This is an old revision of the document!
Here are the rules of engagement for the cluster
The cluster is a very versatile tool that can help us work together to achieve miraculous goals. But as with all shared resources, there must be certain rules that apply to all of us to make the cooperation possible, and as headache-free as possible.
The following are general rules.
1) Do not copy datasets around.
The cluster can store an impressive amount of data, but not infinitely. If everyone copied datasets around, we'd fill up the storage space pretty quickly. We also have to be very careful that we know exactly what versions of the datasets we are working with, and where they are stored.
2) Do not (try) to hog the system resources.
This one can be tricky since this can be a completely involuntary act. For instance, python has a tendency with some packages like numpy to spawn processes to speed up computing.
Processing power is like all else, not an infinite resource, and we must be careful to not hog all the resources.
3) Do not try to "cheat the system"
If there is any facet of the cluster that seems off, or counterintuitive to you, please do not try to hack around with it. The clusters configuration is very specific, and suited to