How do I get an account for Memex?
To request an account you should contact your departmental IT Manager and get approval from your PI:
DGE/DPB - Garret Huntress or request HPC Access
DTM - Michael Acierno
EMB - Ed Hirschmugl or Fred Tan
GL - Gabor Szilagyi
OBS - Chris Burns or Andrew Benson
HQ - Floyd Fayton
Everyone with a valid Memex login account will be automatically added to our Memex-Announce Google Group (memex-announce@carnegiescience.edu).
How do I login to memex?
The basic command to login from Putty or any terminal:
$ ssh username@memex.carnegiescience.edu
Password:
Last login: Fri Jul 28 09:15:05 2017 from 192.70.249.30
Rocks 6.2 (SideWinder)
Profile built 02:15 24-Nov-2015
Kickstarted 02:20 24-Nov-2015
Login Node
____ _ ____ _
/ ___|__ _ _ __ _ __ ___ __ _(_) ___/ ___| ___(_) ___ _ __ ___ ___
| | / _ | __| _ \ / _ \/ _ | |/ _ \___ \ / __| |/ _ \ _ \ / __/ _ \
| |__| (_| | | | | | | __/ (_| | | __/___) | (__| | __/ | | | (_| __/
\____\__,_|_| |_| |_|\___|\__, |_|\___|____/ \___|_|\___|_| |_|\___\___|.edu
|___/
…………. (bunch of machine information)
[username@memex ~]$
where your username/password is the same as your username/password for Gmail. For example, if your email is bush@carnegiescience.edu your login for Memex is bush@memex.carnegiescience.edu.
What partition or queue should I use?
The majority of the partitions corresponding to each Department (DGE, DPB, DTM, EMB, GL, HQ, and OBS), except for the GPU, SHARED, and PREEMPTION partitions. Typing "sinfo -a -s", will give you a summary of all the partitions. All of the Department partitions have no time limit or memory limits (up to 128GB) for its nodes. The GPU nodes have the same specifications except they also have an NVIDIA K80 GPU. Department nodes are also shared in SHARED or PREEMPTION, which have limits to make sure Department nodes are generally available to Departments.
Nodes in the SHARED partition have a two hour limit for all of its jobs. This ensures no Department user is waiting more than two hours to use their Department's nodes.
Nodes in the PREEMPTION partition have a 7 day time limit and are limited to 40GB of memory per node job. PREEMPTION jobs can be suspended if a user (who must be apart of the Department that owns the node) requests those nodes via their Departments partition. **Note: Maximum memory usage per node for PREEMPTION is not enforced but testing is underway now to fix this issue.
Please use your own discretion for which shared partition suites your needs, SHARED or PREEMPTION.
Most nodes (memex-g[01-02],memex-c[001-100]) have 128GB, 24-cores, and 1.6TB local storage (~30GB /tmp). Nodes, memex-c[101-108] have 128GB, 28-cores but only 250GB local storage (~30GB /tmp).
Parition Name |
Wallclock Hours |
Total Cores |
Number of Nodes |
MemLimit per Node (GB) |
---|---|---|---|---|
SHARED | 2 | 2160 | 78 | 128 |
PREEMPTION | 168 | 696 | 30 | 40 |
DGE | unlimited | 240 | 10 | 128 |
DPB | unlimited |
240 | 10 | 128 |
DTM | unlimited |
960 | 36 | 128 (memex-c[109-124] have 256GB) |
EMB | unlimited |
120 | 5 | 128 |
GL | unlimited |
240 | 10 | 128 |
HQ | unlimited |
120 | 4 | 128 |
OBS | unlimited |
960 | 40 | 128 |
GPU | unlimited |
48 (+2 K80s) |
2 | 128 |
How do I run a job on Memex?
For an interactive job use SLURM’s “salloc” or "srun" from the command line. Here’s an example to grab 4 nodes in a bash shell,
$ salloc -N 4 bash
or
$ srun -N4 --pty bash -i
Then run your application with 96 CPUs (24 CPUs per node),
$ mpirun -n 96 a.out < input_file > output_file
For a batch/non-interactive job, here is a SLURM batch script,
$ cat batch_script.sh #!/bin/bash #SBATCH --nodes=1 # only grab one node on Memex #SBATCH --ntasks-per-node=24 # 24 is the max for most nodes #SBATCH --time=02:00:00 # two hour limit for SHARED only, 7d for PREEMPTION #SBATCH -p SHARED,PREEMPTION # if SHARED isn’t available, 2nd choice is PREEMPTION #SBATCH --mem-per-cpu=2000 # default is 1000M, and PREEMPTION has a 40000M total limit per node #SBATCH --output=slurm-${SLURM_JOBID}.out # you can specify directory and file name of SLURM’s output log #SBATCH --error=slurm-${SLURM_JOBID}.out # you can specify directory and file name of SLURM’s error log #SBATCH --mail-user=username@carnegiescience.edu # Use your email for notifications #SBATCH --mail-type=FAIL # only send email if the job fails (BEGIN,END,SUSPEND are options as well and can be used in combination) echo "SLURM_JOBID: " $SLURM_JOBID echo "SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID echo "SLURM_ARRAY_JOB_ID: " $SLURM_ARRAY_JOB_ID echo “$SLURM_JOB_NODELIST” module load Intel/2018 mpirun -n $SLURM_NTASKS a.out 2>1 slurm-${SLURM_JOBID}.out # if you redirect your applications output to append the SLURM output
Submit the script to SLURM with
$ sbatch < batch_script.sh
Monitor all user jobs with,
For more information, please send an email to memexsupport@memex.carnegiescience.edu.
How do I use job dependencies in SLURM?
Here is an example of a regular shell script to submit jobs with dependencies:
#!/bin/bash
# Launch first job
JOB=`sbatch job.sh | egrep -o -e "\b[0-9]+$"`
# Launch a job that should run if the first is successful
sbatch --dependency=afterok:${JOB} after_success.sh
# Launch a job that should run if the first job is unsuccessful
sbatch --dependency=afternotok:${JOB} after_fail.sh
Where the content of job.sh, after_success.sh, and after_fail.sh are SLURM batch scripts.
How do I monitor my job on Memex?
There are several ways to monitor your job on Memex. If you want to monitor the progress of your application, you can “tail -f output_file.txt”, where the filename is specified in your batch or interactive SLURM submission. For example,
mpirun -n a.out 2>1 output_file.txt
If you’re interested in monitoring resource usage (CPU/memory/network), you can go to our Ganglia page, or use the following command for running jobs only:
sstat --format=AveCPU,AveCPUFreq,MaxDiskRead,MaxDiskWrite,AvePages,MaxRSS,MaxVMSize,JobID,NTasks -j XXXXX.batch
Where XXXXX is a SLURM jobid. Otherwise, please email memexsupport@carnegiescience.edu for other options.
How do I backup my Memex data?
We do NOT have a central backup in place for user data. However, you can backup your critical data to Google Drive using the “rclone” tool on Memex.
Here's a video on how to setup for Google Drive:
Here are examples of how to use Rclone commands:
$ rclone ls remote:path
$ rclone copy /local/path remote:path # copies /local/path to the remote
$ rclone sync /local/path remote:path # syncs /local/path to the remote
Also, please don't use spaces for the names of your Google Drive backups! For example, if you're backing up a directory or file, please make a path with no spaces on GDrive (i.e. rclone sync /home/user/bio GDrive:biobackup). If you have issues, please submit a ticket to memexsupport@carnegiescience.edu.
How can I request software?
Please email memexsupport@carnegiescience.edu for request software to be added to Memex. Normally this request can take up to one week to fulfil but please be advised, some applications may not work on Memex. If the application can be installed, a module should be made available as well.
How to monitor Memex's hardware?
Visit our Ganglia page by browsing to http://10.15.176.1/ganglia, after you've logged into Memex. Make sure you log into Memex using X11 forwarding ("ssh -X username@memex.carnegiescience.edu" or "ssh -XY username@memex.carnegiescience.edu").
For slower connections, setting up VNC (from Mac or Windows, using memex.carnegiescience.edu not calc.d**.carnegiescience.edu) would be a better option.
For Memex news and events?
Subscribe to Carnegie's HPC calendar for important updates and if you have a Memex account, you'll automatically receive emails from the official mailing list (memex-announce@carnegiescience.edu). For general questions that are not affecting your work, please join the Memex Discuss mailing list or follow our Slack Channel for Carnegie HPC.
How to use Lustre?
Here is general information on Lustre,
https://www.nics.tennessee.edu/computing-resources/file-systems/io-lustr...
Linux Basics for HPC....