This page describes on to use VASP6 on the LCRC server blues at Argonne National Laboratory.
To use VASP6 on blues, you can include the path to the executables in your PATH environmental variables by adding the following line to your ~/.soft file for Blues’ Sandy Bridge nodes.
PATH += /soft/vasp/6.beta/snb
To use the build of VASP6 tuned for the Haswell nodes you would use the following line.
PATH += /soft/vasp/6.beta/hsw
You should not have the paths to snb and hsw VASP executables in your PATH variable simultaneously. If you do, then only the executables found the first VASP directory in the PATH variable will be used.
You could also just specify the full path the executable. For example, the full path to the multiple k-point or standard (std) versions of VASP built for Sandy Bridge is shown below.
The following softenv keys are required.
The number of MPI processes per node, the number of OpenMP threads and CPU affinity must be defined correctly to use VASP6 efficiently.
If you want to use only a single OpenMP thread per MPI process, use the following command in your bash run script then VASP will perform like a standard MPI program.
The NCORE and NPAR VASP tags in INCAR will cause VASP6 to act like VASP5. NCORE=8 on Sandy Bridge nodes and NCORE=16 on Haswell nodes are recommended with NUM_OMP_THREADS=1.
If you use more than one OpenMP thread per MPI process, then you should set NUM_OMP_THREADS to be a divisor of the number of cores per processor. On the sandy bridge nodes, each processor has eight cores. Good choices for NUM_OMP_THREADS are 1,2,4 and 8. It is hard to predict which value of NUM_OMP_THREADS is optimal because it will depend on the number of nodes, the size of the caches, the topology of the processor and the size of your model. You need to experiment by varying NUM_OMP_THREADS and number of MPI processes in a scaling study on a short test (i.e. ten SCF steps) of your model.
NUM_OMP_THREADS greater than one is most beneficial when you want to use more than one core per atom in VASP. We have found that you can get fair parallel scaling for twice as many nodes for an optimized value of NUM_OMP_THREADS. Running more than one OpenMP thread per MPI process can perform better than a single thread per process.
It is important to define CPU affinity when using NUM_OMP_THREADS > 1. The VASP executables on blues were built with the mvapich2-v2.3a libraries. Use the MV2_CPU_MAPPING to pin all the threads from an MPI process to consecutive cores. The following script uses four OpenMP threads per MPI process and defines an optimum environment for running VASP6 on a Sandy Bridge Node.
Start of Example Script
#PBS -l nodes=8:ppn=4
# Note that 4 mpi processes per node with 4 threads per process
# will use 16 cores per nodes.
#PBS -l walltime=$ll
# send mail at begin, end, abort, or never (b, e, a, n):
#PBS -m bea
ulimit -s unlimited
mpiexec -hostfile $PBS_NODEFILE \
-np $(wc -l < $PBS_NODEFILE) \
-env MV2_CPU_MAPPING 0-3:4-7:8-11:12-15
End of Example Script
Note that MV2_CPU_MAPPING=0-3:4-7:8-11:12-16 assigns four threads from MPI process 0 to cores 0 through 3, four threads from MPI process 1 to cores 4 through 7, four threads from MPI process 2 to cores 8 through 11 and four threads from MPI process 3 to cores 12 through 15.