LAMMPS Molecular Dynamics Simulator

From the LAMMPS website http://www.lammps.org

LAMMPS is a classical molecular dynamics code with a focus on materials modeling. It’s an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.

LAMMPS has potentials for solid-state materials (metals, semiconductors) and soft matter (biomolecules, polymers) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. Many of its models have versions that provide accelerated performance on CPUs, GPUs, and Intel Xeon Phis. The code is designed to be easy to modify or extend with new functionality.

Building LAMMPS on CSD3

As LAMMPS has so many possible different configurations we normally find it more useful to provide help to compile the specific setup rather than make a global install. For example, to build a configuration of LAMMPS that supports parallel computation on CPUs using the Reax force-field:

# module setup
module purge
module load rhel8/default-ccl
module load cmake/3.29.4/gcc/cery5wyj
# download LAMMPS
git clone --branch stable_29Aug2024_update2 --depth=1 https://github.com/lammps/lammps.git
cd lammps
# create and switch to build directory
mkdir build && cd build

# setup with Intel compilers preset.
cmake -C ../cmake/presets/intel.cmake ../cmake
# activate the reaxff package
cmake -D PKG_REAXFF=yes .

# compile lammps with mpi support
cmake --build . --clean-first

cd ../..

This will produce an executable lmp in that directory which you can then use with the pair/reax type.

Some LAMMPS subpackages support optimised versions for the specific hardware on CSD3. For example the EAM force-field has support for CPU and GPU specific optimisations. To build a configuration that supports running EAM calculations on CPU :

# load the CSD3 modules for CPU, INTEL and CMAKE 
module purge
module load rhel8/cclake/intel
module load intel-oneapi-mkl/2024.1.0/intel/vnktbkgm
module load cmake/3.29.4/gcc/cery5wyj
# download LAMMPS
git clone --branch stable_29Aug2024_update2 --depth=1 https://github.com/lammps/lammps.git
cd lammps

# create and switch to build directory
mkdir build && cd build

# setup with Intel compilers preset.
cmake -C ../cmake/presets/intel.cmake ../cmake
# activate support for EAM forcefields
cmake -D PKG_MANYBODY=yes .
# activate intel-optimised support for intel architectures
cmake -D PKG_INTEL=yes -D INTEL_ARCH=cpu -D INTEL_LRT_MODE=c++11 .

# compile lammps 
cmake --build . --clean-first
cd ../..

This will produce the executable lmp for Intel CPU architecture. To build a configuration that supports running EAM calculations on the Ampere partition:

# load the CSD3 modules for the GPU architecture

module purge
module load rhel8/default-amp 
module load cmake/3.21.4/gcc-9.4.0-pucmh2y 
module load gcc/9.4.0/gcc-11.2.0-72sgv5z
# download LAMMPS
git clone --depth=1 --branch stable_29Aug2024_update2 https://github.com/lammps/lammps.git lammps_gpu

cd lammps_gpu
# create and switch to build directory
mkdir build && cd build

# setup with gcc compiler preset.
cmake -C ../cmake/presets/gcc.cmake ../cmake
# activate support for EAM forcefields
cmake -D PKG_MANYBODY=yes .
# activate support for Ampere GPUs
cmake -D PKG_GPU=yes -D GPU_ARCH=sm_80 -D GPU_API=cuda .
# activate colloid package (used in sbatch example). Replace with any package you plan to use
cmake -D PKG_COLLOID=yes .

# compile lammps 
cmake --build . --clean-first

cd ../..

This will also produce an executable lmp in the build directory which you can use with eam pair style.

Running LAMMPS

LAMMPS can be run with a sbatch script similar to:

#!/bin/bash
#! update the Account and partition (-p) in the two following lines to suit your needs. 
#SBATCH -A MYACCOUNT
#SBATCH -p ampere
#SBATCH -t 8:00:00
#SBATCH --exclusive
#SBATCH --nodes=1
#SBATCH --tasks-per-node=8
#SBATCH --cpus-per-task=8
#SBATCH -o lammps.out
#SBATCH -e lammps.err

module purge
# Use default module relevant to the partition you set in the SBATCH directives.
module load rhel8/default-amp


app="lammps_gpu/build/lmp" # Assuming lammps was installed on this directory, see setup. Edit otherwise.
infile="lammps_gpu/examples/colloid/in.colloid"

# Creating MPS wrapper script to run multiple MPI jobs per GPU 
rm -f mps-wrapper.sh

cat <<EOF >> mps-wrapper.sh
#!/bin/bash
# Example mps-wrapper.sh usage:
# > srun [srun args] mps-wrapper.sh [cmd] [cmd args]


# Set CUDA device
numa_nodes=\$(hwloc-calc --physical --intersect NUMAnode \$(hwloc-bind --get --taskset))
export CUDA_VISIBLE_DEVICES=\$numa_nodes

# Wait for MPS to start
sleep 1

# Run the command
numactl --membind=\$numa_nodes "\$@"
result=\$?

exit \$result
EOF

# Start MPS Server before running mpirun
# Only this path is supported by MPS
export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log-\$(id -un)
# Launch MPS from a single rank per node
CUDA_VISIBLE_DEVICES=0,1,2,3 nvidia-cuda-mps-control -d

chmod +x ./mps-wrapper.sh
cmd="mpirun --npernode ${SLURM_NTASKS_PER_NODE} -np ${SLURM_NTASKS} --bind-to none ./mps-wrapper.sh $app -sf gpu -pk gpu 4 -i $infile"
echo $cmd
eval $cmd

# Shutdown MPS control daemon before finishing
echo quit | nvidia-cuda-mps-control

where we are using an input file in.colloid to run on the GPU system, making use of all 4 GPUs on 1 compute node, running 2 MPI tasks per GPU thanks to the CUDA Multi-Process Service (MPS) wrapper script detailed inside. This wrapper script allows to run efficiently and concurently multiple MPI processes on a GPU by making use of the Hyper-Q capabilities on the latest NVIDIA cards. This allowing to optimise the GPU usage when each MPI process only use a fraction of the GPU memory available (80 GB on the Ampere cards). Without the MPS, the MPI processes will be run sequentially, wasting GPU time.

To use more or less GPUs the -N and -n options should be changed, bearing in mind that our GPU compute nodes have 4 GPUs per node. As always the -A option should be changed to your specific slurm account and the job time limit can be adjusted with the -t option.