GPU: GeForce RTX 3080 * 4
CPU: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz * 2
mpirun -np 4 lmp -sf gpu -in in.gpu
# 3d Lennard-Jones melt
variable x index 1
variable y index 1
variable z index 1
variable xx equal 20*$x
variable yy equal 20*$y
variable zz equal 20*$z
units lj
atom_style atomic
lattice fcc 0.8442
region box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box 1 box
create_atoms 1 box
mass 1 1.0
velocity all create 1.44 87287 loop geom
pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5
neighbor 0.3 bin
neigh_modify delay 0 every 20 check no
fix 1 all nve
run 100000
GPU 1/2/3/4
Performance: 558428.477 tau/day, 1292.659 timesteps/s
91.2% CPU use with 1 MPI tasks x 1 OpenMP threads
Performance: 710200.030 tau/day, 1643.982 timesteps/s
91.8% CPU use with 2 MPI tasks x 1 OpenMP threads
Performance: 861403.974 tau/day, 1993.991 timesteps/s
91.8% CPU use with 3 MPI tasks x 1 OpenMP threads
Performance: 1002403.932 tau/day, 2320.379 timesteps/s
89.5% CPU use with 4 MPI tasks x 1 OpenMP threads
CPU 24/36/48
Performance: 546301.574 tau/day, 1264.587 timesteps/s
99.7% CPU use with 24 MPI tasks x 1 OpenMP threads
Performance: 788594.307 tau/day, 1825.450 timesteps/s
99.6% CPU use with 36 MPI tasks x 1 OpenMP threads
Performance: 864797.031 tau/day, 2001.845 timesteps/s
99.4% CPU use with 48 MPI tasks x 1 OpenMP threads