I am currently trying to reduce the run-time of the MC portion by increasing the number of nodes and running in parallel. When running on 1 node, with all cores available, there are no issues. But when using ‘mpirun’, there seems to be a conflict between the nodes when writing to the ‘summary.h5’ file.
instead of mpirun -np $SLURM_NNODES --bind-to core --map-by core python3 run_PBR_v_1_8.py
you put srun XYZ
than Slurm takes care of running the MPI and handles dispatching
below runs a calculation on 2 nodes, each with 4xMPI * 96xOMP
for a total of 768 threads
some aspects of your Slurm script will vary based on your HPC environment/build and OpenMC build
I always build OpenMC with Intel oneAPI and I also build my own parallel HDF5
#SBATCH -J OMC-mpi # Job name
#SBATCH --nodes=2 # -N; number of nodes on which to run
#SBATCH --ntasks=8 # -n; number of tasks to run = Total Number of MPI ranks
#SBATCH --ntasks-per-node=4 # number of MPI tasks to invoke on each node
#SBATCH --cpus-per-task=96 # OMP ranks per MPI rank
#SBATCH --threads-per-core=2 # Lock threads to Cores
#SBATCH --hint=multithread # Allow hyperthreads
#SBATCH --exclusive
#SBATCH --no-requeue
#SBATCH --output=job_%j.log # Standard output and error log
#SBATCH -vv
# MPI Settings
module load intel/2025.3.0
module load mpi/2021.17
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # use cpus-per-task
export OMP_PLACES=threads # physical cores + hyperthreads
export OMP_PROC_BIND=close # close for cache locality
export I_MPI_DEBUG=9
export LD_LIBRARY_PATH=/opt/local/lib64:$LD_LIBRARY_PATH
export HDF5_USE_FILE_LOCKING=FALSE
srun -vv --mpi=pmi2 ~/home/bin/openmc -s $OMP_NUM_THREADS model*xml
Thanks for your reply. I tried to emulate your script, and ran into ‘invalid distribution specification’ errors. The system architecture I am working with is listed below:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 16
Vendor ID: AuthenticAMD
CPU family: 25
Model: 17
Model name: AMD EPYC 9454 48-Core Processor
Stepping: 1
CPU MHz: 2750.000
CPU max MHz: 3810.7910
CPU min MHz: 1500.0000
BogoMIPS: 5499.87
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 32768K
NUMA node0 CPU(s): 0-5
NUMA node1 CPU(s): 6-11
NUMA node2 CPU(s): 12-17
NUMA node3 CPU(s): 18-23
NUMA node4 CPU(s): 24-29
NUMA node5 CPU(s): 30-35
NUMA node6 CPU(s): 36-41
NUMA node7 CPU(s): 42-47
NUMA node8 CPU(s): 48-53
NUMA node9 CPU(s): 54-59
NUMA node10 CPU(s): 60-65
NUMA node11 CPU(s): 66-71
NUMA node12 CPU(s): 72-77
NUMA node13 CPU(s): 78-83
NUMA node14 CPU(s): 84-89
NUMA node15 CPU(s): 90-95
Because of this, I defined my SLURM allocations as:
As far as I can see, this is fairly similar to your script. Do you see anything that could be causing this issue? I am unfamiliar with using srun, so I am unable to proficiently troubleshoot.
for now, generate the model.xml and stick with running OpenMC directly
you are not familiar with MPI, so you are going to have difficulty executing an MPI OpenMC Python run
you can get back to python later
#SBATCH -p od3
#SBATCH --nodes=2
#SBATCH --ntasks=32 # 2xNUMA
#SBATCH --ntasks-per-node=16 # 16 NUMA
#SBATCH --cpus-per-task=6 # CPUs per NUMA
#SBATCH --exclusive
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK # use cpus-per-task
srun -vv --mpi=pmi2 ~/home/bin/openmc -s $OMP_NUM_THREADS model*xml
you may need a different --mpi depending on your HPC
check srun --mpi=list
use whatever is available, preferably pmi2+ (pmix3, pmix3_v5, etc)
you may need to module load depending on your HPC
you’re on your own for that