Dear all,
I’m having problems while running OpenMC on a Cluster machine. The software, OpenMC 0.13.3, has been compiled from source using MPICH 4.0.3 and HDF5 1.12.2 (HDF5 compiled using option “-D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64”). The python interface has not been installed. Job submission is performed using Slurm.
[root@n0009 ~]# ldd /var/sw/openmc/build/exec/bin/openmc
linux-vdso.so.1 (0x00007ffc5d68a000)
libopenmc.so => /var/sw/openmc/build/exec/lib64/libopenmc.so (0x00007f4677ddc000)
libhdf5.so.1000 => /var/sw/lib/hdf5/hdf5-install/lib/libhdf5.so.1000 (0x00007f4677653000)
libz.so.1 => /lib64/libz.so.1 (0x00007f467743c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f4677238000)
libhdf5_hl.so.1000 => /var/sw/lib/hdf5/hdf5-install/lib/libhdf5_hl.so.1000 (0x00007f4677015000)
libpng16.so.16 => /lib64/libpng16.so.16 (0x00007f4676de0000)
libmpicxx.so.12 => /var/sw/libmpi/mpich-install/lib/libmpicxx.so.12 (0x00007f4676bbf000)
libmpi.so.12 => /var/sw/libmpi/mpich-install/lib/libmpi.so.12 (0x00007f46732c5000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f4672f30000)
libm.so.6 => /lib64/libm.so.6 (0x00007f4672bae000)
libgomp.so.1 => /lib64/libgomp.so.1 (0x00007f4672976000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f467275e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f467253e000)
libc.so.6 => /lib64/libc.so.6 (0x00007f4672179000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4678261000)
librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00007f4671f5e000)
libefa.so.1 => /lib64/libefa.so.1 (0x00007f4671d54000)
libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007f4671b34000)
librt.so.1 => /lib64/librt.so.1 (0x00007f467192c000)
libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f4671709000)
libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f4671483000)
Originally, I was running OpenMC calculations using .h5 libraries located in a repository shared between the different nodes of the cluster. In such a situation, the code got stuck while reading the .h5 libraries without dumping any error.
The problem has been partially solved by using the command ‘sbcast’ to broadcast the h5 libraries to each node and using a separate copy of the library in each node. However, also in this case the solver gets stuck while writing the statepoint.h5 file. I suspect the problem is due to the read/write permissions of h5 files across different nodes.
As a further detail, the problem has not been encountered while running calculations on the node in which I’m logged in (without using slurm).
Can you please help me solving this issue?
Cordially,
Matteo