Problems with the use of MPICH multi-node parallel computation of depletion

The three nodes have the same configuration:

  • Two 16-core 32-thread CPUs and 32G RAM
  • Ubuntu20.04
  • Python3.8.10
  • openmc 0.12.0
  • Firewall is closed
  • SSH Login Without Password

Installation Process:

sudo apt install mpich libmpich-dev
sudo apt install g++ cmake libhdf5-dev
mkdir build && cd build
export CXX=mpicxx
cmake ..
make
sudo make install
cd ..
pip3 install mpi4py
pip3 install .

When I use the command to run:

mpirun -f mpi_config_file python3 run_depletion.py

The output will be as follows:

 %%%%%%%%%%%%%%%
                           %%%%%%%%%%%%%%%%%%%%%%%%
                        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                      %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                    %%%%%%%%%%%%%%%%%%%%%%%%
                                     %%%%%%%%%%%%%%%%%%%%%%%%
                 ###############      %%%%%%%%%%%%%%%%%%%%%%%%
                ##################     %%%%%%%%%%%%%%%%%%%%%%%
                ###################     %%%%%%%%%%%%%%%%%%%%%%%
                ####################     %%%%%%%%%%%%%%%%%%%%%%
                #####################     %%%%%%%%%%%%%%%%%%%%%
                ######################     %%%%%%%%%%%%%%%%%%%%
                #######################     %%%%%%%%%%%%%%%%%%
                 #######################     %%%%%%%%%%%%%%%%%
                 ######################     %%%%%%%%%%%%%%%%%
                  ####################     %%%%%%%%%%%%%%%%%
                    #################     %%%%%%%%%%%%%%%%%
                     ###############     %%%%%%%%%%%%%%%%
                       ############     %%%%%%%%%%%%%%%
                          ########     %%%%%%%%%%%%%%
                                      %%%%%%%%%%%

                   | The OpenMC Monte Carlo Code
         Copyright | 2011-2020 MIT and OpenMC contributors
           License | https://docs.openmc.org/en/latest/license.html
           Version | 0.12.0
         Date/Time | 2022-04-01 14:13:44
     MPI Processes | 96
    OpenMP Threads | 32

 Reading settings XML file...
 Reading cross sections XML file...
 Reading materials XML file...
 Reading geometry XML file...
 Reading O16 from /home/neal/openmc/xs/endfb71_hdf5/O16.h5
 Reading U234 from /home/neal/openmc/xs/endfb71_hdf5/U234.h5
 Reading U235 from /home/neal/openmc/xs/endfb71_hdf5/U235.h5
 Reading U236 from /home/neal/openmc/xs/endfb71_hdf5/U236.h5
 Reading U238 from /home/neal/openmc/xs/endfb71_hdf5/U238.h5
 Reading O17 from /home/neal/openmc/xs/endfb71_hdf5/O17.h5
 Reading He3 from /home/neal/openmc/xs/endfb71_hdf5/He3.h5
 Reading He4 from /home/neal/openmc/xs/endfb71_hdf5/He4.h5
 Reading Zr90 from /home/neal/openmc/xs/endfb71_hdf5/Zr90.h5
 Reading Zr91 from /home/neal/openmc/xs/endfb71_hdf5/Zr91.h5
 Reading Zr92 from /home/neal/openmc/xs/endfb71_hdf5/Zr92.h5
 Reading Zr94 from /home/neal/openmc/xs/endfb71_hdf5/Zr94.h5
 Reading Zr96 from /home/neal/openmc/xs/endfb71_hdf5/Zr96.h5
 Reading Cr50 from /home/neal/openmc/xs/endfb71_hdf5/Cr50.h5
 Reading Cr52 from /home/neal/openmc/xs/endfb71_hdf5/Cr52.h5
 Reading Cr53 from /home/neal/openmc/xs/endfb71_hdf5/Cr53.h5
 Reading Cr54 from /home/neal/openmc/xs/endfb71_hdf5/Cr54.h5
 Reading Fe54 from /home/neal/openmc/xs/endfb71_hdf5/Fe54.h5
 Reading Fe56 from /home/neal/openmc/xs/endfb71_hdf5/Fe56.h5
 Reading Fe57 from /home/neal/openmc/xs/endfb71_hdf5/Fe57.h5
 Reading Fe58 from /home/neal/openmc/xs/endfb71_hdf5/Fe58.h5
 Reading Sn112 from /home/neal/openmc/xs/endfb71_hdf5/Sn112.h5
 Reading Sn114 from /home/neal/openmc/xs/endfb71_hdf5/Sn114.h5
 Reading Sn115 from /home/neal/openmc/xs/endfb71_hdf5/Sn115.h5
 Reading Sn116 from /home/neal/openmc/xs/endfb71_hdf5/Sn116.h5
 Reading Sn117 from /home/neal/openmc/xs/endfb71_hdf5/Sn117.h5
 Reading Sn118 from /home/neal/openmc/xs/endfb71_hdf5/Sn118.h5
 Reading Sn119 from /home/neal/openmc/xs/endfb71_hdf5/Sn119.h5
 Reading Sn120 from /home/neal/openmc/xs/endfb71_hdf5/Sn120.h5
 Reading Sn122 from /home/neal/openmc/xs/endfb71_hdf5/Sn122.h5
 Reading Sn124 from /home/neal/openmc/xs/endfb71_hdf5/Sn124.h5
 Reading B10 from /home/neal/openmc/xs/endfb71_hdf5/B10.h5
 Reading B11 from /home/neal/openmc/xs/endfb71_hdf5/B11.h5
 Reading H1 from /home/neal/openmc/xs/endfb71_hdf5/H1.h5
 Reading H2 from /home/neal/openmc/xs/endfb71_hdf5/H2.h5
 Reading c_H_in_H2O from /home/neal/openmc/xs/endfb71_hdf5/c_H_in_H2O.h5
 Minimum neutron data temperature: 294.000000 K
 Maximum neutron data temperature: 294.000000 K
 Preparing distributed cell instances...
 Writing summary.h5 file...
 Reading Br81 from /home/neal/openmc/xs/endfb71_hdf5/Br81.h5
 Reading Kr82 from /home/neal/openmc/xs/endfb71_hdf5/Kr82.h5
 Reading Kr83 from /home/neal/openmc/xs/endfb71_hdf5/Kr83.h5
 Reading Kr84 from /home/neal/openmc/xs/endfb71_hdf5/Kr84.h5
 Reading Kr85 from /home/neal/openmc/xs/endfb71_hdf5/Kr85.h5
 Reading Kr86 from /home/neal/openmc/xs/endfb71_hdf5/Kr86.h5
 Reading Sr89 from /home/neal/openmc/xs/endfb71_hdf5/Sr89.h5
 Reading Sr90 from /home/neal/openmc/xs/endfb71_hdf5/Sr90.h5
 Reading Y89 from /home/neal/openmc/xs/endfb71_hdf5/Y89.h5
 Reading Y90 from /home/neal/openmc/xs/endfb71_hdf5/Y90.h5
 Reading Y91 from /home/neal/openmc/xs/endfb71_hdf5/Y91.h5
 Reading Zr93 from /home/neal/openmc/xs/endfb71_hdf5/Zr93.h5
 Reading Zr95 from /home/neal/openmc/xs/endfb71_hdf5/Zr95.h5
 Reading Nb95 from /home/neal/openmc/xs/endfb71_hdf5/Nb95.h5
 WARNING: Negative value(s) found on probability table for nuclide Nb95 at 250K
 WARNING: Negative value(s) found on probability table for nuclide Nb95 at 294K
 WARNING: Negative value(s) found on probability table for nuclide Nb95 at 600K
 WARNING: Negative value(s) found on probability table for nuclide Nb95 at 900K
 WARNING: Negative value(s) found on probability table for nuclide Nb95 at 1200K
 Reading Mo92 from /home/neal/openmc/xs/endfb71_hdf5/Mo92.h5
 Reading Mo94 from /home/neal/openmc/xs/endfb71_hdf5/Mo94.h5
 Reading Mo95 from /home/neal/openmc/xs/endfb71_hdf5/Mo95.h5
 Reading Mo96 from /home/neal/openmc/xs/endfb71_hdf5/Mo96.h5
 Reading Mo97 from /home/neal/openmc/xs/endfb71_hdf5/Mo97.h5
 Reading Mo98 from /home/neal/openmc/xs/endfb71_hdf5/Mo98.h5
 Reading Mo99 from /home/neal/openmc/xs/endfb71_hdf5/Mo99.h5
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 250K
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 294K
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 600K
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 900K
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 1200K
 WARNING: Negative value(s) found on probability table for nuclide Mo99 at 2500K
 Reading Mo100 from /home/neal/openmc/xs/endfb71_hdf5/Mo100.h5
 Reading Tc99 from /home/neal/openmc/xs/endfb71_hdf5/Tc99.h5
 Reading Ru100 from /home/neal/openmc/xs/endfb71_hdf5/Ru100.h5
 Reading Ru101 from /home/neal/openmc/xs/endfb71_hdf5/Ru101.h5
 Reading Ru102 from /home/neal/openmc/xs/endfb71_hdf5/Ru102.h5
 Reading Ru103 from /home/neal/openmc/xs/endfb71_hdf5/Ru103.h5
 Reading Ru104 from /home/neal/openmc/xs/endfb71_hdf5/Ru104.h5
 Reading Ru105 from /home/neal/openmc/xs/endfb71_hdf5/Ru105.h5
 Reading Ru106 from /home/neal/openmc/xs/endfb71_hdf5/Ru106.h5
 Reading Rh103 from /home/neal/openmc/xs/endfb71_hdf5/Rh103.h5
 Reading Rh105 from /home/neal/openmc/xs/endfb71_hdf5/Rh105.h5
 Reading Pd104 from /home/neal/openmc/xs/endfb71_hdf5/Pd104.h5
 Reading Pd105 from /home/neal/openmc/xs/endfb71_hdf5/Pd105.h5
 Reading Pd106 from /home/neal/openmc/xs/endfb71_hdf5/Pd106.h5
 Reading Pd107 from /home/neal/openmc/xs/endfb71_hdf5/Pd107.h5
 Reading Pd108 from /home/neal/openmc/xs/endfb71_hdf5/Pd108.h5
 Reading Ag107 from /home/neal/openmc/xs/endfb71_hdf5/Ag107.h5
 Reading Ag109 from /home/neal/openmc/xs/endfb71_hdf5/Ag109.h5
 Reading Ag110_m1 from /home/neal/openmc/xs/endfb71_hdf5/Ag110_m1.h5
 Reading Ag111 from /home/neal/openmc/xs/endfb71_hdf5/Ag111.h5
 Reading Cd110 from /home/neal/openmc/xs/endfb71_hdf5/Cd110.h5
 Reading Cd111 from /home/neal/openmc/xs/endfb71_hdf5/Cd111.h5
 Reading Cd112 from /home/neal/openmc/xs/endfb71_hdf5/Cd112.h5
 Reading Cd113 from /home/neal/openmc/xs/endfb71_hdf5/Cd113.h5
 Reading Cd114 from /home/neal/openmc/xs/endfb71_hdf5/Cd114.h5
 Reading In113 from /home/neal/openmc/xs/endfb71_hdf5/In113.h5
 Reading In115 from /home/neal/openmc/xs/endfb71_hdf5/In115.h5
 Reading Sb121 from /home/neal/openmc/xs/endfb71_hdf5/Sb121.h5
 Reading Sb123 from /home/neal/openmc/xs/endfb71_hdf5/Sb123.h5
 Reading Sb125 from /home/neal/openmc/xs/endfb71_hdf5/Sb125.h5
 Reading Te127_m1 from /home/neal/openmc/xs/endfb71_hdf5/Te127_m1.h5
 Reading Te129_m1 from /home/neal/openmc/xs/endfb71_hdf5/Te129_m1.h5
 Reading Te132 from /home/neal/openmc/xs/endfb71_hdf5/Te132.h5
 Reading I127 from /home/neal/openmc/xs/endfb71_hdf5/I127.h5
 Reading I129 from /home/neal/openmc/xs/endfb71_hdf5/I129.h5
 Reading I130 from /home/neal/openmc/xs/endfb71_hdf5/I130.h5
 Reading I131 from /home/neal/openmc/xs/endfb71_hdf5/I131.h5
 WARNING: Negative value(s) found on probability table for nuclide I131 at 250K
 WARNING: Negative value(s) found on probability table for nuclide I131 at 294K
 WARNING: Negative value(s) found on probability table for nuclide I131 at 600K
 WARNING: Negative value(s) found on probability table for nuclide I131 at 900K
 WARNING: Negative value(s) found on probability table for nuclide I131 at 1200K
 WARNING: Negative value(s) found on probability table for nuclide I131 at 2500K
 Reading I135 from /home/neal/openmc/xs/endfb71_hdf5/I135.h5
 Reading Xe128 from /home/neal/openmc/xs/endfb71_hdf5/Xe128.h5
 Reading Xe130 from /home/neal/openmc/xs/endfb71_hdf5/Xe130.h5
 Reading Xe131 from /home/neal/openmc/xs/endfb71_hdf5/Xe131.h5
 Reading Xe132 from /home/neal/openmc/xs/endfb71_hdf5/Xe132.h5
 Reading Xe133 from /home/neal/openmc/xs/endfb71_hdf5/Xe133.h5
 WARNING: Negative value(s) found on probability table for nuclide Xe133 at
          2500K
 Reading Xe134 from /home/neal/openmc/xs/endfb71_hdf5/Xe134.h5
 Reading Xe135 from /home/neal/openmc/xs/endfb71_hdf5/Xe135.h5
 Reading Xe136 from /home/neal/openmc/xs/endfb71_hdf5/Xe136.h5
 Reading Cs133 from /home/neal/openmc/xs/endfb71_hdf5/Cs133.h5
 Reading Cs134 from /home/neal/openmc/xs/endfb71_hdf5/Cs134.h5
 Reading Cs135 from /home/neal/openmc/xs/endfb71_hdf5/Cs135.h5
 Reading Cs136 from /home/neal/openmc/xs/endfb71_hdf5/Cs136.h5
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at 250K
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at 294K
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at 600K
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at 900K
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at
          1200K
 WARNING: Negative value(s) found on probability table for nuclide Cs136 at
          2500K
 Reading Cs137 from /home/neal/openmc/xs/endfb71_hdf5/Cs137.h5
 Reading Ba134 from /home/neal/openmc/xs/endfb71_hdf5/Ba134.h5
 Reading Ba137 from /home/neal/openmc/xs/endfb71_hdf5/Ba137.h5

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 552199 RUNNING AT node3
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@node1] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:878): assert (!closed) failed
[proxy:0:0@node1] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@node1] main (pm/pmiserv/pmip.c:200): demux engine error waiting for event
[mpiexec@node1] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:74): one of the processes terminated badly; aborting
[mpiexec@node1] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:22): launcher returned error waiting for completion
[mpiexec@node1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:215): launcher returned error waiting for completion
[mpiexec@node1] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion

I have noticed two phenomena.
1, using mpirun to specify the current single node can run normally
2, when using mpirun to call three nodes, the other two nodes cpu load can reach 100%, but will produce the above error

Here is my reference code and configuration file:
hosts.py (272 Bytes)
mpi_config_file.py (27 Bytes)
run_depletion.py (4.9 KB)

I would appreciate your help. :smiling_face:

I am not sure but i think this can be an issue of serial/parallel mpi4py.

also try setting this in the script.

openmc.deplete.pool.USE_MULTIPROCESSING = False

Try looking at this thread for detail.

i hope that helps

The issue has been resolved!!!I found out it was a mistake in my configuration file。 :joy:
mpi_config_file.py (18 Bytes)
Use the following command:

mpirun --hostfile mpi_config_file python3 dir.py