Hi all,
I am attempting to run an openmc depletion simulation in parallel on the BlueWaters supercomputer. And my simulation gets stuck between depletion steps. I’ve previously run openmc criticality simulations on the cluster and they work perfectly. I’m testing the parallel depletion using the simple pincell depletion example. My simulation will not get past the first depletion step, and is getting stuck at Creating state point openmc_simulation_n0.h5...
. The simulation does not crash, it just hangs indefinitely at this point. I have attached my openmc input (simple_depletion_test.py) and output file (error_depletion.txt).
I have been working with the BW user support group to try to resolve this problem. We initially suspected that this issue is due to the simulation not exiting all MPI ranks when creating the state point file, resulting in all other ranks waiting for the one that is stuck. But, through a debugging process, we conclude that it is exiting all the MPI ranks, and it seems to not be a MPI HDF5 parallel issue. At the point, they weren’t sure what else could be wrong.
If anyone has any insight into this problem, it will be greatly appreciated.
Best,
Gwen
error_depletion.txt (47.7 KB)
simple_depletion_test.py (5.02 KB)
1 Like
Hello Gwen,
I have a similar error some time ago. I had installed OpenMC with conda-forge and the simulation got stuck between the depletion steps. So, I installed OpenMC from source; after that, I installed the Python API and it works. No idea what was happening but it was my solution
I hope this helps,
Javier
Hello Gwendolyn Chee,
Yesterday Paul merged a new pull request Allow depletion with MPI & serial HDF5. If you install OpenMC from openmc-dev branch, you will not encounter any problem.
I have run pin cell depletion problem with 2 MPI processes and I didn’t get any error.
-Pranto
Hi Javier,
Thanks for your response. I should have been more clear about how I installed OpenMC on Bluewaters. I also installed it from source and then installed the python API but this error is still occurring.
Best,
Gwen
Hi Pranto,
This is great! I’ll give it a shot. Thank you.
Best,
Gwen
Gwendolyn,
Thank you for bringing this to our attention. Unfortunately, I don’t think pull request 1566 will be terribly helpful, as previous versions would refuse to import openmc.deplete if you 1) were running with multiple MPI tasks and 2) did not have the parallel version of HDF5 accessible via h5py. In any event, I am still interested in fixing the existing issue, because the fix is not super scalable, something you definitely want on a supercomputer.
Looking at your input file, my initial suspicion is that there is something hanging on MPI tasks that don’t have any depletable materials. Your input file is just a pincell with a single burnable material, so openmc will only use one task for tracking reaction rates, fission yields, and updating compositions. There is nothing wrong with that distribution, but it does lead me to think about this change in PR 1565. However that only fixes the case where the non-default fission yields were used, so I’m less sure on that.
Can you provide the following diagnostic information:
- Output of the shell command “openmc --version”
- Python version, and output of python command “openmc.version”
- Do the contents of the statepoint files “statepoint.10.h5” and “openmc_simulation_n0.h5” agree?
- Simulation environment, e.g. MPI tasks, memory per node
Any additional information from your debugging process would be extremely helpful.
Best regards,
Andrew
Hi Andrew,
Thanks for your response. You’re correct PR 1566 was not very helpful and the same error is still occurring: the simulation is sticking between depletion steps.
I do not set any fission yield modes, so I would think that I am still using default fission yields and so PR1565 probably had no impact on fixing the error (which is true since I recompiled OpenMC on the supercomputer with the develop branch).
The information below is for the OpenMC develop branch compiled from source. Diagnostic information:
-
[Wed May 20 11:41:03 2020] [unknown] Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(525): MPID_Init(228).......: channel initialization failed MPID_Init(611).......: PMI2 init failed: 1 Aborted
-
0.12.0-dev
-
Yes, they are exactly the same except that the statepoint.10.h5 file has a source_bank table also. I’ve attached both of these files.
-
MPI tasks: 16, No. of OpenMP threads: 16. Memory per node: 32GB. (I ran the simulation on two types of nodes, the first being XK nodes which have 32GB of memory, and the second being XE nodes which have 64GB of memory, here is a link to more information about the supercomputer I am using.) I’m not sure if this is helpful to you, but I’ve attached my job script for the run (job_script).
Debugging process:
Since the simulation was getting stuck at the ’ Creating state point openmc_simulation_n0.h5…` line, we inspected the src/state_point.cpp file’s openmc_statepoint_write function and from the inclusion of many print statements, we concluded that all MPI ranks were arriving at line 314 and leaving the function. It is difficult to pinpoint the location in the source code where the simulation is getting stuck since there is no error message, but instead, the code gets indefinitely stuck, and we also could not easily use a debugger on the supercomputer in parallel. And so at this point, the supercomputer user support suggested that this might be an issue outside their scope of assistance.
I hope that this information is helpful. I have also attached again my input python script and job results for your convenience.
Best,
Gwen
job_script (2.07 KB)
openmc_simulation_n0.h5 (22.1 KB)
statepoint.10.h5 (112 KB)
job_results (46.8 KB)
simple_depletion_test.py (5.02 KB)
Gwendolyn,
Thank you for this update. After the routine finishes writing the statepoint file, it goes back to the python side for depletion. The lack of source is expected, as this is what the Operator requests.
To aid with the discovery, would you be able to add break points or print statements to the following methods:
- At the end of Operator.write_bos_data
- Surrounding the call to self._timed_deplete in PredictorIntegrator.call
- At the start and end of Results.save
If you are using a print statement, you can use “comm.rank” inside the message, which will let you know which MPI task has made it to each point. This is already imported in operator.py and results.py, but you will have to import it into integrators.py with “from . import comm”.
I believe the issue surrounds how the depletion is dispatched for MPI tasks that don’t have any depletable materials. If my suspicion is correct, the fix should be very easy.
Regards,
Drew
Drew,
Thanks for the help, I do hope it is an easy fix.
I added print statements to the three methods. I have attached the output of the job script (debug_job_results). It seems to be getting stuck in the Results.save function at MPI rank 1. I have also attached my edited operator.py, integrators.py, and results.py files for your reference.
Best,
Gwen
debug_job_results (48 KB)
results.py (16.4 KB)
integrators.py (45 KB)
operator.py (28.4 KB)
Gwendolyn,
Thank you for the update. Unfortunately, my original suspicion does not appear to be the case. However, your debugging output is extremely helpful.
The hangup appears to be in the master rank, as it is not exiting _timed_deplete. The other ranks are likely hanging at one of the comm.barrier calls used inside Result.save, and get hung there waiting for rank 0 (the one performing the depletion) to arrive.
This leads me to believe something inside the multiprocessing module, how openmc.deplete dispatches the depletion. For additional information, can you add two additional print statements inside openmc/deplete/pool.py, before and after the matrices object is constructed - L47-L52. This will allow us to find out where the hangup is occurring inside pool.deplete.
Additionally, can you reach out to the BlueWaters support staff on using multiprocessing.Pool? They might have some information into how to best configure your computing resources and/or additional insight.
Cheers,
Drew
Drew,
Thanks for the quick response. I added the print statements around the matrices object in pool.py.
I noticed that when I re-run the simulation with the print statements, for each run, the simulation still gets stuck in the Results.save function. However, it gets stuck randomly at different MPI ranks. It just so happened that it got stuck at rank 1 in the simulation I ran in my previous comment, so it might not be an issue with the master rank. I’ve attached 3 different job results, the simulation gets stuck at rank 3, 7, and 12 in each separate job script. I’ve also attached my edited openmc/deplete/pool.py script.
I’ll ask the BlueWaters support staff about multiprocessing.Pool when their services open again on Monday.
Thanks,
Gwen
pool.py (1.95 KB)
fhr_p1b_c1b_8.o11272662 (48.4 KB)
fhr_p1b_c1b_8.o11272667 (48.4 KB)
fhr_p1b_c1b_8.o11272669 (48.4 KB)
Drew,
I chased the bug down even more, and I found the line that it is getting stuck at is in operator.py at line 766. So it goes into results.save and gets to line 471, then goes into operator.get_results_info and gets stuck at volume_list = comm.allgather(volume)
.
Best,
Gwen
Gwendolyn,
Thank you for the update.
Since you are running a parallel simulation, the last line in the debug files is not necessarily the line that is responsible for the hangup. Each rank will progress and print at similar but not necessarily the same pace, and pause at specific MPI instructions (barrier, allgather, etc.). This explains why the last printed command is not always the same rank. Yes, this is where most of the MPI ranks are hanging, but they are hanging because not all the ranks are present at that line. The challenge becomes finding the real bottleneck, which I believe to be the master rank inside the depletion routines.
[bold added to highlight shell commands / python modules]
If you search for which ranks exit the depletion function with grep “end timed deplete” fhr_p1b_c1b_8.o11272662, you will see that all ranks are present except for rank 0. The command grep “.* 0$” fhr_p1b_c1b_8.o11272662 reveals that the last command reached by rank 0 is the matrix formation, right before the calls to multiprocessing.Pool. Since you have a single burnable material and more than one MPI rank, the other ranks have no reaction rates nor compositions to deplete, and do not need to dispatch any work to subprocesses via multiprocessing.Pool, unlike the master rank.
I believe that the simulation is stalling because the master rank cannot spin up additional processes via multiprocessing.Pool, and therefore rank 0 hangs inside pool.deplete. All the other ranks skip this step, and move on to Results.save, where they hang on collective MPI actions like comm.allgather.
Unfortunately, this means that additional information on how the MPI distribution used on/preferred by BlueWaters interacts with multiprocessing.Pool.
I hope this makes sense, as it is proving to be more than the “simple fix” I believed it to be.
Regards,
Drew
Drew,
Yes, that completely makes sense.
I’ve been talking with the BlueWaters user support team and they mentioned that MPI does not mix well with multiprocessing.pool on Cray systems such as BlueWaters. “MPI stacks often are no very tolerant of calling fork() (which Pool does) since the fork()ed childing all try to access the same network hardware (at a low level since MPI wants to be very fast).” We also discussed whether this is a common issue with supercomputers, and they said that " It is much more likely to work on a “whitebox cluster” such as eg a campus compute cluster or most XSEDE machines (which are partially designed with this is mind) than on the heavy duty, purely HPC-only, Cray machines that are Blue Waters and the large DOE machines."
At first, we discussed the various fixes that could be employed, and a few more complicated fixes such as loading a newly compiled MPI created by them on BlueWaters for this problem. But, we realized that multiprocessing.pool is only utilized in pool.py. We checked if its use is performance-critical when running depletion on BlueWaters by altering the source code slightly, to not use pool.starmap, but instead use itertools.starmap. And we found that removing multiprocessing.pool did not have a huge impact on runtime. For example, I ran a simulation with 10 active, 1 inactive, 10000 particles, on 32 nodes and 32 openmp threads. The total runtime for one depletion step was 318 seconds, and the time spent in the starmap function for rank0 was only 1.6 seconds (which is ~0.5% of runtime).
I’m sure that the use of multiprocessing.pool speeds up the depletion run slightly, but it seems to not be completely necessary especially since it is only used with rank0. I’m not sure if anyone else using CRAY machines has ever come across this issue, or perhaps it is only a BlueWaters issue since it is a pretty old machine. With that, I’ll probably run my simulations with this slight change in the source code implemented, just so that it works on BlueWaters.
Thanks for your help,
Gwen
Gwendolyn,
Thank you for your patience and this update from the BlueWaters team. I’m talking with some of the developers now on if and how we can implement a fix. Supporting depletion on HPC environments is a worthy task, but it can come with some additional hardware / networking issues that are beyond our control.
As you have indicated, the use of itertools.starmap is a suitable alternative. With a single burnable material in your case, the multiprocessing routines don’t provide substantial improvements. However, as the number of burnable materials increase (in total and on each MPI rank), the multiprocessing routines are more important. This allows python to obtain updated compositions “in parallel” inside an MPI rank, similar in kind to running hybrid OpenMP threads inside an MPI environment.
If we do move forward with some additional fixes or discussion I will be sure to link them here for completeness. Additionally, if you or anyone working on HPC environments have proposed fixes, a pull request would be greatly appreciated!
Regards,
Drew
Drew,
Thanks for the clarification! During my talks with the Blue Waters support team, they also suggested replacing multiprocessing.Pool with an MPI.executor instead (maybe this: https://github.com/adrn/mpipool). I’ll play around with it, and see if I get anywhere. If I do, I’ll let you know and make a PR.
Best,
Gwen
Hi Gwen,
Drew and I did talk about potentially using MPIPoolExecutor from the mpi4py package (since it is already being used). The problem with any MPI-based solution is that it won’t address the need for parallelism within a single MPI process. As an example, if you are running on a node with two sockets, each having 16 cores, we would normally recommend using two MPI ranks, which then use OpenMP to fill up all the cores (for the transport solve). For depletion, multiprocessing is the means by which we can fill up all the cores within a socket, and using an MPI-based pool won’t achieve the same thing. I’m still a little puzzled as to why multiprocessing is causing problems. Effectively, what it is doing is the same as what OpenMP does (fork-join), and there is no MPI communication that happens during that part of the process. Anyway, we’ll keep brainstorming on this. One potential option is to just have a switch that disables the use of multiprocessing.
Best,
Paul
Hi Paul,
Thanks for the explanation. That makes sense. This might be a unique issue on BlueWaters if a similar issue is not occurring on the Argonne supercomputers.
Best,
Gwen
Gwendolyn,
After a good bit of discussion, Paul and I have merged a fix into the develop branch that should help this issue: https://github.com/openmc-dev/openmc/pull/1593
By changing the value of “openmc.deplete.pool.USE_MULTIPROCESSING” to False, the depletion will be done without multiprocessing and therefore not hang on BlueWaters or similar HPCs. Thank you for bringing this to our attention and providing us with extremely helpful information.
Regards,
Drew
1 Like
That’s great to hear! I’m glad I could help.
Best,
Gwen