Simulation gets "stuck" between batches

Hi all, I am having an issue where OpenMC simulations get “stuck” between batches (e.g. it completes N of N+M batches and then never makes further process). When it gets stuck, each MPI process sits using 100% of a single thread, when running an actual batch I have it configured with OpenMP to run 24 threads per MPI process. A problem file seems to always get stuck on the same batch number. Running the problem with fewer total particles (less than the point it gets stuck) completes successfully.

It appears my problem is similar, or possibly identical, to these two threads:

in which it sounded like this was an issue with the tracklength tally estimator. Changing to a collision estimator is not a good solution for my problem, I think, as I have significant vacuum/void regions. I will try changing the tally dimensions to see if I can work around it, but I wanted to ask if there has been any progress on this bug? And if I need to change tally dimensions, does anyone have an understanding of what causes the problem to hang?

I am running v0.15.0 of OpenMC

Thank you,
Alex

Hi Alex, welcome to the community.
I also encountered a similar issue a long time ago, and then I realized that I had some geometry error after doing geometry debug even in low n histories. openmc.run(geometry_debug=True)
Have you checked your geometry?
Sorry if it didn’t solve your problem.

Yes, I’ve tried (with a much smaller number of particles) running geometry debug mode and it doesn’t turn up any cell overlap or other geometry issues. I have not tried to run a full number of particles in geometry debug mode as I assumed it would take a very long time, but if that’s recommended I will try it.

Sorry Alex, I can’t recommend it if you think this calculation is time-consuming and you need that time for other calculations.
I am just thinking that the problem might come from the geometry and the neutrons get into that coordinate after the long run. Because the random number generator in all mc code is not fully random right?
But yeah, maybe it came from the mpi - omp threading, and it doesn’t really a problem to the geometry. Have you tried to use thread only? So rather than using 12 mpi @ 12 thread, you use 144 threads by using OMP_NUM_THREADS=144 or openmc.run(threads=144). But that also going to need a lot of time and right now, time is limited.

I hope other members can get to you later. Sorry Alex

Right - when doing geometry debugging I did enough particles that each cell was checked but not the full number of particles, because the user guide suggested that there was a computational cost and the full run takes a long time already.

I’ve tried a few combinations of run configurations including a threaded only (no MPI) run and all failed.

After posting I found these two threads on gitlab:

Changing the mesh dimensions by a factor of 2 did not change behavior, but my simulations did complete with either a switch from tracklength to collision estimator, or by using RectilinearMesh instead of RegularMesh. So if there are bug(s) in those modules they seem to still be present?

Hi Alex, is it possible for you to give us the script to reproduce this problem? The minimum set of model: materials, geometry, tally, and settings for it to get stuck?

Unfortunately I’m not able to share the actual script used. I did confirm that the small example in this (Simple Simulation runs into endless loop with openmc.MeshSurfaceFilter · Issue #2855 · openmc-dev/openmc · GitHub) hangs for me with RegularMesh but completes with RectilinearMesh.

Hi Alex, I tried to replicate the case scenario from git and also got “stuck” when using RegularMesh and it works if it changes to RectilinearMesh as you said before. here is the notebook regmeshstuck.ipynb (111.1 KB)

But, when I plot the source defined by marquezj using regular mesh to see the xy and xz plot, I see that the source itself was a thin source in the z-axis, here is the plot
image
image

so basically I expect the flux will be low on the coordinate that was interested by marquezj which is x (-30 to 30) y(-30 to 30) z(30 to 60) as defined

mesh_surface = openmc.RegularMesh() # <-- Important
mesh_surface.lower_left = [-30, -30, 30]  # <-- Min z is important
mesh_surface.upper_right = [30, 30, 60]  # <-- Max z is important
mesh_surface.dimension = [1, 1, 1]

when I changed the z-min in lower left into slightly higher or lower, openmc did not get “stuck”

mesh_surface.lower_left = [-30, -30, 30.001]  # <-- Min z is important

that will also work if you use [-30, -30, 29.999].
and as predicted, the current is all zero

if I move the mesh coordinate to z between 0-1 cm, because I know that there are a lot of particles on that z position,

mesh_surface.lower_left = [-30, -30, 0]  # <-- Min z is important
mesh_surface.upper_right = [30, 30, 1]  # <-- Max z is important
mesh_surface.dimension = [1, 1, 1]

then as expected, we will get the surface current from that position and source configuration

In another case, when I used the original mesh specified mesh_surface.lower_left = [-30, -30, 30] but change the source definition to default isotropic at the origin by simply adding a hashtag to the source definition

# settings.source = source
settings.run_mode = 'fixed source'
settings.particles = 10000
settings.batches =  10

then, I expect that the source will be distributed isotropic, even on the z-axis like this
image

and we can expect that the surface current will have value on the x (-30 to 30) y(-30 to 30) z(30 to 60), and here the output surface current tally

so if this is the same case as what you encounter, you might want to try to plot your source using a regular mesh tally to see the source you defined, and you can slightly change the mesh coordinate if the problem is on the mesh definition.
sorry if it didn’t help in solving your problem