"Deplete" test hangs when openmc built with Libmesh

Hi, I’ve having trouble with the test regression_tests/deplete/test.py when I’ve built OpenMC with libmesh. It looks like PETSC returns an error, but the calling process just hangs. My libmesh version is 1.7.0; and libmesh was built with PETSC version is 3.14.2. I only discovered the PETSC error printout when I ran pytest with the option --capture=no, and need to manually kill the test with CTRL-C.
PETSC error is

PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.14.2, unknown
[0]PETSC ERROR: openmc on a named helen-Latitude-7400 by helen Fri Jun 4 10:39:04 2021
[0]PETSC ERROR: Configure options --prefix=/home/helen/local/petsc --with-make-np=4 --download-hypre=1 --with-debugging=no --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-strumpack=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices --prefix=/home/helen/local/petsc
[0]PETSC ERROR: #1 User provided function() line 0 in unknown file
^C
!!! KeyboardInterrupt !!!

Full traceback is too long to reproduce in its entirety but here’s the end of it:

/usr/lib/python3.8/multiprocessing/popen_fork.py:27: KeyboardInterrupt
======================================================= 459 deselected, 1 warning in 137.18s (0:02:17) =======================================================
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

traceback.xml (38.3 KB)

Added full traceback as one giant xml comment (xml due to limited file formats).

Thanks for reporting, @helen-brooks! I’ll take a look sometime soon.

1 Like

I’m experiencing nearly the exact same error message when I try to run my own depletion simulation. A similar, but slightly different, error occurs when I try to run the code from this notebook locally. The slightly different trace for the notebook is included below

 Creating state point openmc_simulation_n0.h5...
Traceback (most recent call last):
  File "/home/lgross/gcmr/depletion/test_notebook/depletion.py", line 47, in <module>
    integrator.integrate()
  File "/home/lgross/openmc/openmc/deplete/abc.py", line 817, in integrate
    proc_time, conc_list, res_list = self(conc, res.rates, dt, source_rate, i)
  File "/home/lgross/openmc/openmc/deplete/integrators.py", line 56, in __call__
    proc_time, conc_end = self._timed_deplete(conc, rates, dt)
  File "/home/lgross/openmc/openmc/deplete/abc.py", line 711, in _timed_deplete
    results = deplete(
  File "/home/lgross/openmc/openmc/deplete/pool.py", line 121, in deplete
    x_result = list(pool.starmap(func, inputs))
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/home/lgross/.pyenv/versions/3.10.5/lib/python3.10/multiprocessing/pool.py", line 475, in _map_async
    iterable = list(iterable)
  File "/home/lgross/openmc/openmc/deplete/chain.py", line 662, in form_matrix
    k = self.nuclide_dict[target]
KeyError: 'Cs136'
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple MacOS to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Signal received
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.16.6, unknown 
[0]PETSC ERROR: openmc on a arch-moose named ulam by lgross Thu Jul 20 14:47:35 2023
[0]PETSC ERROR: Configure options --download-hypre=1 --with-shared-libraries=1 --download-hdf5=1 --download-hdf5-fortran-bindings=0   --with-debugging=no --download-fblaslapack=1 --download-metis=1 --download-ptscotch=1 --download-parmetis=1 --download-superlu_dist=1 --download-mumps=1 --download-strumpack=1 --download-scalapack=1 --download-slepc=1 --with-mpi=1 --with-openmp=1 --with-cxx-dialect=C++11 --with-fortran-bindings=0 --with-sowing=0 --with-64-bit-indices 
[0]PETSC ERROR: #1 User provided function() at unknown file:0
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=59
:
system msg for write_line failure : Bad file descriptor

Was there ever any consensus on what is causing this issue? Figured I’d ask before diving into valgrind

Figured out this issue was due to a mistake in chain_simple.xml. The issue discussion is here and the solution PR is here