HDF5 error when running OpenMC 0.12.0 with OpenMPI on cluster

Dear All,

I am running OpenMC 0.12.0 in parallel with OpenMPI on a HPC cluster. I installed it with anaconda.

So far all of my simulations have crashed due to HDF5 errors. A few weeks ago, I was using 0.12.0-dev version and the same error would happen but not systematically.

I am pasting the error message in this email as it seems I can’t upload a text file:

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 140299942157312:
#000: H5Odeprec.c line 200 in H5Oget_info_by_idx1(): can’t get info for object
major: Object header
minor: Can’t get value
#001: H5Oint.c line 2431 in H5O__get_info_by_idx(): can’t retrieve object info
major: Object header
minor: Can’t get value
#002: H5Oint.c line 2327 in H5O_get_info(): can’t retrieve object’s btree & heap info
major: Object header
minor: Can’t get value
#003: H5Goh.c line 401 in H5O__group_bh_info(): can’t retrieve symbol table size info
major: Symbol table
minor: Can’t get value
#004: H5Gstab.c line 671 in H5G__stab_bh_size(): iteration operator failed
major: B-Tree node
minor: Unable to initialize object
#005: H5B.c line 1992 in H5B_get_info(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#006: H5B.c line 1899 in H5B__get_info_helper(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#007: H5AC.c line 1625 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#008: H5C.c line 2362 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#009: H5C.c line 6621 in H5C_load_entry(): Can’t read image
major: Object cache
minor: Read failed
#010: H5Fio.c line 118 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#011: H5PB.c line 732 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#012: H5Faccum.c line 211 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#013: H5FDint.c line 205 in H5FD_read(): driver read request failed
major: Virtual File Layer
minor: Read failed
#014: H5FDsec2.c line 716 in H5FD_sec2_read(): file read failed: time = Mon Sep 14 18:46:35 2020
, filename = ‘/home/groups/rewing1/cross-sections/lib80x_hdf5/Gd157.h5’, file descriptor = 5, errno = 5, error message = ‘Input/output error’, buf = 0x55b4efdcb4c0, total read size = 544, bytes this sub-read = 544, bytes actually read = 18446744073709551615, offset = 8334376
major: Low-level I/O
minor: Read failed

The same message repeats again and again.

Would somebody know what is going on?

Thanks.

Julien

Hi @Julien. I have a suspicion about what might be happening. Usually strange HDF5 errors happen when there is a mismatch in HDF5 libraries being used to compile and dynamically loaded at runtime. This usually happens for depletion simulations because those happen through Python. The situation that happens is usually something like that following:

  • You compile OpenMC using one version of HDF5, let’s say 1.12 for this example.
  • You go to run a depletion simulation, which relies on Python
  • In the process, import h5py is called, which dynamically loads an HDF5 library, but not necessarily the same version that was used to compile OpenMC (let’s say 1.10 for this example)
  • When the openmc.deplete module dynamically loads the OpenMC shared library, the HDF5 libraries have already been loaded by virtue of import h5py.
  • Thus, OpenMC will try to use version 1.10 of HDF5 even though it was originally compiled with 1.12.

The reason I point out these specific versions is because there was an API change in 1.12 related to the H5Oget_info_by_idx function (see related PR here).

So, the solution here is to make sure that h5py itself gets built/installed with the same HDF5 library you are using to build OpenMC.

Hi Paul,

Thanks for the help. I think I understand. I do not use openmc.deplete but my own depletion module. Though my module uses openmc.data.IncidentNeutron which imports h5py.

I first tried to download hdf5 1.12.0 on my conda environment (as I assumed openmc was compiled with hdf5 1.12.0 on conda-forge?) and then install openmc 0.12.0 from conda but it did not work (conda install openmc ran into version conflicts, with hdf5 I believe).

Now, I have decided to create new conda environment with hdf5 1.10.6 installed but where I installed openmc from source (to ensure it is compiled with the same hdf5 as the one used during simulations). I will see if the error still appear.

Something that I have noticed is that when I git clone openmc and checkout master, the version installed end up being 0.11.0 where I expected 0.12.0. Am I doing something wrong?

Thanks a lot.

Julien

Nope, looks like we forgot to update the master branch. I’ve just updated it so that it points to the v0.12.0 tag now. Thanks for pointing that out!

EDIT: Disregard this, I recompiled everything and it is now working, thanks!

Hello, after installing OpenMC from source with DAGMC enabled in a Miniconda environment, I believe I’m having a similar problem. When I run DAGMC enabled codes, including the dagmc tests in the test suite, I get the following errors:

E           RuntimeError: OpenMC aborted unexpectedly.
/home/me/openmc/openmc/executor.py:38: RuntimeError
Warning! ***HDF5 library version mismatched error***
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.10.4, library is 1.10.6

I installed HDF5 using sudo apt install libhdf5-dev, and it says that 1.10.4 is the latest version available. I then install the OpenMC Python API using pip install -e .[test] in the OpenMC home directory, which installs version 3.4.0 of h5py.

I’ve tried uninstalling and reinstalling both h5py and libhdf5-dev, and I’ve also tried reinstalling h5py using HDF5_VERSION=1.10.4 pip install --no-binary=h5py h5py as described in the link Paul provided above, but I still get the same error. Any help with this issue would be greatly appreciated.