Failure in unit test

When running the test suite with 0.11.0, we’ve run into a problem with one of the unit tests, the output from which is below:

platform linux – Python 3.7.3, pytest-4.3.1, py-1.8.0, pluggy-0.9.0
rootdir: /openmc, inifile: pytest.ini
plugins: remotedata-0.3.1, openfiles-0.3.2, doctestplus-0.3.0, arraydiff-0.3
collected 429 items

regression_tests/asymmetric_lattice/test.py F
regression_tests/cmfd_feed/test.py …
regression_tests/cmfd_feed_2g/test.py F
regression_tests/cmfd_feed_expanding_window/test.py .
regression_tests/cmfd_feed_ng/test.py F
regression_tests/cmfd_feed_ref_d/test.py .
regression_tests/cmfd_feed_rolling_window/test.py .
regression_tests/cmfd_nofeed/test.py .
regression_tests/cmfd_restart/test.py .
regression_tests/complex_cell/test.py .
regression_tests/confidence_intervals/test.py .
regression_tests/create_fission_neutrons/test.py .
regression_tests/dagmc/legacy/test.py s
regression_tests/dagmc/refl/test.py s
regression_tests/dagmc/uwuw/test.py s
regression_tests/density/test.py .
regression_tests/deplete/test.py F
regression_tests/diff_tally/test.py F
regression_tests/distribmat/test.py .
regression_tests/eigenvalue_genperbatch/test.py .
regression_tests/eigenvalue_no_inactive/test.py .
regression_tests/energy_cutoff/test.py .
regression_tests/energy_grid/test.py .
regression_tests/energy_laws/test.py .
regression_tests/enrichment/test.py .
regression_tests/entropy/test.py .
regression_tests/filter_distribcell/test.py F
regression_tests/filter_energyfun/test.py .
regression_tests/filter_mesh/test.py F
regression_tests/fixed_source/test.py .
regression_tests/infinite_cell/test.py .
regression_tests/iso_in_lab/test.py .
regression_tests/lattice/test.py .
regression_tests/lattice_hex/test.py F
regression_tests/lattice_hex_coincident/test.py .
regression_tests/lattice_hex_x/test.py F
regression_tests/lattice_multiple/test.py .
regression_tests/lattice_rotated/test.py .
regression_tests/mg_basic/test.py .
regression_tests/mg_basic_delayed/test.py .
regression_tests/mg_convert/test.py .
regression_tests/mg_legendre/test.py .
regression_tests/mg_max_order/test.py .
regression_tests/mg_survival_biasing/test.py .
regression_tests/mg_tallies/test.py .
regression_tests/mgxs_library_ce_to_mg/test.py .
regression_tests/mgxs_library_condense/test.py F
regression_tests/mgxs_library_correction/test.py .
regression_tests/mgxs_library_distribcell/test.py F
regression_tests/mgxs_library_hdf5/test.py F
regression_tests/mgxs_library_histogram/test.py .
regression_tests/mgxs_library_mesh/test.py F
regression_tests/mgxs_library_no_nuclides/test.py F
regression_tests/mgxs_library_nuclides/test.py .
regression_tests/multipole/test.py .
regression_tests/output/test.py .
regression_tests/particle_restart_eigval/test.py .
regression_tests/particle_restart_fixed/test.py .
regression_tests/periodic/test.py .
regression_tests/photon_production/test.py .
regression_tests/photon_source/test.py .
regression_tests/plot/test.py .
regression_tests/plot_overlaps/test.py .
regression_tests/plot_voxel/test.py .
regression_tests/ptables_off/test.py .
regression_tests/quadric_surfaces/test.py F
regression_tests/reflective_plane/test.py .
regression_tests/resonance_scattering/test.py .
regression_tests/rotation/test.py .
regression_tests/salphabeta/test.py .
regression_tests/score_current/test.py .
regression_tests/seed/test.py .
regression_tests/source/test.py .
regression_tests/source_file/test.py .
regression_tests/sourcepoint_batch/test.py .
regression_tests/sourcepoint_latest/test.py .
regression_tests/sourcepoint_restart/test.py .
regression_tests/statepoint_batch/test.py .
regression_tests/statepoint_restart/test.py .
regression_tests/statepoint_sourcesep/test.py .
regression_tests/surface_tally/test.py .
regression_tests/survival_biasing/test.py .
regression_tests/tallies/test.py F
regression_tests/tally_aggregation/test.py F
regression_tests/tally_arithmetic/test.py .
regression_tests/tally_assumesep/test.py .
regression_tests/tally_nuclides/test.py .
regression_tests/tally_slice_merge/test.py .
regression_tests/trace/test.py .
regression_tests/track_output/test.py .
regression_tests/translation/test.py .
regression_tests/trigger_batch_interval/test.py .
regression_tests/trigger_no_batch_interval/test.py .
regression_tests/trigger_no_status/test.py .
regression_tests/trigger_tallies/test.py .
regression_tests/triso/test.py F
regression_tests/uniform_fs/test.py .
regression_tests/universe/test.py .
regression_tests/void/test.py .
regression_tests/volume_calc/test.py .
unit_tests/test_capi.py …
unit_tests/test_cell.py …
unit_tests/test_complex_cell_capi.py

At this point, the process fails abruptly without an error message. We attempted to run test_complex_cell_capi.py by itself, and get the error:

*** The MPI_Type_free() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

So the theory is that the first finalize() in the final lines of the complex_cell function (below) is causing the error.

model.export_to_xml()

openmc.capi.finalize()
openmc.capi.init()

openmc.capi.finalize()

Any insights from others who’ve recently run the tests?

Side question: we have quite a few test fails in the regression tests, but since the results are hashed, it’s difficult to know the magnitude of the failure. Is it possible these are just rounding errors?

Hi Sourena,

Can you provide the following information:

  • Information on how you built openmc, specifically the cmake command

  • Compiler information: which cxx / mpicxx C++ compiler and version

  • Output of the following test commands from the openmc directory

  • pytest -v -s --sw

  • pytest tests/unit_tests/test_complex_cell_capi.py -v -s --sw
    The switches in the pytest commands should provide information on what is failing by capturing the standard out (-s).

To your question on the hashed tests, round off errors are not out of the question. However, the number of failed regression tests indicates something possibly more substantial. I have different compiler versions than what openmc tests against, so two of my regression tests fail consistently because the computed values are off in the 5th or 6th decimal places.

Thank you,

Andrew

Hi Andrew. I’m probably the best one to answer the software building questions.

The cmake command was:

CC="$openmpi_root/bin/mpicc" \ CXX="$openmpi_root/bin/mpicxx" \ FC="$openmpi_root/bin/mpifort" \ CFLAGS="-march=skylake" \ CXXFLAGS="-march=skylake" \ FFLAGS="-march=skylake" \ LDFLAGS="-march=skylake" \ HDF5_ROOT="$hdf5_root" \ cmake3 \ -D "CMAKE_BUILD_TYPE:STRING=Release" \ -D "CMAKE_INSTALL_PREFIX:STRING=$install_dir" \ -D "openmp:BOOL=ON" \ -D "optimize:BOOL=ON" \ -D "debug:BOOL=OFF" \ ..

Open MPI 4.0.1 was used with GCC 8.3.0, though the GCC libraries (like libstdc++.so.6) from Anaconda 2019.03 (compiled with GCC 7.3.0) are what OpenMC is run with.

The output of pytest unit_tests/test_complex_cell_capi.py -v -s from the tests directory is:

`

============================= test session starts ==============================
platform linux – Python 3.7.3, pytest-4.3.1, py-1.8.0, pluggy-0.9.0 – /minerva/opt/anaconda3-2019.03/bin/python
cachedir: .pytest_cache
rootdir: /minerva/opt/openmc/build/openmc-0.11.0-dev/openmc-0.11.0-dev, inifile: pytest.ini
plugins: remotedata-0.3.1, openfiles-0.3.2, doctestplus-0.3.0, arraydiff-0.3
collected 5 items

unit_tests/test_complex_cell_capi.py::test_cell_box[1-expected_box0] *** The MPI_Type_free() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[devnode:04671] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

`
(There is no additional output.)

I attached the output of pytest --ignore=unit_tests/test_complex_cell_capi.py -v -s.

openmc_test_results.txt (627 KB)

Hi Cliff,

Thank you for this information. Looking through the log file, I think I have some explanations for some of the failed regression and/or unit tests.

  • Lack of NJOY executable. Some of the unit tests related to openmc.data rely on having an njoy executable that can be called simply as “njoy” from the command line. If you have already installed njoy, a simple fix is either adding the location of the executable to your PATH. Instructions on the use of njoy in the test suite can be found at https://docs.openmc.org/en/latest/devguide/tests.html#prerequisites
  • Lack of cythonized modules. Some of the unit tests are failing due to the inability to perform resonance reconstruction. This is done with openmc/data/resonance.pyx. These should be compiled for the project when the python package is installed with either “python setup.py develop” or “pip install -e .” from the main directory.
    From the output file, I can see there are small differences in isotopes produced through depletion. This could be due to compiler differences and/or differences in cross sections. Have you downloaded and installed the latest set of cross sections using “tools/ci/download-xs.sh”? Some changes were recently made to the structure of thermal scattering data, but openmc checks to ensure that the version of the libraries are up to date during transport [would likely see all regression tests failing as the version of cross section libraries does not match the version expected by openmc].

When the regression tests fail, most of the outputs are retained in the directory of the regression tests. Could you compare the files and look for large errors? This can be done with the “diff” command, e.g. “diff tests/regression_test/<test_name>/results*.dat”. Errors here will be similar to those in the depletion test, where compiler issues and/or cross section differences are at play. If you have the latest set of cross sections and the errors are very minor, it is likely compiler differences. Openmc is tested using version gcc/g++ 5.4.0, and I have minor errors using g++ 9.1.0.

As for the complex_cell_api failure, I am unsure about the cause. Is it possible for you to build openmc using the MPICH implementation? This is the MPI implementation used by the test suite. I believe this can be installed using Anaconda as well.

Regards.

Andrew

Just wanted to follow up on this. There are definitely some issues with OpenMPI but I’m working through them right now. I should have a pull request up soon that at least gets these tests working with OpenMPI. Like Andrew mentioned, there is some variability among different OSes due to subtle things like what versions of glibc and libm are being used, so it’s not surprising that you will see same failures, but hopefully we can get it down to just a few.

Best,
Paul

Thanks for the pointers Andrew.

I wasn’t necessarily looking to use NJOY2016, but for the sake of testing I got that working on our cluster and that of course made some tests happy. To fix the lack of cythonized modules for tests, I simply renamed the openmc directory in the source tree so the tests instead skipped looking there first and instead used the modules already installed in my PYTHONPATH.

I think Paul and I now have a fix in place for Open MPI, addressing the problem seen in test_complex_cell_capi: https://github.com/openmc-dev/openmc/pull/1335

We’re now down to 17 “failing” tests:
asymmetric_lattice: SHA512 hash, so discrepancy of unknown significance
cmfd_feed_2g: Close-ish, possibly just too few particles
cmfd_feed_ng: Close-ish, possibly just too few particles
full: Insignificant numerical discrepancies
diff_tally: Insignificant numerical discrepancies
filter_distribcell: SHA512 hash, so discrepancy of unknown significance
lattice_hex: Insignificant numerical discrepancies
lattice_hex_ox_surf: Insignificant numerical discrepancies
lattice_multiple: Insignificant numerical discrepancies
mgxs_library_condense: Insignificant numerical discrepancies
mgxs_library_distribcell: Insignificant numerical discrepancies
mgxs_library_hdf5: Insignificant numerical discrepancies
mgxs_library_no_nuclides: Insignificant numerical discrepancies
quadric_surfaces: Equivalent within provided uncertainty
tallies: SHA512 hash, so discrepancy of unknown significance
tally_aggregation: Insignificant numerical discrepancies
triso: Equivalent within provided uncertainty

It would really be nice not to have hashed results to compare to since all that can be said is that my OpenMC is not producing the exact same results as the reference OpenMC compile. I really can’t say anything about whether my OpenMC is working correctly or not based on that.

I suppose I should also be testing with multiple processes/threads. Paul had mentioned that I’d need mpi4py, so I’ve installed that now. How do I go about testing parallel execution? If running OpenMC in Python, I’d typically run something like this:

openmc.run(mpi_args=['srun', '--ntasks=4', '--cpus-per-task=11', '--account=code_test', '--constraint=qa-openmc-0.11.0-dev', '--hint=compute_bound', '--mpi=pmi2', '--cpu_bind=sockets', '--mem_bind=local'], threads=11)

How do I configure the test suite to work in such an environment?

Hi Cliff,

Yes, I agree that the having hashed test results does make it difficult to ascertain whether things are still within reason when they are different. I’ve put up an issue about the hashed results so we can work towards getting rid of them in favor of plain ASCII results.

As for running in the test suite in parallel, you should run pytest with the extra --mpi flag. There are also an --mpiexec= option (specify mpiexec executable, srun in your case) and --mpi-np= (number of processes). I don’t think we have a good way right now of specifying the other exotic arguments you may need (account, constraint, etc.). If running plain srun is not sufficient, let us know and I’ll look into adding an option for specifying arbitrary MPI arguments during testing.

Best regards,
Paul

Hi Paul,

I’m trying to get the tests to run with MPI by using Slurm environment variables instead of command line options. Here’s what I have so far:

SLURM_ACCOUNT="code_test" \ SLURM_CONSTRAINT="qa-openmc-0.11.0-dev" \ SLURM_CPU_BIND="sockets" \ SLURM_CPUS_PER_TASK=3 \ SLURM_HINT="compute_bound" \ SLURM_MEM_BIND="local" \ SLURM_MPI_TYPE="pmi2" \ SLURM_TIMELIMIT=5 \ OMP_NUM_THREADS=3 \ pytest -v -s --mpi --mpiexec=srun --mpi-np=3

However, if I run that, the tests fail with something like:

tests/regression_tests/asymmetric_lattice/test.py::test_asymmetric_lattice [mn41:450932] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 165
[mn41:450932] PMIX ERROR: NOT-FOUND in file gds_ds12_lock_pthread.c at line 199
[mn41:450933] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 165
[mn41:450933] PMIX ERROR: NOT-FOUND in file gds_ds12_lock_pthread.c at line 199
[mn41:450934] PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 165
[mn41:450934] PMIX ERROR: NOT-FOUND in file gds_ds12_lock_pthread.c at line 199
[mn41:450932] OPAL ERROR: Unreachable in file pmix3x_client.c at line 112
[mn41:450933] OPAL ERROR: Unreachable in file pmix3x_client.c at line 112
[mn41:450934] OPAL ERROR: Unreachable in file pmix3x_client.c at line 112

The application appears to have been direct launched using “srun”,
but OMPI was not built with SLURM’s PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

version 16.05 or later: you can use SLURM’s PMIx support. This
requires that you configure and build SLURM --with-pmix.

Versions earlier than 16.05: you must use either SLURM’s PMI-1 or
PMI-2 support. SLURM builds PMI-1 by default, or you can manually
install PMI-2. You must then build Open MPI using --with-pmi pointing
to the SLURM PMI library location.

Please configure as appropriate and try again.

If I only run the regression_tests, but omit the cmfd_* and deplete tests, then the other tests (like the one above) run with MPI without producing the above error. For the cmfd_* and deplete tests, mpi4py is imported–could that be the source of my problem? How should I be running the tests to avoid the issue?

Cliff

Cliff,

Are you able to use mpi4py to successfully run a short script? For example, you could try the following:

from mpi4py import MPI
import openmc.capi
openmc.capi.init(intracomm=MPI.COMM_WORLD)
openmc.capi.run()
openmc.capi.finalize()

Best,
Paul

Good question Paul.

I’ll start simple first. I’ll grab the input files from the basic example and try out your test script directly with no srun:

OMP_NUM_THREADS=3 OPENMC_CROSS_SECTIONS=/minerva/opt/openmc/data/nndc_hdf5/cross_sections.xml ./openmctest.py

That works, with OpenMC reporting 1 MPI process and 3 OpenMP threads.

Let’s try running your script with srun now:

OMP_NUM_THREADS=3 OPENMC_CROSS_SECTIONS=/minerva/opt/openmc/data/nndc_hdf5/cross_sections.xml srun --ntasks=3 --cpus-per-task=3 --account=code_test --constraint=qa-openmc-0.11.0-dev --hint=compute_bound --mpi=pmi2 --cpu_bind=sockets --mem_bind=local ./openmctest.py
That works too, with OpenMC reporting 3 MPI processes and 3 OpenMP threads.

I’m guessing that means that the testing code isn’t quite setup properly and needs fixing? Anything else I can do to help troubleshoot this?

Yes, it appears to be an issue with the way the test suite is setup.

I made a script called penv.sh that simply calls printenv, and ignores any arguments.

If I run:

pytest -v -s --mpi **--mpiexec=penv.sh** --mpi-np=3 \ tests/regression_tests/tally_arithmetic
there are no MPI environment variables shown. The mpiexec command would then setup a fresh MPI environment, and all would be well for the tally_arithmetic test.

If I run:
pytest -v -s --mpi **--mpiexec=penv.sh** --mpi-np=3 \ tests/regression_tests/tally_arithmetic \ tests/regression_tests/deplete
then due to mpi4py being loaded by the deplete test, there are a bunch of MPI environment variables that are set before launching the mpiexec command for tally_arithmetic. Those environment variables must interfere with how the subsequent mpiexec command and the MPI library setup communications for the OpenMC processes.

For the second case where the deplete test is selected to run, I see these extra environment variables for the tally_arithmetic test:

HFI_NO_BACKTRACE=1 IPATH_NO_BACKTRACE=1 OMPI_APP_CTX_NUM_PROCS=1 OMPI_MCA_ess=singleton OMPI_MCA_orte_ess_num_procs=1 OMPI_MCA_orte_launch=1 OMPI_MCA_orte_precondition_transports=75b54118aca3fbf7-9c0f505f774f772d OMPI_MCA_pmix=^s1,s2,cray,isolated ORTE_SCHIZO_DETECTION=ORTE PMIX_BFROP_BUFFER_TYPE=PMIX_BFROP_BUFFER_NON_DESC PMIX_DSTORE_21_BASE_PATH=/tmp/ompi.devnode.503/pid.123913/pmix_dstor_ds21_123913 PMIX_DSTORE_ESH_BASE_PATH=/tmp/ompi.devnode.503/pid.123913/pmix_dstor_ds12_123913 PMIX_GDS_MODULE=ds21,ds12,hash PMIX_MCA_mca_base_component_show_load_errors=1 PMIX_NAMESPACE=222822401 PMIX_PTL_MODULE=tcp,usock PMIX_RANK=0 PMIX_SECURITY_MODE=native PMIX_SERVER_TMPDIR=/tmp/ompi.devnode.503/pid.123913 PMIX_SERVER_URI21=222822400.0;tcp4://127.0.0.1:38411 PMIX_SERVER_URI2=222822400.0;tcp4://127.0.0.1:38411 PMIX_SERVER_URI3=222822400.0;tcp4://127.0.0.1:38411 PMIX_SYSTEM_TMPDIR=/tmp