Unstructured Mesh Tallies Benchmark

Hello @paulromano @pshriwise @Shimwell

I am wondering if there has been any publications or background work other than this ( Allow MOAB k-d tree to be configured by paulromano · Pull Request #2976 · openmc-dev/openmc · GitHub ) on benchmarking performance using unstructured meshes to tally on. We are currently working with the super computers FASTER and LAUNCH at Texas A&M and a desktop cpu to compare the performance between OpenMP and OpenMPI while using unstructured meshes. a couple of the points we are looking at are:

- figure out what set up works well with OpenMP on AMD 7950X CPU but not well on FASTER’s Intel Xeon 8352Y. (this is a personal CPU vs HPC)

  • Compare OpenMP vs MPI performance between a small mesh and a large mesh setup, to show how OpenMP can perform as well (or even better) than MPI for the small mesh setup, but can’t keep up with MPI for the large mesh setup.
  • Compare conda-provided OpenMC vs manually-built OpenMC, with various optimization flags
  • Run the large mesh setup setup with optimal OpenMC version to estimate how much time is needed to run the large mesh experiment on Launch.

We would like to see how performance is impacted, produce results that can help find the optimal settings for large mesh runs (mpi runs better but with large meshes the memory is too large), and possible develop methods to speed up the performance.