Hi,
I am a Computer Scientist interested in OpenMC from a CS systems research point-of-view. I am looking to run a “typical” OpenMC example - a typical small problem that can be run on one node using OpenMP. I am just interested in profiling OpenMC, and do not intend to use the OpenMC API in any way directly.
I am trying to get the “pincell” example running, but I am unable to find the “cross_sections.xml” file anywhere in the OpenMC github repository. From the docs, I have figured out that I need some HDF5 files that describe the nuclide properties, and have been able to download the “nndc” directory with files in the .ace format. I have been unable to use the openmc-ace-to-hdf5 script succesfully on these .ace files.
I would be grateful if someone could help point out how I can generate the cross_sections.xml file from these .ace files for the “pincell” example. Any help would be much appreciated.
Regards,
Srinivasan Ramesh
Hi,
Sorry for this post - it seems that I was not following the documentation closely enough. I am able to get the cross_sections.xml file successfully. Kindly ingore this post.
Regards,
Srinivasan Ramesh
Hi Srinivasan,
I’ve attached a tarball with a single assembly model that should be suitable for your needs. One thing to note when doing performance analysis is that the run is divided into “inactive” and “active” batches. The distribution of fission source sites has to be iterated on until it reaches stationarity; at that point, the code can being tallying physical quantities of interest. Having tallies turned on during “active” batches means that the performance (number of particles simulalated per second) will be lower than during the “inactive” batches, where no tallies are being accumulated. If you want consistent numbers across the entire run, you can just remove tallies (by making sure there is no tallies.xml file in your directory).
For cross sections, it sounds like you might have figured it out. The easiest thing to do though is download the data that we use for CI testing, which you can find here:
https://anl.box.com/s/na85do11dfh0lb9utye2il5o6yaxx8hi
Let me know if I can be of any more help.
Best regards,
Paul
assembly.tar.gz (7.73 KB)
Hi Paul,
Thanks much for your reply! I had gotten the “pincell” example to work on about 100,000 particles for 500 or so batches (10 batches inactive) with a single generation per batch. As I had mentioned to you before, I am interested in the effect of power capping on the “rate of progress” (particles/second). It may interest you to know that this application (atleast for the “pincell” example) appears to have a rather constant rate of progress throughout the run, EXCEPT for the initial portion of where the “rate of progress” appears to be higher than the remaining portion of the run - this corresponds exactly with the number of “inactive” batches specified in the settings.xml.
Actually, your explanation is helpful to us - it shows that OpenMC has two phases. I need to re-run the power-capping experiment with the .xml files you have provided here.
Another observation we made:
===> the “rate of progress” for OpenMC almost exactly follows the “active power capping function” being applied to the PACKAGE (cores, caches). I have attached some images of how this rate of progress is affected by a variety of power capping functions (linearly decreasing power cap, step functions, etc). This is intended to mimic some of the behaviours that can be expected from future system power management software, and their interactions with daemons on the node that manage and actually implement the power capping.
Regards,
Srinivasan Ramesh
Hi Paul,
I had a quick question - I was wondering if you have profiled OpenMC’s memory usage levels on production or benchmarking runs? If so, what is the memory usage level (in terms of % of node memory) for production runs? Roughly how many particles do you suggest per node in order to study behavior of OpenMC using a single node that is “representive” of the way it runs on production? I remember you mentioning that OpenMC has a random (unstructured) memory usage pattern, so it sensitive to DRAM bandwidth. For the .xml files you have provided above, I see roughly ~0.1% memory usage using all the 24 cores to parallelize the code with OpenMP while instantiating 100,000 particles. I was wondering if this is typical?
Regards,
Srinivasan Ramesh
Hi Srinivasan,
The memory usage in OpenMC depends primarily on:
- The number of nuclides in the problem (and how many temperatures they appear at)
- The number/type of user-specified tallies
The number of nuclides depends on the problem you are modeling. For the simplest models, it might only be a handful, but for complex reactor models with fuel that has been depleted, there can be hundreds of nuclides (generally one is limited to nuclides that have neutron cross sections available, which is 400-500). A problem with 300 nuclides will require about 1 GB of memory just for the cross section data at a single temperature.
For tallies, the memory entirely depends on what a user is looking for in a problem. The memory required could be anything from 0 (if no tallies are specified) up to 1 TB (for a finely resolved full reactor model where the user requests reaction rates in every region). One important thing to keep in mind is that no decomposition of data is currently done, so the memory requirements are really “per process”. Tally data and cross sections are shared among OpenMP threads though, so there is very little memory overhead when using threads.
Let me know if you have further questions about memory usage.
Best regards,
Paul
Hi Paul,
Thank you for the reply! I have some followup questions because I am seeing some behavior that I do not understand.
Speaking for the example that you had forwarded me (assembly example in the tarball you forwarded me on this thread):
-
Kindly correct me if I am wrong: This problem has roughly ~50-60 nuclides and 1 tally (flux tally) based on the tallies.xml file.
-
For this problem, I am trying to calculate what is called the “Misses Per Operation (MPO)” metric, which is basically the Number of Last Level Cache Misses divided by the Number of Instructions Retired. This is a CPU frequency-independent metric that tells me whether the application is CPU bound or memory bound - higher the MPO, more is the memory-boundedness of the application.
-
With the STREAM benchmark as the ultimate reference point for memory-bound codes (it makes a whole bunch of random memory references, and does not use cache well), and LAMMPS LJ benchmark as the other extreme end (CPU bound, problem size almost fits within last level cache), I expected the MPO value of OpenMC to be closer to STREAM than to LAMMPS. But it is the other way around! Numbers:
a. STREAM (MPO): 0.0509
b. LAMMPS (MPO): 0.00032
c. OpenMC (MPO): 0.00020 !
This tells me that the example I am running is CPU bound, and this correlates with what I am seeing with the power-capping experiments. From our conversation, I had noted down that OpenMC is memory latency sensitive, and makes a lot of unstructured memory references that lead to inefficient use of the cache.
I was hoping you could help me understand these MPO results. Very interested in your response!
Best,