SEGV in XS calculation on 8K processes

Hi,

I encountered the SEGV when running OpenMC on Edison. This SEGV occurs exactly on the last process (rank = 8191) in calculate_sab_xs and is reproducible only on 8K processes. The run on 2K and 4K processes using the same benchmark works perfectly. (this is also why it is not easy to debug) Does it have something to do with the integer overflow during the cross section calculation when many processes involved?

OpenMC: Version: 0.5.2 Git SHA1: e9a5810ae36a4017c10f9fc24d2019c2fe9ca4d6
Platform: Cray XC30 system, https://www.nersc.gov/users/computational-systems/edison/
Compiler: ftn --version GNU Fortran (GCC) 4.9.1 20140716 (Cray Inc.)

Hi Nan,

It shouldn’t be related to integer overflow. Can you compile with debug flags turned on and re-run to get a traceback with line numbers?

Paul

Sure, I will enable debug info and try again. Since 8K allocation takes time to be scheduled, I will get back to you when it is done.

Here it is,

Nan- how many particles per batch are you running on this problem? I suspect that this may be due to running too few particles when you have 8k processors.

Thanks,
Paul

There were 5120000 particles, 640 particles/node, indeed two few. Then how many particles per node will be enough not to produce this problem?

-Nan

It’s hard to say because it’s problem dependent. In any event, the code should not segfault like that so I will look into how to prevent that from occurring.

Thanks,
Paul