Poor performance in openmc.deplete.pool

Dear developers,
I’m using your burnup module as an external dependency for the method of characteristics. And I encountered the problem of slow performance. Almost all the time is taken up by pool in the openmc.deplete.pool.py module. I use USE_MULTIPROCESSING and get_cpucount = 32, I noticed that nothing depends on the openmc.deplete.pool.NUM_PROCESSES variable, I set it to 1 and 32 and the calculation time for one step did not change (there are about 40k materials in the model). I came to the conclusion that it takes the longest to form a matrix from burnout chains chain.form_matrix. But it seems that their calculation occurs in n_result = list(pool.starmap(func, inputs)). But the source code pool.starmap relies on _map_async, and in _map_async sequential calculations occur

if not hasattr(iterable, ‘len’):
iterable = list(iterable)

_map_async.

I would like to know from the community: what is the bottleneck when calculating burnup and whether you have observed a sequential calculation in the function openmc.deplete.pool.deplete. May be there are some hints to increase performance?
I will be glad for any help.