The issue fixed in #5767 should really have an associated benchmark.
Unfortunately it seems from @stephenworsley experiment that the current
It also looks like switching to tracemalloc would probably fix this -- i.e. enable us to see this kind of inefficiency.