Happy to see that he mentionned working on a memory pool in his todo list, malloc's mutex will really hinder the multi-threading performance here.
I recommend he uses: "Scalable Lock-Free Dynamic Memory Allocation" by Maged M. Michael
Incidentally, this is one of the guys who came up with the lock-free queue algorithm he uses.
"Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithmms" (1996) Maged M. Michael, Michael L. Scott
https://www.cs.rochester.edu/research/synchronization/pseudo... http://dl.acm.org/citation.cfm?id=248106 https://www.research.ibm.com/people/m/michael/podc-1996.pdf
also see the Concurrent Data Structures (CDS) library https://github.com/khizmax/libcds