I have spent a fortnight or so, over the last few months, learning about how to manage multiple worker threads on Apple operating systems on Apple’s ARM processors. Like many ARM processors, the cores are grouped into clusters, which have a shared level of cache.
Apple’s OSes will tell you many things about the cores, but are very enigmatic about the ordinal numbers of those cores. That’s because they don’t provide an API for pinning threads to cores; they insist that the OS can do that better. They didn’t tell you much about the cores until we pointed that out for macOS 11, and they added query functions in macOS 12.
I have been trying to get our multi-threaded performance tests to run in a reasonably consistent amount of elapsed time, so that we can use them to observe the effects of changes in algorithms. The noise level when I started was horrible: factors of two in elapsed time for identical code and test data.
In late January, I got it to an acceptable level of consistency and that was released. A customer soon came back with a bug report, pointing out that it had got significantly slower. It had. The performance tests had been so noisy that we weren’t sure, but their example was good, and demonstrated that it was slower.
On Thursday, I spent the afternoon refining their example, and then carefully taking out my changes. It appears that the slowdowns were caused by the overheads of turning the priority of each thread up when it had work to do, and down again when it didn’t. Apple’s pthreads implementation is slow, because it isn’t the native threading of the OS, but a wrapper over the native Mach threads.
The thing that had worked to get consistency was reaching through the pthreads layer to use a Mach threads feature that pthreads lacks: the ability to say that specific threads should be associated with each other because they’ll be communicating frequently. This is obscure macOS functionality, to say the least. What I pulled out on Thursday was all the work I’d done before adding associations, following Apple’s guidelines. I have ideas about doing more, but I need to let my brain cool down a bit first. I also need to let statistics from the daily performance test runs accumulate!
I want to award that customer a “Good bug report!” prize.