mirror of
https://github.com/rust-lang/rust.git
synced 2026-05-23 02:27:39 +03:00
a47c9f870f
Improve performance of spsc_queue and stream. This PR makes two main changes: 1. It switches the `spsc_queue` node caching strategy from keeping a shared counter of the number of nodes in the cache to keeping a consumer only counter of the number of node eligible to be cached. 2. It separates the consumer and producers fields of `spsc_queue` and `stream` into a producer cache line and consumer cache line. Overall, it speeds up `mpsc` in `spsc` mode by 2-10x. Variance is higher than I'd like (that 2-10x speedup is on one benchmark), I believe this is due to the drop check in `send` (`fn stream::Queue::send:107`). I think this check can be combined with the sleep detection code into a version which only uses 1 shared variable, and only one atomic access per `send`, but I haven't looked through the select implementation enough to be sure. The code currently assumes a cache line size of 64 bytes. I added a CacheAligned newtype in `mpsc` which I expect to reuse for `shared`. It doesn't really belong there, it would probably be best put in `core::sync::atomic`, but putting it in `core` would involve making it public, which I thought would require an RFC. Benchmark runner is [here](https://github.com/JLockerman/queues/tree/3eca46279c53eb75833c5ecd416de2ac220bd022/shootout), benchmarks [here](https://github.com/JLockerman/queues/blob/3eca46279c53eb75833c5ecd416de2ac220bd022/queue_bench/src/lib.rs#L170-L293). Fixes #44512.