Replace `ShardedHashMap` method `insert` with debug-checked `insert_unique`
Currently every use of `ShardedHashMap::insert` checks that it won't evict an old value due to unique key. I haven't found any issue related to that faulty condition, so I thought of replacing it with `ShardedHashMap::insert_unique` which doesn't check for this condition unless `debug_assertions` are enabled. This might improve the performance.
r? @petrochenkov
Replace the hacky macro with a generic function and a new
`FlatMapInPlaceVec` trait. More verbose but more readable and typical.
LLM disclosure: I asked Claude Code to critique this file and it
suggested the generic function + trait idea. I implemented the idea
entirely by hand.
compiler: update hashbrown to 0.17
See library's rust-lang/rust#155154 for the bug fixes this brings.
This PR also updates `indexmap` in the compiler as a direct dependent.
`HashStable::hash_stable` takes a `&mut Hcx`. In contrast,
`ToStableHashKey::to_stable_hash_key` takes a `&Hcx`. But there are
some places where the latter calls the former, and due to the mismatch a
`clone` call is required to get a mutable `StableHashingContext`.
This commit changes `to_stable_hash_key` to instead take a `&mut Hcx`.
This eliminates the mismatch, the need for the clones, and the need for
the `Clone` impls.
Take first task group for further execution
Continuing from https://github.com/rust-lang/rust/pull/153768#discussion_r2938828363.
I thought that storing a first group of tasks for immediate execution instead of pushing and immediately poping it from rayon's local task queue in par_slice would avoid overwhelming work stealing potentially blocking the original thread. So I've implemented this change.
8 threads benchmarks:
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>baseline~~9</b></th><td colspan="2"><b>new~take-first-group~1</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.1110s</td><td align="right">0.1086s</td><td align="right">💚 -2.13%</td></tr><tr><td>🟣 <b>hyper</b>:check:initial</td><td align="right">0.1314s</td><td align="right">0.1298s</td><td align="right">💚 -1.23%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.0771s</td><td align="right">0.0755s</td><td align="right">💚 -2.14%</td></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">0.3787s</td><td align="right">0.3757s</td><td align="right"> -0.80%</td></tr><tr><td>🟣 <b>clap</b>:check:initial</td><td align="right">0.4680s</td><td align="right">0.4564s</td><td align="right">💚 -2.48%</td></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.2337s</td><td align="right">0.2301s</td><td align="right">💚 -1.52%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">0.4321s</td><td align="right">0.4265s</td><td align="right">💚 -1.31%</td></tr><tr><td>🟣 <b>syn</b>:check:initial</td><td align="right">0.5586s</td><td align="right">0.5401s</td><td align="right">💚 -3.31%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.3434s</td><td align="right">0.3429s</td><td align="right"> -0.14%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.2755s</td><td align="right">0.2661s</td><td align="right">💚 -3.40%</td></tr><tr><td>🟣 <b>regex</b>:check:initial</td><td align="right">0.3350s</td><td align="right">0.3347s</td><td align="right"> -0.11%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.1851s</td><td align="right">0.1832s</td><td align="right">💚 -1.01%</td></tr><tr><td>Total</td><td align="right">3.5296s</td><td align="right">3.4695s</td><td align="right">💚 -1.70%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9837s</td><td align="right">💚 -1.63%</td></tr></table>
The `HashStable` and `ToStableHashKey` traits both have a type parameter
that is sometimes called `CTX` and sometimes called `HCX`. (In practice
this type parameter is always instantiated as `StableHashingContext`.)
Similarly, variables with these types are sometimes called `ctx` and
sometimes called `hcx`. This inconsistency has bugged me for some time.
The `HCX`/`hcx` form is more informative (the `H`/`h` indicates what
type of context it is) and it matches other cases like `tcx`, `dcx`,
`icx`.
Also, RFC 430 says that type parameters should have names that are
"concise UpperCamelCase, usually single uppercase letter: T". In this
case `H` feels insufficient, and `Hcx` feels better.
Therefore, this commit changes the code to use `Hcx`/`hcx` everywhere.
Unalign `PackedFingerprint` on all hosts, not just x86 and x86-64
Back in https://github.com/rust-lang/rust/pull/78646, `DepNode` was modified to store an unaligned `PackedFingerprint` instead of an 8-byte-aligned `Fingerprint`. That reduced the size of DepNode from 24 bytes to 17 bytes (nowadays 18 bytes), resulting in considerable memory savings in incremental builds.
See https://github.com/rust-lang/rust/pull/152695#issuecomment-3907091509 for a benchmark demonstrating the impact of *removing* that optimization.
At the time (and today), the unaligning was only performed on x86 and x86-64 hosts, because those CPUs are known to generally have low overhead for unaligned memory accesses. Hosts with other CPU architectures would continue to use an 8-byte-aligned fingerprint and a 24-byte DepNode.
Given the subsequent rise of aarch64 (especially on macOS) and other architectures, it's a shame that some commonly-used builds of rustc don't get those memory-size benefits, based on a decision made several years ago under different circumstances.
We don't have benchmarks to show the actual effect of unaligning DepNode fingerprints on various non-x86 hosts, but it seems very likely to be a good idea on Apple chips, and I have no particular reason to think that it will be catastrophically bad on other hosts. And we don't typically perform this kind of speculative pessimization in other parts of the compiler.
The current code for indexing into bucket arrays is quite tricky and unsafe,
partly because it has to keep manually assuring the compiler that a bucket
index is always less than 21.
By encapsulating that knowledge in a 21-value enum, we can make the code
clearer and safer, without giving up performance.
Having a dedicated `BucketIndex` type could also help with further cleanups of
`VecCache` indexing.
Remove redundant `is_dyn_thread_safe` checks
Refactor uses of `FromDyn` to reduce number of redundant `is_dyn_thread_safe` checks by replacing `FromDyn::from` with `check_dyn_thread_safe` in tandem with existing `FromDyn::derive` so that the users would avoid redundancy in the future.
PR is split up into multiple commits for an easier review.
LinkedGraph: support adding nodes and edges in arbitrary order
If an edge uses some not-yet-known node, we just leave the node's data empty, that data can be added later.
Use this support to avoid skipping edges in RetainedDepGraph.
This is continuation of https://github.com/rust-lang/rust/pull/152590, that PR just fixes the ICE, this PR also preserves all the edges in debug dumps.
This is also a minimized version of https://github.com/rust-lang/rust/pull/151821 with a smaller amount of data structure hacks.
core: make atomic primitives type aliases of `Atomic<T>`
Tracking issue: https://github.com/rust-lang/rust/issues/130539
This makes `AtomicI32` and friends type aliases of `Atomic<T>` by encoding their alignment requirements via the use of an internal `Storage` associated type. This is also used to encode that `AtomicBool` store a `u8` internally.
Modulo the `Send`/`Sync` implementations, this PR does not move any trait implementations, methods or associated functions – I'll leave that for another PR.
The manual `DynSend` implementation for `AtomicPtr` blocks the
auto-implementation for other atomic primitives since they forward to the same
`Atomic<T>` type now. This breakage cannot occur in user code as it depends on
`DynSend` being a custom auto-trait.
Use `HashStable` derive in more places
This applies `HashStable` derive in a couple more places. Also `stable_hasher` is declared for `HashStable_NoContext`.
If an edge uses some not-yet-known node, we just leave the node's data empty, that data can be added later.
Use this support to avoid skipping edges in DepGraphQuery
Use `scope` for `par_slice` instead of `join`
This uses `scope` instead of nested `join`s in `par_slice` so that each group of items are independent and do not end up blocking on another.