Fix: On wasm targets, call `panic_in_cleanup` if panic occurs in cleanup
Previously this was not correctly implemented. Each funclet may need its own terminate block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which can have a terminate_block for each funclet. We key on the first basic block of the funclet -- in particular, this is the start block for the old case of the top level terminate function.
I also fixed the `terminate` handler to not be invoked when a foreign exception is raised, mimicking the behavior from msvc. On wasm, in order to avoid generating a `catch_all` we need to call `llvm.wasm.get.exception` and `llvm.wasm.get.ehselector`.
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.
Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
extend unpin noalias tests to cover mutable references
https://github.com/rust-lang/rust/pull/152946 made a change to the logic for this attribute that the test should have flagged as problematic -- but the test only checked `Box`, not `&mut`, and those have independent code paths. So extend the test to also cover `&mut`.
@b-naber would be nice if you could confirm that the added tests do fail with your PR.
Simplify internals of `{Rc,Arc}::default`
This commit simplifies the internal implementation of `Default` for these two pointer types to have the same performance characteristics as before (a side effect of changes in rust-lang/rust#131460) while avoid use of internal private APIs of Rc/Arc. To preserve the same codegen as before some non-generic functions needed to be tagged as `#[inline]` as well, but otherwise the same IR is produced before/after this change.
The motivation of this commit is I was studying up on the state of initialization of `Arc` and `Rc` and figured it'd be nicer to reduce the use of internal APIs and instead use public stable APIs where possible, even in the implementation itself.
Simplify the canonical enum clone branches to a copy statement
I have overhauled MatchBranchSimplification in this PR. This pass tries to unify statements one by one, which is more readable and extensible.
This PR also unifies the following pattern that is mostly generated by GVN into one basic block that contains the copy statement:
```rust
match a {
Foo::A(_) => *a,
Foo::B => Foo::B
}
```
Fixes https://github.com/rust-lang/rust/issues/128081.
Pass alignments through the shim as `Alignment` (not `usize`)
We're using `Layout` on both sides, so might as well skip the transmutes back and forth to `usize`.
The mir-opt test shows that doing so allows simplifying the boxed-slice drop slightly, for example.
tests: adapt align-offset.rs for InstCombine improvements in LLVM 23
Upstream [has improved InstCombine](https://github.com/llvm/llvm-project/commit/8d2078332c23b10dcf3571adc1a186e5c65f82df) so that it can shrink added constants using known zeroes, which caused a little bit of change in this test. As far as I can tell either output is fine, so we just accept both.
@rustbot label: +llvm-main
UnsafePinned: implement opsem effects of UnsafeUnpin
This implements the next step for https://github.com/rust-lang/rust/issues/125735: actually making `UnsafePinned` have special opsem effects by suppressing the `noalias` *even if* the type is wrapped in an `Unpin` wrapper.
For backwards compatibility we also still keep the `Unpin` hack, i.e. a type must be both `Unpin` and `UnsafeUnpin` to get `noalias`.
Optimize indexing slices and strs with inclusive ranges
Instead of separately checking for `end == usize::MAX` and `end + 1 > slice.len()`, we can check for `end >= slice.len()`. Also consolidate all the str indexing related panic functions into a single function which reports the correct error depending on the arguments, as the slice indexing code already does.
The downside of all this is that the panic message is slightly less specific when trying to index with `[..=usize::MAX]`: instead of saying "attempted to index str up to maximum usize" it just says "end byte index {end} out of bounds". But this is a rare enough case that I think it is acceptable
This commit simplifies the internal implementation of `Default` for
these two pointer types to have the same performance characteristics as
before (a side effect of changes in 131460) while avoid use of internal
private APIs of Rc/Arc. To preserve the same codegen as before some
non-generic functions needed to be tagged as `#[inline]` as well, but
otherwise the same IR is produced before/after this change.
The motivation of this commit is I was studying up on the state of
initialization of `Arc` and `Rc` and figured it'd be nicer to reduce the
use of internal APIs and instead use public stable APIs where possible,
even in the implementation itself.
Make operational semantics of pattern matching independent of crate and module
The question of "when does matching an enum against a pattern of one of its variants read its discriminant" is currently an underspecified part of the language, causing weird behavior around borrowck, drop order, and UB.
Of course, in the common cases, the discriminant must be read to distinguish the variant of the enum, but currently the following exceptions are implemented:
1. If the enum has only one variant, we currently skip the discriminant read.
- This has the advantage that single-variant enums behave the same way as structs in this regard.
- However, it means that if the discriminant exists in the layout, we can't say that this discriminant being invalid is UB. This makes me particularly uneasy in its interactions with niches – consider the following example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=5904a6155cbdd39af4a2e7b1d32a9b1a)), where miri currently doesn't detect any UB (because the semantics don't specify any):
<details><summary>Example 1</summary>
```rust
#![allow(dead_code)]
use core::mem::{size_of, transmute};
#[repr(u8)]
enum Inner {
X(u8),
}
enum Outer {
A(Inner),
B(u8),
}
fn f(x: &Inner) {
match x {
Inner::X(v) => {
println!("{v}");
}
}
}
fn main() {
assert_eq!(size_of::<Inner>(), 2);
assert_eq!(size_of::<Outer>(), 2);
let x = Outer::B(42);
let y = &x;
f(unsafe { transmute(y) });
}
```
</details>
2. For the purpose of the above, enums with marked with `#[non_exhaustive]` are always considered to have multiple variants when observed from foreign crates, but the actual number of variants is considered in the current crate.
- This means that whether code has UB can depend on which crate it is in: https://github.com/rust-lang/rust/issues/147722
- In another case of `#[non_exhaustive]` affecting the runtime semantics, its presence or absence can change what gets captured by a closure, and by extension, the drop order: https://github.com/rust-lang/rust/issues/147722#issuecomment-3674554872
- Also at the above link, there is an example where removing `#[non_exhaustive]` can cause borrowck to suddenly start failing in another crate.
3. Moreover, we currently make a more specific check: we only read the discriminant if there is more than one *inhabited* variant in the enum.
- This means that the semantics can differ between `foo<!>`, and a copy of `foo` where `T` was manually replaced with `!`: rust-lang/rust#146803
- Moreover, due to the privacy rules for inhabitedness, it means that the semantics of code can depend on the *module* in which it is located.
- Additionally, this inhabitedness rule is even uglier due to the fact that closure capture analysis needs to happen before we can determine whether types are uninhabited, which means that whether the discriminant read happens has a different answer specifically for capture analysis.
- For the two above points, see the following example ([playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=a07d8a3ec0b31953942e96e2130476d9)):
<details><summary>Example 2</summary>
```rust
#![allow(unused)]
mod foo {
enum Never {}
struct PrivatelyUninhabited(Never);
pub enum A {
V(String, String),
Y(PrivatelyUninhabited),
}
fn works(mut x: A) {
let a = match x {
A::V(ref mut a, _) => a,
_ => unreachable!(),
};
let b = match x {
A::V(_, ref mut b) => b,
_ => unreachable!(),
};
a.len(); b.len();
}
fn fails(mut x: A) {
let mut f = || match x {
A::V(ref mut a, _) => (),
_ => unreachable!(),
};
let mut g = || match x {
A::V(_, ref mut b) => (),
_ => unreachable!(),
};
f(); g();
}
}
use foo::A;
fn fails(mut x: A) {
let a = match x {
A::V(ref mut a, _) => a,
_ => unreachable!(),
};
let b = match x {
A::V(_, ref mut b) => b,
_ => unreachable!(),
};
a.len(); b.len();
}
fn fails2(mut x: A) {
let mut f = || match x {
A::V(ref mut a, _) => (),
_ => unreachable!(),
};
let mut g = || match x {
A::V(_, ref mut b) => (),
_ => unreachable!(),
};
f(); g();
}
```
</details>
In light of the above, and following the discussion at rust-lang/rust#138961 and rust-lang/rust#147722, this PR ~~makes it so that, operationally, matching on an enum *always* reads its discriminant.~~ introduces the following changes to this behavior:
- matching on a `#[non_exhaustive]` enum will always introduce a discriminant read, regardless of whether the enum is from an external crate
- uninhabited variants now count just like normal ones, and don't get skipped in the checks
As per the discussion below, the resolution for point (1) above is that it should land as part of a separate PR, so that the subtler decision can be more carefully considered.
Note that this is a breaking change, due to the aforementioned changes in borrow checking behavior, new UB (or at least UB newly detected by miri), as well as drop order around closure captures. However, it seems to me that the combination of this PR with rust-lang/rust#138961 should have smaller real-world impact than rust-lang/rust#138961 by itself.
Fixesrust-lang/rust#142394Fixesrust-lang/rust#146590Fixesrust-lang/rust#146803 (though already marked as duplicate)
Fixes parts of rust-lang/rust#147722Fixesrust-lang/miri#4778
r? @Nadrieril @RalfJung
@rustbot label +A-closures +A-patterns +T-opsem +T-lang
We're using `Layout` on both sides, so might as well skip the transmutes back and forth to `usize`.
The mir-opt test shows that doing so allows simplifying the boxed-slice drop slightly, for example.
llvm will look at both
1. the values of "target-features" and
2. the function string attributes.
this removes the redundant function string attribute because it is not needed at all.
rustc sets the `+backchain` attribute through `target_features_attr(...)`
Replace `self.end() == usize::MAX` and `self.end() + 1 > slice.len()`
with `self.end() >= slice.len()`. Same reasoning as previous commit.
Also consolidate the str panicking functions into function.
The checks for `self.end() == usize::MAX` and `self.end() + 1 > slice.len()`
can be replaced with `self.end() >= slice.len()`, since
`self.end() < slice.len()` implies both
`self.end() <= slice.len()` and
`self.end() < usize::MAX`.
Add a `codegen-llvm` test to check the number of `icmp` instrucitons
generated for each `SliceIndex` method on the various range types. This
will be updated in the next commit when `SliceIndex::get` is optimized
for `RangeInclusive`.
Upstream has improved InstCombine so that it can shrink added constants
using known zeroes, which caused a little bit of change in this test. As
far as I can tell either output is fine, so we just accept both.
Cleanup offload datatransfer
There are 3 steps to run code on a GPU: Copy data from the host to the device, launch the kernel, and move it back.
At the moment, we have a single variable describing the memory handling to do in each step, but that makes it hard for LLVM's opt pass to understand what's going on. We therefore split it into three variables, each only including the bits relevant for the corresponding stage.
cc @jdoerfert @kevinsala
r? compiler
Stabilize `core::hint::cold_path`
`cold_path` has been around unstably for a while and is a rather useful tool to have. It does what it is supposed to and there are no known remaining issues, so stabilize it here (including const).
Newly stable API:
```rust
// in core::hint
pub const fn cold_path();
```
I have opted to exclude `likely` and `unlikely` for now since they have had some concerns about ease of use that `cold_path` doesn't suffer from. `cold_path` is also significantly more flexible; in addition to working with boolean `if` conditions, it can be used in `match` arms, `if let`, closures, and other control flow blocks. `likely` and `unlikely` are also possible to implement in user code via `cold_path`, if desired.
Closes: https://github.com/rust-lang/rust/issues/136873 (tracking issue)
---
There has been some design and implementation work for making `#[cold]` function in more places, such as `if` arms, `match` arms, and closure bodies. Considering a stable `cold_path` will cover all of these usecases, it does not seem worth pursuing a more powerful `#[cold]` as an alternative way to do the same thing. If the lang team agrees, then:
Closes: https://github.com/rust-lang/rust/issues/26179
Closes: https://github.com/rust-lang/rust/pull/120193
Remove dummy loads on offload codegen
The current logic generates two dummy loads to prevent some globals from being optimized away. This blocks memtransfer loop hoisting optimizations, so it's time to remove them.
r? @ZuseZ4
skip codegen for intrinsics with big fallback bodies if backend does not need them
This hopefully fixes the perf regression from https://github.com/rust-lang/rust/pull/148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.
Fixes https://github.com/rust-lang/rust/issues/149945
Cc @scottmcm @bjorn3
Fix autodiff codegen tests
Preparing autodiff for release on nightly. Since we haven't been running these tests in CI, they regressed over the last months. These changes fixes this and hopefully make the tests more robust for the future.
r? compiler
Add codegen test for SLP vectorization
close: rust-lang/rust#142519
This PR adds a codegen regression test for rust-lang/rust#142519. A regression in LLVM to fail to auto-vectorize, leading to significant performance loss.
The SLP vectorizer correctly groups the 4-byte operations into <4 x i8> vectors.
The loop state is maintained in SIMD registers (phi <4 x i8>).
The test remains robust across architectures (AArch64 vs x86_64) by allowing flexible store types (i32 or <4 x i8>).