Remove fewer Storage calls in CopyProp and GVN
Modify the CopyProp and GVN MIR optimization passes to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see rust-lang/rust#141649.
### Details
The idea is to use a new `MaybeUninitializedLocals` analysis and remove only the storage calls of locals that are maybe-uninit when accessed in a new location.
preserve SIMD element type information
Preserve the SIMD element type and provide it to LLVM for better optimization.
This is relevant for AArch64 types like `int16x4x2_t`, see also https://github.com/llvm/llvm-project/issues/181514. Such types are defined like so:
```rust
#[repr(simd)]
struct int16x4_t([i16; 4]);
#[repr(C)]
struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
```
Previously this would be translated to the opaque `[2 x <8 x i8>]`, with this PR it is instead `[2 x <4 x i16>]`. That change is not relevant for the ABI, but using the correct type prevents bitcasts that can (indeed, do) confuse the LLVM pattern matcher.
This change will make it possible to implement the deinterleaving loads on AArch64 in a portable way (without neon-specific intrinsics), which means that e.g. Miri or the cranelift backend can run them without additional support.
discussion at [#t-compiler > loss of vector element type information](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/loss.20of.20vector.20element.20type.20information/with/584483611)
hwaddress: automatically add `-Ctarget-feature=+tagged-globals`
Note that since HWAddressSanitizer is/should be a target modifier, we do not have to worry about whether this LLVM target feature changes the ABI.
Fixes: rust-lang/rust#148185
test `#[naked]` with `#[link_section = "..."]` on windows
As a part of https://github.com/rust-lang/rust/pull/147811 I ran into that we actually don't match (current) LLVM output.
r? @mati865
Use convergent attribute to funcs for GPU targets
On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary.
This affects the amdgpu and nvptx targets.
cc @kjetilkjeka, @kulst for nvptx
cc @ZuseZ4
r? @nnethercote, as you already reviewed this in the other PR
Split out from rust-lang/rust#149637, the part here should be uncontroversial.
Explicitly forget the zero remaining elements in `vec::IntoIter::fold()`.
[Original description:] ~~This seems to help LLVM notice that dropping the elements in the destructor of `IntoIter` is not necessary. In cases it doesn’t help, it should be cheap since it is just one assignment.~~
This PR adds a function to `vec::IntoIter()` which is used used by `fold()` and `spec_extend()`, when those operations complete, to forget the zero remaining elements and only deallocate the allocation, ensuring that there will never be a useless loop to drop zero remaining elements when the iterator is dropped.
This is my first ever attempt at this kind of codegen micro-optimization in the standard library, so please let me know what should go into the PR or what sort of additional systematic testing might indicate this is a good or bad idea.
GCI: During reachability analysis don't try to evaluate the initializer of overly generic free const items
We generally don't want the initializer of free const items to get evaluated if they have any non-lifetime generic parameters. However, while I did account for that in HIR analysis & mono item collection (rust-lang/rust#136168 & rust-lang/rust#136429), I didn't account for reachability analysis so far which means that on main we still evaluate such items if they are *public* for example.
The closed PR rust-lang/rust#142293 from a year ago did address that as a byproduct but of course it wasn't merged since its primary goal was misguided. This PR extracts & improves upon the relevant parts of that PR which are necessary to fix said issue.
Follow up to rust-lang/rust#136168 & rust-lang/rust#136429.
Partially supersedes rust-lang/rust#142293.
Part of rust-lang/rust#113521.
r? @BoxyUwU
Rollup of 5 pull requests
Successful merges:
- rust-lang/rust#154376 (Remove more BuiltinLintDiag variants - part 4)
- rust-lang/rust#154731 (llvm: Fix array ABI test to not check equality implementation)
- rust-lang/rust#127534 (feat(core): impl Step for NonZero<u*>)
- rust-lang/rust#154703 (Fix trailing comma in lifetime suggestion for empty angle brackets)
- rust-lang/rust#154776 (Fix ICE in read_discriminant for enums with non-contiguous discriminants)
LLVM has moved memcmp expansion in the pipeline, resulting in the bcmp
call being expanded into loads and register comparisons, which breaks
the test.
Based on history, I believe the test actually intended validate that
these arrays were being passed as pointer arguments, which can be done
more directly.
Clang changed to representing tuples of scalable vectors as
structs rather than as wide vectors (that is, scalable vector types
where the `N` part of the `<vscale x N x ty>` type was multiplied by
the number of vectors). rustc mirrored this in the initial implementation
of scalable vectors.
Earlier versions of our patches used the wide vector representation and
our intrinsic patches used the legacy
`llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating
these tuples/getting/setting the vectors, which were only supported
due to LLVM's `AutoUpgrade` pass converting these intrinsics into
`llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics
with the wide vector representation.
With the current struct representation, Clang has special handling in
codegen for generating `insertvalue`/`extractvalue` instructions for
these operations, which must be replicated by rustc's codegen for our
intrinsics to use. This patch implements new intrinsics in
`core::intrinsics::scalable` (mirroring the structure of
`core::intrinsics::simd`) which rustc lowers to the appropriate
`insertvalue`/`extractvalue` instructions.
this enables packed-stack just as -mpacked-stack in clang and gcc.
packed-stack is needed on s390x for kernel development.
Co-authored-by: Ralf Jung <post@ralfj.de>
simd_fmin/fmax: make semantics and name consistent with scalar intrinsics
This is the SIMD version of https://github.com/rust-lang/rust/pull/153343: change the documented semantics of the SIMD float min/max intrinsics to that of the scalar intrinsics, and also make the name consistent. The overall semantic change this amounts to is that we restrict the non-determinism: the old semantics effectively mean "when one input is an SNaN, the result non-deterministically is a NaN or the other input"; the new semantics say that in this case the other input must be returned. For all other cases, old and new semantics are equivalent. This means all users of these intrinsics that were correct with the old semantics are still correct: the overall set of possible behaviors has become smaller, no new possible behaviors are being added.
In terms of providers of this API:
- Miri, GCC, and cranelift already implement the new semantics, so no changes are needed.
- LLVM is adjusted to use `minimumnum nsz` instead of `minnum`, thus giving us the new semantics.
In terms of consumers of this API:
- Portable SIMD almost certainly wants to match the scalar behavior, so this is strictly a bugfix here.
- Stdarch mostly stopped using the intrinsic, except on nvptx, where arguably the new semantics are closer to what we actually want than the old semantics (https://github.com/rust-lang/stdarch/issues/2056).
Q: Should there be an `f` in the intrinsic name to indicate that it is for floats? E.g., `simd_fminimum_number_nsz`?
Also see https://github.com/rust-lang/rust/issues/153395.
stabilizes `core::range::RangeFrom`
stabilizes `core::range::RangeFromIter`
add examples for `remainder` method on range iterators
`RangeFromIter::remainder` was not stabilized (see issue 154458)
fromrangeiter-overflow-checks: accept optional `signext` for argument
On some targets such as LoongArch64 and RISCV64, the ABI requires sign-extension for 32-bit integer arguments, so LLVM may emit the `signext` attribute for the `%range` parameter. The existing CHECK pattern required the argument to be exactly `i32 noundef %range`, causing the test to fail on those targets.
Allow an optional `signext` attribute in the CHECK pattern so the test passes consistently across architectures without affecting the intended codegen validation.
debuginfo: emit DW_TAG_call_site entries
Set `FlagAllCallsDescribed` on function definition DIEs so LLVM emits DW_TAG_call_site entries, letting debuggers and analysis tools track tail calls.
Add `-Zsanitize=kernel-hwaddress`
The Linux kernel has a config option called `CONFIG_KASAN_SW_TAGS` that enables `-fsanitize=kernel-hwaddress`. This is not supported by Rust.
One slightly awkward detail is that `#[sanitize(address = "off")]` applies to both `-Zsanitize=address` and `-Zsanitize=kernel-address`. Probably it was done this way because both are the same LLVM pass. I replicated this logic here for hwaddress, but it might be undesirable.
Note that `#[sanitize(kernel_hwaddress = "off")]` could be supported as an annotation on statics, but since it's also missing for `#[sanitize(hwaddress = "off")]`, I did not add it.
MCP: https://github.com/rust-lang/compiler-team/issues/975
Tracking issue: https://github.com/rust-lang/rust/issues/154171
cc @rcvalle @maurer @ojeda
On some targets such as LoongArch64 and RISCV64, the ABI requires
sign-extension for 32-bit integer arguments, so LLVM may emit the
`signext` attribute for the `%range` parameter. The existing CHECK
pattern required the argument to be exactly `i32 noundef %range`,
causing the test to fail on those targets.
Allow an optional `signext` attribute in the CHECK pattern so the test
passes consistently across architectures without affecting the intended
codegen validation.
refactor RangeFromIter overflow-checks impl
Crates with different overflow-checks settings accessing the same RangeFromIter resulted in incorrect values being yielded
Fixesrust-lang/rust#154124
r? @tgross35
[BPF] add target feature allows-misaligned-mem-access
This PR adds the allows-misaligned-mem-access target feature to the BPF target. The feature can enable misaligned memory access support in the LLVM backend, aligning Rust’s BPF target behavior with the corresponding LLVM update introduced in [llvm/llvm-project#167013](https://github.com/llvm/llvm-project/pull/167013) (included in LLVM 22).
On targets with convergent operations, we need to add the convergent
attribute to all functions that run convergent operations. Following
clang, we can conservatively apply the attribute to all functions when
compiling for such a target and rely on LLVM optimizing away the
attribute in cases where it is not necessary.
This affects the amdgpu and nvptx targets.