Error on invalid macho section specifier
The macho section specifier used by `#[link_section = "..."]` is more strict than e.g. the one for elf. LLVM will error when you get it wrong, which is easy to do if you're used to elf. So, provide some guidance for the simplest mistakes, based on the LLVM validation.
Currently compilation fails with an LLVM error, see https://godbolt.org/z/WoE8EdK1K.
The LLVM validation logic is at
https://github.com/llvm/llvm-project/blob/a0f0d6342e0cd75b7f41e0e6aae0944393b68a62/llvm/lib/MC/MCSectionMachO.cpp#L199-L203
LLVM validates the other components of the section specifier too, but it feels a bit fragile to duplicate those checks. If you get that far, hopefully the LLVM errors will be sufficient to get unstuck.
---
sidequest from https://github.com/rust-lang/rust/pull/147811
r? JonathanBrouwer
specifically, is this the right place for this sort of validation? `rustc_attr_parsing` also does some validation.
Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
Tracking issue: rust-lang/rust#135516
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
Make `//@ skip-filecheck` a normal compiletest directive
The `skip-filecheck` directive is currently used by mir-opt tests, to suppress the default behaviour of running LLVM's `FileCheck` tool to check MIR output against FileCheck rules in the test file.
The `skip-filecheck` directive was not included in the big migration to `//@` directive syntax (https://github.com/rust-lang/rust/pull/121370), perhaps because it was parsed and processed in the *miropt-test-tools* helper crate, not in compiletest itself.
Recently I noticed that a small number of *codegen-llvm* tests were using the `//@ build-pass` directive, which has the non-obvious effect of skipping FileCheck in codegen tests. That's quite confusing, so I decided to have the mir-opt tests migrate over to a proper `//@ skip-filecheck` directive, which could then be used by codegen tests as well.
(I also added skip-filecheck support to assembly tests, which are very similar to codegen tests, though there are currently no assembly tests that actually use `//@ skip-filecheck`.)
---
Support for using `//@ build-pass` in codegen tests to skip FileCheck was introduced in https://github.com/rust-lang/rust/pull/113603. With hindsight, I think doing things that way was pretty clearly a mistake, and we'll be better off with `//@ skip-filecheck`.
r? jieyouxu
codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.
Fixes https://github.com/rust-lang/rust/issues/155241.
When a value is passed via an indirect pointer, the value needs to be copied to a new alloca. For x86_64-unknown-linux-gnu, `Thing` is the case:
```rust
#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);
pub fn foo() {
let thing = Thing(0, 0, 0);
bar(thing);
assert_eq!(thing.0, 0);
}
#[inline(never)]
#[unsafe(no_mangle)]
pub fn bar(mut thing: Thing) {
thing.0 = 1;
}
```
Before passing the thing to the bar function, the thing needs to be copied to an alloca that is passed to bar.
```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
call void @bar(ptr %0)
```
This patch applies the rule to the untupled arguments as well.
```rust
#![feature(fn_traits)]
#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);
#[inline(never)]
#[unsafe(no_mangle)]
pub fn foo() {
let thing = (Thing(0, 0, 0),);
(|mut thing: Thing| {
thing.0 = 1;
}).call(thing);
assert_eq!(thing.0.0, 0);
}
```
For this case, this patch changes from
```llvm
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %thing)
```
to
```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %0)
```
However, the same rule cannot be applied to tail calls that would be unsound, because the caller's stack frame is overwritten by the callee's stack frame. Fortunately, https://github.com/rust-lang/rust/pull/151143 has already handled the special case. We must not copy again.
No copy is needed for by-move arguments, because the argument is passed to the called "in-place".
No copy is also needed for by-val arguments, because the attribute implies that a hidden copy of the pointee is made between the caller and the callee.
NOTE: The patch has a trick for tail calls that we pass by-move. We can choose to copy an alloca even for by-move arguments, but tail calls require MUST-by-move.
Skipping FileCheck in codegen/assembly tests is normally not very useful, but a
small number of existing tests were using `//@ build-pass` to do so anyway, so
it's clearer for them to explicitly use `//@ skip-filecheck` instead.
Add autocast support for `x86amx`
Builds on rust-lang/rust#140763 by further adding autocasts for `x86amx` from/to vectors of size 8192 bits.
This also disables SIMD vector abi checks for the `"unadjusted"` abi because
- This is primarily used to link with LLVM intrinsics, which don't actually lower to function calls with vector arguments. Even with other cg backends, this is true.
- This ABI is internal and perma-unstable (and also super specific), so it is very unlikely that this will cause breakages.
- (The primary reason) Without doing this we can't actually use 8192 bit long vectors to represent `x86amx`
> Why do we need a bypass for `x86amx`? Can't we use a `#[lang_item]` or something?
If `x86amx` was a normal LLVM type, this approach would've worked and I would also prefer it. But LLVM specifies that
> No instruction is allowed for this type. There are no arguments, arrays, pointers, vectors or constants of this type.
So we can't treat it like a normal type at all -- even if we add it like a lang-item, we would still have to special-case everywhere to check if we are passing to the correct LLVM intrinsic, and only then use the `x86amx` type. IMO this is needlessly complex, and way worse than this solution, which just adds it to the autocast list in cg_llvm
r? codegen
Add regression test for dead code elimination with drop + panic
Add a codegen test for rust-lang/rust#114532.
The bug was that dead code elimination failed when a `Drop` impl contained a `panic!` and a potentially-panicking external function was called after the value was created. This was fixed since 1.82 but no regression test was added.
The test verifies that `foo()` compiles to just a call to `unknown()` + `ret void`, with no panic or panicking call in the function body.
Closesrust-lang/rust#114532
Remove fewer Storage calls in CopyProp and GVN
Modify the CopyProp and GVN MIR optimization passes to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see rust-lang/rust#141649.
### Details
The idea is to use a new `MaybeUninitializedLocals` analysis and remove only the storage calls of locals that are maybe-uninit when accessed in a new location.
Dead code elimination used to fail when a Drop impl contained a panic
and a potentially-panicking external function was called after the value
was created. This was fixed since 1.82 but no regression test was added.
The test verifies that foo() compiles to just a call to unknown() and
ret void, with no panic or panicking call in the function body.
Signed-off-by: Naveen R. Iyer <iyernaveenr@gmail.com>
preserve SIMD element type information
Preserve the SIMD element type and provide it to LLVM for better optimization.
This is relevant for AArch64 types like `int16x4x2_t`, see also https://github.com/llvm/llvm-project/issues/181514. Such types are defined like so:
```rust
#[repr(simd)]
struct int16x4_t([i16; 4]);
#[repr(C)]
struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
```
Previously this would be translated to the opaque `[2 x <8 x i8>]`, with this PR it is instead `[2 x <4 x i16>]`. That change is not relevant for the ABI, but using the correct type prevents bitcasts that can (indeed, do) confuse the LLVM pattern matcher.
This change will make it possible to implement the deinterleaving loads on AArch64 in a portable way (without neon-specific intrinsics), which means that e.g. Miri or the cranelift backend can run them without additional support.
discussion at [#t-compiler > loss of vector element type information](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/loss.20of.20vector.20element.20type.20information/with/584483611)
hwaddress: automatically add `-Ctarget-feature=+tagged-globals`
Note that since HWAddressSanitizer is/should be a target modifier, we do not have to worry about whether this LLVM target feature changes the ABI.
Fixes: rust-lang/rust#148185
test `#[naked]` with `#[link_section = "..."]` on windows
As a part of https://github.com/rust-lang/rust/pull/147811 I ran into that we actually don't match (current) LLVM output.
r? @mati865
Use convergent attribute to funcs for GPU targets
On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary.
This affects the amdgpu and nvptx targets.
cc @kjetilkjeka, @kulst for nvptx
cc @ZuseZ4
r? @nnethercote, as you already reviewed this in the other PR
Split out from rust-lang/rust#149637, the part here should be uncontroversial.
Explicitly forget the zero remaining elements in `vec::IntoIter::fold()`.
[Original description:] ~~This seems to help LLVM notice that dropping the elements in the destructor of `IntoIter` is not necessary. In cases it doesn’t help, it should be cheap since it is just one assignment.~~
This PR adds a function to `vec::IntoIter()` which is used used by `fold()` and `spec_extend()`, when those operations complete, to forget the zero remaining elements and only deallocate the allocation, ensuring that there will never be a useless loop to drop zero remaining elements when the iterator is dropped.
This is my first ever attempt at this kind of codegen micro-optimization in the standard library, so please let me know what should go into the PR or what sort of additional systematic testing might indicate this is a good or bad idea.
GCI: During reachability analysis don't try to evaluate the initializer of overly generic free const items
We generally don't want the initializer of free const items to get evaluated if they have any non-lifetime generic parameters. However, while I did account for that in HIR analysis & mono item collection (rust-lang/rust#136168 & rust-lang/rust#136429), I didn't account for reachability analysis so far which means that on main we still evaluate such items if they are *public* for example.
The closed PR rust-lang/rust#142293 from a year ago did address that as a byproduct but of course it wasn't merged since its primary goal was misguided. This PR extracts & improves upon the relevant parts of that PR which are necessary to fix said issue.
Follow up to rust-lang/rust#136168 & rust-lang/rust#136429.
Partially supersedes rust-lang/rust#142293.
Part of rust-lang/rust#113521.
r? @BoxyUwU
Rollup of 5 pull requests
Successful merges:
- rust-lang/rust#154376 (Remove more BuiltinLintDiag variants - part 4)
- rust-lang/rust#154731 (llvm: Fix array ABI test to not check equality implementation)
- rust-lang/rust#127534 (feat(core): impl Step for NonZero<u*>)
- rust-lang/rust#154703 (Fix trailing comma in lifetime suggestion for empty angle brackets)
- rust-lang/rust#154776 (Fix ICE in read_discriminant for enums with non-contiguous discriminants)
LLVM has moved memcmp expansion in the pipeline, resulting in the bcmp
call being expanded into loads and register comparisons, which breaks
the test.
Based on history, I believe the test actually intended validate that
these arrays were being passed as pointer arguments, which can be done
more directly.
Clang changed to representing tuples of scalable vectors as
structs rather than as wide vectors (that is, scalable vector types
where the `N` part of the `<vscale x N x ty>` type was multiplied by
the number of vectors). rustc mirrored this in the initial implementation
of scalable vectors.
Earlier versions of our patches used the wide vector representation and
our intrinsic patches used the legacy
`llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating
these tuples/getting/setting the vectors, which were only supported
due to LLVM's `AutoUpgrade` pass converting these intrinsics into
`llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics
with the wide vector representation.
With the current struct representation, Clang has special handling in
codegen for generating `insertvalue`/`extractvalue` instructions for
these operations, which must be replicated by rustc's codegen for our
intrinsics to use. This patch implements new intrinsics in
`core::intrinsics::scalable` (mirroring the structure of
`core::intrinsics::simd`) which rustc lowers to the appropriate
`insertvalue`/`extractvalue` instructions.
this enables packed-stack just as -mpacked-stack in clang and gcc.
packed-stack is needed on s390x for kernel development.
Co-authored-by: Ralf Jung <post@ralfj.de>
simd_fmin/fmax: make semantics and name consistent with scalar intrinsics
This is the SIMD version of https://github.com/rust-lang/rust/pull/153343: change the documented semantics of the SIMD float min/max intrinsics to that of the scalar intrinsics, and also make the name consistent. The overall semantic change this amounts to is that we restrict the non-determinism: the old semantics effectively mean "when one input is an SNaN, the result non-deterministically is a NaN or the other input"; the new semantics say that in this case the other input must be returned. For all other cases, old and new semantics are equivalent. This means all users of these intrinsics that were correct with the old semantics are still correct: the overall set of possible behaviors has become smaller, no new possible behaviors are being added.
In terms of providers of this API:
- Miri, GCC, and cranelift already implement the new semantics, so no changes are needed.
- LLVM is adjusted to use `minimumnum nsz` instead of `minnum`, thus giving us the new semantics.
In terms of consumers of this API:
- Portable SIMD almost certainly wants to match the scalar behavior, so this is strictly a bugfix here.
- Stdarch mostly stopped using the intrinsic, except on nvptx, where arguably the new semantics are closer to what we actually want than the old semantics (https://github.com/rust-lang/stdarch/issues/2056).
Q: Should there be an `f` in the intrinsic name to indicate that it is for floats? E.g., `simd_fminimum_number_nsz`?
Also see https://github.com/rust-lang/rust/issues/153395.