Add a `Local::arg(i)` helper constructor
While reading through stuff I was noticing just how many `+1` fixes there were in various places (and comments explaining those fixups), so this adds a new inherent helper on `Local` for making an argument to help make this clearer.
r? mir
While reading through stuff I was noticing just how many `+1` fixes there were in various places (and comments explaining that fixup), so this adds a new inherent helper on `Local` for making an argument to help make this clearer.
c-variadic: document `Clone` and `Drop` instances and require `VaArgSafe: Copy`
tracking issue: https://github.com/rust-lang/rust/issues/44930
Fixing some things that came up in the stabilization PR
r? tgross35
cc @kpreid
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.
Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
Tracking issue: rust-lang/rust#135516
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.
Fixes https://github.com/rust-lang/rust/issues/155241.
When a value is passed via an indirect pointer, the value needs to be copied to a new alloca. For x86_64-unknown-linux-gnu, `Thing` is the case:
```rust
#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);
pub fn foo() {
let thing = Thing(0, 0, 0);
bar(thing);
assert_eq!(thing.0, 0);
}
#[inline(never)]
#[unsafe(no_mangle)]
pub fn bar(mut thing: Thing) {
thing.0 = 1;
}
```
Before passing the thing to the bar function, the thing needs to be copied to an alloca that is passed to bar.
```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
call void @bar(ptr %0)
```
This patch applies the rule to the untupled arguments as well.
```rust
#![feature(fn_traits)]
#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);
#[inline(never)]
#[unsafe(no_mangle)]
pub fn foo() {
let thing = (Thing(0, 0, 0),);
(|mut thing: Thing| {
thing.0 = 1;
}).call(thing);
assert_eq!(thing.0.0, 0);
}
```
For this case, this patch changes from
```llvm
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %thing)
```
to
```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %0)
```
However, the same rule cannot be applied to tail calls that would be unsound, because the caller's stack frame is overwritten by the callee's stack frame. Fortunately, https://github.com/rust-lang/rust/pull/151143 has already handled the special case. We must not copy again.
No copy is needed for by-move arguments, because the argument is passed to the called "in-place".
No copy is also needed for by-val arguments, because the attribute implies that a hidden copy of the pointee is made between the caller and the callee.
NOTE: The patch has a trick for tail calls that we pass by-move. We can choose to copy an alloca even for by-move arguments, but tail calls require MUST-by-move.
preserve SIMD element type information
Preserve the SIMD element type and provide it to LLVM for better optimization.
This is relevant for AArch64 types like `int16x4x2_t`, see also https://github.com/llvm/llvm-project/issues/181514. Such types are defined like so:
```rust
#[repr(simd)]
struct int16x4_t([i16; 4]);
#[repr(C)]
struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
```
Previously this would be translated to the opaque `[2 x <8 x i8>]`, with this PR it is instead `[2 x <4 x i16>]`. That change is not relevant for the ABI, but using the correct type prevents bitcasts that can (indeed, do) confuse the LLVM pattern matcher.
This change will make it possible to implement the deinterleaving loads on AArch64 in a portable way (without neon-specific intrinsics), which means that e.g. Miri or the cranelift backend can run them without additional support.
discussion at [#t-compiler > loss of vector element type information](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/loss.20of.20vector.20element.20type.20information/with/584483611)
test `#[naked]` with `#[link_section = "..."]` on windows
As a part of https://github.com/rust-lang/rust/pull/147811 I ran into that we actually don't match (current) LLVM output.
r? @mati865
Because the things in this module aren't MIR and don't use anything
from `rustc_middle::mir`. Also, modules that use `mono` often don't use
anything else from `rustc_middle::mir`.
Instead of just using regular struct lowering for these types, which
results in an incorrect ABI (e.g. returning indirectly), use
`BackendRepr::ScalableVector` which will lower to the correct type and
be passed in registers.
This also enables some simplifications for generating alloca of scalable
vectors and greater re-use of `scalable_vector_parts`.
A LLVM codegen test demonstrating the changed IR this generates is
included in the next commit alongside some intrinsics that make these
tuples usable.
Remove `TyCtxt::node_lint` method and `rustc_middle::lint_level` function
Part of https://github.com/rust-lang/rust/issues/153099.
With this PR, we can finally get rid of `lint_level`. \o/
r? @JonathanBrouwer
It's defined in `rustc_span::source_map` which doesn't make any sense
because it has nothing to do with source maps. This commit moves it to
the crate root, a more sensible spot for something this basic.
Make `size`/`align` always correct rather than conditionally on the
`safe` field. This makes it less error prone and easier to work with for
`MaybeDangling` / potential future pointer kinds like `Aligned<_>`.
Fix: On wasm targets, call `panic_in_cleanup` if panic occurs in cleanup
Previously this was not correctly implemented. Each funclet may need its own terminate block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which can have a terminate_block for each funclet. We key on the first basic block of the funclet -- in particular, this is the start block for the old case of the top level terminate function.
I also fixed the `terminate` handler to not be invoked when a foreign exception is raised, mimicking the behavior from msvc. On wasm, in order to avoid generating a `catch_all` we need to call `llvm.wasm.get.exception` and `llvm.wasm.get.ehselector`.
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.
Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
Add amdgpu_dispatch_ptr intrinsic
There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
Tracking issue: rust-lang/rust#135024
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.
The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.
The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.
Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
As this is the only unwinding intrinsic we use and
codegen_llvm_intrinsic_call currently doesn't handle unwinding
intrinsics, this uses the conventional call path for llvm.wasm.throw.