Commit Graph

1097 Commits

Author SHA1 Message Date
Ralf Jung e402c1edf0 miri: remove retag statements, make typed copies retag implicitly instead 2026-05-02 17:40:33 +02:00
Jonathan Brouwer 716bccea07 Rollup merge of #156028 - scottmcm:local-arg, r=wesleywiser
Add a `Local::arg(i)` helper constructor

While reading through stuff I was noticing just how many `+1` fixes there were in various places (and comments explaining those fixups), so this adds a new inherent helper on `Local` for making an argument to help make this clearer.

r? mir
2026-05-02 10:18:28 +02:00
Scott McMurray 548cdc1412 Add a Local::arg(i) helper constructor
While reading through stuff I was noticing just how many `+1` fixes there were in various places (and comments explaining that fixup), so this adds a new inherent helper on `Local` for making an argument to help make this clearer.
2026-04-30 20:52:02 -07:00
Jacob Pratt 0ab1a47246 Rollup merge of #155821 - folkertdev:doc-va-list-clone, r=joshtriplett
c-variadic: document `Clone` and `Drop` instances and require `VaArgSafe: Copy`

tracking issue: https://github.com/rust-lang/rust/issues/44930

Fixing some things that came up in the stabilization PR

r? tgross35
cc @kpreid
2026-04-30 22:28:32 -04:00
Folkert de Vries ac12c696b8 remove custom va_end implementation in the LLVM backend
it should use the fallback body instead
2026-04-30 20:32:15 +02:00
Hood Chatham b072d24e26 Fix: On wasm targets, call panic_in_cleanup if panic occurs in cleanup
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.

Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
2026-04-28 09:56:22 -07:00
Jonathan Brouwer dde4886801 Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst
Add intrinsic for launch-sized workgroup memory on GPUs

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.

Tracking issue: rust-lang/rust#135516
2026-04-25 23:07:48 +02:00
Flakebi 13ec3de673 Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
2026-04-24 10:03:45 +02:00
bors f676c20edd Auto merge of #155343 - dianqk:indirect-by-ref, r=nikic
codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer.



Fixes https://github.com/rust-lang/rust/issues/155241.

When a value is passed via an indirect pointer, the value needs to be copied to a new alloca. For x86_64-unknown-linux-gnu, `Thing` is the case:

```rust
#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);

pub fn foo() {
    let thing = Thing(0, 0, 0);
    bar(thing);
    assert_eq!(thing.0, 0);
}

#[inline(never)]
#[unsafe(no_mangle)]
pub fn bar(mut thing: Thing) {
    thing.0 = 1;
}
```

Before passing the thing to the bar function, the thing needs to be copied to an alloca that is passed to bar.

```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
call void @bar(ptr %0)
```

This patch applies the rule to the untupled arguments as well.

```rust
#![feature(fn_traits)]

#[derive(Clone, Copy)]
struct Thing(usize, usize, usize);

#[inline(never)]
#[unsafe(no_mangle)]
pub fn foo() {
    let thing = (Thing(0, 0, 0),);
    (|mut thing: Thing| {
        thing.0 = 1;
    }).call(thing);
    assert_eq!(thing.0.0, 0);
}
```

For this case, this patch changes from

```llvm
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %thing)
```

to

```llvm
%0 = alloca [24 x i8], align 8
call void @llvm.memcpy.p0.p0.i64(ptr align 8 %0, ptr align 8 %thing, i64 24, i1 false)
; call example::foo::{closure#0}
call void @_RNCNvCs15qdZVLwHPA_7example3foo0B3_(ptr ..., ptr %0)
```

However, the same rule cannot be applied to tail calls that would be unsound, because the caller's stack frame is overwritten by the callee's stack frame. Fortunately, https://github.com/rust-lang/rust/pull/151143 has already handled the special case. We must not copy again.

No copy is needed for by-move arguments, because the argument is passed to the called "in-place".

No copy is also needed for by-val arguments, because the attribute implies that a hidden copy of the pointee is made between the caller and the callee.


NOTE: The patch has a trick for tail calls that we pass by-move. We can choose to copy an alloca even for by-move arguments, but tail calls require MUST-by-move.
2026-04-22 15:47:21 +00:00
dianqk 10d8329061 codegen: Copy to an alloca when the argument is neither by-val nor by-move for indirect pointer. 2026-04-22 17:37:17 +08:00
Folkert de Vries 41afd5f8d6 handle uefi and test assembly versus regular functions 2026-04-16 11:37:46 +02:00
Folkert de Vries 7787bd915b fix macho section specifier & windows test 2026-04-16 11:37:46 +02:00
Folkert de Vries bc4aad37ca naked-functions: properly document the -Zfunction-sections windows status 2026-04-16 11:37:45 +02:00
Folkert de Vries 872301bfdd naked functions: respect function_sections on linux/macos 2026-04-16 11:37:45 +02:00
Folkert de Vries 0e522d6c62 naked functions: respect function_sections on windows
For `gnu` function_sections is off by default.
2026-04-16 11:37:45 +02:00
Jacob Pratt 803a7227fb Rollup merge of #155005 - folkertdev:simd-element-type-llvm, r=nnethercote
preserve SIMD element type information

Preserve the SIMD element type and provide it to LLVM for better optimization.

This is relevant for AArch64 types like `int16x4x2_t`, see also https://github.com/llvm/llvm-project/issues/181514. Such types are defined like so:

```rust
#[repr(simd)]
struct int16x4_t([i16; 4]);

#[repr(C)]
struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
```

Previously this would be translated to the opaque `[2 x <8 x i8>]`, with this PR it is instead `[2 x <4 x i16>]`. That change is not relevant for the ABI, but using the correct type prevents bitcasts that can (indeed, do) confuse the LLVM pattern matcher.

This change will make it possible to implement the deinterleaving loads on AArch64 in a portable way (without neon-specific intrinsics), which means that e.g. Miri or the cranelift backend can run them without additional support.

discussion at [#t-compiler > loss of vector element type information](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/loss.20of.20vector.20element.20type.20information/with/584483611)
2026-04-14 00:37:24 -04:00
Folkert de Vries 6f428df8df preseve SIMD element type information
and provide it to LLVM for better optimization
2026-04-13 13:26:50 +02:00
David Wood da948999eb cg_ssa: transmute between scalable vectors
Like regular SIMD vectors, we can support casting between scalable
vectors of integral or floating-point types without needing a temporary.
2026-04-13 04:24:28 +00:00
Jonathan Brouwer b040d5493e Rollup merge of #154598 - folkertdev:windows-naked-link-section, r=mati865
test `#[naked]` with `#[link_section = "..."]` on windows

As a part of https://github.com/rust-lang/rust/pull/147811 I ran into that we actually don't match (current) LLVM output.

r? @mati865
2026-04-08 23:04:33 +02:00
Nicholas Nethercote 3ff4201fd1 Move rustc_middle::mir::mono to rustc_middle::mono
Because the things in this module aren't MIR and don't use anything
from `rustc_middle::mir`. Also, modules that use `mono` often don't use
anything else from `rustc_middle::mir`.
2026-04-07 08:33:54 +10:00
Folkert de Vries 72b6825828 test #[naked] with #[link_section = "..."] on windows 2026-04-06 14:58:52 +02:00
David Wood a2f7f3c1eb ty_utils: lower tuples to ScalableVector repr
Instead of just using regular struct lowering for these types, which
results in an incorrect ABI (e.g. returning indirectly), use
`BackendRepr::ScalableVector` which will lower to the correct type and
be passed in registers.

This also enables some simplifications for generating alloca of scalable
vectors and greater re-use of `scalable_vector_parts`.

A LLVM codegen test demonstrating the changed IR this generates is
included in the next commit alongside some intrinsics that make these
tuples usable.
2026-04-03 10:27:30 +00:00
Wesley Wiser c9d3a00cd1 Revert "Fix: On wasm targets, call panic_in_cleanup if panic occurs in cleanup"
This reverts commit acbfd79acf.
2026-04-01 21:29:42 -05:00
teor 55d9f7cb6c Fix typos and outdated comments 2026-03-20 11:22:53 +10:00
bors b2fabe39bd Auto merge of #153673 - JonathanBrouwer:rollup-cGOKonI, r=JonathanBrouwer
Rollup of 7 pull requests

Successful merges:

 - rust-lang/rust#153560 (Introduce granular tidy_ctx's check in extra_checks)
 - rust-lang/rust#153666 (Add a regression test for rust-lang/rust#153599)
 - rust-lang/rust#153493 (Remove `FromCycleError` trait)
 - rust-lang/rust#153549 (tests/ui/binop: add annotations for reference rules)
 - rust-lang/rust#153641 (Move `Spanned`.)
 - rust-lang/rust#153663 (Remove `TyCtxt::node_lint` method and `rustc_middle::lint_level` function)
 - rust-lang/rust#153664 (Add test for rust-lang/rust#109804)
2026-03-11 05:12:10 +00:00
Jonathan Brouwer 3ed43bb774 Rollup merge of #153663 - GuillaumeGomez:migrate-diag, r=JonathanBrouwer
Remove `TyCtxt::node_lint` method and `rustc_middle::lint_level` function

Part of https://github.com/rust-lang/rust/issues/153099.

With this PR, we can finally get rid of `lint_level`. \o/

r? @JonathanBrouwer
2026-03-10 22:46:57 +01:00
Nicholas Nethercote c12ab08c14 Move Spanned.
It's defined in `rustc_span::source_map` which doesn't make any sense
because it has nothing to do with source maps. This commit moves it to
the crate root, a more sensible spot for something this basic.
2026-03-11 06:25:23 +11:00
Guillaume Gomez 916d760c47 Replace TyCtxt::node_lint call with TyCtxt::emit_node_lint in rustc_codegen_ssa 2026-03-10 17:14:01 +01:00
David Wood db5e2dc248 abi: s/ScalableVector/SimdScalableVector
Renaming to remove any ambiguity as to what "vector" refers to in this
context
2026-03-10 11:52:22 +00:00
bors 64b72a1fa5 Auto merge of #150447 - WaffleLapkin:maybe-dangling-semantics, r=RalfJung
Implement `MaybeDangling` compiler support



Tracking issue: https://github.com/rust-lang/rust/issues/118166



cc @RalfJung
2026-03-05 12:21:27 +00:00
Waffle Lapkin 312055fad5 refactor PointeeInfo
Make `size`/`align` always correct rather than conditionally on the
`safe` field. This makes it less error prone and easier to work with for
`MaybeDangling` / potential future pointer kinds like `Aligned<_>`.
2026-03-05 11:53:38 +01:00
Folkert de Vries 391a7554f3 enable PassMode::Indirect { on_stack: true } tail call arguments 2026-03-04 19:43:12 +01:00
Jonathan Brouwer ad4b2c01a1 Rollup merge of #153046 - bjorn3:cg_ssa_cleanups, r=TaKO8Ki
Couple of cg_ssa refactorings

These should help a bit with using cg_ssa in cg_clif at some point in the future.
2026-03-02 09:49:22 +01:00
Folkert de Vries e6cf5a22e7 test u128 passing on linux and windows 2026-02-27 10:51:55 +01:00
Folkert de Vries 31ae3d2be8 guaranteed tail calls: support indirect arguments 2026-02-27 10:24:39 +01:00
Jacob Pratt cb78bc4dd4 Rollup merge of #151771 - hoodmane:wasm-double-panic, r=bjorn3
Fix: On wasm targets, call `panic_in_cleanup` if panic occurs in cleanup

Previously this was not correctly implemented. Each funclet may need its own terminate block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which can have a terminate_block for each funclet. We key on the first basic block of the funclet -- in particular, this is the start block for the old case of the top level terminate function.

I also fixed the `terminate` handler to not be invoked when a foreign exception is raised, mimicking the behavior from msvc. On wasm, in order to avoid generating a `catch_all` we need to call `llvm.wasm.get.exception` and `llvm.wasm.get.ehselector`.
2026-02-25 21:42:53 -05:00
bjorn3 df4b228c71 Merge const_data_from_alloc into static_addr_of
In Cranelift a Value can't hold arbitrarily sized values.
2026-02-25 11:11:06 +00:00
Hood Chatham acbfd79acf Fix: On wasm targets, call panic_in_cleanup if panic occurs in cleanup
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.

Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
2026-02-24 17:47:27 +01:00
Scott McMurray 0187acccad Include the Align::MAX limit in the !range metadata for loading alignment from a vtable 2026-02-20 20:22:31 -08:00
Camille Gillot 6d4b1b38e7 Remove ShallowInitBox. 2026-02-17 11:25:50 +00:00
Hans Wennborg 73d06be9f3 Set hidden visibility on naked functions in compiler-builtins
88b46460fa made builtin functions hidden,
but it doesn't apply to naked functions, which are generated through a
different code path.
2026-02-02 14:52:34 +01:00
Matthias Krüger 3a69035338 Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee
add `simd_splat` intrinsic

Add `simd_splat` which lowers to the LLVM canonical splat sequence.

```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```

Right now we try to fake it using one of

```rust
fn splat(x: u32) -> u32x8 {
    u32x8::from_array([x; 8])
}
```

or (in `stdarch`)

```rust
fn splat(value: $elem_type) -> $name {
    #[derive(Copy, Clone)]
    #[repr(simd)]
    struct JustOne([$elem_type; 1]);
    let one = JustOne([value]);
    // SAFETY: 0 is always in-bounds because we're shuffling
    // a simd type with exactly one element.
    unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```

Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:

- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804

---

As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.

Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.

Currently this just adds the intrinsic, it does not actually use it anywhere yet.
2026-01-24 21:04:15 +01:00
Folkert de Vries 71f34429ac const-eval: do not call immediate_const_vector on vector of pointers 2026-01-24 10:40:47 +01:00
Ralf Jung 29ed211215 codegen: clarify some variable names around function calls 2026-01-21 18:01:30 +01:00
Jacob Pratt 6912c676cd Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee
Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang/rust#135024
2026-01-15 19:35:46 -05:00
Flakebi 91d4e40e02 Add amdgpu_dispatch_ptr intrinsic
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
2026-01-09 10:41:37 +01:00
Folkert de Vries 76d0843f8d naked functions: emit .private_extern on macos 2026-01-06 16:48:04 +01:00
Fang He f9007bcb87 fix condition checks for SVE <vscale x N x i1> when N != 16 2025-12-28 14:59:40 +08:00
Fang He 435a027c71 fix typo and rephrase 2025-12-28 14:59:08 +08:00
bjorn3 cb23b54eb1 Add a hack for llvm.wasm.throw
As this is the only unwinding intrinsic we use and
codegen_llvm_intrinsic_call currently doesn't handle unwinding
intrinsics, this uses the conventional call path for llvm.wasm.throw.
2025-12-27 17:46:26 +00:00