Commit Graph

203 Commits

Author SHA1 Message Date
Jonathan Brouwer dde4886801 Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst
Add intrinsic for launch-sized workgroup memory on GPUs

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.

Tracking issue: rust-lang/rust#135516
2026-04-25 23:07:48 +02:00
Flakebi 13ec3de673 Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
2026-04-24 10:03:45 +02:00
Adwin White 6279106e72 fix all errors 2026-04-20 00:18:28 +08:00
teor dafb6bb801 Refactor FnDecl and FnSig flags into packed structs 2026-04-16 07:08:08 +10:00
David Wood 957320cdb1 cg_llvm: sve_cast intrinsic
Abstract over the existing `simd_cast` intrinsic to implement a new
`sve_cast` intrinsic - this is better than allowing scalable vectors to
be used with all of the generic `simd_*` intrinsics.
2026-04-03 10:37:42 +00:00
David Wood 4fbcb031de cg_llvm: sve_tuple_{create,get,set} intrinsics
Clang changed to representing tuples of scalable vectors as
structs rather than as wide vectors (that is, scalable vector types
where the `N` part of the `<vscale x N x ty>` type was multiplied by
the number of vectors). rustc mirrored this in the initial implementation
of scalable vectors.

Earlier versions of our patches used the wide vector representation and
our intrinsic patches used the legacy
`llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating
these tuples/getting/setting the vectors, which were only supported
due to LLVM's `AutoUpgrade` pass converting these intrinsics into
`llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics
with the wide vector representation.

With the current struct representation, Clang has special handling in
codegen for generating `insertvalue`/`extractvalue` instructions for
these operations, which must be replicated by rustc's codegen for our
intrinsics to use. This patch implements new intrinsics in
`core::intrinsics::scalable` (mirroring the structure of
`core::intrinsics::simd`) which rustc lowers to the appropriate
`insertvalue`/`extractvalue` instructions.
2026-04-03 10:27:30 +00:00
Guillaume Gomez 67ab3ac423 Rollup merge of #154043 - RalfJung:simd-min-max, r=Amanieu,calebzulawski,antoyo
simd_fmin/fmax: make semantics and name consistent with scalar intrinsics

This is the SIMD version of https://github.com/rust-lang/rust/pull/153343: change the documented semantics of the SIMD float min/max intrinsics to that of the scalar intrinsics, and also make the name consistent. The overall semantic change this amounts to is that we restrict the non-determinism: the old semantics effectively mean "when one input is an SNaN, the result non-deterministically is a NaN or the other input"; the new semantics say that in this case the other input must be returned. For all other cases, old and new semantics are equivalent. This means all users of these intrinsics that were correct with the old semantics are still correct: the overall set of possible behaviors has become smaller, no new possible behaviors are being added.

In terms of providers of this API:
- Miri, GCC, and cranelift already implement the new semantics, so no changes are needed.
- LLVM is adjusted to use `minimumnum nsz` instead of `minnum`, thus giving us the new semantics.

In terms of consumers of this API:
- Portable SIMD almost certainly wants to match the scalar behavior, so this is strictly a bugfix here.
- Stdarch mostly stopped using the intrinsic, except on nvptx, where arguably the new semantics are closer to what we actually want than the old semantics (https://github.com/rust-lang/stdarch/issues/2056).

Q: Should there be an `f` in the intrinsic name to indicate that it is for floats? E.g., `simd_fminimum_number_nsz`?

Also see https://github.com/rust-lang/rust/issues/153395.
2026-03-29 00:06:50 +01:00
Ralf Jung 986a280644 simd_fmin/fmax: make semantics and name consistent with scalar intrinsics 2026-03-18 15:17:56 +01:00
N1ark abb5228ec1 Merge fabsfN into fabs::<F>
Add `bounds::FloatPrimitive`

Exhaustive float pattern match

Fix GCC

use span bugs
2026-03-16 21:49:04 +00:00
Ralf Jung c7220f423b rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering 2026-03-15 14:53:00 +01:00
Benno Lossin 7b428597ff add field representing types 2026-02-27 15:54:20 +01:00
jasper3108 7857058a6b nix vtable_for intrinsic 2026-02-20 10:16:36 +01:00
jasper3108 01627b7441 Support getting TypeId's Trait and vtable 2026-02-20 10:16:36 +01:00
Ralf Jung 5e65109f21 add write_box_via_move intrinsic and use it for vec!
This allows us to get rid of box_new entirely
2026-02-16 17:27:40 +01:00
Folkert de Vries b935f379b4 implement carryless_mul 2026-02-14 21:23:30 +01:00
Matthias Krüger 3a69035338 Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee
add `simd_splat` intrinsic

Add `simd_splat` which lowers to the LLVM canonical splat sequence.

```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```

Right now we try to fake it using one of

```rust
fn splat(x: u32) -> u32x8 {
    u32x8::from_array([x; 8])
}
```

or (in `stdarch`)

```rust
fn splat(value: $elem_type) -> $name {
    #[derive(Copy, Clone)]
    #[repr(simd)]
    struct JustOne([$elem_type; 1]);
    let one = JustOne([value]);
    // SAFETY: 0 is always in-bounds because we're shuffling
    // a simd type with exactly one element.
    unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```

Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:

- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804

---

As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.

Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.

Currently this just adds the intrinsic, it does not actually use it anywhere yet.
2026-01-24 21:04:15 +01:00
Folkert de Vries dd9241d150 c_variadic: use Clone instead of LLVM va_copy 2026-01-20 18:38:50 +01:00
Folkert de Vries 80c0b99de0 add simd_splat intrinsic 2026-01-19 16:48:28 +01:00
Jacob Pratt 6912c676cd Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee
Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang/rust#135024
2026-01-15 19:35:46 -05:00
Flakebi 91d4e40e02 Add amdgpu_dispatch_ptr intrinsic
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
2026-01-09 10:41:37 +01:00
Oli Scherer a3359bdd4f Compile-Time Reflection MVP: tuples 2026-01-08 11:41:00 +00:00
Marcelo Domínguez 58e2610f71 Expose workgroup/thread dims as intrinsic args 2026-01-02 11:50:32 +01:00
Ivar Flakstad d5bf1a4c9a Introduce vtable_for intrinsic and use it to implement try_as_dyn and try_as_dyn_mut for fallible coercion from &T / &mut T to &dyn Trait. 2025-12-16 06:39:58 -04:00
Marcelo Domínguez 5128ce10a0 Implement offload intrinsic 2025-11-25 20:04:27 +01:00
Camille Gillot 72444372ae Replace OffsetOf by an actual sum. 2025-11-18 00:10:03 +00:00
Stuart Cook d3475140ee Rollup merge of #128666 - pitaj:intrinsic-overflow_checks, r=BoxyUwU
Add `overflow_checks` intrinsic

This adds an intrinsic which allows code in a pre-built library to inherit the overflow checks option from a crate depending on it. This enables code in the standard library to explicitly change behavior based on whether `overflow_checks` are enabled, regardless of the setting used when standard library was compiled.

This is very similar to the `ub_checks` intrinsic, and refactors the two to use a common mechanism.

The primary use case for this is to allow the new `RangeFrom` iterator to yield the maximum element before overflowing, as requested [here](https://github.com/rust-lang/rust/issues/125687#issuecomment-2151118208). This PR includes a working `IterRangeFrom` implementation based on this new intrinsic that exhibits the desired behavior.

[Prior discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Ability.20to.20select.20code.20based.20on.20.60overflow_checks.60.3F)
2025-11-09 13:22:23 +11:00
Peter Jaszkowiak cc8b95cc54 add overflow_checks intrinsic 2025-11-08 10:57:35 -07:00
sayantn 75de619159 Add alignment parameter to simd_masked_{load,store} 2025-11-04 02:30:59 +05:30
Dawid Lachowicz 2a5dac7682 Remove no longer used contract_checks intrinsic
The contract_checks compiler flag is now used to determine
if runtime contract checks should be enabled, as opposed
to the compiler intrinsic as previously.
2025-10-11 00:16:44 +01:00
Dawid Lachowicz e4ead0ec70 Guard HIR lowered contracts with contract_checks
Refactor contract HIR lowering to ensure no contract code is
executed when contract-checks are disabled.

The call to contract_checks is moved to inside the lowered fn
body, and contract closures are built conditionally, ensuring
no side-effects present in contracts occur when those are disabled.
2025-10-11 00:16:29 +01:00
ltdk e8a8e061bf Make missed precondition-free float intrinsics safe 2025-09-23 18:15:11 -04:00
ltdk 055e05a338 Mark float intrinsics with no preconditions as safe 2025-09-21 20:37:51 -04:00
ltdk 987f9603f9 Sort safe intrinsic list 2025-09-17 15:48:47 -04:00
sayantn 62b4347e80 Add funnel_sh{l,r} functions and intrinsics
- Add a fallback implementation for the intrinsics
 - Add LLVM backend support for funnel shifts

Co-Authored-By: folkertdev <folkert@folkertdev.nl>
2025-09-03 14:13:24 +05:30
Folkert de Vries d25910eaeb make prefetch intrinsics safe 2025-08-20 00:35:42 +02:00
Marcelo Domínguez 250d77e5d7 Complete functionality and general cleanup 2025-08-14 16:30:15 +00:00
Marcelo Domínguez 5c631041aa Basic implementation of autodiff intrinsic 2025-08-14 16:29:58 +00:00
Ralf Jung de1b999ff6 atomicrmw on pointers: move integer-pointer cast hacks into backend 2025-07-23 08:32:55 +02:00
Deadbeef 3f2dc2bd1a add const_make_global; err for const_allocate ptrs if didn't call
Co-Authored-By: Ralf Jung <post@ralfj.de>
Co-Authored-By: Oli Scherer <github333195615777966@oli-obk.de>
2025-07-16 00:32:12 +08:00
Camille GILLOT 36bc0948e0 Generalize TyCtxt::item_name. 2025-07-13 13:50:00 +00:00
Oli Scherer 486ffda9dc Add opaque TypeId handles for CTFE 2025-07-09 16:37:11 +00:00
sayantn 2038405ff7 Add simd_funnel_sh{l,r} and simd_round_ties_even 2025-06-15 04:33:41 +05:30
Ralf Jung 62418f4c56 intrinsics: rename min_align_of to align_of 2025-06-12 17:50:25 +02:00
Ralf Jung 2a3a6150d4 move all intrinsic typeck logic into the one big match 2025-06-07 21:45:58 +02:00
Ralf Jung 8808c9d34b intrinsics: use const generic to set atomic ordering 2025-06-07 21:45:58 +02:00
Oli Scherer fd3da4bebd Replace some Option<Span> with Span and use DUMMY_SP instead of None 2025-06-05 14:14:59 +00:00
Scott McMurray 4668124cc7 slice.get(i) should use a slice projection in MIR, like slice[i] does 2025-05-30 12:04:41 -07:00
Ralf Jung 4794ea176b atomic_load intrinsic: use const generic parameter for ordering 2025-05-28 22:57:55 +02:00
Urgau e7247df590 Use intrinsics for {f16,f32,f64,f128}::{minimum,maximum} operations 2025-05-09 17:11:23 +02:00
Oli Scherer 5d2952100f Use is_lang_item and as_lang_item instead of handrolling their logic 2025-04-22 11:02:37 +00:00