Mirrors/rust - rust - Gitea @ Femelysm.ru

mirror of https://github.com/rust-lang/rust.git synced 2026-04-26 13:01:27 +03:00

Author	SHA1	Message	Date
Jonathan Brouwer	dde4886801	Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst Add intrinsic for launch-sized workgroup memory on GPUs Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now. Tracking issue: rust-lang/rust#135516	2026-04-25 23:07:48 +02:00
Flakebi	13ec3de673	Add intrinsic for launch-sized workgroup memory on GPUs Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now.	2026-04-24 10:03:45 +02:00
Adwin White	6279106e72	fix all errors	2026-04-20 00:18:28 +08:00
teor	dafb6bb801	Refactor FnDecl and FnSig flags into packed structs	2026-04-16 07:08:08 +10:00
David Wood	957320cdb1	cg_llvm: `sve_cast` intrinsic Abstract over the existing `simd_cast` intrinsic to implement a new `sve_cast` intrinsic - this is better than allowing scalable vectors to be used with all of the generic `simd_*` intrinsics.	2026-04-03 10:37:42 +00:00
David Wood	4fbcb031de	cg_llvm: `sve_tuple_{create,get,set}` intrinsics Clang changed to representing tuples of scalable vectors as structs rather than as wide vectors (that is, scalable vector types where the `N` part of the `<vscale x N x ty>` type was multiplied by the number of vectors). rustc mirrored this in the initial implementation of scalable vectors. Earlier versions of our patches used the wide vector representation and our intrinsic patches used the legacy `llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating these tuples/getting/setting the vectors, which were only supported due to LLVM's `AutoUpgrade` pass converting these intrinsics into `llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics with the wide vector representation. With the current struct representation, Clang has special handling in codegen for generating `insertvalue`/`extractvalue` instructions for these operations, which must be replicated by rustc's codegen for our intrinsics to use. This patch implements new intrinsics in `core::intrinsics::scalable` (mirroring the structure of `core::intrinsics::simd`) which rustc lowers to the appropriate `insertvalue`/`extractvalue` instructions.	2026-04-03 10:27:30 +00:00
Guillaume Gomez	67ab3ac423	Rollup merge of #154043 - RalfJung:simd-min-max, r=Amanieu,calebzulawski,antoyo simd_fmin/fmax: make semantics and name consistent with scalar intrinsics This is the SIMD version of https://github.com/rust-lang/rust/pull/153343: change the documented semantics of the SIMD float min/max intrinsics to that of the scalar intrinsics, and also make the name consistent. The overall semantic change this amounts to is that we restrict the non-determinism: the old semantics effectively mean "when one input is an SNaN, the result non-deterministically is a NaN or the other input"; the new semantics say that in this case the other input must be returned. For all other cases, old and new semantics are equivalent. This means all users of these intrinsics that were correct with the old semantics are still correct: the overall set of possible behaviors has become smaller, no new possible behaviors are being added. In terms of providers of this API: - Miri, GCC, and cranelift already implement the new semantics, so no changes are needed. - LLVM is adjusted to use `minimumnum nsz` instead of `minnum`, thus giving us the new semantics. In terms of consumers of this API: - Portable SIMD almost certainly wants to match the scalar behavior, so this is strictly a bugfix here. - Stdarch mostly stopped using the intrinsic, except on nvptx, where arguably the new semantics are closer to what we actually want than the old semantics (https://github.com/rust-lang/stdarch/issues/2056). Q: Should there be an `f` in the intrinsic name to indicate that it is for floats? E.g., `simd_fminimum_number_nsz`? Also see https://github.com/rust-lang/rust/issues/153395.	2026-03-29 00:06:50 +01:00
Ralf Jung	986a280644	simd_fmin/fmax: make semantics and name consistent with scalar intrinsics	2026-03-18 15:17:56 +01:00
N1ark	abb5228ec1	Merge `fabsfN` into `fabs::<F>` Add `bounds::FloatPrimitive` Exhaustive float pattern match Fix GCC use span bugs	2026-03-16 21:49:04 +00:00
Ralf Jung	c7220f423b	rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering	2026-03-15 14:53:00 +01:00
Benno Lossin	7b428597ff	add field representing types	2026-02-27 15:54:20 +01:00
jasper3108	7857058a6b	nix vtable_for intrinsic	2026-02-20 10:16:36 +01:00
jasper3108	01627b7441	Support getting TypeId's Trait and vtable	2026-02-20 10:16:36 +01:00
Ralf Jung	5e65109f21	add write_box_via_move intrinsic and use it for vec! This allows us to get rid of box_new entirely	2026-02-16 17:27:40 +01:00
Folkert de Vries	b935f379b4	implement `carryless_mul`	2026-02-14 21:23:30 +01:00
Matthias Krüger	3a69035338	Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee add `simd_splat` intrinsic Add `simd_splat` which lowers to the LLVM canonical splat sequence. ```llvm insertelement <N x elem> poison, elem %x, i32 0 shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer ``` Right now we try to fake it using one of ```rust fn splat(x: u32) -> u32x8 { u32x8::from_array([x; 8]) } ``` or (in `stdarch`) ```rust fn splat(value: $elem_type) -> $name { #[derive(Copy, Clone)] #[repr(simd)] struct JustOne([$elem_type; 1]); let one = JustOne([value]); // SAFETY: 0 is always in-bounds because we're shuffling // a simd type with exactly one element. unsafe { simd_shuffle!(one, one, [0; $len]) } } ``` Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples: - https://github.com/rust-lang/rust/issues/60637 - https://github.com/rust-lang/rust/issues/137407 - https://github.com/rust-lang/rust/issues/122623 - https://github.com/rust-lang/rust/issues/97804 --- As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends. Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below. Currently this just adds the intrinsic, it does not actually use it anywhere yet.	2026-01-24 21:04:15 +01:00
Folkert de Vries	dd9241d150	`c_variadic`: use `Clone` instead of LLVM `va_copy`	2026-01-20 18:38:50 +01:00
Folkert de Vries	80c0b99de0	add `simd_splat` intrinsic	2026-01-19 16:48:28 +01:00
Jacob Pratt	6912c676cd	Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee Add amdgpu_dispatch_ptr intrinsic There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way. As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`. Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Is this ok or is there a better way (also, should it return a pointer instead of a reference)? Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> *const (); ``` Tracking issue: rust-lang/rust#135024	2026-01-15 19:35:46 -05:00
Flakebi	91d4e40e02	Add amdgpu_dispatch_ptr intrinsic Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`const ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> const (); ```	2026-01-09 10:41:37 +01:00
Oli Scherer	a3359bdd4f	Compile-Time Reflection MVP: tuples	2026-01-08 11:41:00 +00:00
Marcelo Domínguez	58e2610f71	Expose workgroup/thread dims as intrinsic args	2026-01-02 11:50:32 +01:00
Ivar Flakstad	d5bf1a4c9a	Introduce `vtable_for` intrinsic and use it to implement `try_as_dyn` and `try_as_dyn_mut` for fallible coercion from `&T` / `&mut T` to `&dyn Trait`.	2025-12-16 06:39:58 -04:00
Marcelo Domínguez	5128ce10a0	Implement offload intrinsic	2025-11-25 20:04:27 +01:00
Camille Gillot	72444372ae	Replace OffsetOf by an actual sum.	2025-11-18 00:10:03 +00:00
Stuart Cook	d3475140ee	Rollup merge of #128666 - pitaj:intrinsic-overflow_checks, r=BoxyUwU Add `overflow_checks` intrinsic This adds an intrinsic which allows code in a pre-built library to inherit the overflow checks option from a crate depending on it. This enables code in the standard library to explicitly change behavior based on whether `overflow_checks` are enabled, regardless of the setting used when standard library was compiled. This is very similar to the `ub_checks` intrinsic, and refactors the two to use a common mechanism. The primary use case for this is to allow the new `RangeFrom` iterator to yield the maximum element before overflowing, as requested [here](https://github.com/rust-lang/rust/issues/125687#issuecomment-2151118208). This PR includes a working `IterRangeFrom` implementation based on this new intrinsic that exhibits the desired behavior. [Prior discussion on Zulip](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Ability.20to.20select.20code.20based.20on.20.60overflow_checks.60.3F)	2025-11-09 13:22:23 +11:00
Peter Jaszkowiak	cc8b95cc54	add `overflow_checks` intrinsic	2025-11-08 10:57:35 -07:00
sayantn	75de619159	Add alignment parameter to `simd_masked_{load,store}`	2025-11-04 02:30:59 +05:30
Dawid Lachowicz	2a5dac7682	Remove no longer used contract_checks intrinsic The contract_checks compiler flag is now used to determine if runtime contract checks should be enabled, as opposed to the compiler intrinsic as previously.	2025-10-11 00:16:44 +01:00
Dawid Lachowicz	e4ead0ec70	Guard HIR lowered contracts with contract_checks Refactor contract HIR lowering to ensure no contract code is executed when contract-checks are disabled. The call to contract_checks is moved to inside the lowered fn body, and contract closures are built conditionally, ensuring no side-effects present in contracts occur when those are disabled.	2025-10-11 00:16:29 +01:00
ltdk	e8a8e061bf	Make missed precondition-free float intrinsics safe	2025-09-23 18:15:11 -04:00
ltdk	055e05a338	Mark float intrinsics with no preconditions as safe	2025-09-21 20:37:51 -04:00
ltdk	987f9603f9	Sort safe intrinsic list	2025-09-17 15:48:47 -04:00
sayantn	62b4347e80	Add `funnel_sh{l,r}` functions and intrinsics - Add a fallback implementation for the intrinsics - Add LLVM backend support for funnel shifts Co-Authored-By: folkertdev <folkert@folkertdev.nl>	2025-09-03 14:13:24 +05:30
Folkert de Vries	d25910eaeb	make `prefetch` intrinsics safe	2025-08-20 00:35:42 +02:00
Marcelo Domínguez	250d77e5d7	Complete functionality and general cleanup	2025-08-14 16:30:15 +00:00
Marcelo Domínguez	5c631041aa	Basic implementation of `autodiff` intrinsic	2025-08-14 16:29:58 +00:00
Ralf Jung	de1b999ff6	atomicrmw on pointers: move integer-pointer cast hacks into backend	2025-07-23 08:32:55 +02:00
Deadbeef	3f2dc2bd1a	add `const_make_global`; err for `const_allocate` ptrs if didn't call Co-Authored-By: Ralf Jung <post@ralfj.de> Co-Authored-By: Oli Scherer <github333195615777966@oli-obk.de>	2025-07-16 00:32:12 +08:00
Camille GILLOT	36bc0948e0	Generalize TyCtxt::item_name.	2025-07-13 13:50:00 +00:00
Oli Scherer	486ffda9dc	Add opaque TypeId handles for CTFE	2025-07-09 16:37:11 +00:00
sayantn	2038405ff7	Add `simd_funnel_sh{l,r}` and `simd_round_ties_even`	2025-06-15 04:33:41 +05:30
Ralf Jung	62418f4c56	intrinsics: rename min_align_of to align_of	2025-06-12 17:50:25 +02:00
Ralf Jung	2a3a6150d4	move all intrinsic typeck logic into the one big match	2025-06-07 21:45:58 +02:00
Ralf Jung	8808c9d34b	intrinsics: use const generic to set atomic ordering	2025-06-07 21:45:58 +02:00
Oli Scherer	fd3da4bebd	Replace some `Option<Span>` with `Span` and use DUMMY_SP instead of None	2025-06-05 14:14:59 +00:00
Scott McMurray	4668124cc7	`slice.get(i)` should use a slice projection in MIR, like `slice[i]` does	2025-05-30 12:04:41 -07:00
Ralf Jung	4794ea176b	atomic_load intrinsic: use const generic parameter for ordering	2025-05-28 22:57:55 +02:00
Urgau	e7247df590	Use intrinsics for `{f16,f32,f64,f128}::{minimum,maximum}` operations	2025-05-09 17:11:23 +02:00
Oli Scherer	5d2952100f	Use `is_lang_item` and `as_lang_item` instead of handrolling their logic	2025-04-22 11:02:37 +00:00

1 2 3 4 5

203 Commits