Mirrors/rust - rust - Gitea @ Femelysm.ru

mirror of https://github.com/rust-lang/rust.git synced 2026-05-03 17:35:28 +03:00

Author	SHA1	Message	Date
Jonathan Brouwer	82b5849618	Rollup merge of #150831 - folkertdev:more-va-arg-2, r=workingjubilee c-variadic: make `va_arg` match on `Arch` exhaustive tracking issue: https://github.com/rust-lang/rust/issues/44930 Continuing from https://github.com/rust-lang/rust/pull/150094, the more annoying cases remain. These are mostly very niche targets without Clang `va_arg` implementations, and so it might just be easier to defer to LLVM instead of us getting the ABI subtly wrong. That does mean we cannot stabilize c-variadic on those targets I think. Alternatively we could ask target maintainers to contribute an implementation. I'd honestly prefer they make that change to LVM though (likely by just using `CodeGen::emitVoidPtrVAArg`) that we can mirror. r? @workingjubilee	2026-02-05 12:16:56 +01:00
Jonathan Brouwer	b66ead827c	Rollup merge of #152020 - Sa4dUs:offload-remove-dummy-loads, r=ZuseZ4 Remove dummy loads on offload codegen The current logic generates two dummy loads to prevent some globals from being optimized away. This blocks memtransfer loop hoisting optimizations, so it's time to remove them. r? @ZuseZ4	2026-02-05 08:32:45 +01:00
Folkert de Vries	d2b5ba2ff7	c-variadic: make va_arg match on `Arch` exhaustive	2026-02-05 00:41:10 +01:00
Folkert de Vries	aa0ce237b4	c-variadic: minor cleanups of `va_arg`	2026-02-05 00:38:08 +01:00
bors	db3e99bbab	Auto merge of #150605 - RalfJung:fallback-intrinsic-skip, r=mati865 skip codegen for intrinsics with big fallback bodies if backend does not need them This hopefully fixes the perf regression from https://github.com/rust-lang/rust/pull/148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list. Fixes https://github.com/rust-lang/rust/issues/149945 Cc @scottmcm @bjorn3	2026-02-04 17:12:58 +00:00
Marcelo Domínguez	212c8c3811	Remove dummy loads	2026-02-04 15:26:56 +01:00
bjorn3	d2a0557afb	Convert to inline diagnostics in all codegen backends	2026-02-04 13:12:49 +00:00
Stuart Cook	3d102a7812	Rollup merge of #150893 - ZuseZ4:move-un-register-lib, r=oli-obk offload: move (un)register lib into global_ctors Right now we initialize the openmp/offload runtime before every single offload call, and tear it down directly afterwards. What we should rather do is initialize it once in the binary startup code, and tear it down at the end of the binary execution. Here I implement these changes. Together, our generated IR has a lot less usage of globals, which in turn simplifies the refactoring in https://github.com/rust-lang/rust/pull/150683, where I introduce a new variant of our offload intrinsic. r? oli-obk	2026-01-28 19:03:51 +11:00
Manuel Drehwald	1f11bf6649	Leave note to drop tgt_init_all_rtls in the future	2026-01-27 10:43:22 -08:00
Manuel Drehwald	7eae36f017	Add an early return if handling multiple offload calls	2026-01-27 10:43:03 -08:00
bors	873d4682c7	Auto merge of #151337 - the8472:bail-before-memcpy2, r=Mark-Simulacrum optimize `vec.extend(slice.to_vec())`, take 2 Redoing https://github.com/rust-lang/rust/pull/130998 It was reverted in https://github.com/rust-lang/rust/pull/151150 due to flakiness. I have traced this to layout randomization perturbing the test (the failure reproduces locally with layout randomization), which is now excluded.	2026-01-25 19:45:35 +00:00
bors	75963ce795	Auto merge of #151065 - nagisa:add-preserve-none-abi, r=petrochenkov abi: add a rust-preserve-none calling convention This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature. For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of: foo: push r12 ; do things pop r12 jmp next_step This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses. I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)	2026-01-25 02:49:32 +00:00
Matthias Krüger	3a69035338	Rollup merge of #151346 - folkertdev:simd-splat, r=workingjubilee add `simd_splat` intrinsic Add `simd_splat` which lowers to the LLVM canonical splat sequence. ```llvm insertelement <N x elem> poison, elem %x, i32 0 shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer ``` Right now we try to fake it using one of ```rust fn splat(x: u32) -> u32x8 { u32x8::from_array([x; 8]) } ``` or (in `stdarch`) ```rust fn splat(value: $elem_type) -> $name { #[derive(Copy, Clone)] #[repr(simd)] struct JustOne([$elem_type; 1]); let one = JustOne([value]); // SAFETY: 0 is always in-bounds because we're shuffling // a simd type with exactly one element. unsafe { simd_shuffle!(one, one, [0; $len]) } } ``` Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples: - https://github.com/rust-lang/rust/issues/60637 - https://github.com/rust-lang/rust/issues/137407 - https://github.com/rust-lang/rust/issues/122623 - https://github.com/rust-lang/rust/issues/97804 --- As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends. Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below. Currently this just adds the intrinsic, it does not actually use it anywhere yet.	2026-01-24 21:04:15 +01:00
Simonas Kazlauskas	6db94dbc25	abi: add a rust-preserve-none calling convention This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature. For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of: foo: push r12 ; do things pop r12 jmp next_step This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses. I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)	2026-01-24 19:23:17 +02:00
Jonathan Brouwer	48b9a6c298	Rollup merge of #151527 - tgross35:f16-fixme-cleanup, r=folkertdev Clean up or resolve cfg-related instances of `FIXME(f16_f128)` * Replace target-specific config that has a `FIXME` with `cfg(target_has_reliable_f)` Take care of trivial intrinsic-related FIXMEs * Split `FIXME(f16_f128)` into `FIXME(f16)`, `FIXME(f128)`, or `FIXME(f16,f128)` to more clearly identify what they block The individual commit messages have more details.	2026-01-23 11:07:57 +01:00
Jonathan Brouwer	dec8d6ebcf	Rollup merge of #150780 - fzakaria:fzakaria/section-threshold, r=jackh726 Add -Z large-data-threshold This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64. When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB. This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.	2026-01-23 11:07:55 +01:00
Trevor Gross	490b307740	cleanup: Start splitting `FIXME(f16_f128)` into `f16`, `f128`, or `f16,f128` Make it easier to identify which FIXMEs are blocking stabilization of which type.	2026-01-22 23:41:57 -06:00
Jonathan Brouwer	704eaef9d4	Rollup merge of #151465 - RalfJung:fn-call-vars, r=mati865 codegen: clarify some variable names around function calls I looked at rust-lang/rust#145932 to try to understand how it works, and quickly got lost in the variable names -- what refers to the caller, what to the callee? So here's my attempt at making those more clear. Hopefully the new names are correct.^^ Cc @JamieCunliffe	2026-01-22 13:35:42 +01:00
Jacob Pratt	512cc8d785	Rollup merge of #151442 - clubby789:crate-type-port, r=JonathanBrouwer Port `#![crate_type]` to the attribute parser Tracking issue: https://github.com/rust-lang/rust/issues/131229 ~~Note that the actual parsing that is used in the compiler session is unchanged, as it must happen very early on; this just ports the validation logic.~~ Also added `// tidy-alphabetical-start` to `check_attr.rs` to make it a bit less conflict-prone	2026-01-22 00:37:43 -05:00
Jamie Hill-Daniel	66b78b700b	Port `crate_type` to attribute parser	2026-01-22 02:34:28 +00:00
León Orell Valerian Liehr	558a59258e	Support debuginfo for assoc const bindings	2026-01-21 18:52:08 +01:00
Ralf Jung	29ed211215	codegen: clarify some variable names around function calls	2026-01-21 18:01:30 +01:00
Jacob Pratt	2206d935f7	Rollup merge of #149209 - lto_refactors8, r=jackh726 Move LTO to OngoingCodegen::join This will make it easier to in the future move all this code to link_binary. Follow up to https://github.com/rust-lang/rust/pull/147810 Part of https://github.com/rust-lang/compiler-team/issues/908	2026-01-21 02:04:01 -05:00
Manuel Drehwald	43111396e3	move initialization of omp/ol runtimes into global_ctor/dtor	2026-01-20 20:06:08 -05:00
Jacob Pratt	db9ff0d44f	Rollup merge of #151429 - s390x, r=durin42 s390x: Support aligned stack datalayout LLVM 23 will mark the stack as aligned for more efficient code: https://github.com/llvm/llvm-project/pull/176041 r? durin42 @rustbot label llvm-main	2026-01-20 19:46:32 -05:00
Jacob Pratt	43d2006c25	Rollup merge of #150436 - va-list-copy, r=workingjubilee,RalfJung `c_variadic`: impl `va_copy` and `va_end` as Rust intrinsics tracking issue: https://github.com/rust-lang/rust/issues/44930 Implement `va_copy` as (the rust equivalent of) `memcpy`, which is the behavior of all current LLVM targets. By providing our own implementation, we can guarantee its behavior. These guarantees are important for implementing c-variadics in e.g. const-eval. Discussed in [#t-compiler/const-eval > c-variadics in const-eval](https://rust-lang.zulipchat.com/#narrow/channel/146212-t-compiler.2Fconst-eval/topic/c-variadics.20in.20const-eval/with/565509704). I've also updated the comment for `Drop` a bit. The background here is that the C standard requires that `va_end` is used in the same function (and really, in the same scope) as the corresponding `va_start` or `va_copy`. That is because historically `va_start` would start a scope, which `va_end` would then close. e.g. https://softwarepreservation.computerhistory.org/c_plus_plus/cfront/release_3.0.3/source/incl-master/proto-headers/stdarg.sol ```c #define va_start(ap, parmN) {\ va_buf _va;\ _vastart(ap = (va_list)_va, (char )&parmN + sizeof parmN) #define va_end(ap) } #define va_arg(ap, mode) ((mode *)_vaarg(ap, sizeof (mode))) ``` The C standard still has to consider such implementations, but for Rust they are irrelevant. Hence we can use `Clone` for `va_copy` and `Drop` for `va_end`.	2026-01-20 19:46:29 -05:00
Matthew Maurer	39296ff8f8	s390x: Support aligned stack datalayout LLVM 23 will mark the stack as aligned for more efficient code: https://github.com/llvm/llvm-project/pull/176041	2026-01-20 20:21:11 +00:00
Folkert de Vries	dd9241d150	`c_variadic`: use `Clone` instead of LLVM `va_copy`	2026-01-20 18:38:50 +01:00
Nikita Popov	08da3685ed	Don't use evex512 with LLVM 22 As Intel has walked back on the existence of AVX 10.1-256, LLVM no longer uses evex512 and avx-10.n-512 are now avx-10.n instead, so we can skip all the special handling on LLVM 22.	2026-01-20 14:47:09 +01:00
Nikita Popov	0be66603ac	Avoid passing addrspacecast to lifetime intrinsics Since LLVM 22 the alloca must be passed directly. Do this by stripping the addrspacecast if it exists.	2026-01-20 14:47:04 +01:00
Nikita Popov	bf3ac98d69	Update amdgpu data layout This changed in: https://github.com/llvm/llvm-project/commit/853760bca6aa7a960b154cef8c61f87271870b8a	2026-01-20 14:46:58 +01:00
Stuart Cook	1262ff906b	Rollup merge of #150288 - offload-bench-fix, r=ZuseZ4 Add scalar support for offload This PR adds scalar support to the offload feature. The scalar management has two main parts: On the host side, each scalar arg is casted to `ix` type, zero extended to `i64` and passed to the kernel like that. On the device, the each scalar arg (`i64` at that point), is truncated to `ix` and then casted to the original type. r? @ZuseZ4	2026-01-20 18:00:08 +11:00
Marcelo Domínguez	307a4fcdf8	Add scalar support for both host and device	2026-01-19 22:28:42 +01:00
Folkert de Vries	80c0b99de0	add `simd_splat` intrinsic	2026-01-19 16:48:28 +01:00
bors	d940e56841	Auto merge of #151363 - JonathanBrouwer:rollup-yIXELnN, r=JonathanBrouwer Rollup of 2 pull requests Successful merges: - rust-lang/rust#151336 (Port rustc codegen attrs) - rust-lang/rust#151359 (compiler: Temporarily re-export `assert_matches!` to reduce stabilization churn) r? @ghost	2026-01-19 13:09:33 +00:00
Jonathan Brouwer	a56e2d3037	Rollup merge of #151071 - gen-openmp-metadata, r=nnethercote Generate openmp metadata LLVM has an openmp-opt pass, which is part of the default O3 pipeline. The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations. In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it. r? compiler	2026-01-19 08:31:31 +01:00
Zalathar	7ec34defe9	Temporarily re-export `assert_matches!` to reduce stabilization churn	2026-01-19 18:26:53 +11:00
Edvin Bryntesson	9a931e8bf2	Port `#[rustc_allocator_zeroed_variant]` to attr parser	2026-01-18 20:13:13 +01:00
Manuel Drehwald	5c85d522d0	Generate global openmp metadata to trigger llvm openmp-opt pass	2026-01-16 14:57:32 -05:00
Jacob Pratt	6912c676cd	Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee Add amdgpu_dispatch_ptr intrinsic There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way. As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`. Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu. The HSA kernel dispatch packet contains important information like the launch size and workgroup size. The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference. The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`. The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function. Is this ok or is there a better way (also, should it return a pointer instead of a reference)? Short version: ```rust #[cfg(target_arch = "amdgpu")] pub fn amdgpu_dispatch_ptr() -> *const (); ``` Tracking issue: rust-lang/rust#135024	2026-01-15 19:35:46 -05:00
The 8472	3df0dc8803	mark rust_dealloc as captures(address) Co-authored-by: Ralf Jung <post@ralfj.de>	2026-01-15 20:38:40 +01:00
Jieyou Xu	cd79ff2e2c	Revert "avoid phi node for pointers flowing into Vec appends #130998 " This reverts PR <https://github.com/rust-lang/rust/pull/130998> because the added test seems to be flaky / non-deterministic, and has been failing in unrelated PRs during merge CI.	2026-01-15 09:37:16 +08:00
Jonathan Brouwer	d23e780a57	Rollup merge of #150966 - arch-powerpc64le, r=petrochenkov rustc_target: Remove unused Arch::PowerPC64LE This variant has been added in https://github.com/rust-lang/rust/pull/147645, but actually unused since target_arch for powerpc64le- targets is "powerpc64". (The difference between powerpc64- and powerpc64le- targets is identified by target_endian.) Note: This is an internal cleanup and does NOT remove `powerpc64le-*` targets.	2026-01-14 22:29:57 +01:00
bors	86a49fd71f	Auto merge of #130998 - the8472:bail-before-memcpy, r=nnethercote avoid phi node for pointers flowing into Vec appends Elide temporary allocations in patterns like `vec.append(slice.to_vec())` related discussion: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/nocapture.20and.20allocation.20elimination	2026-01-14 16:36:26 +00:00
Taiki Endo	7d80e7d720	rustc_target: Remove unused Arch::PowerPC64LE target_arch for powerpc64le- targets is "powerpc64".	2026-01-14 23:12:57 +09:00
Marcelo Domínguez	2c9c5d14a2	Allow bounded types	2026-01-14 11:37:31 +01:00
Marcelo Domínguez	bc751adcdb	Minor doc and ty fixes	2026-01-14 11:37:31 +01:00
Nicholas Nethercote	3aa31788b5	Remove `Deref`/`DerefMut` impl for `Providers`. It's described as a "backwards compatibility hack to keep the diff small". Removing it requires only a modest amount of churn, and the resulting code is clearer without the invisible derefs.	2026-01-14 15:55:59 +11:00
The 8472	e6071522db	mark rust_dealloc as captures(address) Co-authored-by: Ralf Jung <post@ralfj.de>	2026-01-12 02:54:22 +01:00
Nicholas Nethercote	5e510929c6	Remove useless call to `erase_and_anonymize_regions`. The only thing we do with the result is consult the `.def_id` field, which is unaffected by `erase_and_anonymize_regions`.	2026-01-12 09:22:58 +11:00

1 2 3 4 5 ...

3225 Commits