3468 Commits

Author SHA1 Message Date
Marcelo Domínguez 307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Folkert de Vries 80c0b99de0 add simd_splat intrinsic 2026-01-19 16:48:28 +01:00
bors d940e56841 Auto merge of #151363 - JonathanBrouwer:rollup-yIXELnN, r=JonathanBrouwer
Rollup of 2 pull requests

Successful merges:

 - rust-lang/rust#151336 (Port rustc codegen attrs)
 - rust-lang/rust#151359 (compiler: Temporarily re-export `assert_matches!` to reduce stabilization churn)

r? @ghost
2026-01-19 13:09:33 +00:00
Jonathan Brouwer a56e2d3037 Rollup merge of #151071 - gen-openmp-metadata, r=nnethercote
Generate openmp metadata

LLVM has an openmp-opt pass, which is part of the default O3 pipeline.
The pass bails if we don't have a global called openmp, so let's generate it if people enable our experimental offload feature. openmp is a superset of the offload feature, so they share optimizations.
In follow-up PRs I'll start verifying that LLVM optimizes Rust the way we want it.

r? compiler
2026-01-19 08:31:31 +01:00
Zalathar 7ec34defe9 Temporarily re-export assert_matches! to reduce stabilization churn 2026-01-19 18:26:53 +11:00
Edvin Bryntesson 9a931e8bf2 Port #[rustc_allocator_zeroed_variant] to attr parser 2026-01-18 20:13:13 +01:00
Manuel Drehwald 5c85d522d0 Generate global openmp metadata to trigger llvm openmp-opt pass 2026-01-16 14:57:32 -05:00
Jacob Pratt 6912c676cd Rollup merge of #150607 - dispatch-ptr-intrinsic, r=workingjubilee
Add amdgpu_dispatch_ptr intrinsic

There is an ongoing discussion in rust-lang/rust#150452 about using address spaces from the Rust language in some way.
As that discussion will likely not conclude soon, this PR adds one rustc_intrinsic with an addrspacecast to unblock getting basic information like launch and workgroup size and make it possible to implement something like `core::gpu`.

Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to `addrspace(0)`, so it can be returned as a Rust reference.
The returned pointer/reference is valid for the whole program lifetime, and is therefore `'static`.
The return type of the intrinsic (`&'static ()`) does not mention the struct so that rustc does not need to know the exact struct type. An alternative would be to define the struct as lang item or add a generic argument to the function.
Is this ok or is there a better way (also, should it return a pointer instead of a reference)?

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```

Tracking issue: rust-lang/rust#135024
2026-01-15 19:35:46 -05:00
The 8472 3df0dc8803 mark rust_dealloc as captures(address)
Co-authored-by: Ralf Jung <post@ralfj.de>
2026-01-15 20:38:40 +01:00
Jieyou Xu cd79ff2e2c Revert "avoid phi node for pointers flowing into Vec appends #130998"
This reverts PR <https://github.com/rust-lang/rust/pull/130998> because
the added test seems to be flaky / non-deterministic, and has been
failing in unrelated PRs during merge CI.
2026-01-15 09:37:16 +08:00
Jonathan Brouwer d23e780a57 Rollup merge of #150966 - arch-powerpc64le, r=petrochenkov
rustc_target: Remove unused Arch::PowerPC64LE

This variant has been added in https://github.com/rust-lang/rust/pull/147645, but actually unused since target_arch for powerpc64le- targets is "powerpc64". (The difference between powerpc64- and powerpc64le- targets is identified by target_endian.)

Note: This is an internal cleanup and does NOT remove `powerpc64le-*` targets.
2026-01-14 22:29:57 +01:00
bors 86a49fd71f Auto merge of #130998 - the8472:bail-before-memcpy, r=nnethercote
avoid phi node for pointers flowing into Vec appends

Elide temporary allocations in patterns like `vec.append(slice.to_vec())`

related discussion: https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/nocapture.20and.20allocation.20elimination
2026-01-14 16:36:26 +00:00
Taiki Endo 7d80e7d720 rustc_target: Remove unused Arch::PowerPC64LE
target_arch for powerpc64le- targets is "powerpc64".
2026-01-14 23:12:57 +09:00
Marcelo Domínguez 2c9c5d14a2 Allow bounded types 2026-01-14 11:37:31 +01:00
Marcelo Domínguez bc751adcdb Minor doc and ty fixes 2026-01-14 11:37:31 +01:00
Nicholas Nethercote 3aa31788b5 Remove Deref/DerefMut impl for Providers.
It's described as a "backwards compatibility hack to keep the diff
small". Removing it requires only a modest amount of churn, and the
resulting code is clearer without the invisible derefs.
2026-01-14 15:55:59 +11:00
The 8472 e6071522db mark rust_dealloc as captures(address)
Co-authored-by: Ralf Jung <post@ralfj.de>
2026-01-12 02:54:22 +01:00
Nicholas Nethercote 5e510929c6 Remove useless call to erase_and_anonymize_regions.
The only thing we do with the result is consult the `.def_id` field,
which is unaffected by `erase_and_anonymize_regions`.
2026-01-12 09:22:58 +11:00
Matthias Krüger cbdfa9167f Rollup merge of #150908 - llvm-f16-cfg, r=nikic
llvm: Update `reliable_f16` configuration for LLVM22

Since yesterday, the LLVM `main` branch should have working `f16` on all platforms that Rust supports; this will be LLVM version 22, so update how `cfg(target_has_reliable_f16)` is set to reflect this.

Within the rust-lang organization, this currently has no effect. The goal is to start catching problems as early as possible in external CI that runs top-of-tree rust against top-of-tree LLVM, and once testing for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that we can fix any problems that show up before the bump actually happens, meaning `f16` will be about ready for stabilization at that point (with some considerations for the GCC patch at [1] propagating).

References:

* https://github.com/llvm/llvm-project/commit/919021b0df8c91417784bfd84a6ad4869a0d2206
* https://github.com/llvm/llvm-project/commit/054ee2f8706b582859fcf96d1771aa68c37d9e6a
* https://github.com/llvm/llvm-project/commit/db26ce5c5572a1a54ce307c762689ab63e5c5485
* https://github.com/llvm/llvm-project/commit/549d7c4f35a99598a269004ee13b237d2565b5ec
* https://github.com/llvm/llvm-project/commit/4903c6260cbd781881906007f9c82aceb71fd7c7

[1]: https://github.com/gcc-mirror/gcc/commit/8b6a18ecaf44553230b90bf28adfb9fe9c9d5ab9
2026-01-11 09:56:50 +01:00
Stuart Cook 30585ebbd3 Rollup merge of #150494 - extern_linkage_dso_local, r=bjorn3
Fix dso_local for external statics with linkage

Tracking issue of the feature: rust-lang/rust#127488

DSO local attributes are not correctly applied to extern statics with `#[linkage = "foo"]` as we generate an internal global for such statics, and the we evaluate (and apply) DSO attributes on the internal one instead.

Fix this by applying DSO local attributes on the actually extern ones, too.
2026-01-11 14:27:55 +11:00
Trevor Gross 07fa70e104 llvm: Update reliable_f16 configuration for LLVM22
Since yesterday, the LLVM `main` branch should have working `f16` on all
platforms that Rust supports; this will be LLVM version 22, so update
how `cfg(target_has_reliable_f16)` is set to reflect this.

Within the rust-lang organization, this currently has no effect. The
goal is to start catching problems as early as possible in external CI
that runs top-of-tree rust against top-of-tree LLVM, and once testing
for the rust-lang bump to LLVM 22 starts. Hopefully this will mean that
we can fix any problems that show up before the bump actually happens,
meaning `f16` will be about ready for stabilization at that point (with
some considerations for the GCC patch at [1] propagating).

References:

* https://github.com/llvm/llvm-project/commit/919021b0df8c91417784bfd84a6ad4869a0d2206
* https://github.com/llvm/llvm-project/commit/054ee2f8706b582859fcf96d1771aa68c37d9e6a
* https://github.com/llvm/llvm-project/commit/db26ce5c5572a1a54ce307c762689ab63e5c5485
* https://github.com/llvm/llvm-project/commit/549d7c4f35a99598a269004ee13b237d2565b5ec
* https://github.com/llvm/llvm-project/commit/4903c6260cbd781881906007f9c82aceb71fd7c7

[1]: https://github.com/gcc-mirror/gcc/commit/8b6a18ecaf44553230b90bf28adfb9fe9c9d5ab9
2026-01-09 19:44:24 -06:00
Tshepang Mbambo 229673ac85 make sentence more simple 2026-01-09 22:49:32 +02:00
Tshepang Mbambo 8e61f0de27 cg_llvm: add a pause to make comment less confusing 2026-01-09 22:47:59 +02:00
Flakebi 91d4e40e02 Add amdgpu_dispatch_ptr intrinsic
Add a rustc intrinsic `amdgpu_dispatch_ptr` to access the kernel
dispatch packet on amdgpu.
The HSA kernel dispatch packet contains important information like the
launch size and workgroup size.

The Rust intrinsic lowers to the `llvm.amdgcn.dispatch.ptr` LLVM
intrinsic, which returns a `ptr addrspace(4)`, plus an addrspacecast to
`addrspace(0)`, so it can be returned as a Rust reference.

The returned pointer/reference is valid for the whole program lifetime,
and is therefore `'static`.

The return type of the intrinsic (`*const ()`) does not mention the
struct so that rustc does not need to know the exact struct type.
An alternative would be to define the struct as lang item or add a
generic argument to the function.

Short version:
```rust
#[cfg(target_arch = "amdgpu")]
pub fn amdgpu_dispatch_ptr() -> *const ();
```
2026-01-09 10:41:37 +01:00
Matthias Krüger d464630301 Rollup merge of #150811 - defid-aliases, r=bjorn3
Store defids instead of symbol names in the aliases list

I was honestly surprised this worked in the past. This causes a cycle error since we now compute a symbol name in codegen_attrs, and then compute codegen attrs when we try to get the symbol name.

It only worked when there weren't any codegen attributes to begin with, causing symbol name computation to skip the call to codegen_attrs.

Like this we won't have the same problem.

r? @bjorn3
2026-01-08 22:21:21 +01:00
bjorn3 fe9715b5e8 Remove support for ScalarPair unadjusted arguments 2026-01-08 18:44:38 +00:00
Matthias Krüger fc4464bf7b Rollup merge of #150094 - more-va-arg, r=workingjubilee
`c_variadic`: provide our own `va_arg` implementation for more targets

tracking issue: https://github.com/rust-lang/rust/issues/44930

Provide our own implementations in order to guarantee the behavior of `va_arg`. We will only be able to stabilize `c_variadic` on targets where we know and guarantee the properties of `va_arg`.

r? workingjubilee
2026-01-08 16:25:28 +01:00
Jana Dönszelmann 6b88c6b7c2 store defids instead of symbol names in the aliases list 2026-01-08 16:25:27 +01:00
bjorn3 f1ab003658 Don't compute FnAbi for LLVM intrinsics in backends 2026-01-08 10:47:29 +00:00
bjorn3 acc8c0bd65 Reduce usage of FnAbi in codegen_llvm_intrinsic_call 2026-01-08 10:45:43 +00:00
Matthias Krüger dadacb6589 Rollup merge of #150747 - fix/liloading-enzyme-err, r=lqd
tests/ui/runtime/on-broken-pipe/with-rustc_main.rs: Not needed so remove

related: https://github.com/rust-lang/rust/issues/145899#issuecomment-3705550673

print error from EnzymeWrapper::get_or_init(sysroot) as a note

r? @ZuseZ4

e.g.

1. when libEnzyme not found

```shell
$ rustc +stage1 -Z autodiff=Enable -C lto=fat src/main.rs
error: autodiff backend not found in the sysroot: failed to find a `libEnzyme-21` folder in the sysroot candidates:
       * /Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib
  |
  = note: it will be distributed via rustup in the future
```

2. when could not load libEnzyme successfully

```shell
rustc +stage1 -Z autodiff=Enable -C lto=fat src/main.rs
error: failed to load our autodiff backend: DlOpen { source: "dlopen(/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib, 0x0005): tried: \'/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (slice is not valid mach-o file), \'/System/Volumes/Preboot/Cryptexes/OS/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (no such file), \'/Volumes/WD_BLACK_SN850X_HS_1TB/rust-lang/rust/build/aarch64-apple-darwin/stage1/lib/rustlib/aarch64-apple-darwin/lib/libEnzyme-21.dylib\' (slice is not valid mach-o file)" }
```
2026-01-08 07:27:55 +01:00
Farid Zakaria 93f2e80f4a Add -Z large-data-threshold
This flag allows specifying the threshold size for placing static data
in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses
RIP-relative addressing (32-bit offsets), while larger data uses
absolute 64-bit addressing. This allows the compiler to generate more
efficient code for smaller data while still supporting data larger than
2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang.
The default threshold is 65536 bytes (64KB) if not specified, matching
LLVM's default behavior.
2026-01-07 11:57:48 -08:00
sgasho 14ac6a1d3a Modified to output error messages appropriate to the situation 2026-01-07 00:33:57 +09:00
Augie Fackler accfc34e43 rustc_codegen_llvm: update alignment for double on AIX
This was recently fixed upstream in LLVM, so we update our default
layout to match.

@rustbot label: +llvm-main
2026-01-05 14:08:51 -05:00
Manuel Drehwald fa584faca5 Update test and verify that tgt_(un)register_lib have the right type 2026-01-04 06:58:31 -08:00
sgasho d7fa6e527f enrich error info when tries to dlopen Enzyme 2026-01-04 22:56:17 +09:00
Ralf Jung 57e44f5046 skip codegen for intrinsics with big fallback bodies if backend does not need them 2026-01-02 23:14:02 +01:00
Jonathan Brouwer 58f5089d8a Rollup merge of #150444 - Sa4dUs:offload-intrinsic2, r=ZuseZ4
Expose kernel launch options as offload intrinsic args

Allows modifying the workgroup and thread grid dimensions directly from the intrinsic call.

```rust
core::intrinsics::offload(_kernel_1, [256, 1, 1], [32, 1, 1], (x,))
```

r? `@ZuseZ4`
2026-01-02 19:00:15 +01:00
Marcelo Domínguez 58e2610f71 Expose workgroup/thread dims as intrinsic args 2026-01-02 11:50:32 +01:00
Wesley Wiser f74896fc01 Cleanup debuginfo_compression unstable flag
It isn't necessary to declare the option as a top-level flag when
it is accessible from `unstable_opts`.
2026-01-01 19:30:02 -06:00
Folkert de Vries f89cce3acb c_variadic: provide va_arg for more targets 2026-01-01 13:38:53 +01:00
bors 8d670b93d4 Auto merge of #150546 - JonathanBrouwer:rollup-jkqji1j, r=JonathanBrouwer
Rollup of 5 pull requests

Successful merges:

 - rust-lang/rust#146798 (RISC-V: Implement (Zkne or Zknd) intrinsics correctly)
 - rust-lang/rust#150337 (docs: fix typo in std::io::buffered)
 - rust-lang/rust#150530 (Remove `feature(string_deref_patterns)`)
 - rust-lang/rust#150543 (`rust-analyzer` subtree update)
 - rust-lang/rust#150544 (Use --print target-libdir in run-make tests)

r? `@ghost`
`@rustbot` modify labels: rollup
2025-12-31 18:42:17 +00:00
Jonathan Brouwer dc103c4cd9 Rollup merge of #146798 - a4lg:riscv-intrinsics-zkne_or_zknd, r=Amanieu
RISC-V: Implement (Zkne or Zknd) intrinsics correctly

On rust-lang/stdarch#1765, it has been pointed out that two RISC-V (64-bit only) intrinsics to perform AES key scheduling have wrong target feature.
`aes64ks1i` and `aes64ks2` instructions require *either* Zkne (scalar cryptography: AES encryption) or Zknd (scalar cryptography: AES decryption) extension (or both) but corresponding Rust intrinsics (in `core::arch::riscv64`) required *both* Zkne and Zknd extensions.

An excerpt from the original intrinsics:

```rust
#[target_feature(enable = "zkne", enable = "zknd")]
```

To fix that, we need to:

1.  Represent a condition where *either* Zkne or Zknd is available and
2.  Workaround an issue: `llvm.riscv.aes64ks1i` / `llvm.riscv.aes64ks2` LLVM intrinsics require either Zkne or Zknd extension.

This PR attempts to resolve them by:

1.  Adding a perma-unstable RISC-V target feature: `zkne_or_zknd` (implied from both `zkne` and `zknd`) and
2.  Using inline assembly to construct machine code directly (because `zkne_or_zknd` alone cannot imply neither Zkne nor Zknd, we cannot use LLVM intrinsics).

The author confirmed that we can construct an AES key scheduling function with decent performance using fixed `aes64ks1i` and `aes64ks2` intrinsics (with optimization enabled).
2025-12-31 17:32:04 +01:00
Jonathan Brouwer d898dccc21 Rollup merge of #150511 - Sa4dUs:offload-inline, r=ZuseZ4
Allow inline calls to offload intrinsic

Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block.

r? `@ZuseZ4`

fixes: https://github.com/rust-lang/rust/issues/150413
2025-12-31 14:30:48 +01:00
Marcelo Domínguez 9d8b4cc70d Restore builder at the end of saved bb 2025-12-31 13:10:29 +01:00
Gary Guo 5467a398c2 Fix dso_local for external statics with linkage
The current code applies `dso_local` to the internal generated symbols
instead of the actually-externally one.
2025-12-29 19:26:34 +00:00
dianqk fe075ad212 Removes the serde dependency in rustc_codegen_llvm 2025-12-28 15:52:20 +08:00
bjorn3 9bcd6ed4c9 Partially inline get_fn_addr/get_fn in codegen_llvm_intrinsic_call
This moves all LLVM intrinsic handling out of the regular call path for
cg_gcc and makes it easier to hook into this code for future cg_llvm
changes.
2025-12-27 17:46:26 +00:00
bjorn3 ae8ef1f5eb Inline BuilderMethods::call for intrinsics
Intrinsics only need a fraction of the functionality offered by
BuilderMethods::call and in particular don't need the FnAbi to be
computed other than (currently) as step towards computing the function
value type.
2025-12-27 17:46:26 +00:00
bjorn3 62e17e920c Move llvm intrinsic call to backend 2025-12-27 17:46:25 +00:00