764 Commits

Author SHA1 Message Date
Jonathan Brouwer dde4886801 Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst
Add intrinsic for launch-sized workgroup memory on GPUs

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.

Tracking issue: rust-lang/rust#135516
2026-04-25 23:07:48 +02:00
Flakebi 13ec3de673 Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
2026-04-24 10:03:45 +02:00
Augie Fackler f6b8f0b6f1 rustc_llvm: update opt-level handling for LLVM 23
LLVM 23 removed Os and Oz optimization pipelines and the PR says to use
O2 with optsize or minsize instead as appropriate.
2026-04-22 13:25:35 -04:00
sayantn a5372be2a1 Add target arch verification for LLVM intrinsics 2026-04-12 23:33:27 +05:30
sayantn c21f4ee437 Check for AutoUpgraded intrinsics, and lint on uses of deprecated intrinsics 2026-04-12 23:33:15 +05:30
Jonathan Brouwer 66a00ba2ef Rollup merge of #153995 - Flakebi:gpu-use-convergent, r=nnethercote
Use convergent attribute to funcs for GPU targets

On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.

cc @kjetilkjeka, @kulst for nvptx
cc @ZuseZ4

r? @nnethercote, as you already reviewed this in the other PR

Split out from rust-lang/rust#149637, the part here should be uncontroversial.
2026-04-08 14:21:57 +02:00
David Wood a24ee0329e cg_llvm/debuginfo: scalable vectors
Generate debuginfo for scalable vectors, following the structure that
Clang generates for scalable vectors.
2026-04-03 10:37:42 +00:00
bors fb27476aaf Auto merge of #154468 - Kobzol:revert-154200, r=dingxiangfei2009
Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009"

This reverts commit 2f1603077b, reversing
changes made to 6e3c17424d.

Debugging perf. hit from https://github.com/rust-lang/rust/pull/154384.

The binary size hits from https://github.com/rust-lang/rust/pull/154468#issuecomment-4144557076 were due to this PR, not all of the copmile-time hits though.
2026-03-28 16:59:18 +00:00
Manuel Drehwald a3261a2307 Revert "Link LLVM dynamically on aarch64-apple-darwin"
This reverts commit e7c268f883.
2026-03-28 05:11:48 +01:00
Jakub Beránek 8b44562bc8 Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009"
This reverts commit 2f1603077b, reversing
changes made to 6e3c17424d.
2026-03-27 20:08:24 +01:00
Jonathan Brouwer 2f1603077b Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009
debuginfo: emit DW_TAG_call_site entries

Set `FlagAllCallsDescribed` on function definition DIEs so LLVM emits DW_TAG_call_site entries, letting debuggers and analysis tools track tail calls.
2026-03-25 19:52:50 +01:00
Jonathan Brouwer 0cd8de3843 Rollup merge of #153049 - Darksonn:kasan-sw-tags, r=fmease
Add `-Zsanitize=kernel-hwaddress`

The Linux kernel has a config option called `CONFIG_KASAN_SW_TAGS`  that enables `-fsanitize=kernel-hwaddress`. This is not supported by Rust.

One slightly awkward detail is that `#[sanitize(address = "off")]` applies to both `-Zsanitize=address` and `-Zsanitize=kernel-address`. Probably it was done this way because both are the same LLVM pass. I replicated this logic here for hwaddress, but it might be undesirable.

Note that `#[sanitize(kernel_hwaddress = "off")]` could be supported as an annotation on statics, but since it's also missing for `#[sanitize(hwaddress = "off")]`, I did not add it.

MCP: https://github.com/rust-lang/compiler-team/issues/975
Tracking issue: https://github.com/rust-lang/rust/issues/154171

cc @rcvalle @maurer @ojeda
2026-03-25 19:52:49 +01:00
Scott Young 9677d7a587 debuginfo: emit DW_TAG_call_site entries 2026-03-22 08:42:21 -04:00
sgasho e7c268f883 Link LLVM dynamically on aarch64-apple-darwin 2026-03-22 16:06:31 +09:00
Alice Ryhl a197752e88 Add kernel-hwaddress sanitizer
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
2026-03-17 20:23:59 +00:00
Flakebi 8e932ed79c Use convergent attribute to funcs for GPU targets
On targets with convergent operations, we need to add the convergent
attribute to all functions that run convergent operations. Following
clang, we can conservatively apply the attribute to all functions when
compiling for such a target and rely on LLVM optimizing away the
attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.
2026-03-17 10:51:31 +01:00
Ralf Jung c7220f423b rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering 2026-03-15 14:53:00 +01:00
Josh Stone 52dfa94cdc Update the minimum external LLVM to 21 2026-03-12 16:45:42 -07:00
Stuart Cook cc0a60fd74 Rollup merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=cuviper
Always use the ThinLTO pipeline for pre-link optimizations

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

[^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
2026-03-08 14:01:35 +11:00
Stuart Cook 4400f2f835 Rollup merge of #153202 - dpaoliello:arm64unwind, r=cuviper
[win] Fix truncated unwinds for Arm64 Windows

Panic backtraces on ARM64 Windows are truncated because Rust's LLVM configuration sets `NoTrapAfterNoreturn = true`, which suppresses the generation of `brk #0x1` (trap) instructions after calls to `noreturn` functions. Without this trap instruction, the return address from a `noreturn` call points past the end of the calling function into an unrelated function, causing `RtlLookupFunctionEntry` to return the wrong unwind information, which terminates the stack walk prematurely.

In general, `NoTrapAfterNoreturn = true` is recommended against for Windows, since we have seen security vulnerabilities in the past where an attacker has managed to return from a noreturn function, or the function wasn't actually noereturn, resulting in executing whatever was after the call.

This change disables setting `NoTrapAfterNoreturn = true` for Windows.

Fixes rust-lang/rust#140489
2026-03-08 14:01:34 +11:00
bjorn3 71a31b30d9 Always use the ThinLTO pipeline for pre-link optimizations
When using cargo this was already effectively done for all dependencies
as cargo passes -Clinker-plugin-lto without -Clto=fat/thin.
-Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO
pre-link pipeline is faster than the fat LTO one. And according to the
benchmarks in [1] there is barely any runtime performance difference
between executables that used fat LTO with the fat vs ThinLTO pre-link
pipeline.

[1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
2026-03-05 17:40:58 +00:00
Alan Egerton f352e743b1 Use shlex instead of shell-words 2026-03-05 10:59:18 +00:00
Augie Fackler cbc711ea01 rustc_llvm: add missing - to flag-comparison logic
The build script here wants to sniff for `-stdlib=libc++` but was
missing the leading dashes. We caught this on the Rust/LLVM HEADs
builder which also uses libc++.
2026-03-04 15:48:08 -05:00
Jonathan Brouwer 6a44bbd91b Rollup merge of #152712 - eggyal:quote-lib-paths, r=ChrisDenton
Use shell-words to parse output from llvm-config

llvm-config might output paths that contain spaces, in which case the naive approach of splitting on whitespace breaks; instead we ask llvm-config to quote any paths and use the [shell-words](https://crates.io/crates/shell-words) crate by @tmiasko (a new dependency) to parse the output.

r? ChrisDenton
Fixes rust-lang/rust#152707
2026-03-03 19:11:48 +01:00
Daniel Paoliello 614bac581b [win] Fix truncated unwinds for Arm64 Windows 2026-02-27 14:53:09 -08:00
bjorn3 474a7168ab Remove explicit EmitThinLTOSummary argument
In favor of passing a NULL ThinLTOSummaryBufferRef. And improve type
improve type safety on the Rust side.
2026-02-21 11:47:45 +00:00
bjorn3 a086b3617e Remove ModuleBuffer ThinBuffer duplication 2026-02-21 11:47:45 +00:00
bjorn3 a5372d1dba Replace LLVMRustThinLTOBuffer with separate LLVMRustBuffers for bitcode and summary 2026-02-21 11:47:45 +00:00
bjorn3 8b2c10ff82 Replace LLVMRustModuleBuffer with generic LLVMRustBuffer 2026-02-21 11:47:45 +00:00
bjorn3 c51cd0e691 Deduplicate some code in LLVMRustOptimize 2026-02-20 12:19:41 +00:00
bjorn3 6366a698e3 Remove -Zemit-thin-lto flag
As far as I can tell it was introduced to allow fat LTO with
-Clinker-plugin-lto. Later a change was made to automatically disable
ThinLTO summary generation when -Clinker-plugin-lto -Clto=fat is used,
so we can safely remove it.
2026-02-20 12:19:41 +00:00
Alan Egerton c30e20a049 Use shell-words to parse output from llvm-config
llvm-config might output paths that contain spaces, in which case the
naive approach of splitting on whitespace breaks; instead we ask
llvm-config to quote any paths and use the shell-words crate to parse
the output.
2026-02-16 23:57:17 +00:00
Manuel Drehwald c89a89bb14 Fix multi-cgu+debug builds using autodiff by delaying autodiff till lto 2026-02-11 14:08:56 -05:00
Jonathan Brouwer dec8d6ebcf Rollup merge of #150780 - fzakaria:fzakaria/section-threshold, r=jackh726
Add -Z large-data-threshold

This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.
2026-01-23 11:07:55 +01:00
Matthew Maurer b639b0a4d8 llvm: Tolerate dead_on_return attribute changes
The attribute now has a size parameter and sorts differently:
* Explicitly omit size parameter during construction on 23+
* Tolerate alternate sorting in tests

https://github.com/llvm/llvm-project/pull/171712
2026-01-21 23:39:03 +00:00
Nikita Popov 0be66603ac Avoid passing addrspacecast to lifetime intrinsics
Since LLVM 22 the alloca must be passed directly. Do this by
stripping the addrspacecast if it exists.
2026-01-20 14:47:04 +01:00
Marcelo Domínguez 307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Farid Zakaria 93f2e80f4a Add -Z large-data-threshold
This flag allows specifying the threshold size for placing static data
in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses
RIP-relative addressing (32-bit offsets), while larger data uses
absolute 64-bit addressing. This allows the compiler to generate more
efficient code for smaller data while still supporting data larger than
2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang.
The default threshold is 65536 bytes (64KB) if not specified, matching
LLVM's default behavior.
2026-01-07 11:57:48 -08:00
Jonathan Brouwer d898dccc21 Rollup merge of #150511 - Sa4dUs:offload-inline, r=ZuseZ4
Allow inline calls to offload intrinsic

Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block.

r? `@ZuseZ4`

fixes: https://github.com/rust-lang/rust/issues/150413
2025-12-31 14:30:48 +01:00
Marcelo Domínguez 9d8b4cc70d Restore builder at the end of saved bb 2025-12-31 13:10:29 +01:00
Jonathan Brouwer 122f02ad02 Rollup merge of #150394 - DKLoehr:passplugin, r=nikic
Accommodate LLVM PassPlugin rename

LLVM [recently moved](https://github.com/llvm/llvm-project/pull/173279) their `PassPlugin` files to a new folder. This PR updates our `PassWrapper` to point to the new location.
2025-12-29 17:17:56 +01:00
dianqk fe075ad212 Removes the serde dependency in rustc_codegen_llvm 2025-12-28 15:52:20 +08:00
Devon Loehr 634251cba8 Accommodate upstream PassPlugin rename 2025-12-26 15:40:40 +00:00
Manuel Drehwald dfef2e96fe Remove the need to call clang for std::offload usages 2025-12-23 05:20:07 -08:00
sgasho ddd5aad8a3 feat: dlopen Enzyme 2025-12-16 00:31:32 +09:00
Alina Sbirlea ad73972e99 Fix for LLVM22 making lowering decisions dependent on RuntimeLibraryInfo.
LLVM reference commit:
https://github.com/llvm/llvm-project/commit/04c81a99735c04b2018eeb687e74f9860e1d0e1b.
2025-12-04 20:23:00 +00:00
Stuart Cook 2b150f2c65 Rollup merge of #147936 - Sa4dUs:offload-intrinsic, r=ZuseZ4
Offload intrinsic

This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata.

- `offload(kernel, (..args))`: an intrinsic that generates the necessary host-side LLVM-IR code.
- `rustc_offload_kernel`: a builtin attribute that marks device kernels to be handled appropriately.

Example usage (pseudocode):
```rust
fn kernel(x: *mut [f64; 128]) {
    core::intrinsics::offload(kernel_1, (x,))
}

#[cfg(target_os = "linux")]
extern "C" {
    pub fn kernel_1(array_b: *mut [f64; 128]);
}

#[cfg(not(target_os = "linux"))]
#[rustc_offload_kernel]
extern "gpu-kernel" fn kernel_1(x: *mut [f64; 128]) {
    unsafe { (*x)[0] = 21.0 };
}
```
2025-11-26 23:32:03 +11:00
Marcelo Domínguez 5128ce10a0 Implement offload intrinsic 2025-11-25 20:04:27 +01:00
Manuel Drehwald 5fbe5dae42 Only try to link against offload functions if llvm.enzyme is enabled 2025-11-23 00:19:53 -08:00
Manuel Drehwald 89d50591c0 Replace the first of 4 binary invocations for offload 2025-11-21 02:41:17 -08:00