Commit Graph

610 Commits

Author SHA1 Message Date
Stephen 8477a690b1 add support for Zprint-codegen-stats-json 2026-05-11 16:28:51 -04:00
Devon Loehr 7f697083d6 Fix formatting 2026-05-05 14:34:16 +00:00
Devon Loehr 412090247c Narrow definitions 2026-05-05 14:18:59 +00:00
Devon Loehr 596d9853bb Adjust getMCSubtargetInfo signature for LLVM 23+ 2026-05-04 18:30:23 +00:00
Jonathan Brouwer c6912bf401 Rollup merge of #155692 - fneddy:fix_naked-dead-code-elimination, r=folkertdev
disable naked-dead-code-elimination test if no RET mnemonic is available

this test emit x86_64 specific ret asm instruction and should not be compiled on any other arch.
2026-04-28 20:24:33 +02:00
Eddy (Eduard) Stefes 2a8e588c90 Add --print=backend-has-mnemonic and needs-asm-mnemonic directive
Add infrastructure to query LLVM backend for specific assembly mnemonics
and use it in compiletest to conditionally run tests based on instruction
availability.

This fixes test failures with naked-dead-code-elimination which requires
the `RET` mnemonic.

Co-authored-by: Folkert de Vries <flokkievids@gmail.com>
2026-04-28 10:21:15 +02:00
Jonathan Brouwer dde4886801 Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst
Add intrinsic for launch-sized workgroup memory on GPUs

Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.

Tracking issue: rust-lang/rust#135516
2026-04-25 23:07:48 +02:00
Flakebi 13ec3de673 Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
2026-04-24 10:03:45 +02:00
Augie Fackler f6b8f0b6f1 rustc_llvm: update opt-level handling for LLVM 23
LLVM 23 removed Os and Oz optimization pipelines and the PR says to use
O2 with optsize or minsize instead as appropriate.
2026-04-22 13:25:35 -04:00
sayantn a5372be2a1 Add target arch verification for LLVM intrinsics 2026-04-12 23:33:27 +05:30
sayantn c21f4ee437 Check for AutoUpgraded intrinsics, and lint on uses of deprecated intrinsics 2026-04-12 23:33:15 +05:30
Jonathan Brouwer 66a00ba2ef Rollup merge of #153995 - Flakebi:gpu-use-convergent, r=nnethercote
Use convergent attribute to funcs for GPU targets

On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.

cc @kjetilkjeka, @kulst for nvptx
cc @ZuseZ4

r? @nnethercote, as you already reviewed this in the other PR

Split out from rust-lang/rust#149637, the part here should be uncontroversial.
2026-04-08 14:21:57 +02:00
David Wood a24ee0329e cg_llvm/debuginfo: scalable vectors
Generate debuginfo for scalable vectors, following the structure that
Clang generates for scalable vectors.
2026-04-03 10:37:42 +00:00
Jakub Beránek 8b44562bc8 Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009"
This reverts commit 2f1603077b, reversing
changes made to 6e3c17424d.
2026-03-27 20:08:24 +01:00
Jonathan Brouwer 2f1603077b Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009
debuginfo: emit DW_TAG_call_site entries

Set `FlagAllCallsDescribed` on function definition DIEs so LLVM emits DW_TAG_call_site entries, letting debuggers and analysis tools track tail calls.
2026-03-25 19:52:50 +01:00
Scott Young 9677d7a587 debuginfo: emit DW_TAG_call_site entries 2026-03-22 08:42:21 -04:00
Alice Ryhl a197752e88 Add kernel-hwaddress sanitizer
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
2026-03-17 20:23:59 +00:00
Flakebi 8e932ed79c Use convergent attribute to funcs for GPU targets
On targets with convergent operations, we need to add the convergent
attribute to all functions that run convergent operations. Following
clang, we can conservatively apply the attribute to all functions when
compiling for such a target and rely on LLVM optimizing away the
attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.
2026-03-17 10:51:31 +01:00
Ralf Jung c7220f423b rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering 2026-03-15 14:53:00 +01:00
Josh Stone 52dfa94cdc Update the minimum external LLVM to 21 2026-03-12 16:45:42 -07:00
Stuart Cook cc0a60fd74 Rollup merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=cuviper
Always use the ThinLTO pipeline for pre-link optimizations

When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline.

This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it.

[^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
2026-03-08 14:01:35 +11:00
bjorn3 71a31b30d9 Always use the ThinLTO pipeline for pre-link optimizations
When using cargo this was already effectively done for all dependencies
as cargo passes -Clinker-plugin-lto without -Clto=fat/thin.
-Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO
pre-link pipeline is faster than the fat LTO one. And according to the
benchmarks in [1] there is barely any runtime performance difference
between executables that used fat LTO with the fat vs ThinLTO pre-link
pipeline.

[1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
2026-03-05 17:40:58 +00:00
Daniel Paoliello 614bac581b [win] Fix truncated unwinds for Arm64 Windows 2026-02-27 14:53:09 -08:00
bjorn3 474a7168ab Remove explicit EmitThinLTOSummary argument
In favor of passing a NULL ThinLTOSummaryBufferRef. And improve type
improve type safety on the Rust side.
2026-02-21 11:47:45 +00:00
bjorn3 a086b3617e Remove ModuleBuffer ThinBuffer duplication 2026-02-21 11:47:45 +00:00
bjorn3 a5372d1dba Replace LLVMRustThinLTOBuffer with separate LLVMRustBuffers for bitcode and summary 2026-02-21 11:47:45 +00:00
bjorn3 8b2c10ff82 Replace LLVMRustModuleBuffer with generic LLVMRustBuffer 2026-02-21 11:47:45 +00:00
bjorn3 c51cd0e691 Deduplicate some code in LLVMRustOptimize 2026-02-20 12:19:41 +00:00
bjorn3 6366a698e3 Remove -Zemit-thin-lto flag
As far as I can tell it was introduced to allow fat LTO with
-Clinker-plugin-lto. Later a change was made to automatically disable
ThinLTO summary generation when -Clinker-plugin-lto -Clto=fat is used,
so we can safely remove it.
2026-02-20 12:19:41 +00:00
Manuel Drehwald c89a89bb14 Fix multi-cgu+debug builds using autodiff by delaying autodiff till lto 2026-02-11 14:08:56 -05:00
Jonathan Brouwer dec8d6ebcf Rollup merge of #150780 - fzakaria:fzakaria/section-threshold, r=jackh726
Add -Z large-data-threshold

This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.
2026-01-23 11:07:55 +01:00
Matthew Maurer b639b0a4d8 llvm: Tolerate dead_on_return attribute changes
The attribute now has a size parameter and sorts differently:
* Explicitly omit size parameter during construction on 23+
* Tolerate alternate sorting in tests

https://github.com/llvm/llvm-project/pull/171712
2026-01-21 23:39:03 +00:00
Nikita Popov 0be66603ac Avoid passing addrspacecast to lifetime intrinsics
Since LLVM 22 the alloca must be passed directly. Do this by
stripping the addrspacecast if it exists.
2026-01-20 14:47:04 +01:00
Marcelo Domínguez 307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Farid Zakaria 93f2e80f4a Add -Z large-data-threshold
This flag allows specifying the threshold size for placing static data
in large data sections when using the medium code model on x86-64.

When using -Ccode-model=medium, data smaller than this threshold uses
RIP-relative addressing (32-bit offsets), while larger data uses
absolute 64-bit addressing. This allows the compiler to generate more
efficient code for smaller data while still supporting data larger than
2GB.

This mirrors the -mlarge-data-threshold flag available in GCC and Clang.
The default threshold is 65536 bytes (64KB) if not specified, matching
LLVM's default behavior.
2026-01-07 11:57:48 -08:00
Jonathan Brouwer d898dccc21 Rollup merge of #150511 - Sa4dUs:offload-inline, r=ZuseZ4
Allow inline calls to offload intrinsic

Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block.

r? `@ZuseZ4`

fixes: https://github.com/rust-lang/rust/issues/150413
2025-12-31 14:30:48 +01:00
Marcelo Domínguez 9d8b4cc70d Restore builder at the end of saved bb 2025-12-31 13:10:29 +01:00
Jonathan Brouwer 122f02ad02 Rollup merge of #150394 - DKLoehr:passplugin, r=nikic
Accommodate LLVM PassPlugin rename

LLVM [recently moved](https://github.com/llvm/llvm-project/pull/173279) their `PassPlugin` files to a new folder. This PR updates our `PassWrapper` to point to the new location.
2025-12-29 17:17:56 +01:00
dianqk fe075ad212 Removes the serde dependency in rustc_codegen_llvm 2025-12-28 15:52:20 +08:00
Devon Loehr 634251cba8 Accommodate upstream PassPlugin rename 2025-12-26 15:40:40 +00:00
Manuel Drehwald dfef2e96fe Remove the need to call clang for std::offload usages 2025-12-23 05:20:07 -08:00
sgasho ddd5aad8a3 feat: dlopen Enzyme 2025-12-16 00:31:32 +09:00
Alina Sbirlea ad73972e99 Fix for LLVM22 making lowering decisions dependent on RuntimeLibraryInfo.
LLVM reference commit:
https://github.com/llvm/llvm-project/commit/04c81a99735c04b2018eeb687e74f9860e1d0e1b.
2025-12-04 20:23:00 +00:00
Stuart Cook 2b150f2c65 Rollup merge of #147936 - Sa4dUs:offload-intrinsic, r=ZuseZ4
Offload intrinsic

This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata.

- `offload(kernel, (..args))`: an intrinsic that generates the necessary host-side LLVM-IR code.
- `rustc_offload_kernel`: a builtin attribute that marks device kernels to be handled appropriately.

Example usage (pseudocode):
```rust
fn kernel(x: *mut [f64; 128]) {
    core::intrinsics::offload(kernel_1, (x,))
}

#[cfg(target_os = "linux")]
extern "C" {
    pub fn kernel_1(array_b: *mut [f64; 128]);
}

#[cfg(not(target_os = "linux"))]
#[rustc_offload_kernel]
extern "gpu-kernel" fn kernel_1(x: *mut [f64; 128]) {
    unsafe { (*x)[0] = 21.0 };
}
```
2025-11-26 23:32:03 +11:00
Marcelo Domínguez 5128ce10a0 Implement offload intrinsic 2025-11-25 20:04:27 +01:00
Manuel Drehwald 5fbe5dae42 Only try to link against offload functions if llvm.enzyme is enabled 2025-11-23 00:19:53 -08:00
Manuel Drehwald 89d50591c0 Replace the first of 4 binary invocations for offload 2025-11-21 02:41:17 -08:00
Quinn Okabayashi c7e50d0f37 Remove unused LLVMModuleRef argument 2025-11-12 15:46:08 +00:00
bors 87f9dcd5e2 Auto merge of #147935 - luca3s:add-rtsan, r=petrochenkov
Add LLVM realtime sanitizer

This is a new attempt at adding the [LLVM real-time sanitizer](https://clang.llvm.org/docs/RealtimeSanitizer.html) to rust.

Previously this was attempted in https://github.com/rust-lang/rfcs/pull/3766.

Since then the `sanitize` attribute was introduced in https://github.com/rust-lang/rust/pull/142681 and it is a lot more flexible than the old `no_santize` attribute. This allows adding real-time sanitizer without the need for a new attribute, like it was proposed in the RFC. Because i only add a new value to a existing command line flag and to a attribute i don't think an MCP is necessary.

Currently real-time santizer is usable in rust code with the [rtsan-standalone](https://crates.io/crates/rtsan-standalone) crate. This downloads or builds the sanitizer runtime and then links it into the rust binary.

The first commit adds support for more detailed sanitizer information.
The second commit then actually adds real-time sanitizer.
The third adds a warning against using real-time sanitizer with async functions, cloures and blocks because it doesn't behave as expected when used with async functions. I am not sure if this is actually wanted, so i kept it in a seperate commit.
The fourth commit adds the documentation for real-time sanitizer.
2025-11-08 12:24:15 +00:00
Lucas Baumann d198633b95 add realtime sanitizer 2025-11-06 13:20:12 +01:00