Commit Graph

299 Commits

Author SHA1 Message Date
Flakebi 13ec3de673 Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.

# Interface

With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.

It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).

All calls to this intrinsic return a pointer to the same address.

See the intrinsic documentation for more details.

## Alternative Interfaces

It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.

# Implementation Details

Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.

There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
2026-04-24 10:03:45 +02:00
sayantn a5372be2a1 Add target arch verification for LLVM intrinsics 2026-04-12 23:33:27 +05:30
sayantn c21f4ee437 Check for AutoUpgraded intrinsics, and lint on uses of deprecated intrinsics 2026-04-12 23:33:15 +05:30
Jonathan Brouwer 66a00ba2ef Rollup merge of #153995 - Flakebi:gpu-use-convergent, r=nnethercote
Use convergent attribute to funcs for GPU targets

On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.

cc @kjetilkjeka, @kulst for nvptx
cc @ZuseZ4

r? @nnethercote, as you already reviewed this in the other PR

Split out from rust-lang/rust#149637, the part here should be uncontroversial.
2026-04-08 14:21:57 +02:00
David Wood a24ee0329e cg_llvm/debuginfo: scalable vectors
Generate debuginfo for scalable vectors, following the structure that
Clang generates for scalable vectors.
2026-04-03 10:37:42 +00:00
Jakub Beránek 8b44562bc8 Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009"
This reverts commit 2f1603077b, reversing
changes made to 6e3c17424d.
2026-03-27 20:08:24 +01:00
Scott Young 9677d7a587 debuginfo: emit DW_TAG_call_site entries 2026-03-22 08:42:21 -04:00
Flakebi 8e932ed79c Use convergent attribute to funcs for GPU targets
On targets with convergent operations, we need to add the convergent
attribute to all functions that run convergent operations. Following
clang, we can conservatively apply the attribute to all functions when
compiling for such a target and rely on LLVM optimizing away the
attribute in cases where it is not necessary.

This affects the amdgpu and nvptx targets.
2026-03-17 10:51:31 +01:00
Ralf Jung c7220f423b rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering 2026-03-15 14:53:00 +01:00
Josh Stone 52dfa94cdc Update the minimum external LLVM to 21 2026-03-12 16:45:42 -07:00
bjorn3 a086b3617e Remove ModuleBuffer ThinBuffer duplication 2026-02-21 11:47:45 +00:00
bjorn3 8b2c10ff82 Replace LLVMRustModuleBuffer with generic LLVMRustBuffer 2026-02-21 11:47:45 +00:00
Matthew Maurer b639b0a4d8 llvm: Tolerate dead_on_return attribute changes
The attribute now has a size parameter and sorts differently:
* Explicitly omit size parameter during construction on 23+
* Tolerate alternate sorting in tests

https://github.com/llvm/llvm-project/pull/171712
2026-01-21 23:39:03 +00:00
Nikita Popov 0be66603ac Avoid passing addrspacecast to lifetime intrinsics
Since LLVM 22 the alloca must be passed directly. Do this by
stripping the addrspacecast if it exists.
2026-01-20 14:47:04 +01:00
Marcelo Domínguez 307a4fcdf8 Add scalar support for both host and device 2026-01-19 22:28:42 +01:00
Jonathan Brouwer d898dccc21 Rollup merge of #150511 - Sa4dUs:offload-inline, r=ZuseZ4
Allow inline calls to offload intrinsic

Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block.

r? `@ZuseZ4`

fixes: https://github.com/rust-lang/rust/issues/150413
2025-12-31 14:30:48 +01:00
Marcelo Domínguez 9d8b4cc70d Restore builder at the end of saved bb 2025-12-31 13:10:29 +01:00
dianqk fe075ad212 Removes the serde dependency in rustc_codegen_llvm 2025-12-28 15:52:20 +08:00
Manuel Drehwald dfef2e96fe Remove the need to call clang for std::offload usages 2025-12-23 05:20:07 -08:00
sgasho ddd5aad8a3 feat: dlopen Enzyme 2025-12-16 00:31:32 +09:00
Stuart Cook 2b150f2c65 Rollup merge of #147936 - Sa4dUs:offload-intrinsic, r=ZuseZ4
Offload intrinsic

This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata.

- `offload(kernel, (..args))`: an intrinsic that generates the necessary host-side LLVM-IR code.
- `rustc_offload_kernel`: a builtin attribute that marks device kernels to be handled appropriately.

Example usage (pseudocode):
```rust
fn kernel(x: *mut [f64; 128]) {
    core::intrinsics::offload(kernel_1, (x,))
}

#[cfg(target_os = "linux")]
extern "C" {
    pub fn kernel_1(array_b: *mut [f64; 128]);
}

#[cfg(not(target_os = "linux"))]
#[rustc_offload_kernel]
extern "gpu-kernel" fn kernel_1(x: *mut [f64; 128]) {
    unsafe { (*x)[0] = 21.0 };
}
```
2025-11-26 23:32:03 +11:00
Marcelo Domínguez 5128ce10a0 Implement offload intrinsic 2025-11-25 20:04:27 +01:00
Manuel Drehwald 5fbe5dae42 Only try to link against offload functions if llvm.enzyme is enabled 2025-11-23 00:19:53 -08:00
Manuel Drehwald 89d50591c0 Replace the first of 4 binary invocations for offload 2025-11-21 02:41:17 -08:00
Quinn Okabayashi c7e50d0f37 Remove unused LLVMModuleRef argument 2025-11-12 15:46:08 +00:00
bors 87f9dcd5e2 Auto merge of #147935 - luca3s:add-rtsan, r=petrochenkov
Add LLVM realtime sanitizer

This is a new attempt at adding the [LLVM real-time sanitizer](https://clang.llvm.org/docs/RealtimeSanitizer.html) to rust.

Previously this was attempted in https://github.com/rust-lang/rfcs/pull/3766.

Since then the `sanitize` attribute was introduced in https://github.com/rust-lang/rust/pull/142681 and it is a lot more flexible than the old `no_santize` attribute. This allows adding real-time sanitizer without the need for a new attribute, like it was proposed in the RFC. Because i only add a new value to a existing command line flag and to a attribute i don't think an MCP is necessary.

Currently real-time santizer is usable in rust code with the [rtsan-standalone](https://crates.io/crates/rtsan-standalone) crate. This downloads or builds the sanitizer runtime and then links it into the rust binary.

The first commit adds support for more detailed sanitizer information.
The second commit then actually adds real-time sanitizer.
The third adds a warning against using real-time sanitizer with async functions, cloures and blocks because it doesn't behave as expected when used with async functions. I am not sure if this is actually wanted, so i kept it in a seperate commit.
The fourth commit adds the documentation for real-time sanitizer.
2025-11-08 12:24:15 +00:00
Lucas Baumann d198633b95 add realtime sanitizer 2025-11-06 13:20:12 +01:00
Manuel Drehwald 360b38cceb Fix device code generation, to account for an implicit dyn_ptr argument. 2025-11-06 03:34:38 -05:00
Matthias Krüger 3d671c0d54 Rollup merge of #148103 - Zalathar:compression, r=wesleywiser
cg_llvm: Pass `debuginfo_compression` through FFI as an enum

There are only three possible values, making an enum more appropriate.

This avoids string allocation on the Rust side, and avoids ad-hoc `!strcmp` to convert back to an enum on the C++ side.
2025-10-31 18:41:51 +01:00
Tomasz Miąsko 2a03a948b9 Deduce captures(none) for a return place and parameters
Extend attribute deduction to determine whether parameters using
indirect pass mode might have their address captured. Similarly to
the deduction of `readonly` attribute this information facilitates
memcpy optimizations.
2025-10-25 22:53:52 +02:00
Zalathar 73b734bf63 Pass debuginfo_compression through FFI as an enum 2025-10-25 23:58:19 +11:00
Guillaume Gomez 3938f42bb1 Rollup merge of #147608 - Zalathar:debuginfo, r=nnethercote
cg_llvm: Use `LLVMDIBuilderCreateGlobalVariableExpression`

- Part of rust-lang/rust#134001
- Follow-up to rust-lang/rust#146763

---

This PR dismantles the somewhat complicated `LLVMRustDIBuilderCreateStaticVariable` function, and replaces it with equivalent calls to `LLVMDIBuilderCreateGlobalVariableExpression` and `LLVMGlobalSetMetadata`.

A key difference is that the new code does not replicate the attempted downcast of `InitVal`. As far as I can tell, those downcasts were actually dead, because `llvm::ConstantInt` and `llvm::ConstantFP` are not subclasses of `llvm::GlobalVariable`. I tried replacing those code paths with fatal errors, and was unable to induce failure in any of the relevant test suites I ran.

I have also confirmed that if the calls to `create_static_variable` are commented out, debuginfo tests will fail, demonstrating some amount of relevant test coverage.

The new `DIBuilder` methods have been added via an extension trait, not as inherent methods, to avoid impeding rust-lang/rust#142897.
2025-10-13 11:25:23 +02:00
Zalathar 1081d98551 Use LLVMDIBuilderCreateGlobalVariableExpression
Note that the code in `LLVMRustDIBuilderCreateStaticVariable` that tried to
downcast `InitVal` appears to have been dead, because `llvm::ConstantInt` and
`llvm::ConstantFP` are not subclasses of `llvm::GlobalVariable`.
2025-10-12 23:36:26 +11:00
AMS21 0abecda9ed Replace LLVMRustContextCreate with normal LLVM-C API calls
Since `LLVMRustContextCreate` can easily be replaced with a call to
`LLVMContextCreate` and `LLVMContextSetDiscardValueNames`.
2025-10-10 15:45:40 +02:00
bors 4b57d8154a Auto merge of #147519 - Zalathar:rollup-o5f16uo, r=Zalathar
Rollup of 3 pull requests

Successful merges:

 - rust-lang/rust#147446 (PassWrapper: use non-deprecated lookupTarget method)
 - rust-lang/rust#147473 (Do `x check` on various bootstrap tools in CI)
 - rust-lang/rust#147509 (remove intrinsic wrapper functions from LLVM bindings)

r? `@ghost`
`@rustbot` modify labels: rollup
2025-10-09 10:54:43 +00:00
Stuart Cook 4dfd977c8b Rollup merge of #147488 - AMS21:remove_llvm_rust_insert_private_global, r=nikic
refactor: Remove `LLVMRustInsertPrivateGlobal` and `define_private_global`

Since it can easily be implemented using the existing LLVM C API in
terms of `LLVMAddGlobal` and `LLVMSetLinkage` and `define_private_global`
was only used in one place.

Work towards https://github.com/rust-lang/rust/issues/46437
2025-10-09 18:43:26 +11:00
AMS21 064e3b8212 remove intrinsic wrapper functions from LLVM bindings 2025-10-09 09:26:44 +02:00
AMS21 036ab3a925 refactor: Remove LLVMRustInsertPrivateGlobal and define_private_global
Since it can easily be implemented using the existing LLVM C API in
terms of `LLVMAddGlobal` and `LLVMSetLinkage` and `define_private_global`
was only used in one place.
2025-10-08 21:59:48 +02:00
AMS21 1aed495ed7 refactor: replace LLVMRustAtomicLoad/Store with LLVM built-in functions 2025-10-08 13:53:09 +02:00
dianqk 1bd89bd42e codegen: Generate dbg_value for the ref statement 2025-10-02 14:55:51 +08:00
Zalathar 906bf49ade Declare all "fixed" metadata kinds as MetadataKindId 2025-09-30 20:10:10 +10:00
Matthias Krüger c29fb2e57e Rollup merge of #144197 - KMJ-007:type-tree, r=ZuseZ4
TypeTree support in autodiff

# TypeTrees for Autodiff

## What are TypeTrees?
Memory layout descriptors for Enzyme. Tell Enzyme exactly how types are structured in memory so it can compute derivatives efficiently.

## Structure
```rust
TypeTree(Vec<Type>)

Type {
    offset: isize,  // byte offset (-1 = everywhere)
    size: usize,    // size in bytes
    kind: Kind,     // Float, Integer, Pointer, etc.
    child: TypeTree // nested structure
}
```

## Example: `fn compute(x: &f32, data: &[f32]) -> f32`

**Input 0: `x: &f32`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, size: 4, kind: Float,
        child: TypeTree::new()
    }])
}])
```

**Input 1: `data: &[f32]`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, size: 4, kind: Float,  // -1 = all elements
        child: TypeTree::new()
    }])
}])
```

**Output: `f32`**
```rust
TypeTree(vec![Type {
    offset: -1, size: 4, kind: Float,
    child: TypeTree::new()
}])
```

## Why Needed?
- Enzyme can't deduce complex type layouts from LLVM IR
- Prevents slow memory pattern analysis
- Enables correct derivative computation for nested structures
- Tells Enzyme which bytes are differentiable vs metadata

## What Enzyme Does With This Information:

Without TypeTrees (current state):
```llvm
; Enzyme sees generic LLVM IR:
define float ``@distance(ptr*`` %p1, ptr* %p2) {
; Has to guess what these pointers point to
; Slow analysis of all memory operations
; May miss optimization opportunities
}
```

With TypeTrees (our implementation):
```llvm
define "enzyme_type"="{[]:Float@float}" float ``@distance(``
    ptr "enzyme_type"="{[]:Pointer}" %p1,
    ptr "enzyme_type"="{[]:Pointer}" %p2
) {
; Enzyme knows exact type layout
; Can generate efficient derivative code directly
}
```

# TypeTrees - Offset and -1 Explained

## Type Structure

```rust
Type {
    offset: isize, // WHERE this type starts
    size: usize,   // HOW BIG this type is
    kind: Kind,    // WHAT KIND of data (Float, Int, Pointer)
    child: TypeTree // WHAT'S INSIDE (for pointers/containers)
}
```

## Offset Values

### Regular Offset (0, 4, 8, etc.)
**Specific byte position within a structure**

```rust
struct Point {
    x: f32, // offset 0, size 4
    y: f32, // offset 4, size 4
    id: i32, // offset 8, size 4
}
```

TypeTree for `&Point` (internal representation):
```rust
TypeTree(vec![
    Type { offset: 0, size: 4, kind: Float },   // x at byte 0
    Type { offset: 4, size: 4, kind: Float },   // y at byte 4
    Type { offset: 8, size: 4, kind: Integer }  // id at byte 8
])
```

Generates LLVM:
```llvm
"enzyme_type"="{[]:Float@float}"
```

### Offset -1 (Special: "Everywhere")
**Means "this pattern repeats for ALL elements"**

#### Example 1: Array `[f32; 100]`
```rust
TypeTree(vec![Type {
    offset: -1, // ALL positions
    size: 4,    // each f32 is 4 bytes
    kind: Float, // every element is float
}])
```

Instead of listing 100 separate Types with offsets `0,4,8,12...396`

#### Example 2: Slice `&[i32]`
```rust
// Pointer to slice data
TypeTree(vec![Type {
    offset: -1, size: 8, kind: Pointer,
    child: TypeTree(vec![Type {
        offset: -1, // ALL slice elements
        size: 4,    // each i32 is 4 bytes
        kind: Integer
    }])
}])
```

#### Example 3: Mixed Structure
```rust
struct Container {
    header: i64,        // offset 0
    data: [f32; 1000],  // offset 8, but elements use -1
}
```

```rust
TypeTree(vec![
    Type { offset: 0, size: 8, kind: Integer }, // header
    Type { offset: 8, size: 4000, kind: Pointer,
        child: TypeTree(vec![Type {
            offset: -1, size: 4, kind: Float // ALL array elements
        }])
    }
])
```
2025-09-28 18:13:11 +02:00
Matthias Krüger e8578c8808 Rollup merge of #146763 - Zalathar:di-builder, r=jdonszelmann
cg_llvm: Replace some DIBuilder wrappers with LLVM-C API bindings (part 5)

- Part of rust-lang/rust#134001
- Follow-up to rust-lang/rust#146673

---

This is another batch of LLVMDIBuilder binding migrations, replacing some our own LLVMRust bindings with bindings to upstream LLVM-C APIs.

Some of these are a little more complex than most of the previous migrations, because they split one LLVMRust binding into multiple LLVM bindings, but nothing too fancy.

This appears to be the last of the low-hanging fruit. As noted in https://github.com/rust-lang/rust/issues/134001#issuecomment-2524979268, the remaining bindings are difficult or impossible to migrate at present.
2025-09-28 09:15:23 +02:00
Josh Stone fe440ec934 llvm: add a destructor to call releaseSerializer 2025-09-24 16:53:17 -07:00
Augie Fackler 42cf78f762 llvm: update remarks support on LLVM 22
LLVM change dfbd76bda01e removed separate remark support entirely, but
it turns out we can just drop the parameter and everything appears to
work fine.

Fixes 146912 as far as I can tell (the test passes.)

@rustbot label llvm-main
2025-09-23 13:25:04 -04:00
Folkert de Vries 3565b0699d emit attribute for readonly non-pure inline assembly 2025-09-21 21:16:06 +02:00
Zalathar 741e1e2ec7 Remove unused LLVMRustDIBuilder(Create|Dispose)
These should have been removed earlier, when we switched to the corresponding
LLVM-C bindings.
2025-09-20 12:48:48 +10:00
Zalathar e39e5a0d15 Use LLVMDIBuilderCreate(Auto|Parameter)Variable 2025-09-19 20:56:58 +10:00
Zalathar 9daa026cad Use LLVMDIBuilder(CreateExpression|InsertDeclareRecordAtEnd) 2025-09-19 17:15:32 +10:00
Karan Janthe 3ba5f19182 autodiff: typetree recursive depth query from enzyme with fallback
Signed-off-by: Karan Janthe <karanjanthe@gmail.com>
2025-09-19 05:42:27 +00:00