Mirrors/rust - rust - Gitea @ Femelysm.ru

mirror of https://github.com/rust-lang/rust.git synced 2026-04-27 18:57:42 +03:00

Author	SHA1	Message	Date
Jonathan Brouwer	dde4886801	Rollup merge of #146181 - Flakebi:dynamic-shared-memory, r=ZuseZ4,Sa4dus,workingjubilee,RalfJung,nikic,kjetilkjeka,kulst Add intrinsic for launch-sized workgroup memory on GPUs Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now. Tracking issue: rust-lang/rust#135516	2026-04-25 23:07:48 +02:00
Flakebi	13ec3de673	Add intrinsic for launch-sized workgroup memory on GPUs Workgroup memory is a memory region that is shared between all threads in a workgroup on GPUs. Workgroup memory can be allocated statically or after compilation, when launching a gpu-kernel. The intrinsic added here returns the pointer to the memory that is allocated at launch-time. # Interface With this change, workgroup memory can be accessed in Rust by calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T` intrinsic. It returns the pointer to workgroup memory guaranteeing that it is aligned to at least the alignment of `T`. The pointer is dereferencable for the size specified when launching the current gpu-kernel (which may be the size of `T` but can also be larger or smaller or zero). All calls to this intrinsic return a pointer to the same address. See the intrinsic documentation for more details. ## Alternative Interfaces It was also considered to expose dynamic workgroup memory as extern static variables in Rust, like they are represented in LLVM IR. However, due to the pointer not being guaranteed to be dereferencable (that depends on the allocated size at runtime), such a global must be zero-sized, which makes global variables a bad fit. # Implementation Details Workgroup memory in amdgpu and nvptx lives in address space 3. Workgroup memory from a launch is implemented by creating an external global variable in address space 3. The global is declared with size 0, as the actual size is only known at runtime. It is defined behavior in LLVM to access an external global outside the defined size. There is no similar way to get the allocated size of launch-sized workgroup memory on amdgpu an nvptx, so users have to pass this out-of-band or rely on target specific ways for now.	2026-04-24 10:03:45 +02:00
Augie Fackler	f6b8f0b6f1	rustc_llvm: update opt-level handling for LLVM 23 LLVM 23 removed Os and Oz optimization pipelines and the PR says to use O2 with optsize or minsize instead as appropriate.	2026-04-22 13:25:35 -04:00
sayantn	a5372be2a1	Add target arch verification for LLVM intrinsics	2026-04-12 23:33:27 +05:30
sayantn	c21f4ee437	Check for AutoUpgraded intrinsics, and lint on uses of deprecated intrinsics	2026-04-12 23:33:15 +05:30
Jonathan Brouwer	66a00ba2ef	Rollup merge of #153995 - Flakebi:gpu-use-convergent, r=nnethercote Use convergent attribute to funcs for GPU targets On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary. This affects the amdgpu and nvptx targets. cc @kjetilkjeka, @kulst for nvptx cc @ZuseZ4 r? @nnethercote, as you already reviewed this in the other PR Split out from rust-lang/rust#149637, the part here should be uncontroversial.	2026-04-08 14:21:57 +02:00
David Wood	a24ee0329e	cg_llvm/debuginfo: scalable vectors Generate debuginfo for scalable vectors, following the structure that Clang generates for scalable vectors.	2026-04-03 10:37:42 +00:00
bors	fb27476aaf	Auto merge of #154468 - Kobzol:revert-154200, r=dingxiangfei2009 Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009" This reverts commit `2f1603077b`, reversing changes made to `6e3c17424d`. Debugging perf. hit from https://github.com/rust-lang/rust/pull/154384. The binary size hits from https://github.com/rust-lang/rust/pull/154468#issuecomment-4144557076 were due to this PR, not all of the copmile-time hits though.	2026-03-28 16:59:18 +00:00
Manuel Drehwald	a3261a2307	Revert "Link LLVM dynamically on aarch64-apple-darwin" This reverts commit `e7c268f883`.	2026-03-28 05:11:48 +01:00
Jakub Beránek	8b44562bc8	Revert "Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009" This reverts commit `2f1603077b`, reversing changes made to `6e3c17424d`.	2026-03-27 20:08:24 +01:00
Jonathan Brouwer	2f1603077b	Rollup merge of #154200 - resrever:enable-dwarf-call-sites, r=dingxiangfei2009 debuginfo: emit DW_TAG_call_site entries Set `FlagAllCallsDescribed` on function definition DIEs so LLVM emits DW_TAG_call_site entries, letting debuggers and analysis tools track tail calls.	2026-03-25 19:52:50 +01:00
Jonathan Brouwer	0cd8de3843	Rollup merge of #153049 - Darksonn:kasan-sw-tags, r=fmease Add `-Zsanitize=kernel-hwaddress` The Linux kernel has a config option called `CONFIG_KASAN_SW_TAGS` that enables `-fsanitize=kernel-hwaddress`. This is not supported by Rust. One slightly awkward detail is that `#[sanitize(address = "off")]` applies to both `-Zsanitize=address` and `-Zsanitize=kernel-address`. Probably it was done this way because both are the same LLVM pass. I replicated this logic here for hwaddress, but it might be undesirable. Note that `#[sanitize(kernel_hwaddress = "off")]` could be supported as an annotation on statics, but since it's also missing for `#[sanitize(hwaddress = "off")]`, I did not add it. MCP: https://github.com/rust-lang/compiler-team/issues/975 Tracking issue: https://github.com/rust-lang/rust/issues/154171 cc @rcvalle @maurer @ojeda	2026-03-25 19:52:49 +01:00
Scott Young	9677d7a587	debuginfo: emit DW_TAG_call_site entries	2026-03-22 08:42:21 -04:00
sgasho	e7c268f883	Link LLVM dynamically on aarch64-apple-darwin	2026-03-22 16:06:31 +09:00
Alice Ryhl	a197752e88	Add kernel-hwaddress sanitizer Signed-off-by: Alice Ryhl <aliceryhl@google.com>	2026-03-17 20:23:59 +00:00
Flakebi	8e932ed79c	Use convergent attribute to funcs for GPU targets On targets with convergent operations, we need to add the convergent attribute to all functions that run convergent operations. Following clang, we can conservatively apply the attribute to all functions when compiling for such a target and rely on LLVM optimizing away the attribute in cases where it is not necessary. This affects the amdgpu and nvptx targets.	2026-03-17 10:51:31 +01:00
Ralf Jung	c7220f423b	rename min/maxnum intrinsics to min/maximum_number and fix their LLVM lowering	2026-03-15 14:53:00 +01:00
Josh Stone	52dfa94cdc	Update the minimum external LLVM to 21	2026-03-12 16:45:42 -07:00
Stuart Cook	cc0a60fd74	Rollup merge of #153446 - bjorn3:llvm_pre_link_thinlto, r=cuviper Always use the ThinLTO pipeline for pre-link optimizations When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [^1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline. This also helps avoid having yet another code path if we want to support Unified LTO (that is a single bitcode file that supports being used for both fat LTO and ThinLTO when using linker plugin LTO, we already support it when rustc does LTO as ThinLTO bitcode is enough of a superset of fat LTO bitcode that it happens to work by accident if you don't explicitly have a check preventing mixing of them for the current set of LTO features that rustc exposes.) I'm currently still investigating if rustc would benefit from Unified LTO and how exactly to integrate it. [^1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774	2026-03-08 14:01:35 +11:00
Stuart Cook	4400f2f835	Rollup merge of #153202 - dpaoliello:arm64unwind, r=cuviper [win] Fix truncated unwinds for Arm64 Windows Panic backtraces on ARM64 Windows are truncated because Rust's LLVM configuration sets `NoTrapAfterNoreturn = true`, which suppresses the generation of `brk #0x1` (trap) instructions after calls to `noreturn` functions. Without this trap instruction, the return address from a `noreturn` call points past the end of the calling function into an unrelated function, causing `RtlLookupFunctionEntry` to return the wrong unwind information, which terminates the stack walk prematurely. In general, `NoTrapAfterNoreturn = true` is recommended against for Windows, since we have seen security vulnerabilities in the past where an attacker has managed to return from a noreturn function, or the function wasn't actually noereturn, resulting in executing whatever was after the call. This change disables setting `NoTrapAfterNoreturn = true` for Windows. Fixes rust-lang/rust#140489	2026-03-08 14:01:34 +11:00
bjorn3	71a31b30d9	Always use the ThinLTO pipeline for pre-link optimizations When using cargo this was already effectively done for all dependencies as cargo passes -Clinker-plugin-lto without -Clto=fat/thin. -Clinker-plugin-lto assumes that ThinLTO will be used. The ThinLTO pre-link pipeline is faster than the fat LTO one. And according to the benchmarks in [1] there is barely any runtime performance difference between executables that used fat LTO with the fat vs ThinLTO pre-link pipeline. [1]: https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774	2026-03-05 17:40:58 +00:00
Alan Egerton	f352e743b1	Use shlex instead of shell-words	2026-03-05 10:59:18 +00:00
Augie Fackler	cbc711ea01	rustc_llvm: add missing `-` to flag-comparison logic The build script here wants to sniff for `-stdlib=libc++` but was missing the leading dashes. We caught this on the Rust/LLVM HEADs builder which also uses libc++.	2026-03-04 15:48:08 -05:00
Jonathan Brouwer	6a44bbd91b	Rollup merge of #152712 - eggyal:quote-lib-paths, r=ChrisDenton Use shell-words to parse output from llvm-config llvm-config might output paths that contain spaces, in which case the naive approach of splitting on whitespace breaks; instead we ask llvm-config to quote any paths and use the [shell-words](https://crates.io/crates/shell-words) crate by @tmiasko (a new dependency) to parse the output. r? ChrisDenton Fixes rust-lang/rust#152707	2026-03-03 19:11:48 +01:00
Daniel Paoliello	614bac581b	[win] Fix truncated unwinds for Arm64 Windows	2026-02-27 14:53:09 -08:00
bjorn3	474a7168ab	Remove explicit EmitThinLTOSummary argument In favor of passing a NULL ThinLTOSummaryBufferRef. And improve type improve type safety on the Rust side.	2026-02-21 11:47:45 +00:00
bjorn3	a086b3617e	Remove ModuleBuffer ThinBuffer duplication	2026-02-21 11:47:45 +00:00
bjorn3	a5372d1dba	Replace LLVMRustThinLTOBuffer with separate LLVMRustBuffers for bitcode and summary	2026-02-21 11:47:45 +00:00
bjorn3	8b2c10ff82	Replace LLVMRustModuleBuffer with generic LLVMRustBuffer	2026-02-21 11:47:45 +00:00
bjorn3	c51cd0e691	Deduplicate some code in LLVMRustOptimize	2026-02-20 12:19:41 +00:00
bjorn3	6366a698e3	Remove -Zemit-thin-lto flag As far as I can tell it was introduced to allow fat LTO with -Clinker-plugin-lto. Later a change was made to automatically disable ThinLTO summary generation when -Clinker-plugin-lto -Clto=fat is used, so we can safely remove it.	2026-02-20 12:19:41 +00:00
Alan Egerton	c30e20a049	Use shell-words to parse output from llvm-config llvm-config might output paths that contain spaces, in which case the naive approach of splitting on whitespace breaks; instead we ask llvm-config to quote any paths and use the shell-words crate to parse the output.	2026-02-16 23:57:17 +00:00
Manuel Drehwald	c89a89bb14	Fix multi-cgu+debug builds using autodiff by delaying autodiff till lto	2026-02-11 14:08:56 -05:00
Jonathan Brouwer	dec8d6ebcf	Rollup merge of #150780 - fzakaria:fzakaria/section-threshold, r=jackh726 Add -Z large-data-threshold This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64. When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB. This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.	2026-01-23 11:07:55 +01:00
Matthew Maurer	b639b0a4d8	llvm: Tolerate dead_on_return attribute changes The attribute now has a size parameter and sorts differently: * Explicitly omit size parameter during construction on 23+ * Tolerate alternate sorting in tests https://github.com/llvm/llvm-project/pull/171712	2026-01-21 23:39:03 +00:00
Nikita Popov	0be66603ac	Avoid passing addrspacecast to lifetime intrinsics Since LLVM 22 the alloca must be passed directly. Do this by stripping the addrspacecast if it exists.	2026-01-20 14:47:04 +01:00
Marcelo Domínguez	307a4fcdf8	Add scalar support for both host and device	2026-01-19 22:28:42 +01:00
Farid Zakaria	93f2e80f4a	Add -Z large-data-threshold This flag allows specifying the threshold size for placing static data in large data sections when using the medium code model on x86-64. When using -Ccode-model=medium, data smaller than this threshold uses RIP-relative addressing (32-bit offsets), while larger data uses absolute 64-bit addressing. This allows the compiler to generate more efficient code for smaller data while still supporting data larger than 2GB. This mirrors the -mlarge-data-threshold flag available in GCC and Clang. The default threshold is 65536 bytes (64KB) if not specified, matching LLVM's default behavior.	2026-01-07 11:57:48 -08:00
Jonathan Brouwer	d898dccc21	Rollup merge of #150511 - Sa4dUs:offload-inline, r=ZuseZ4 Allow inline calls to offload intrinsic Removes explicit insertion point handling and recovers the pointer at the end of the saved basic block. r? `@ZuseZ4` fixes: https://github.com/rust-lang/rust/issues/150413	2025-12-31 14:30:48 +01:00
Marcelo Domínguez	9d8b4cc70d	Restore builder at the end of saved bb	2025-12-31 13:10:29 +01:00
Jonathan Brouwer	122f02ad02	Rollup merge of #150394 - DKLoehr:passplugin, r=nikic Accommodate LLVM PassPlugin rename LLVM [recently moved](https://github.com/llvm/llvm-project/pull/173279) their `PassPlugin` files to a new folder. This PR updates our `PassWrapper` to point to the new location.	2025-12-29 17:17:56 +01:00
dianqk	fe075ad212	Removes the serde dependency in rustc_codegen_llvm	2025-12-28 15:52:20 +08:00
Devon Loehr	634251cba8	Accommodate upstream PassPlugin rename	2025-12-26 15:40:40 +00:00
Manuel Drehwald	dfef2e96fe	Remove the need to call clang for std::offload usages	2025-12-23 05:20:07 -08:00
sgasho	ddd5aad8a3	feat: dlopen Enzyme	2025-12-16 00:31:32 +09:00
Alina Sbirlea	ad73972e99	Fix for LLVM22 making lowering decisions dependent on RuntimeLibraryInfo. LLVM reference commit: https://github.com/llvm/llvm-project/commit/04c81a99735c04b2018eeb687e74f9860e1d0e1b.	2025-12-04 20:23:00 +00:00
Stuart Cook	2b150f2c65	Rollup merge of #147936 - Sa4dUs:offload-intrinsic, r=ZuseZ4 Offload intrinsic This PR implements the minimal mechanisms required to run a small subset of arbitrary offload kernels without relying on hardcoded names or metadata. - `offload(kernel, (..args))`: an intrinsic that generates the necessary host-side LLVM-IR code. - `rustc_offload_kernel`: a builtin attribute that marks device kernels to be handled appropriately. Example usage (pseudocode): ```rust fn kernel(x: mut [f64; 128]) { core::intrinsics::offload(kernel_1, (x,)) } #[cfg(target_os = "linux")] extern "C" { pub fn kernel_1(array_b: mut [f64; 128]); } #[cfg(not(target_os = "linux"))] #[rustc_offload_kernel] extern "gpu-kernel" fn kernel_1(x: mut [f64; 128]) { unsafe { (x)[0] = 21.0 }; } ```	2025-11-26 23:32:03 +11:00
Marcelo Domínguez	5128ce10a0	Implement offload intrinsic	2025-11-25 20:04:27 +01:00
Manuel Drehwald	5fbe5dae42	Only try to link against offload functions if llvm.enzyme is enabled	2025-11-23 00:19:53 -08:00
Manuel Drehwald	89d50591c0	Replace the first of 4 binary invocations for offload	2025-11-21 02:41:17 -08:00

1 2 3 4 5 ...

764 Commits