Remove unsound `target_feature_inline_always` feature
## Summary
- Remove `target_feature_inline_always`
- Update stdarch generators to only use `#[inline]` & regenerate stdarch.
## Why?
Succinctly; the feature relies on LLVMs `AlwaysInlinerPass()` running before LLVMs heuristic based inliner pass. Which is not a basis for sound code.
This has been discussed in [the tracking issue](https://github.com/rust-lang/rust/issues/145574).
If the ordering of the passes were to change, of which they have in the past, it is very possible we could inline functions across callsites with mismatching target features leading to unsound code. Checks proposed in; https://github.com/rust-lang/rust/pull/155426 would only take into account caller -> callee which is not enough to guard against possibly of generating unsound code if the pass ordering were to change.
There doesn't seem to be a way, presently, this this mechanism to provide soundness guarantees nor does it seem like `AlwaysInlinerPass()` is a desired feature of LLVM, which this feature relies on.
r? @RalfJung
Add Swift function call ABI
Adds an unstable `extern "Swift"` ABI behind the `abi_swift` feature gate, mapping to LLVM's `swiftcc` calling convention. This is only allowed (a) for `is_darwin_like` targets, since the [ABI is only stable for those platforms](https://www.swift.org/blog/abi-stability-and-more/) and (b) with the LLVM backend, since the other backends don't support it.
Current approaches to interoperability with Swift lower to Objective-C (or require a Swift stub exposing a C ABI), but that is an optional mapping on the Swift side that some newer Apple frameworks omit. It would be great to be able to more directly/natively be able to call into Swift code directly via its stable API (on Apple platforms at least).
Reimplements rust-lang/rust#64582 on top of current main. The main objection to the previous PR seemed to be that it needed an RFC, but there was pushback (which seems sensible to me) that an RFC could be deferred until stabilization.
I think this needs a tracking issue? Would be happy to write one up if/when there is a consensus that this will be merged.
Add support for Zprint-codegen-stats-json
Add a flag `-Zprint-codegen-stats-json=<file>` to collect and write LLVM statistics in JSON format. It makes use of the function `llvm::PrintStatisticsJSON` in LLVM.
The flag currently only obtains LLVM statistics for now, but could be used to collect more front-end statistics in the future.
Adds an unstable `extern "Swift"` ABI behind the `abi_swift` feature
gate, mapping to LLVM's `swiftcc` calling convention. Cranelift and
GCC backends fall back to the platform default since they have no
equivalent.
llvm: Use correct type for splat mask
After llvm/llvm-project#195486 , LLVM has explicit handling for null pointers in simd operations instead of using special handling based on zeroes.
This causes LLVM with asserts enabled to detect an improperly typed mask passed to splat (if the output vector does not have i32 elements), and can cause SIGSEGV in more complex cases with asserts disabled.
@rustbot label: +llvm-main
r? @durin42
After llvm/llvm-project#195486 , LLVM has explicit handling for null
pointers in simd operations instead of using special handling based on
zeroes.
This causes LLVM with asserts enabled to detect an improperly typed mask
passed to splat (if the output vector does not have i32 elements), and
can cause SIGSEGV in more complex cases with asserts disabled.
Add rlib digest to identify Rust object files
This adds a metadata entry to `rlib` archives that lists which members are Rust object files instead of relying on the filename heuristic in `looks_like_rust_object.file`. I also added a fallback to the old behavior for `rlibs` built by older compilers.
Part of https://github.com/rust-lang/rust/issues/138243.
c-variadic: gate `va_arg` on `c_variadic_experimental_arch`
tracking issue: https://github.com/rust-lang/rust/issues/44930
Just gating `...` is insufficient because we make the types available everywhere, and you could still define and export functions that used va_arg for targets where we don't want to stably support it.
r? joshtriplett
Just gating `...` is insufficient because we make the types available
everywhere, and you could still define and export functions that used
va_arg for targets where we don't want to stably support it.
Specifically:
- `HashStable` -> `StableHash` (trait)
- `HashStable` -> `StableHash` (derive)
- `HashStable_NoContext` -> `StableHash_NoContext` (derive)
Note: there are some names in `compiler/rustc_macros/src/hash_stable.rs`
that are still to be renamed, e.g. `HashStableMode`.
Part of MCP 983.
-Zembed-source: also embed external source
Hi,
I've been using `-Zembed-source` for a while and noticed that some sources are not embedded when compiling in release mode. See the minimal reproducer below for missing embedded source code.
The current implementation only emits source from the `src` field in `SourceFile`, but for multi-codegen-unit scenarios the source might only be available via the `external_src` field.
I've updated the LLVM and Cranelift backends to fall back to `external_src` if `src` is not available.
Minimal reproducer:
```console
$ cat Cargo.toml
[package]
name = "repro"
version = "0.0.1"
edition = "2021"
[profile.release]
strip = "none"
debug = true
$ cat src/lib.rs
// marker comment
pub fn foo() -> u64 { 42 }
$ cat src/main.rs
fn main() {
println!("{}", repro::foo());
}
$ RUSTFLAGS="-g -Zembed-source=yes -Zdwarf-version=5" cargo build -r
Compiling repro v0.0.1 (/tmp/repro)
Finished `release` profile [optimized + debuginfo] target(s) in 0.08s
$ # Note: Source for lib.rs is missing here:
$ llvm-dwarfdump --debug-line target/release/repro | grep "marker comment"
$ RUSTFLAGS="-g -Zembed-source=yes -Zdwarf-version=5" cargo +stage1 build -r
Finished `release` profile [optimized + debuginfo] target(s) in 0.04s
$ llvm-dwarfdump --debug-line target/release/repro | grep "marker comment"
source: "// marker comment\npub fn foo() -> u64 { 42 }\n"
```
Thanks!
Pass Session to optimize_and_codegen_fat_lto
This is necessary to fix incremental LTO in cg_gcc as well as to do some LTO refactorings I want to do. The actual fix for cg_gcc will be done on the cg_gcc repo to test it in CI.
Fix: On wasm targets, call `panic_in_cleanup` if panic occurs in cleanup
Relies on https://github.com/rust-lang/llvm-project/pull/194.
Reland of https://github.com/rust-lang/rust/pull/151771.
Previously this was not correctly implemented. Each funclet may need its own terminate block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which can have a terminate_block for each funclet. We key on the first basic block of the funclet -- in particular, this is the start block for the old case of the top level terminate function.
Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all instruction is emitted which will catch foreign exceptions. Because the new logic is only used on wasm, it seemed better to take the simpler approach seeing as they do the same thing.
- [ ] Add test for https://github.com/rust-lang/rust/issues/153948
This is necessary to fix incremental LTO in cg_gcc as well as to do some
LTO refactorings I want to do. The actual fix for cg_gcc will be done on
the cg_gcc repo to test it in CI.
This allows embedding source code even when it's only available via
previously emitted metadata files.
Without this, embedded source availability is inconsistent, especially
in release builds.
NVPTX: Drop support for old architectures and old ISAs
This is the implementation of [this MCP](https://github.com/rust-lang/compiler-team/issues/965#issuecomment-3837320262)
I believe it was said that no FCP was needed, but if that is incorrect then the FCP is anyway scheduled to finish in 2 days so it can in any case be merged then.
disable naked-dead-code-elimination test if no RET mnemonic is available
this test emit x86_64 specific ret asm instruction and should not be compiled on any other arch.
Previously this was not correctly implemented. Each funclet may need its own terminate
block, so this changes the `terminate_block` into a `terminate_blocks` `IndexVec` which
can have a terminate_block for each funclet. We key on the first basic block of the
funclet -- in particular, this is the start block for the old case of the top level
terminate function.
Rather than using a catchswitch/catchpad pair, I used a cleanuppad. The reason for the
pair is to avoid catching foreign exceptions on MSVC. On wasm, it seems that the
catchswitch/catchpad pair is optimized back into a single cleanuppad and a catch_all
instruction is emitted which will catch foreign exceptions. Because the new logic is
only used on wasm, it seemed better to take the simpler approach seeing as they do the
same thing.
Add infrastructure to query LLVM backend for specific assembly mnemonics
and use it in compiletest to conditionally run tests based on instruction
availability.
This fixes test failures with naked-dead-code-elimination which requires
the `RET` mnemonic.
Co-authored-by: Folkert de Vries <flokkievids@gmail.com>
* Implement core::arch::return_address and tests
Fix typo
Apply suggestions from code review
Wording/docs changes.
Co-authored-by: Ralf Jung <post@ralfj.de>
Change signature according to Ralf's comment
Fix call to `core::intrinsics::return_address()` according to the new signature
Add cranelift implementation for intrinsic
Change wording on `return_address!()` to be clear that returning a null pointer is best-effort.
Fix formatting of doc comment
Fix mistake in cranelift codegen
* Do not generate llvm.returnaddress on wasm
* Supress return_address test on miri
Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
Tracking issue: rust-lang/rust#135516
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
we were saying that the type is i32, but would often provide an i64.
That never failed so far, but starts failing (like, crashing LLVM) when
working with 128-bit values that are 16-byte aligned. So, we may as well
use the more robust approach now.