preserve SIMD element type information
Preserve the SIMD element type and provide it to LLVM for better optimization.
This is relevant for AArch64 types like `int16x4x2_t`, see also https://github.com/llvm/llvm-project/issues/181514. Such types are defined like so:
```rust
#[repr(simd)]
struct int16x4_t([i16; 4]);
#[repr(C)]
struct int16x4x2_t(pub int16x4_t, pub int16x4_t);
```
Previously this would be translated to the opaque `[2 x <8 x i8>]`, with this PR it is instead `[2 x <4 x i16>]`. That change is not relevant for the ABI, but using the correct type prevents bitcasts that can (indeed, do) confuse the LLVM pattern matcher.
This change will make it possible to implement the deinterleaving loads on AArch64 in a portable way (without neon-specific intrinsics), which means that e.g. Miri or the cranelift backend can run them without additional support.
discussion at [#t-compiler > loss of vector element type information](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/loss.20of.20vector.20element.20type.20information/with/584483611)
Start using pattern types in libcore
cc rust-lang/rust#135996
Replaces the innards of `NonNull` with `*const T is !null`.
This does affect LLVM's optimizations, as now reading the field preserves the metadata that the field is not null, and transmuting to another type (e.g. just a raw pointer), will also preserve that information for optimizations. This can cause LLVM opts to do more work, but it's not guaranteed to produce better machine code.
Once we also remove all uses of rustc_layout_scalar_range_start from rustc itself, we can remove the support for that attribute entirely and handle all such needs via pattern types
Use closures more consistently in `dep_graph.rs`.
This file has several methods that take a `FnOnce() -> R` closure:
- `DepGraph::with_ignore`
- `DepGraph::with_query_deserialization`
- `DepGraph::with_anon_task`
- `DepGraphData::with_anon_task_inner`
It also has two methods that take a faux closure via an `A` argument and a `fn(TyCtxt<'tcx>, A) -> R` argument:
- DepGraph::with_task
- DepGraphData::with_task
The rationale is that the faux closure exercises tight control over what state they have access to. This seems silly when (a) they are passed a `TyCtxt`, and (b) when similar nearby functions take real closures. And they are more awkward to use, e.g. requiring multiple arguments to be gathered into a tuple. This commit changes the faux closures to real closures.
r? @Zalathar
Because the things in this module aren't MIR and don't use anything
from `rustc_middle::mir`. Also, modules that use `mono` often don't use
anything else from `rustc_middle::mir`.
This file has several methods that take a `FnOnce() -> R` closure:
- `DepGraph::with_ignore`
- `DepGraph::with_query_deserialization`
- `DepGraph::with_anon_task`
- `DepGraphData::with_anon_task_inner`
It also has two methods that take a faux closure via an `A` argument and
a `fn(TyCtxt<'tcx>, A) -> R` argument:
- DepGraph::with_task
- DepGraphData::with_task
The rationale is that the faux closure exercises tight control over what
state they have access to. This seems silly when (a) they are passed a
`TyCtxt`, and (b) when similar nearby functions take real closures. And
they are more awkward to use, e.g. requiring multiple arguments to be
gathered into a tuple. This commit changes the faux closures to real
closures.
The `HashStable` and `ToStableHashKey` traits both have a type parameter
that is sometimes called `CTX` and sometimes called `HCX`. (In practice
this type parameter is always instantiated as `StableHashingContext`.)
Similarly, variables with these types are sometimes called `ctx` and
sometimes called `hcx`. This inconsistency has bugged me for some time.
The `HCX`/`hcx` form is more informative (the `H`/`h` indicates what
type of context it is) and it matches other cases like `tcx`, `dcx`,
`icx`.
Also, RFC 430 says that type parameters should have names that are
"concise UpperCamelCase, usually single uppercase letter: T". In this
case `H` feels insufficient, and `Hcx` feels better.
Therefore, this commit changes the code to use `Hcx`/`hcx` everywhere.
simd_fmin/fmax: make semantics and name consistent with scalar intrinsics
This is the SIMD version of https://github.com/rust-lang/rust/pull/153343: change the documented semantics of the SIMD float min/max intrinsics to that of the scalar intrinsics, and also make the name consistent. The overall semantic change this amounts to is that we restrict the non-determinism: the old semantics effectively mean "when one input is an SNaN, the result non-deterministically is a NaN or the other input"; the new semantics say that in this case the other input must be returned. For all other cases, old and new semantics are equivalent. This means all users of these intrinsics that were correct with the old semantics are still correct: the overall set of possible behaviors has become smaller, no new possible behaviors are being added.
In terms of providers of this API:
- Miri, GCC, and cranelift already implement the new semantics, so no changes are needed.
- LLVM is adjusted to use `minimumnum nsz` instead of `minnum`, thus giving us the new semantics.
In terms of consumers of this API:
- Portable SIMD almost certainly wants to match the scalar behavior, so this is strictly a bugfix here.
- Stdarch mostly stopped using the intrinsic, except on nvptx, where arguably the new semantics are closer to what we actually want than the old semantics (https://github.com/rust-lang/stdarch/issues/2056).
Q: Should there be an `f` in the intrinsic name to indicate that it is for floats? E.g., `simd_fminimum_number_nsz`?
Also see https://github.com/rust-lang/rust/issues/153395.
Merge `fabsf16/32/64/128` into `fabs::<F>`
Following [a small conversation on Zulip](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/Float.20intrinsics/with/521501401) (and because I'd be interested in starting to contribute on Rust), I thought I'd give a try at merging the float intrinsics :)
This PR just merges `fabsf16`, `fabsf32`, `fabsf64`, `fabsf128`, as it felt like an easy first target.
Notes:
- I'm opening the PR for one intrinsic as it's probably easier if the shift is done one intrinsic at a time, but let me know if you'd rather I do several at a time to reduce the number of PRs.
- Currently this PR increases LOCs, despite being an attempt at simplifying the intrinsics/compilers. I believe this increase is a one time thing as I had to define new functions and move some things around, and hopefully future PRs/commits will reduce overall LoCs
- `fabsf32` and `fabsf64` are `#[rustc_intrinsic_const_stable_indirect]`, while `fabsf16` and `fabsf128` aren't; because `f32`/`f64` expect the function to be const, the generic version must be made indirectly stable too. We'd need to check with T-lang this change is ok; the only other intrinsics where there is such a mismatch is `minnum`, `maxnum` and `copysign`.
- I haven't touched libm because I'm not familiar with how it works; any guidance would be welcome!
Defer codegen for the VaList Drop impl to actual uses
This allows compiling libcore with codegen backends that don't actually implement VaList like cg_clif.
Rename `target.abi` to `target.cfg_abi` and enum-ify llvm_abiname
See [Zulip](https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/De-spaghettifying.20ABI.20controls/with/578893542) for more context. Discussed a bit in https://github.com/rust-lang/rust/pull/153769#discussion_r2934399038 too.
This renames `target.abi` to `target.cfg_abi` to make it less likely that someone will use it to determine things about the actual ccABI, i.e. the calling convention used on the target. `target.abi` does not control that calling convention, it just *sometimes* informs the user about that calling convention (and also about other aspects of the ABI).
Also turn llvm_abiname into an enum to make it more natural to match on.
Cc @workingjubilee @madsmtm
It's defined in `rustc_span::source_map` which doesn't make any sense
because it has nothing to do with source maps. This commit moves it to
the crate root, a more sensible spot for something this basic.
Make `size`/`align` always correct rather than conditionally on the
`safe` field. This makes it less error prone and easier to work with for
`MaybeDangling` / potential future pointer kinds like `Aligned<_>`.
This is already CodegenResults without CrateInfo. The driver can
calculate the CrateInfo and pass it by-ref to the backend. Using
CompiledModules makes it a bit easier to move some other things out of
the backend as will be necessary for moving LTO to the link phase.
replace box_new with lower-level intrinsics
The `box_new` intrinsic is super special: during THIR construction it turns into an `ExprKind::Box` (formerly known as the `box` keyword), which then during MIR building turns into a special instruction sequence that invokes the exchange_malloc lang item (which has a name from a different time) and a special MIR statement to represent a shallowly-initialized `Box` (which raises [interesting opsem questions](https://github.com/rust-lang/rust/issues/97270)).
This PR is the n-th attempt to get rid of `box_new`. That's non-trivial because it usually causes a perf regression: replacing `box_new` by naive unsafe code will incur extra copies in debug builds, making the resulting binaries a lot slower, and will generate a lot more MIR, making compilation measurably slower. Furthermore, `vec!` is a macro, so the exact code it expands to is highly relevant for borrow checking, type inference, and temporary scopes.
To avoid those problems, this PR does its best to make the MIR almost exactly the same as what it was before. `box_new` is used in two places, `Box::new` and `vec!`:
- For `Box::new` that is fairly easy: the `move_by_value` intrinsic is basically all we need. However, to avoid the extra copy that would usually be generated for the argument of a function call, we need to special-case this intrinsic during MIR building. That's what the first commit does.
- `vec!` is a lot more tricky. As a macro, its details leak to stable code, so almost every variant I tried broke either type inference or the lifetimes of temporaries in some ui test or ended up accepting unsound code due to the borrow checker not enforcing all the constraints I hoped it would enforce. I ended up with a variant that involves a new intrinsic, `fn write_box_via_move<T>(b: Box<MaybeUninit<T>>, x: T) -> Box<MaybeUninit<T>>`, that writes a value into a `Box<MaybeUninit<T>>` and returns that box again. In exchange we can get rid of somewhat similar code in the lowering for `ExprKind::Box`, and the `exchange_malloc` lang item. (We can also get rid of `Rvalue::ShallowInitBox`; I didn't include that in this PR -- I think @cjgillot has a commit for this somewhere [around here](https://github.com/rust-lang/rust/pull/147862/commits).)
See [here](https://github.com/rust-lang/rust/pull/148190#issuecomment-3457454814) for the latest perf numbers. Most of the regressions are in deep-vector which consists entirely of an invocation of `vec!`, so any change to that macro affects this benchmark disproportionally.
This is my first time even looking at MIR building code, so I am very low confidence in that part of the patch, in particular when it comes to scopes and drops and things like that.
I also had do nerf some clippy tests because clippy gets confused by the new expansion of `vec!` so it makes fewer suggestions when `vec!` is involved.
### `vec!` FAQ
- Why does `write_box_via_move` return the `Box` again? Because we need to expand `vec!` to a bunch of method invocations without any blocks or let-statements, or else the temporary scopes (and type inference) don't work out.
- Why is `box_assume_init_into_vec_unsafe` (unsoundly!) a safe function? Because we can't use an unsafe block in `vec!` as that would necessarily also include the `$x` (due to it all being one big method invocation) and therefore interpret the user's code as being inside `unsafe`, which would be bad (and 10 years later, we still don't have safe blocks for macros like this).
- Why does `write_box_via_move` use `Box` as input/output type, and not, say, raw pointers? Because that is the only way to get the correct behavior when `$x` panics or has control effects: we need the `Box` to be dropped in that case. (As a nice side-effect this also makes the intrinsic safe, which is imported as explained in the previous bullet.)
- Can't we make it safe by having `write_box_via_move` return `Box<T>`? Yes we could, but there's no easy way for the intrinsic to convert its `Box<MaybeUninit<T>>` to a `Box<T>`. Transmuting would be unsound as the borrow checker would no longer properly enforce that lifetimes involved in a `vec!` invocation behave correctly.
- Is this macro truly cursed? Yes, yes it is.
New float tests in core are failing on clif with issues like the
following:
Undefined symbols for architecture arm64:
"_coshf128", referenced from:
__RNvMNtCshY0fR2o0hOA_3std4f128C4f1284coshCs5TKtJxXQNGL_9coretests in coretests-e38519c0cc90db54.coretests.44b6247a565e10d1-cgu.10.rcgu.o
"_exp2f128", referenced from:
__RNvMNtCshY0fR2o0hOA_3std4f128C4f1284exp2Cs5TKtJxXQNGL_9coretests in coretests-e38519c0cc90db54.coretests.44b6247a565e10d1-cgu.10.rcgu.o
...
Disable f128 math unless the symbols are known to be available, which
for now is only glibc targets. This matches the LLVM backend.
abi: add a rust-preserve-none calling convention
This is the conceptual opposite of the rust-cold calling convention and is particularly useful in combination with the new `explicit_tail_calls` feature.
For relatively tight loops implemented with tail calling (`become`) each of the function with the regular calling convention is still responsible for restoring the initial value of the preserved registers. So it is not unusual to end up with a situation where each step in the tail call loop is spilling and reloading registers, along the lines of:
foo:
push r12
; do things
pop r12
jmp next_step
This adds up quickly, especially when most of the clobberable registers are already used to pass arguments or other uses.
I was thinking of making the name of this ABI a little less LLVM-derived and more like a conceptual inverse of `rust-cold`, but could not come with a great name (`rust-cold` is itself not a great name: cold in what context? from which perspective? is it supposed to mean that the function is rarely called?)
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.