Add intrinsic for launch-sized workgroup memory on GPUs
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
Tracking issue: rust-lang/rust#135516
Workgroup memory is a memory region that is shared between all
threads in a workgroup on GPUs. Workgroup memory can be allocated
statically or after compilation, when launching a gpu-kernel.
The intrinsic added here returns the pointer to the memory that is
allocated at launch-time.
# Interface
With this change, workgroup memory can be accessed in Rust by
calling the new `gpu_launch_sized_workgroup_mem<T>() -> *mut T`
intrinsic.
It returns the pointer to workgroup memory guaranteeing that it is
aligned to at least the alignment of `T`.
The pointer is dereferencable for the size specified when launching the
current gpu-kernel (which may be the size of `T` but can also be larger
or smaller or zero).
All calls to this intrinsic return a pointer to the same address.
See the intrinsic documentation for more details.
## Alternative Interfaces
It was also considered to expose dynamic workgroup memory as extern
static variables in Rust, like they are represented in LLVM IR.
However, due to the pointer not being guaranteed to be dereferencable
(that depends on the allocated size at runtime), such a global must be
zero-sized, which makes global variables a bad fit.
# Implementation Details
Workgroup memory in amdgpu and nvptx lives in address space 3.
Workgroup memory from a launch is implemented by creating an
external global variable in address space 3. The global is declared with
size 0, as the actual size is only known at runtime. It is defined
behavior in LLVM to access an external global outside the defined size.
There is no similar way to get the allocated size of launch-sized
workgroup memory on amdgpu an nvptx, so users have to pass this
out-of-band or rely on target specific ways for now.
c-variadic: fix implementation on `avr`
tracking issue: https://github.com/rust-lang/rust/issues/44930
cc target maintainer @Patryk27
I ran into multiple issues, and although with this PR and a little harness I can run the test with qemu on avr, the implementation is perhaps not ideal.
The problem we found is that on `avr` the `c_int/c_uint` types are `i16/u16`, and this was not handled in the c-variadic checks. Luckily there is a field in the target configuration that contains the targets `c_int_width`. However, this field is not actually used in `core` at all, there the 16-bit targets are just hardcoded.
https://github.com/rust-lang/rust/blob/1500f0f47f5fe8ddcd6528f6c6c031b210b4eac5/library/core/src/ffi/primitives.rs#L174-L185
Perhaps we should expose this like endianness and pointer width?
---
Finally there are some changes to the test to make it compile with `no_std`.
Add #![unstable_removed(..)] attribute to track removed features
Adds the #![unstable_removed(..)] attribute to enable tracking removed library features.
Produce an error when a removed attribute is used.
Add a test that it works.
For https://github.com/rust-lang/rust/issues/141617
r? @jyn514
Fix ICE in `span_extend_prev_while` with multibyte characters
Fixes https://github.com/rust-lang/rust/issues/155037
The function assumed that the character found by `rfind` was always one byte wide, using a hardcoded `1` instead of `c.len_utf8()`. When a multibyte character appeared immediately before the span, this caused the resulting span to point into the middle of a UTF-8 sequence, triggering an assertion failure in `bytepos_to_file_charpos`.
Rollup of 6 pull requests
Successful merges:
- rust-lang/rust#152901 (Introduce a `#[diagnostic::on_unknown]` attribute)
- rust-lang/rust#155078 (Reject dangling attributes in where clauses)
- rust-lang/rust#154449 (Invert dependency between `rustc_errors` and `rustc_abi`.)
- rust-lang/rust#154646 (Add suggestion to `.to_owned()` used on `Cow` when borrowing)
- rust-lang/rust#154993 (compiletest: pass -Zunstable-options for unpretty and no-codegen paths)
- rust-lang/rust#155097 (Make `rustc_attr_parsing::SharedContext::emit_lint` take a `MultiSpan` instead of a `Span`)
Add suggestion to `.to_owned()` used on `Cow` when borrowing
fixesrust-lang/rust#144792
supersedes rust-lang/rust#144925 with the review comments addressed
the tests suggested from @Kivooeo from https://github.com/rust-lang/rust/pull/144925#discussion_r2252703007 didn't work entirely, because these tests failed due to error `[E0308]` mismatched types, which actually already provides a suggestion, that actually makes the code compile:
```
$ cargo check
error[E0308]: mismatched types
--> src/main.rs:3:5
|
1 | fn test_cow_suggestion() -> String {
| ------ expected `std::string::String` because of return type
2 | let os_string = std::ffi::OsString::from("test");
3 | os_string.to_string_lossy().to_owned()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `String`, found `Cow<'_, str>`
|
= note: expected struct `std::string::String`
found enum `std::borrow::Cow<'_, str>`
help: try using a conversion method
|
3 | os_string.to_string_lossy().to_owned().to_string()
| ++++++++++++
```
now this suggestion is of course not good or efficient code, but via clippy with `-Wclippy::nursery` lint group you can actually get to the correct code, so i don't think this is too much of an issue:
<details>
<summary>the clippy suggestions</summary>
```
$ cargo clippy -- -Wclippy::nursery
warning: this `to_owned` call clones the `Cow<'_, str>` itself and does not cause its contents to become owned
--> src/main.rs:3:5
|
3 | os_string.to_string_lossy().to_owned().to_string()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/beta/index.html#suspicious_to_owned
= note: `#[warn(clippy::suspicious_to_owned)]` on by default
help: depending on intent, either make the `Cow` an `Owned` variant
|
3 | os_string.to_string_lossy().into_owned().to_string()
| ++
help: or clone the `Cow` itself
|
3 - os_string.to_string_lossy().to_owned().to_string()
3 + os_string.to_string_lossy().clone().to_string()
|
$ # apply first suggestion
$ cargo c -- -Wclippy::nursery
warning: redundant clone
--> src/main.rs:3:45
|
3 | os_string.to_string_lossy().into_owned().to_string()
| ^^^^^^^^^^^^ help: remove this
|
note: this value is dropped without further use
--> src/main.rs:3:5
|
3 | os_string.to_string_lossy().into_owned().to_string()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/beta/index.html#redundant_clone
= note: `-W clippy::redundant-clone` implied by `-W clippy::nursery`
= help: to override `-W clippy::nursery` add `#[allow(clippy::redundant_clone)]`
```
</details>
the actual error that we are looking for is error `[E0515]`, cannot return value referencing local variable, which was only present in the original issue due to automatic type inference assuming you want to return a cloned `Cow<'_, str>`, which is of course not possible. this is why i took the original test functions and turned them into closures, where it's less obvious that it's trying to return the wrong type.
r? davidtwco
(because you reviewed the last attempt)
(also, let me know if i should squash this down to one commit and add myself as the co-contributor)
Moreover, dereference `ty_layout.align` for `#[rustc_dump_layout(align)]`
to render `align: Align($N bytes)` instead of `align: AbiAlign { abi: Align($N bytes) }`
which contains the same amount of information but it more concise and legible.
This PR introduces a `#[diagnostic::on_unknown_item]` attribute that
allows crate authors to customize the error messages emitted by
unresolved imports. The main usecase for this is using this attribute as
part of a proc macro that expects a certain external module structure to
exist or certain dependencies to be there.
For me personally the motivating use-case are several derives in diesel,
that expect to refer to a `tabe` module. That is done either
implicitly (via the name of the type with the derive) or explicitly by
the user. This attribute would allow us to improve the error message in
both cases:
* For the implicit case we could explicity call out our
assumptions (turning the name into lower case, adding an `s` in the end)
+ point to the explicit variant as alternative
* For the explicit variant we would add additional notes to tell the
user why this is happening and what they should look for to fix the
problem (be more explicit about certain diesel specific assumptions of
the module structure)
I assume that similar use-cases exist for other proc-macros as well,
therefore I decided to put in the work implementing this new attribute.
I would also assume that this is likely not useful for std-lib internal
usage.
Remove `Clone` impl for `StableHashingContext`.
`HashStable::hash_stable` takes a `&mut Hcx`. In contrast, `ToStableHashKey::to_stable_hash_key` takes a `&Hcx`. But there are some places where the latter calls the former, and due to the mismatch a `clone` call is required to get a mutable `StableHashingContext`.
This commit changes `to_stable_hash_key` to instead take a `&mut Hcx`. This eliminates the mismatch, the need for the clones, and the need for the `Clone` impls.
r? @fee1-dead
Remove rustc_on_unimplemented's append_const_msg
This does nothing because it is unreachable within the compiler. It also doesn't seem that useful since all the traits it was on are `const` now.
Will make pr to rustc-dev-guide docs later.
various fixes for scalable vectors
These are a handful of patches fixing bugs with the current `#[rustc_scalable_vector]` infrastructure so that we can upstream the intrinsics to stdarch:
1. Instead of just using regular struct lowering for tuples of scalable vectors, which results in an incorrect ABI (e.g. returning indirectly), we change to using `BackendRepr::ScalableVector` which will lower to the correct type and be passed in registers.
2. Clang changed from representing tuples of scalable vectors as structs rather than as wide vectors (that is, scalable vector types where the `N` part of the `<vscale x N x ty>` type was multiplied by the number of vectors). rustc mirrored this in the initial implementation of scalable vectors that didn't land.
When our early patches used the wide vector representation, our intrinsic patches used the legacy `llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating these tuples/getting/setting the vectors, which were only supported due to LLVM's `AutoUpgrade` pass converting these intrinsics into `llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics with the wide vector representation.
With the current struct representation, Clang has special handling in codegen for generating `insertvalue`/`extractvalue` instructions for these operations, which must be replicated by rustc's codegen for our intrinsics to use.
We add a new `core::intrinsics::scalable` module (never intended to be stable, just used by the stdarch intrinsics, gated by `core_intrinsics`) and add new intrinsics which rustc lowers to the appropriate `insertvalue`/`extractvalue` instructions.
3. Add generation of debuginfo for scalable vectors, following the DWARF that Clang generates.
4. Some intrinsics need something like `simd_cast`, which will work for scalable vectors too, so this implements a new `sve_cast` intrinsic that uses the same internals as `simd_cast`. This seemed better than permitting scalable vectors to be used with the `simd_*` intrinsics more generally as I can't guarantee this would work for all of them.
This is a relatively large patch but most of it is tests, and each commit should be relatively standalone. It's a little bit easier to upstream them together to avoid needing to stack them. It's possible that some more compiler fixes will be forthcoming but it's looking like this might be all at the moment.
Depends on rust-lang/rust#153653
r? @workingjubilee (discussed on Zulip)
Abstract over the existing `simd_cast` intrinsic to implement a new
`sve_cast` intrinsic - this is better than allowing scalable vectors to
be used with all of the generic `simd_*` intrinsics.
Clang changed to representing tuples of scalable vectors as
structs rather than as wide vectors (that is, scalable vector types
where the `N` part of the `<vscale x N x ty>` type was multiplied by
the number of vectors). rustc mirrored this in the initial implementation
of scalable vectors.
Earlier versions of our patches used the wide vector representation and
our intrinsic patches used the legacy
`llvm.aarch64.sve.tuple.{create,get,set}{2,3,4}` intrinsics for creating
these tuples/getting/setting the vectors, which were only supported
due to LLVM's `AutoUpgrade` pass converting these intrinsics into
`llvm.vector.insert`. `AutoUpgrade` only supports these legacy intrinsics
with the wide vector representation.
With the current struct representation, Clang has special handling in
codegen for generating `insertvalue`/`extractvalue` instructions for
these operations, which must be replicated by rustc's codegen for our
intrinsics to use. This patch implements new intrinsics in
`core::intrinsics::scalable` (mirroring the structure of
`core::intrinsics::simd`) which rustc lowers to the appropriate
`insertvalue`/`extractvalue` instructions.
`HashStable::hash_stable` takes a `&mut Hcx`. In contrast,
`ToStableHashKey::to_stable_hash_key` takes a `&Hcx`. But there are
some places where the latter calls the former, and due to the mismatch a
`clone` call is required to get a mutable `StableHashingContext`.
This commit changes `to_stable_hash_key` to instead take a `&mut Hcx`.
This eliminates the mismatch, the need for the clones, and the need for
the `Clone` impls.
opaque_generic_const_args -> generic_const_args
This is part of a larger rework/plan of how to move forward with GCA. Please comment here, or message/ping me or @BoxyUwU if you have questions about the future of oGCA/GCA!
For ease of review, the first commit is renaming, and the second commit is removing the opaque parts.
fyi @camelid
r? @BoxyUwU
Streamline `CachingSourceMapView`
`CachingSourceMapView` is home to some pretty ugly, low-level, repetitive code. We can do better. Details in individual commits.
r? @JonathanBrouwer
`derive(HashStable_Generic)` generates impls like this:
```
impl<__CTX> HashStable<__CTX> for ExpnKind
where
__CTX: crate::HashStableContext
{
fn hash_stable(&self, hcx : &mut __CTX, __hasher: &mut StableHasher) {
...
}
}
```
This is used for crates that are upstream of `rustc_middle`.
The `crate::HashStableContext` bound means every crate that uses
`derive(HashStable_Generic)` must provide (or import) a trait
`HashStableContext` which `rustc_middle` then impls. In `rustc_span`
this trait is sensible, with three methods. In other crates, this trait
is empty, and there is the following trait hierarchy:
```
rustc_session::HashStableContext
| |
| rustc_hir::HashStableContext
| / \
rustc_ast::HashStableContext rustc_abi::HashStableContext
|
rustc_span::HashStableContext
```
All very strange and unnecessary. This commit changes
`derive(HashStable_Generic)` to use `rustc_span::HashStableContext`
instead of `crate::HashStableContext`. This eliminates the need for all
the empty `HashStableContext` traits and impls. Much better.
`CachingSourceMapView` caches information about three recent code
positions, to avoid expensive `SourceMap` lookups. The code to do this
is pretty low-level and grotty.
This commit changes it to cache information about a single code
position.
- This is much simpler, with no need to maintain entry time stamps or
file indices. It's 120 fewer lines of code overall.
- It's also faster. The cache hit rate is only marginally lower, and
scanning a single entry is faster than scanning three entries in the
cache miss case (which is still moderately common).
The commit also renames the `lines` field as `line_bounds`, which makes
it clearer that it's a range.
Currently the `cache_entry_index` uses -1 to indicate failure. But this
is Rust, not C! So this commit uses `Option`, making it impossible to
forget to check for failure, and avoiding many `as usize` casts.