Merge pull request #2825 from rust-lang/tshepang/sembr

sembr a few files
This commit is contained in:
Tshepang Mbambo
2026-04-08 09:01:44 +02:00
committed by GitHub
7 changed files with 225 additions and 148 deletions
@@ -1,8 +1,8 @@
# Code Index
rustc has a lot of important data structures. This is an attempt to give some
guidance on where to learn more about some of the key data structures of the
compiler.
rustc has a lot of important data structures.
This is an attempt to give some guidance on where to learn more
about some of the key data structures of the compiler.
Item | Kind | Short description | Chapter | Declaration
----------------|----------|-----------------------------|--------------------|-------------------
@@ -1,8 +1,9 @@
# Interpreter
The interpreter is a virtual machine for executing MIR without compiling to
machine code. It is usually invoked via `tcx.const_eval_*` functions. The
interpreter is shared between the compiler (for compile-time function
machine code.
It is usually invoked via `tcx.const_eval_*` functions.
The interpreter is shared between the compiler (for compile-time function
evaluation, CTFE) and the tool [Miri](https://github.com/rust-lang/miri/), which
uses the same virtual machine to detect Undefined Behavior in (unsafe) Rust
code.
@@ -26,7 +27,8 @@ The compiler needs to figure out the length of the array before being able to
create items that use the type (locals, constants, function arguments, ...).
To obtain the (in this case empty) parameter environment, one can call
`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is
`let param_env = tcx.param_env(length_def_id);`.
The `GlobalId` needed is
```rust,ignore
let gid = GlobalId {
@@ -36,7 +38,8 @@ let gid = GlobalId {
```
Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of
the MIR of the array length expression. The MIR will look something like this:
the MIR of the array length expression.
The MIR will look something like this:
```mir
Foo::{{constant}}#0: usize = {
@@ -59,35 +62,43 @@ Before the evaluation, a virtual memory location (in this case essentially a
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
At the start of the evaluation, `_0` and `_1` are
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`.
This is quite
a mouthful: [`Operand`] can represent either data stored somewhere in the
[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization)
immediate data stored in-line. And [`Immediate`] can either be a single
immediate data stored in-line.
And [`Immediate`] can either be a single
(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer),
or a pair of two of them. In our case, the single scalar value is *not* (yet)
initialized.
or a pair of two of them.
In our case, the single scalar value is *not* (yet) initialized.
When the initialization of `_1` is invoked, the value of the `FOO` constant is
required, and triggers another call to `tcx.const_eval_*`, which will not be shown
here. If the evaluation of FOO is successful, `42` will be subtracted from its
here.
If the evaluation of FOO is successful, `42` will be subtracted from its
value `4096` and the result stored in `_1` as
`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
the second part is a bool that's true if an overflow happened. A `Scalar::Raw`
Scalar::Raw { data: 0, .. })`.
The first part of the pair is the computed value,
the second part is a bool that's true if an overflow happened.
A `Scalar::Raw`
also stores the size (in bytes) of this scalar value; we are eliding that here.
The next statement asserts that said boolean is `0`. In case the assertion
The next statement asserts that said boolean is `0`.
In case the assertion
fails, its error message is used for reporting a compile-time error.
Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
data: 4054, .. }))` is stored in the virtual memory it was allocated before the
evaluation. `_0` always refers to that location directly.
evaluation.
`_0` always refers to that location directly.
After the evaluation is done, the return value is converted from [`Operand`] to
[`ConstValue`] by [`op_to_const`]: the former representation is geared towards
what is needed *during* const evaluation, while [`ConstValue`] is shaped by the
needs of the remaining parts of the compiler that consume the results of const
evaluation. As part of this conversion, for types with scalar values, even if
evaluation.
As part of this conversion, for types with scalar values, even if
the resulting [`Operand`] is `Indirect`, it will return an immediate
`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::Indirect`).
This makes using the result much more efficient and also more convenient, as no
@@ -107,12 +118,13 @@ the interpreter, but just use the cached result.
The interpreter's outside-facing datastructures can be found in
[rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_middle/src/mir/interpret).
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A
`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types.
A `ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
pointer), `Slice` (to represent byte slices and strings, as needed for pattern
matching) or `Indirect`, which is used for anything else and refers to a virtual
allocation. These allocations can be accessed via the methods on
`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer;
allocation.
These allocations can be accessed via the methods on `tcx.interpret_interner`.
A `Scalar` is either some `Raw` integer or a pointer;
see [the next section](#memory) for more on that.
If you are expecting a numeric result, you can use `eval_usize` (panics on
@@ -122,29 +134,38 @@ in an `Option<u64>` yielding the `Scalar` if possible.
## Memory
To support any kind of pointers, the interpreter needs to have a "virtual memory" that the
pointers can point to. This is implemented in the [`Memory`] type. In the
simplest model, every global variable, stack variable and every dynamic
allocation corresponds to an [`Allocation`] in that memory. (Actually using an
pointers can point to.
This is implemented in the [`Memory`] type.
In the simplest model, every global variable, stack variable and every dynamic
allocation corresponds to an [`Allocation`] in that memory.
(Actually using an
allocation for every MIR stack variable would be very inefficient; that's why we
have `Operand::Immediate` for stack variables that are both small and never have
their address taken. But that is purely an optimization.)
their address taken.
But that is purely an optimization.)
Such an `Allocation` is basically just a sequence of `u8` storing the value of
each byte in this allocation. (Plus some extra data, see below.) Every
`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a
each byte in this allocation.
(Plus some extra data, see below.) Every
`Allocation` has a globally unique `AllocId` assigned in `Memory`.
With that, a
[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and
an offset into the allocation (indicating which byte of the allocation the
pointer points to). It may seem odd that a `Pointer` is not just an integer
pointer points to).
It may seem odd that a `Pointer` is not just an integer
address, but remember that during const evaluation, we cannot know at which
actual integer address the allocation will end up -- so we use `AllocId` as
symbolic base addresses, which means we need a separate offset. (As an aside,
symbolic base addresses, which means we need a separate offset.
(As an aside,
it turns out that pointers at run-time are
[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
These allocations exist so that references and raw pointers have something to
point to. There is no global linear heap in which things are allocated, but each
point to.
There is no global linear heap in which things are allocated, but each
allocation (be it for a local variable, a static or a (future) heap allocation)
gets its own little memory with exactly the required size. So if you have a
gets its own little memory with exactly the required size.
So if you have a
pointer to an allocation for a local variable `a`, there is no possible (no
matter how unsafe) operation that you can do that would ever change said pointer
to a pointer to a different local variable `b`.
@@ -152,31 +173,35 @@ Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays
This, however, causes a problem when we want to store a `Pointer` into an
`Allocation`: we cannot turn it into a sequence of `u8` of the right length!
`AllocId` and offset together are twice as big as a pointer "seems" to be. This
is what the `relocation` field of `Allocation` is for: the byte offset of the
`AllocId` and offset together are twice as big as a pointer "seems" to be.
This is what the `relocation` field of `Allocation` is for: the byte offset of the
`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored
out-of-band. The two are reassembled when the `Pointer` is read from memory.
out-of-band.
The two are reassembled when the `Pointer` is read from memory.
The other bit of extra data an `Allocation` needs is `undef_mask` for keeping
track of which of its bytes are initialized.
### Global memory and exotic allocations
`Memory` exists only during evaluation; it gets destroyed when the
final value of the constant is computed. In case that constant contains any
final value of the constant is computed.
In case that constant contains any
pointers, those get "interned" and moved to a global "const eval memory" that is
part of `TyCtxt`. These allocations stay around for the remaining computation
part of `TyCtxt`.
These allocations stay around for the remaining computation
and get serialized into the final output (so that dependent crates can use
them).
Moreover, to also support function pointers, the global memory in `TyCtxt` can
also contain "virtual allocations": instead of an `Allocation`, these contain an
`Instance`. That allows a `Pointer` to point to either normal data or a
`Instance`.
That allows a `Pointer` to point to either normal data or a
function, which is needed to be able to evaluate casts from function pointers to
raw pointers.
Finally, the [`GlobalAlloc`] type used in the global memory also contains a
variant `Static` that points to a particular `const` or `static` item. This is
needed to support circular statics, where we need to have a `Pointer` to a
variant `Static` that points to a particular `const` or `static` item.
This is needed to support circular statics, where we need to have a `Pointer` to a
`static` for which we cannot yet have an `Allocation` as we do not know the
bytes of its value.
@@ -188,17 +213,19 @@ bytes of its value.
### Pointer values vs Pointer types
One common cause of confusion in the interpreter is that being a pointer *value* and having
a pointer *type* are entirely independent properties. By "pointer value", we
a pointer *type* are entirely independent properties.
By "pointer value", we
refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into
the interpreter's virtual memory. This is in contrast to `Scalar::Raw`, which is just some
concrete integer.
the interpreter's virtual memory.
This is in contrast to `Scalar::Raw`, which is just some concrete integer.
However, a variable of pointer or reference *type*, such as `*const T` or `&T`,
does not have to have a pointer *value*: it could be obtained by casting or
transmuting an integer to a pointer.
transmuting an integer to a pointer.
And similarly, when casting or transmuting a reference to some
actual allocation to an integer, we end up with a pointer *value*
(`Scalar::Ptr`) at integer *type* (`usize`). This is a problem because we
(`Scalar::Ptr`) at integer *type* (`usize`).
This is a problem because we
cannot meaningfully perform integer operations such as division on pointer
values.
@@ -207,30 +234,33 @@ values.
Although the main entry point to constant evaluation is the `tcx.const_eval_*`
functions, there are additional functions in
[rustc_const_eval/src/const_eval](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html)
that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise). You should
that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise).
You should
never have to access an `Allocation` directly except for translating it to the
compilation target (at the moment just LLVM).
The interpreter starts by creating a virtual stack frame for the current constant that is
being evaluated. There's essentially no difference between a constant and a
being evaluated.
There's essentially no difference between a constant and a
function with no arguments, except that constants do not allow local (named)
variables at the time of writing this guide.
A stack frame is defined by the `Frame` type in
[rustc_const_eval/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/eval_context.rs)
and contains all the local
variables memory (`None` at the start of evaluation). Each frame refers to the
evaluation of either the root constant or subsequent calls to `const fn`. The
evaluation of another constant simply calls `tcx.const_eval_*`, which produce an
and contains all the local variables memory (`None` at the start of evaluation).
Each frame refers to the
evaluation of either the root constant or subsequent calls to `const fn`.
The evaluation of another constant simply calls `tcx.const_eval_*`, which produce an
entirely new and independent stack frame.
The frames are just a `Vec<Frame>`, there's no way to actually refer to a
`Frame`'s memory even if horrible shenanigans are done via unsafe code. The only
memory that can be referred to are `Allocation`s.
`Frame`'s memory even if horrible shenanigans are done via unsafe code.
The only memory that can be referred to are `Allocation`s.
The interpreter now calls the `step` method (in
[rustc_const_eval/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/step.rs)
) until it either returns an error or has no further statements to execute. Each
statement will now initialize or modify the locals or the virtual memory
referred to by a local. This might require evaluating other constants or
) until it either returns an error or has no further statements to execute.
Each statement will now initialize or modify the locals or the virtual memory
referred to by a local.
This might require evaluating other constants or
statics, which just recursively invokes `tcx.const_eval_*`.
+33 -22
View File
@@ -1,13 +1,15 @@
# The HIR
The HIR "High-Level Intermediate Representation" is the primary IR used
in most of rustc. It is a compiler-friendly representation of the abstract
in most of rustc.
It is a compiler-friendly representation of the abstract
syntax tree (AST) that is generated after parsing, macro expansion, and name
resolution (see [Lowering](./hir/lowering.md) for how the HIR is created).
Many parts of HIR resemble Rust surface syntax quite closely, with
the exception that some of Rust's expression forms have been desugared away.
For example, `for` loops are converted into a `loop` and do not appear in
the HIR. This makes HIR more amenable to analysis than a normal AST.
the HIR.
This makes HIR more amenable to analysis than a normal AST.
This chapter covers the main concepts of the HIR.
@@ -30,7 +32,8 @@ cargo rustc -- -Z unpretty=hir
The top-level data-structure in the HIR is the [`Crate`], which stores
the contents of the crate currently being compiled (we only ever
construct HIR for the current crate). Whereas in the AST the crate
construct HIR for the current crate).
Whereas in the AST the crate
data structure basically just contains the root module, the HIR
`Crate` structure contains a number of maps and other things that
serve to organize the content of the crate for easier access.
@@ -39,8 +42,8 @@ serve to organize the content of the crate for easier access.
For example, the contents of individual items (e.g. modules,
functions, traits, impls, etc) in the HIR are not immediately
accessible in the parents. So, for example, if there is a module item
`foo` containing a function `bar()`:
accessible in the parents.
So, for example, if there is a module item `foo` containing a function `bar()`:
```rust
mod foo {
@@ -49,8 +52,8 @@ mod foo {
```
then in the HIR the representation of module `foo` (the [`Mod`]
struct) would only have the **`ItemId`** `I` of `bar()`. To get the
details of the function `bar()`, we would lookup `I` in the
struct) would only have the **`ItemId`** `I` of `bar()`.
To get the details of the function `bar()`, we would lookup `I` in the
`items` map.
[`Mod`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Mod.html
@@ -62,9 +65,11 @@ There are similar maps for things like trait items and impl items,
as well as "bodies" (explained below).
The other reason to set up the representation this way is for better
integration with incremental compilation. This way, if you gain access
integration with incremental compilation.
This way, if you gain access
to an [`&rustc_hir::Item`] (e.g. for the mod `foo`), you do not immediately
gain access to the contents of the function `bar()`. Instead, you only
gain access to the contents of the function `bar()`.
Instead, you only
gain access to the **id** for `bar()`, and you must invoke some
function to lookup the contents of `bar()` given its id; this gives
the compiler a chance to observe that you accessed the data for
@@ -79,23 +84,27 @@ the compiler a chance to observe that you accessed the data for
The HIR uses a bunch of different identifiers that coexist and serve different purposes.
- A [`DefId`], as the name suggests, identifies a particular definition, or top-level
item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies
item, in a given crate.
It is composed of two parts: a [`CrateNum`] which identifies
the crate the definition comes from, and a [`DefIndex`] which identifies the definition
within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which
within the crate.
Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which
makes them more stable across compilations.
- A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate.
This allows us to drop the [`CrateNum`] part, and use the type system to ensure that
only local definitions are passed to functions that expect a local definition.
- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed
of two parts: an `owner` and a `local_id` that is unique within the `owner`. This
combination makes for more stable values which are helpful for incremental compilation.
- A [`HirId`] uniquely identifies a node in the HIR of the current crate.
It is composed of two parts:
an `owner` and a `local_id` that is unique within the `owner`.
This combination makes for more stable values which are helpful for incremental compilation.
Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions,
but stays local to the current crate.
- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only
a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the
- A [`BodyId`] identifies a HIR [`Body`] in the current crate.
It is currently only a wrapper around a [`HirId`].
For more info about HIR bodies, please refer to the
[HIR chapter][hir-bodies].
These identifiers can be converted into one another through the `TyCtxt`.
@@ -112,8 +121,8 @@ These identifiers can be converted into one another through the `TyCtxt`.
## HIR Operations
Most of the time when you are working with the HIR, you will do so via
`TyCtxt`. It contains a number of methods, defined in the `hir::map` module and
Most of the time when you are working with the HIR, you will do so via `TyCtxt`.
It contains a number of methods, defined in the `hir::map` module and
mostly prefixed with `hir_`, to convert between IDs of various kinds and to
lookup data associated with a HIR node.
@@ -126,8 +135,10 @@ You need a `LocalDefId`, rather than a `DefId`, since only local items have HIR
[local_def_id_to_hir_id]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.local_def_id_to_hir_id
Similarly, you can use [`tcx.hir_node(n)`][hir_node] to lookup the node for a
[`HirId`]. This returns a `Option<Node<'hir>>`, where [`Node`] is an enum
defined in the map. By matching on this, you can find out what sort of
[`HirId`].
This returns a `Option<Node<'hir>>`, where [`Node`] is an enum
defined in the map.
By matching on this, you can find out what sort of
node the `HirId` referred to and also get a pointer to the data
itself. Often, you know what sort of node `n` is e.g. if you know
that `n` must be some HIR expression, you can do
@@ -148,8 +159,8 @@ calls like [`tcx.parent_hir_node(n)`][parent_hir_node].
## HIR Bodies
A [`rustc_hir::Body`] represents some kind of executable code, such as the body
of a function/closure or the definition of a constant. Bodies are
associated with an **owner**, which is typically some kind of item
of a function/closure or the definition of a constant.
Bodies are associated with an **owner**, which is typically some kind of item
(e.g. an `fn()` or `const`), but could also be a closure expression
(e.g. `|x, y| x + y`). You can use the `TyCtxt` to find the body
associated with a given def-id ([`hir_maybe_body_owned_by`]) or to find
@@ -108,19 +108,20 @@ Here is the list of passes as of <!-- date-check --> March 2023:
- `calculate-doc-coverage` calculates information used for the `--show-coverage`
flag.
- `check-doc-test-visibility` runs `doctest` visibilityrelated `lint`s. This pass
runs before `strip-private`, which is why it needs to be separate from `run-lints`.
- `check-doc-test-visibility` runs `doctest` visibilityrelated `lint`s.
This pass runs before `strip-private`,
which is why it needs to be separate from `run-lints`.
- `collect-intra-doc-links` resolves [intra-doc links](https://doc.rust-lang.org/nightly/rustdoc/write-documentation/linking-to-items-by-name.html).
- `collect-trait-impls` collects `trait` `impl`s for each item in the crate. For
example, if we define a `struct` that implements a `trait`, this pass will note
that the `struct` implements that `trait`.
- `collect-trait-impls` collects `trait` `impl`s for each item in the crate.
For example, if we define a `struct` that implements a `trait`,
this pass will note that the `struct` implements that `trait`.
- `propagate-doc-cfg` propagates `#[doc(cfg(...))]` to child items.
- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`. This is
the last pass to run.
- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`.
This is the last pass to run.
- `bare_urls` detects links that are not linkified, e.g., in Markdown such as
`Go to https://example.com/.` It suggests wrapping the link with angle brackets:
@@ -233,7 +234,8 @@ is complicated from two other constraints that `rustdoc` runs under:
configurations, such as `libstd` having a single package of docs that
cover all supported operating systems.
This means `rustdoc` has to be able to generate docs from `HIR`.
* Docs can inline across crates. Since crate metadata doesn't contain `HIR`,
* Docs can inline across crates.
Since crate metadata doesn't contain `HIR`,
it must be possible to generate inlined docs from the `rustc_middle` data.
The "clean" [`AST`][ast] acts as a common output format for both input formats.
+56 -40
View File
@@ -2,12 +2,13 @@
**In general, we expect every PR that fixes a bug in rustc to come accompanied
by a regression test of some kind.** This test should fail in `main` but pass
after the PR. These tests are really useful for preventing us from repeating the
after the PR.
These tests are really useful for preventing us from repeating the
mistakes of the past.
The first thing to decide is which kind of test to add. This will depend on the
nature of the change and what you want to exercise. Here are some rough
guidelines:
The first thing to decide is which kind of test to add.
This will depend on the nature of the change and what you want to exercise.
Here are some rough guidelines:
- The majority of compiler tests are done with [compiletest].
- The majority of compiletest tests are [UI](ui.md) tests in the [`tests/ui`]
@@ -24,14 +25,17 @@ guidelines:
`library/${crate}tests/lib.rs`.
- If the code is part of an isolated system, and you are not testing compiler
output, consider using a [unit or integration test](intro.md#package-tests).
- Need to run rustdoc? Prefer a `rustdoc` or `rustdoc-ui` test. Occasionally
you'll need `rustdoc-js` as well.
- Need to run rustdoc?
Prefer a `rustdoc` or `rustdoc-ui` test.
Occasionally you'll need `rustdoc-js` as well.
- Other compiletest test suites are generally used for special purposes:
- Need to run gdb or lldb? Use the `debuginfo` test suite.
- Need to inspect LLVM IR or MIR IR? Use the `codegen` or `mir-opt` test
suites.
- Need to inspect the resulting binary in some way? Or if all the other test
suites are too limited for your purposes? Then use `run-make`.
- Need to run gdb or lldb?
Use the `debuginfo` test suite.
- Need to inspect LLVM IR or MIR IR?
Use the `codegen` or `mir-opt` test suites.
- Need to inspect the resulting binary in some way?
Or if all the other test suites are too limited for your purposes?
Then use `run-make`.
- Use `run-make-cargo` if you need to exercise in-tree `cargo` in conjunction
with in-tree `rustc`.
- Check out the [compiletest] chapter for more specialized test suites.
@@ -47,14 +51,16 @@ modified several years later, how can we make it easier for them?).
## UI test walkthrough
The following is a basic guide for creating a [UI test](ui.md), which is one of
the most common compiler tests. For this tutorial, we'll be adding a test for an
async error message.
the most common compiler tests.
For this tutorial, we'll be adding a test for an async error message.
### Step 1: Add a test file
The first step is to create a Rust source file somewhere in the [`tests/ui`]
tree. When creating a test, do your best to find a good location and name (see
[Test organization](ui.md#test-organization) for more). Since naming is the
tree.
When creating a test, do your best to find a good location and name (see
[Test organization](ui.md#test-organization) for more).
Since naming is the
hardest part of development, everything should be downhill from here!
Let's place our async test at `tests/ui/async-await/await-without-async.rs`:
@@ -77,19 +83,23 @@ A few things to notice about our test:
- The top should start with a short comment that [explains what the test is
for](#explanatory_comment).
- The `//@ edition:2018` comment is called a [directive](directives.md) which
provides instructions to compiletest on how to build the test. Here we need to
provides instructions to compiletest on how to build the test.
Here we need to
set the edition for `async` to work (the default is edition 2015).
- Following that is the source of the test. Try to keep it succinct and to the
point. This may require some effort if you are trying to minimize an example
- Following that is the source of the test.
Try to keep it succinct and to the point.
This may require some effort if you are trying to minimize an example
from a bug report.
- We end this test with an empty `fn main` function. This is because the default
- We end this test with an empty `fn main` function.
This is because the default
for UI tests is a `bin` crate-type, and we don't want the "main not found"
error in our test. Alternatively, you could add `#![crate_type="lib"]`.
error in our test.
Alternatively, you could add `#![crate_type="lib"]`.
### Step 2: Generate the expected output
The next step is to create the expected output snapshots from the compiler. This
can be done with the `--bless` option:
The next step is to create the expected output snapshots from the compiler.
This can be done with the `--bless` option:
```sh
./x test tests/ui/async-await/await-without-async.rs --bless
@@ -99,8 +109,8 @@ This will build the compiler (if it hasn't already been built), compile the
test, and place the output of the compiler in a file called
`tests/ui/async-await/await-without-async.stderr`.
However, this step will fail! You should see an error message, something like
this:
However, this step will fail!
You should see an error message, something like this:
> error: /rust/tests/ui/async-await/await-without-async.rs:7: unexpected
> error: '7:10: 7:16: `await` is only allowed inside `async` functions and
@@ -112,7 +122,8 @@ annotations in the source file.
### Step 3: Add error annotations
Every error needs to be annotated with a comment in the source with the text of
the error. In this case, we can add the following comment to our test file:
the error.
In this case, we can add the following comment to our test file:
```rust,ignore
fn bar() {
@@ -136,9 +147,10 @@ It should now pass, yay!
### Step 4: Review the output
Somewhat hand-in-hand with the previous step, you should inspect the `.stderr`
file that was created to see if it looks like how you expect. If you are adding
a new diagnostic message, now would be a good time to also consider how readable
the message looks overall, particularly for people new to Rust.
file that was created to see if it looks like how you expect.
If you are adding a new diagnostic message,
now would be a good time to also consider how readable the message looks overall,
particularly for people new to Rust.
Our example `tests/ui/async-await/await-without-async.stderr` file should look
like this:
@@ -161,9 +173,9 @@ You may notice some things look a little different than the regular compiler
output.
- The `$DIR` removes the path information which will differ between systems.
- The `LL` values replace the line numbers. That helps avoid small changes in
the source from triggering large diffs. See the
[Normalization](ui.md#normalization) section for more.
- The `LL` values replace the line numbers.
That helps avoid small changes in the source from triggering large diffs.
See the [Normalization](ui.md#normalization) section for more.
Around this stage, you may need to iterate over the last few steps a few times
to tweak your test, re-bless the test, and re-review the output.
@@ -171,8 +183,10 @@ to tweak your test, re-bless the test, and re-review the output.
### Step 5: Check other tests
Sometimes when adding or changing a diagnostic message, this will affect other
tests in the test suite. The final step before posting a PR is to check if you
have affected anything else. Running the UI suite is usually a good start:
tests in the test suite.
The final step before posting a PR is to check if you
have affected anything else.
Running the UI suite is usually a good start:
```sh
./x test tests/ui
@@ -188,16 +202,18 @@ You may also need to re-bless the output with the `--bless` flag.
## Comment explaining what the test is about
The first comment of a test file should **summarize the point of the test**, and
highlight what is important about it. If there is an issue number associated
with the test, include the issue number.
highlight what is important about it.
If there is an issue number associated with the test, include the issue number.
This comment doesn't have to be super extensive. Just something like "Regression
test for #18060: match arms were matching in the wrong order." might already be
enough.
This comment doesn't have to be super extensive.
Just something like the following might be enough:
"Regression test for #18060: match arms were matching in the wrong order".
These comments are very useful to others later on when your test breaks, since
they often can highlight what the problem is. They are also useful if for some
they often can highlight what the problem is.
They are also useful if for some
reason the tests need to be refactored, since they let others know which parts
of the test were important. Often a test must be rewritten because it no longer
of the test were important.
Often a test must be rewritten because it no longer
tests what it was meant to test, and then it's useful to know what it *was*
meant to test exactly.
@@ -1,20 +1,29 @@
# Having separate `Trait` and `Projection` bounds
Given `T: Foo<AssocA = u32, AssocB = i32>` where-bound, we currently lower it to a `Trait(Foo<T>)` and separate `Projection(<T as Foo>::AssocA, u32)` and `Projection(<T as Foo>::AssocB, i32)` bounds. Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead?
Given `T: Foo<AssocA = u32, AssocB = i32>` where-bound, we currently lower it to a `Trait(Foo<T>)` and separate `Projection(<T as Foo>::AssocA, u32)` and `Projection(<T as Foo>::AssocB, i32)` bounds.
Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead?
The way we prove `Projection` bounds directly relies on proving the corresponding `Trait` bound: [old solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_trait_selection/src/traits/project.rs#L898) [new solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_next_trait_solver/src/solve/normalizes_to/mod.rs#L37-L41).
It feels like it might make more sense to just have a single implementation which checks whether a trait is implemented and returns (a way to compute) its associated types.
This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound. See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls).
This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound.
See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls).
There are also some other subtle reasons for why we can't do so. The most stupid is that for rigid aliases, trying to normalize them does not consider any lifetime constraints from proving the trait bound. This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm.
There are also some other subtle reasons for why we can't do so.
The most stupid is that for rigid aliases;
trying to normalize them does not consider any lifetime constraints from proving the trait bound.
This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm.
A separate issue is that right now, fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT. See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185.
A separate issue is that, right now,
fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT.
See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185.
There are also slight differences between candidates for some of the builtin impls, these do all seem generally undesirable and I consider them to be bugs which would be fixed if we had a unified approach here.
Finally, not having this split makes lowering where-clauses more annoying. With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds. We now need to make sure we merge all associated type constraints, e.g.
Finally, not having this split makes lowering where-clauses more annoying.
With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds.
We now need to make sure we merge all associated type constraints, e.g.:
```rust
trait Super {
@@ -36,4 +45,3 @@ trait Trait<'a>: Super<'a, A = i32> {}
// how to elaborate
// T: Trait<'a> + for<'b> Super<'b, B = u32>
```
@@ -24,12 +24,14 @@ Adt(&'tcx AdtDef, GenericArgs<'tcx>)
There are two parts:
- The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type
parameters. In our example, this is the `MyStruct` part *without* the argument `u32`.
parameters.
In our example, this is the `MyStruct` part *without* the argument `u32`.
(Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`,
they are all represented using `TyKind::Adt`.)
- The [`GenericArgs`] is a list of values that are to be substituted
for the generic parameters. In our example of `MyStruct<u32>`, we would end up with a list like
`[u32]`. Well dig more into generics and substitutions in a little bit.
for the generic parameters.
In our example of `MyStruct<u32>`, we would end up with a list like `[u32]`.
Well dig more into generics and substitutions in a little bit.
[adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html
[`GenericArgs`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html
@@ -37,25 +39,29 @@ for the generic parameters. In our example of `MyStruct<u32>`, we would end up
### **`AdtDef` and `DefId`**
For every type defined in the source code, there is a unique `DefId` (see [this
chapter](../hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct<T>`
definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that
the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it
is only referenced).
chapter](../hir.md#identifiers-in-the-hir)).
This includes ADTs and generics.
In the `MyStruct<T>` definition we gave above,
there are two `DefId`s: one for `MyStruct` and one for `T`.
Notice that the code above does not generate a new `DefId` for `u32`
because it is not defined in that code (it is only referenced).
`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is
essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a
`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown
by the `'tcx` lifetime.
`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods.
There is essentially a one-to-one relationship between `AdtDef` and `DefId`.
You can get the `AdtDef` for a `DefId` with the [`tcx.adt_def(def_id)` query][adtdefq].
`AdtDef`s are all interned, as shown by the `'tcx` lifetime.
[adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def
## Question: Why not substitute “inside” the `AdtDef`?
Recall that we represent a generic struct with `(AdtDef, args)`. So why bother with this scheme?
Recall that we represent a generic struct with `(AdtDef, args)`.
So why bother with this scheme?
Well, the alternate way we could have chosen to represent types would be to always create a new,
fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like
less of a hassle. However, the `(AdtDef, args)` scheme has some advantages over this.
fully-substituted form of the `AdtDef` where all the types are already substituted.
This seems like less of a hassle.
However, the `(AdtDef, args)` scheme has some advantages over this.
First, `(AdtDef, args)` scheme has an efficiency win:
@@ -68,7 +74,8 @@ struct MyStruct<T> {
```
in an example like this, we can instantiate `MyStruct<A>` as `MyStruct<B>` (and so on) very cheaply,
by just replacing the one reference to `A` with `B`. But if we eagerly instantiated all the fields,
by just replacing the one reference to `A` with `B`.
But if we eagerly instantiated all the fields,
that could be a lot more work because we might have to go through all of the fields in the `AdtDef`
and update all of their types.
@@ -83,7 +90,9 @@ definition of that name, and not carried along “within” the type itself).
Given a generic type `MyType<A, B, …>`, we have to store the list of generic arguments for `MyType`.
In rustc this is done using [`GenericArgs`]. `GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item. For example, given a `struct HashMap<K, V>` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap<i32, u32>` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`.
In rustc this is done using [`GenericArgs`].
`GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item.
For example, given a `struct HashMap<K, V>` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap<i32, u32>` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`.
`GenericArg` is conceptually an `enum` with three variants, one for type arguments, one for const arguments and one for lifetime arguments.
In practice that is actually represented by [`GenericArgKind`] and [`GenericArg`] is a more space efficient version that has a method to
@@ -146,7 +155,8 @@ The construct `MyStruct::<u32>::func::<bool, char>` is represented by a tuple: a
The [`ty::Generics`] type (returned by the [`generics_of`] query) contains the information of how a nested hierarchy
gets flattened down to a list, and lets you figure out which index in the `GenericArgs` list corresponds to which
generic. The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right
generic.
The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right
(`T2` before `T3`), but there are several complications:
- Traits have an implicit `Self` generic parameter which is the first (i.e. 0th) generic parameter. Note that `Self` doesn't mean a generic parameter in all situations, see [Res::SelfTyAlias](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfTyAlias) and [Res::SelfCtor](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfCtor).