Merge pull request #2825 from rust-lang/tshepang/sembr

sembr a few files
2026-04-27 18:57:42 +03:00 · 2026-04-08 09:01:44 +02:00
parent 5f8694755c 050110d507
commit ee33056fc7
7 changed files with 225 additions and 148 deletions
@@ -1,8 +1,8 @@
 # Code Index

-rustc has a lot of important data structures. This is an attempt to give some
-guidance on where to learn more about some of the key data structures of the
-compiler.
+rustc has a lot of important data structures.
+This is an attempt to give some guidance on where to learn more
+about some of the key data structures of the compiler.

 Item            |  Kind    | Short description           | Chapter            | Declaration
 ----------------|----------|-----------------------------|--------------------|-------------------
@@ -1,8 +1,9 @@
 # Interpreter

 The interpreter is a virtual machine for executing MIR without compiling to
-machine code. It is usually invoked via `tcx.const_eval_*` functions. The
-interpreter is shared between the compiler (for compile-time function
+machine code.
+It is usually invoked via `tcx.const_eval_*` functions.
+The interpreter is shared between the compiler (for compile-time function
 evaluation, CTFE) and the tool [Miri](https://github.com/rust-lang/miri/), which
 uses the same virtual machine to detect Undefined Behavior in (unsafe) Rust
 code.
@@ -26,7 +27,8 @@ The compiler needs to figure out the length of the array before being able to
 create items that use the type (locals, constants, function arguments, ...).

 To obtain the (in this case empty) parameter environment, one can call
-`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is
+`let param_env = tcx.param_env(length_def_id);`.
+The `GlobalId` needed is

 ```rust,ignore
 let gid = GlobalId {
@@ -36,7 +38,8 @@ let gid = GlobalId {
 ```

 Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of
-the MIR of the array length expression. The MIR will look something like this:
+the MIR of the array length expression.
+The MIR will look something like this:

 ```mir
 Foo::{{constant}}#0: usize = {
@@ -59,35 +62,43 @@ Before the evaluation, a virtual memory location (in this case essentially a
 `vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.

 At the start of the evaluation, `_0` and `_1` are
-`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite
+`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`.
+This is quite
 a mouthful: [`Operand`] can represent either data stored somewhere in the
 [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization)
-immediate data stored in-line.  And [`Immediate`] can either be a single
+immediate data stored in-line.
+And [`Immediate`] can either be a single
 (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer),
-or a pair of two of them. In our case, the single scalar value is *not* (yet)
-initialized.
+or a pair of two of them.
+In our case, the single scalar value is *not* (yet) initialized.

 When the initialization of `_1` is invoked, the value of the `FOO` constant is
 required, and triggers another call to `tcx.const_eval_*`, which will not be shown
-here. If the evaluation of FOO is successful, `42` will be subtracted from its
+here.
+If the evaluation of FOO is successful, `42` will be subtracted from its
 value `4096` and the result stored in `_1` as
 `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
-Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
-the second part is a bool that's true if an overflow happened. A `Scalar::Raw`
+Scalar::Raw { data: 0, .. })`.
+The first part of the pair is the computed value,
+the second part is a bool that's true if an overflow happened.
+A `Scalar::Raw`
 also stores the size (in bytes) of this scalar value; we are eliding that here.

-The next statement asserts that said boolean is `0`. In case the assertion
+The next statement asserts that said boolean is `0`.
+In case the assertion
 fails, its error message is used for reporting a compile-time error.

 Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
 data: 4054, .. }))` is stored in the virtual memory it was allocated before the
-evaluation. `_0` always refers to that location directly.
+evaluation.
+`_0` always refers to that location directly.

 After the evaluation is done, the return value is converted from [`Operand`] to
 [`ConstValue`] by [`op_to_const`]: the former representation is geared towards
 what is needed *during* const evaluation, while [`ConstValue`] is shaped by the
 needs of the remaining parts of the compiler that consume the results of const
-evaluation.  As part of this conversion, for types with scalar values, even if
+evaluation.
+As part of this conversion, for types with scalar values, even if
 the resulting [`Operand`] is `Indirect`, it will return an immediate
 `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::Indirect`).
 This makes using the result much more efficient and also more convenient, as no
@@ -107,12 +118,13 @@ the interpreter, but just use the cached result.

 The interpreter's outside-facing datastructures can be found in
 [rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_middle/src/mir/interpret).
-This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A
-`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
+This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types.
+A `ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
 pointer), `Slice` (to represent byte slices and strings, as needed for pattern
 matching) or `Indirect`, which is used for anything else and refers to a virtual
-allocation. These allocations can be accessed via the methods on
-`tcx.interpret_interner`.  A `Scalar` is either some `Raw` integer or a pointer;
+allocation.
+These allocations can be accessed via the methods on `tcx.interpret_interner`.
+A `Scalar` is either some `Raw` integer or a pointer;
 see [the next section](#memory) for more on that.

 If you are expecting a numeric result, you can use `eval_usize` (panics on
@@ -122,29 +134,38 @@ in an `Option<u64>` yielding the `Scalar` if possible.
 ## Memory

 To support any kind of pointers, the interpreter needs to have a "virtual memory" that the
-pointers can point to.  This is implemented in the [`Memory`] type.  In the
-simplest model, every global variable, stack variable and every dynamic
-allocation corresponds to an [`Allocation`] in that memory.  (Actually using an
+pointers can point to.
+This is implemented in the [`Memory`] type.
+In the simplest model, every global variable, stack variable and every dynamic
+allocation corresponds to an [`Allocation`] in that memory.
+(Actually using an
 allocation for every MIR stack variable would be very inefficient; that's why we
 have `Operand::Immediate` for stack variables that are both small and never have
-their address taken.  But that is purely an optimization.)
+their address taken.
+But that is purely an optimization.)

 Such an `Allocation` is basically just a sequence of `u8` storing the value of
-each byte in this allocation.  (Plus some extra data, see below.)  Every
-`Allocation` has a globally unique `AllocId` assigned in `Memory`.  With that, a
+each byte in this allocation.
+(Plus some extra data, see below.)  Every
+`Allocation` has a globally unique `AllocId` assigned in `Memory`.
+With that, a
 [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and
 an offset into the allocation (indicating which byte of the allocation the
-pointer points to).  It may seem odd that a `Pointer` is not just an integer
+pointer points to).
+It may seem odd that a `Pointer` is not just an integer
 address, but remember that during const evaluation, we cannot know at which
 actual integer address the allocation will end up -- so we use `AllocId` as
-symbolic base addresses, which means we need a separate offset.  (As an aside,
+symbolic base addresses, which means we need a separate offset.
+(As an aside,
 it turns out that pointers at run-time are
 [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)

 These allocations exist so that references and raw pointers have something to
-point to. There is no global linear heap in which things are allocated, but each
+point to.
+There is no global linear heap in which things are allocated, but each
 allocation (be it for a local variable, a static or a (future) heap allocation)
-gets its own little memory with exactly the required size. So if you have a
+gets its own little memory with exactly the required size.
+So if you have a
 pointer to an allocation for a local variable `a`, there is no possible (no
 matter how unsafe) operation that you can do that would ever change said pointer
 to a pointer to a different local variable `b`.
@@ -152,31 +173,35 @@ Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays

 This, however, causes a problem when we want to store a `Pointer` into an
 `Allocation`: we cannot turn it into a sequence of `u8` of the right length!
-`AllocId` and offset together are twice as big as a pointer "seems" to be.  This
-is what the `relocation` field of `Allocation` is for: the byte offset of the
+`AllocId` and offset together are twice as big as a pointer "seems" to be.
+This is what the `relocation` field of `Allocation` is for: the byte offset of the
 `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored
-out-of-band.  The two are reassembled when the `Pointer` is read from memory.
+out-of-band.
+The two are reassembled when the `Pointer` is read from memory.
 The other bit of extra data an `Allocation` needs is `undef_mask` for keeping
 track of which of its bytes are initialized.

 ### Global memory and exotic allocations

 `Memory` exists only during evaluation; it gets destroyed when the
-final value of the constant is computed.  In case that constant contains any
+final value of the constant is computed.
+In case that constant contains any
 pointers, those get "interned" and moved to a global "const eval memory" that is
-part of `TyCtxt`.  These allocations stay around for the remaining computation
+part of `TyCtxt`.
+These allocations stay around for the remaining computation
 and get serialized into the final output (so that dependent crates can use
 them).

 Moreover, to also support function pointers, the global memory in `TyCtxt` can
 also contain "virtual allocations": instead of an `Allocation`, these contain an
-`Instance`.  That allows a `Pointer` to point to either normal data or a
+`Instance`.
+That allows a `Pointer` to point to either normal data or a
 function, which is needed to be able to evaluate casts from function pointers to
 raw pointers.

 Finally, the [`GlobalAlloc`] type used in the global memory also contains a
-variant `Static` that points to a particular `const` or `static` item.  This is
-needed to support circular statics, where we need to have a `Pointer` to a
+variant `Static` that points to a particular `const` or `static` item.
+This is needed to support circular statics, where we need to have a `Pointer` to a
 `static` for which we cannot yet have an `Allocation` as we do not know the
 bytes of its value.

@@ -188,17 +213,19 @@ bytes of its value.
 ### Pointer values vs Pointer types

 One common cause of confusion in the interpreter is that being a pointer *value* and having
-a pointer *type* are entirely independent properties.  By "pointer value", we
+a pointer *type* are entirely independent properties.
+By "pointer value", we
 refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into
-the interpreter's virtual memory.  This is in contrast to `Scalar::Raw`, which is just some
-concrete integer.
+the interpreter's virtual memory.
+This is in contrast to `Scalar::Raw`, which is just some concrete integer.

 However, a variable of pointer or reference *type*, such as `*const T` or `&T`,
 does not have to have a pointer *value*: it could be obtained by casting or
-transmuting an integer to a pointer. 
+transmuting an integer to a pointer.
 And similarly, when casting or transmuting a reference to some
 actual allocation to an integer, we end up with a pointer *value*
-(`Scalar::Ptr`) at integer *type* (`usize`).  This is a problem because we
+(`Scalar::Ptr`) at integer *type* (`usize`).
+ This is a problem because we
 cannot meaningfully perform integer operations such as division on pointer
 values.

@@ -207,30 +234,33 @@ values.
 Although the main entry point to constant evaluation is the `tcx.const_eval_*`
 functions, there are additional functions in
 [rustc_const_eval/src/const_eval](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html)
-that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise). You should
+that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise).
+You should
 never have to access an `Allocation` directly except for translating it to the
 compilation target (at the moment just LLVM).

 The interpreter starts by creating a virtual stack frame for the current constant that is
-being evaluated. There's essentially no difference between a constant and a
+being evaluated.
+There's essentially no difference between a constant and a
 function with no arguments, except that constants do not allow local (named)
 variables at the time of writing this guide.

 A stack frame is defined by the `Frame` type in
 [rustc_const_eval/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/eval_context.rs)
-and contains all the local
-variables memory (`None` at the start of evaluation). Each frame refers to the
-evaluation of either the root constant or subsequent calls to `const fn`. The
-evaluation of another constant simply calls `tcx.const_eval_*`, which produce an
+and contains all the local variables memory (`None` at the start of evaluation).
+Each frame refers to the
+evaluation of either the root constant or subsequent calls to `const fn`.
+The evaluation of another constant simply calls `tcx.const_eval_*`, which produce an
 entirely new and independent stack frame.

 The frames are just a `Vec<Frame>`, there's no way to actually refer to a
-`Frame`'s memory even if horrible shenanigans are done via unsafe code. The only
-memory that can be referred to are `Allocation`s.
+`Frame`'s memory even if horrible shenanigans are done via unsafe code.
+The only memory that can be referred to are `Allocation`s.

 The interpreter now calls the `step` method (in
 [rustc_const_eval/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/step.rs)
-) until it either returns an error or has no further statements to execute. Each
-statement will now initialize or modify the locals or the virtual memory
-referred to by a local. This might require evaluating other constants or
+) until it either returns an error or has no further statements to execute.
+Each statement will now initialize or modify the locals or the virtual memory
+referred to by a local.
+This might require evaluating other constants or
 statics, which just recursively invokes `tcx.const_eval_*`.
@@ -1,13 +1,15 @@
 # The HIR

 The HIR – "High-Level Intermediate Representation" – is the primary IR used
-in most of rustc. It is a compiler-friendly representation of the abstract
+in most of rustc.
+It is a compiler-friendly representation of the abstract
 syntax tree (AST) that is generated after parsing, macro expansion, and name
 resolution (see [Lowering](./hir/lowering.md) for how the HIR is created).
 Many parts of HIR resemble Rust surface syntax quite closely, with
 the exception that some of Rust's expression forms have been desugared away.
 For example, `for` loops are converted into a `loop` and do not appear in
-the HIR. This makes HIR more amenable to analysis than a normal AST.
+the HIR.
+This makes HIR more amenable to analysis than a normal AST.

 This chapter covers the main concepts of the HIR.

@@ -30,7 +32,8 @@ cargo rustc -- -Z unpretty=hir

 The top-level data-structure in the HIR is the [`Crate`], which stores
 the contents of the crate currently being compiled (we only ever
-construct HIR for the current crate). Whereas in the AST the crate
+construct HIR for the current crate).
+Whereas in the AST the crate
 data structure basically just contains the root module, the HIR
 `Crate` structure contains a number of maps and other things that
 serve to organize the content of the crate for easier access.
@@ -39,8 +42,8 @@ serve to organize the content of the crate for easier access.

 For example, the contents of individual items (e.g. modules,
 functions, traits, impls, etc) in the HIR are not immediately
-accessible in the parents. So, for example, if there is a module item
-`foo` containing a function `bar()`:
+accessible in the parents.
+So, for example, if there is a module item `foo` containing a function `bar()`:

 ```rust
 mod foo {
@@ -49,8 +52,8 @@ mod foo {
 ```

 then in the HIR the representation of module `foo` (the [`Mod`]
-struct) would only have the **`ItemId`** `I` of `bar()`. To get the
-details of the function `bar()`, we would lookup `I` in the
+struct) would only have the **`ItemId`** `I` of `bar()`.
+To get the details of the function `bar()`, we would lookup `I` in the
 `items` map.

 [`Mod`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Mod.html
@@ -62,9 +65,11 @@ There are similar maps for things like trait items and impl items,
 as well as "bodies" (explained below).

 The other reason to set up the representation this way is for better
-integration with incremental compilation. This way, if you gain access
+integration with incremental compilation.
+This way, if you gain access
 to an [`&rustc_hir::Item`] (e.g. for the mod `foo`), you do not immediately
-gain access to the contents of the function `bar()`. Instead, you only
+gain access to the contents of the function `bar()`.
+Instead, you only
 gain access to the **id** for `bar()`, and you must invoke some
 function to lookup the contents of `bar()` given its id; this gives
 the compiler a chance to observe that you accessed the data for
@@ -79,23 +84,27 @@ the compiler a chance to observe that you accessed the data for
 The HIR uses a bunch of different identifiers that coexist and serve different purposes.

 - A [`DefId`], as the name suggests, identifies a particular definition, or top-level
-  item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies
+  item, in a given crate.
+  It is composed of two parts: a [`CrateNum`] which identifies
  the crate the definition comes from, and a [`DefIndex`] which identifies the definition
-  within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which
+  within the crate.
+  Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which
  makes them more stable across compilations.

 - A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate.
  This allows us to drop the [`CrateNum`] part, and use the type system to ensure that
  only local definitions are passed to functions that expect a local definition.

- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed
-  of two parts: an `owner` and a `local_id` that is unique within the `owner`. This
-  combination makes for more stable values which are helpful for incremental compilation.
+- A [`HirId`] uniquely identifies a node in the HIR of the current crate.
+  It is composed of two parts:
+  an `owner` and a `local_id` that is unique within the `owner`.
+  This combination makes for more stable values which are helpful for incremental compilation.
  Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions,
  but stays local to the current crate.

- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only
-  a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the
+- A [`BodyId`] identifies a HIR [`Body`] in the current crate.
+  It is currently only a wrapper around a [`HirId`].
+  For more info about HIR bodies, please refer to the
  [HIR chapter][hir-bodies].

 These identifiers can be converted into one another through the `TyCtxt`.
@@ -112,8 +121,8 @@ These identifiers can be converted into one another through the `TyCtxt`.

 ## HIR Operations

-Most of the time when you are working with the HIR, you will do so via
-`TyCtxt`. It contains a number of methods, defined in the `hir::map` module and
+Most of the time when you are working with the HIR, you will do so via `TyCtxt`.
+It contains a number of methods, defined in the `hir::map` module and
 mostly prefixed with `hir_`, to convert between IDs of various kinds and to
 lookup data associated with a HIR node.

@@ -126,8 +135,10 @@ You need a `LocalDefId`, rather than a `DefId`, since only local items have HIR
 [local_def_id_to_hir_id]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.local_def_id_to_hir_id

 Similarly, you can use [`tcx.hir_node(n)`][hir_node] to lookup the node for a
-[`HirId`]. This returns a `Option<Node<'hir>>`, where [`Node`] is an enum
-defined in the map. By matching on this, you can find out what sort of
+[`HirId`].
+This returns a `Option<Node<'hir>>`, where [`Node`] is an enum
+defined in the map.
+By matching on this, you can find out what sort of
 node the `HirId` referred to and also get a pointer to the data
 itself. Often, you know what sort of node `n` is – e.g. if you know
 that `n` must be some HIR expression, you can do
@@ -148,8 +159,8 @@ calls like [`tcx.parent_hir_node(n)`][parent_hir_node].
 ## HIR Bodies

 A [`rustc_hir::Body`] represents some kind of executable code, such as the body
-of a function/closure or the definition of a constant. Bodies are
-associated with an **owner**, which is typically some kind of item
+of a function/closure or the definition of a constant.
+Bodies are associated with an **owner**, which is typically some kind of item
 (e.g. an `fn()` or `const`), but could also be a closure expression
 (e.g. `|x, y| x + y`). You can use the `TyCtxt` to find the body
 associated with a given def-id ([`hir_maybe_body_owned_by`]) or to find
@@ -108,19 +108,20 @@ Here is the list of passes as of <!-- date-check --> March 2023:
 - `calculate-doc-coverage` calculates information used for the `--show-coverage`
  flag.

- `check-doc-test-visibility` runs `doctest` visibility–related `lint`s. This pass
-  runs before `strip-private`, which is why it needs to be separate from `run-lints`.
+- `check-doc-test-visibility` runs `doctest` visibility–related `lint`s.
+  This pass runs before `strip-private`,
+  which is why it needs to be separate from `run-lints`.

 - `collect-intra-doc-links` resolves [intra-doc links](https://doc.rust-lang.org/nightly/rustdoc/write-documentation/linking-to-items-by-name.html).

- `collect-trait-impls` collects `trait` `impl`s for each item in the crate. For
-  example, if we define a `struct` that implements a `trait`, this pass will note
-  that the `struct` implements that `trait`.
+- `collect-trait-impls` collects `trait` `impl`s for each item in the crate.
+  For example, if we define a `struct` that implements a `trait`,
+  this pass will note that the `struct` implements that `trait`.

 - `propagate-doc-cfg` propagates `#[doc(cfg(...))]` to child items.

- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`. This is
-  the last pass to run.
+- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`.
+  This is the last pass to run.

  - `bare_urls` detects links that are not linkified, e.g., in Markdown such as
    `Go to https://example.com/.` It suggests wrapping the link with angle brackets:
@@ -233,7 +234,8 @@ is complicated from two other constraints that `rustdoc` runs under:
  configurations, such as `libstd` having a single package of docs that
  cover all supported operating systems.
  This means `rustdoc` has to be able to generate docs from `HIR`.
-* Docs can inline across crates. Since crate metadata doesn't contain `HIR`,
+* Docs can inline across crates.
+  Since crate metadata doesn't contain `HIR`,
  it must be possible to generate inlined docs from the `rustc_middle` data.

 The "clean" [`AST`][ast] acts as a common output format for both input formats.
@@ -2,12 +2,13 @@

 **In general, we expect every PR that fixes a bug in rustc to come accompanied
 by a regression test of some kind.** This test should fail in `main` but pass
-after the PR. These tests are really useful for preventing us from repeating the
+after the PR.
+These tests are really useful for preventing us from repeating the
 mistakes of the past.

-The first thing to decide is which kind of test to add. This will depend on the
-nature of the change and what you want to exercise. Here are some rough
-guidelines:
+The first thing to decide is which kind of test to add.
+This will depend on the nature of the change and what you want to exercise.
+Here are some rough guidelines:

 - The majority of compiler tests are done with [compiletest].
  - The majority of compiletest tests are [UI](ui.md) tests in the [`tests/ui`]
@@ -24,14 +25,17 @@ guidelines:
      `library/${crate}tests/lib.rs`.
 - If the code is part of an isolated system, and you are not testing compiler
  output, consider using a [unit or integration test](intro.md#package-tests).
- Need to run rustdoc? Prefer a `rustdoc` or `rustdoc-ui` test. Occasionally
-  you'll need `rustdoc-js` as well.
+- Need to run rustdoc?
+  Prefer a `rustdoc` or `rustdoc-ui` test.
+  Occasionally you'll need `rustdoc-js` as well.
 - Other compiletest test suites are generally used for special purposes:
-  - Need to run gdb or lldb? Use the `debuginfo` test suite.
-  - Need to inspect LLVM IR or MIR IR? Use the `codegen` or `mir-opt` test
-    suites.
-  - Need to inspect the resulting binary in some way? Or if all the other test
-    suites are too limited for your purposes? Then use `run-make`.
+  - Need to run gdb or lldb?
+    Use the `debuginfo` test suite.
+  - Need to inspect LLVM IR or MIR IR?
+    Use the `codegen` or `mir-opt` test suites.
+  - Need to inspect the resulting binary in some way?
+    Or if all the other test suites are too limited for your purposes?
+    Then use `run-make`.
    - Use `run-make-cargo` if you need to exercise in-tree `cargo` in conjunction
      with in-tree `rustc`.
  - Check out the [compiletest] chapter for more specialized test suites.
@@ -47,14 +51,16 @@ modified several years later, how can we make it easier for them?).
 ## UI test walkthrough

 The following is a basic guide for creating a [UI test](ui.md), which is one of
-the most common compiler tests. For this tutorial, we'll be adding a test for an
-async error message.
+the most common compiler tests.
+For this tutorial, we'll be adding a test for an async error message.

 ### Step 1: Add a test file

 The first step is to create a Rust source file somewhere in the [`tests/ui`]
-tree. When creating a test, do your best to find a good location and name (see
-[Test organization](ui.md#test-organization) for more). Since naming is the
+tree.
+When creating a test, do your best to find a good location and name (see
+[Test organization](ui.md#test-organization) for more).
+Since naming is the
 hardest part of development, everything should be downhill from here!

 Let's place our async test at `tests/ui/async-await/await-without-async.rs`:
@@ -77,19 +83,23 @@ A few things to notice about our test:
 - The top should start with a short comment that [explains what the test is
  for](#explanatory_comment).
 - The `//@ edition:2018` comment is called a [directive](directives.md) which
-  provides instructions to compiletest on how to build the test. Here we need to
+  provides instructions to compiletest on how to build the test.
+  Here we need to
  set the edition for `async` to work (the default is edition 2015).
- Following that is the source of the test. Try to keep it succinct and to the
-  point. This may require some effort if you are trying to minimize an example
+- Following that is the source of the test.
+  Try to keep it succinct and to the point.
+  This may require some effort if you are trying to minimize an example
  from a bug report.
- We end this test with an empty `fn main` function. This is because the default
+- We end this test with an empty `fn main` function.
+  This is because the default
  for UI tests is a `bin` crate-type, and we don't want the "main not found"
-  error in our test. Alternatively, you could add `#![crate_type="lib"]`.
+  error in our test.
+  Alternatively, you could add `#![crate_type="lib"]`.

 ### Step 2: Generate the expected output

-The next step is to create the expected output snapshots from the compiler. This
-can be done with the `--bless` option:
+The next step is to create the expected output snapshots from the compiler.
+This can be done with the `--bless` option:

 ```sh
 ./x test tests/ui/async-await/await-without-async.rs --bless
@@ -99,8 +109,8 @@ This will build the compiler (if it hasn't already been built), compile the
 test, and place the output of the compiler in a file called
 `tests/ui/async-await/await-without-async.stderr`.

-However, this step will fail! You should see an error message, something like
-this:
+However, this step will fail!
+You should see an error message, something like this:

 > error: /rust/tests/ui/async-await/await-without-async.rs:7: unexpected
 > error: '7:10: 7:16: `await` is only allowed inside `async` functions and
@@ -112,7 +122,8 @@ annotations in the source file.
 ### Step 3: Add error annotations

 Every error needs to be annotated with a comment in the source with the text of
-the error. In this case, we can add the following comment to our test file:
+the error.
+In this case, we can add the following comment to our test file:

 ```rust,ignore
 fn bar() {
@@ -136,9 +147,10 @@ It should now pass, yay!
 ### Step 4: Review the output

 Somewhat hand-in-hand with the previous step, you should inspect the `.stderr`
-file that was created to see if it looks like how you expect. If you are adding
-a new diagnostic message, now would be a good time to also consider how readable
-the message looks overall, particularly for people new to Rust.
+file that was created to see if it looks like how you expect.
+If you are adding a new diagnostic message,
+now would be a good time to also consider how readable the message looks overall,
+particularly for people new to Rust.

 Our example `tests/ui/async-await/await-without-async.stderr` file should look
 like this:
@@ -161,9 +173,9 @@ You may notice some things look a little different than the regular compiler
 output.

 - The `$DIR` removes the path information which will differ between systems.
- The `LL` values replace the line numbers. That helps avoid small changes in
-  the source from triggering large diffs. See the
-  [Normalization](ui.md#normalization) section for more.
+- The `LL` values replace the line numbers.
+  That helps avoid small changes in the source from triggering large diffs.
+  See the [Normalization](ui.md#normalization) section for more.

 Around this stage, you may need to iterate over the last few steps a few times
 to tweak your test, re-bless the test, and re-review the output.
@@ -171,8 +183,10 @@ to tweak your test, re-bless the test, and re-review the output.
 ### Step 5: Check other tests

 Sometimes when adding or changing a diagnostic message, this will affect other
-tests in the test suite. The final step before posting a PR is to check if you
-have affected anything else. Running the UI suite is usually a good start:
+tests in the test suite.
+The final step before posting a PR is to check if you
+have affected anything else.
+Running the UI suite is usually a good start:

 ```sh
 ./x test tests/ui
@@ -188,16 +202,18 @@ You may also need to re-bless the output with the `--bless` flag.
 ## Comment explaining what the test is about

 The first comment of a test file should **summarize the point of the test**, and
-highlight what is important about it. If there is an issue number associated
-with the test, include the issue number.
+highlight what is important about it.
+If there is an issue number associated with the test, include the issue number.

-This comment doesn't have to be super extensive. Just something like "Regression
-test for #18060: match arms were matching in the wrong order." might already be
-enough.
+This comment doesn't have to be super extensive.
+Just something like the following might be enough:
+"Regression test for #18060: match arms were matching in the wrong order".

 These comments are very useful to others later on when your test breaks, since
-they often can highlight what the problem is. They are also useful if for some
+they often can highlight what the problem is.
+They are also useful if for some
 reason the tests need to be refactored, since they let others know which parts
-of the test were important. Often a test must be rewritten because it no longer
+of the test were important.
+Often a test must be rewritten because it no longer
 tests what it was meant to test, and then it's useful to know what it *was*
 meant to test exactly.
@@ -1,20 +1,29 @@
 # Having separate `Trait` and `Projection` bounds

-Given `T: Foo<AssocA = u32, AssocB = i32>` where-bound, we currently lower it to a `Trait(Foo<T>)` and separate `Projection(<T as Foo>::AssocA, u32)` and `Projection(<T as Foo>::AssocB, i32)` bounds. Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead?
+Given `T: Foo<AssocA = u32, AssocB = i32>` where-bound, we currently lower it to a `Trait(Foo<T>)` and separate `Projection(<T as Foo>::AssocA, u32)` and `Projection(<T as Foo>::AssocB, i32)` bounds.
+Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead?

 The way we prove `Projection` bounds directly relies on proving the corresponding `Trait` bound: [old solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_trait_selection/src/traits/project.rs#L898) [new solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_next_trait_solver/src/solve/normalizes_to/mod.rs#L37-L41).

 It feels like it might make more sense to just have a single implementation which checks whether a trait is implemented and returns (a way to compute) its associated types.

-This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound. See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls).
+This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound.
+See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls).

-There are also some other subtle reasons for why we can't do so. The most stupid is that for rigid aliases, trying to normalize them does not consider any lifetime constraints from proving the trait bound. This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm.
+There are also some other subtle reasons for why we can't do so.
+The most stupid is that for rigid aliases;
+trying to normalize them does not consider any lifetime constraints from proving the trait bound.
+This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm.

-A separate issue is that right now, fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT. See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185.
+A separate issue is that, right now,
+fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT.
+See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185.

 There are also slight differences between candidates for some of the builtin impls, these do all seem generally undesirable and I consider them to be bugs which would be fixed if we had a unified approach here.

-Finally, not having this split makes lowering where-clauses more annoying. With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds. We now need to make sure we merge all associated type constraints, e.g.
+Finally, not having this split makes lowering where-clauses more annoying.
+With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds.
+We now need to make sure we merge all associated type constraints, e.g.:

 ```rust
 trait Super {
@@ -36,4 +45,3 @@ trait Trait<'a>: Super<'a, A = i32> {}
 // how to elaborate
 // T: Trait<'a> + for<'b> Super<'b, B = u32>
 ```
-
@@ -24,12 +24,14 @@ Adt(&'tcx AdtDef, GenericArgs<'tcx>)
 There are two parts:

 - The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type
-  parameters. In our example, this is the `MyStruct` part *without* the argument `u32`.
+  parameters.
+  In our example, this is the `MyStruct` part *without* the argument `u32`.
  (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`,
  they are all represented using `TyKind::Adt`.)
 - The [`GenericArgs`] is a list of values that are to be substituted
-for the generic parameters.  In our example of `MyStruct<u32>`, we would end up with a list like
-`[u32]`. We’ll dig more into generics and substitutions in a little bit.
+for the generic parameters.
+ In our example of `MyStruct<u32>`, we would end up with a list like `[u32]`.
+We’ll dig more into generics and substitutions in a little bit.

 [adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html
 [`GenericArgs`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html
@@ -37,25 +39,29 @@ for the generic parameters.  In our example of `MyStruct<u32>`, we would end up
 ### **`AdtDef` and `DefId`**

 For every type defined in the source code, there is a unique `DefId` (see [this
-chapter](../hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct<T>`
-definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`.  Notice that
-the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it
-is only referenced).
+chapter](../hir.md#identifiers-in-the-hir)).
+This includes ADTs and generics.
+In the `MyStruct<T>` definition we gave above,
+there are two `DefId`s: one for `MyStruct` and one for `T`.
+Notice that the code above does not generate a new `DefId` for `u32`
+because it is not defined in that code (it is only referenced).

-`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is
-essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a
-`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown
-by the `'tcx` lifetime.
+`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods.
+There is essentially a one-to-one relationship between `AdtDef` and `DefId`.
+You can get the `AdtDef` for a `DefId` with the [`tcx.adt_def(def_id)` query][adtdefq].
+`AdtDef`s are all interned, as shown by the `'tcx` lifetime.

 [adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def

 ## Question: Why not substitute “inside” the `AdtDef`?

-Recall that we represent a generic struct with `(AdtDef, args)`. So why bother with this scheme?
+Recall that we represent a generic struct with `(AdtDef, args)`.
+So why bother with this scheme?

 Well, the alternate way we could have chosen to represent types would be to always create a new,
-fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like
-less of a hassle. However, the `(AdtDef, args)` scheme has some advantages over this.
+fully-substituted form of the `AdtDef` where all the types are already substituted.
+This seems like less of a hassle.
+However, the `(AdtDef, args)` scheme has some advantages over this.

 First, `(AdtDef, args)` scheme has an efficiency win:

@@ -68,7 +74,8 @@ struct MyStruct<T> {
 ```

 in an example like this, we can instantiate `MyStruct<A>` as `MyStruct<B>` (and so on) very cheaply,
-by just replacing the one reference to `A` with `B`. But if we eagerly instantiated all the fields,
+by just replacing the one reference to `A` with `B`.
+But if we eagerly instantiated all the fields,
 that could be a lot more work because we might have to go through all of the fields in the `AdtDef`
 and update all of their types.

@@ -83,7 +90,9 @@ definition of that name, and not carried along “within” the type itself).

 Given a generic type `MyType<A, B, …>`, we have to store the list of generic arguments for `MyType`.

-In rustc this is done using [`GenericArgs`]. `GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item. For example, given a `struct HashMap<K, V>` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap<i32, u32>` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`.
+In rustc this is done using [`GenericArgs`].
+`GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item.
+For example, given a `struct HashMap<K, V>` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap<i32, u32>` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`.

 `GenericArg` is conceptually an `enum` with three variants, one for type arguments, one for const arguments and one for lifetime arguments.
 In practice that is actually represented by [`GenericArgKind`] and [`GenericArg`] is a more space efficient version that has a method to
@@ -146,7 +155,8 @@ The construct `MyStruct::<u32>::func::<bool, char>` is represented by a tuple: a

 The [`ty::Generics`] type (returned by the [`generics_of`] query) contains the information of how a nested hierarchy
 gets flattened down to a list, and lets you figure out which index in the `GenericArgs` list corresponds to which
-generic. The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right
+generic.
+The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right
 (`T2` before `T3`), but there are several complications:

 - Traits have an implicit `Self` generic parameter which is the first (i.e. 0th) generic parameter. Note that `Self` doesn't mean a generic parameter in all situations, see [Res::SelfTyAlias](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfTyAlias) and [Res::SelfCtor](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfCtor).