lexer: Fix span override for the first token in a string
Previously due to peculiarities of `StringReader` construction something like `"a b c d".parse::<TokenStream>()` gave you one non-overridden span for `a` and then three correctly overridden spans for `b`, `c` and `d`.
Now all the spans are overridden.
rustc: introduce {ast,hir}::AnonConst to consolidate so-called "embedded constants".
Previously, constants in array lengths and enum variant discriminants were "merely an expression", and had no separate ID for, e.g. type-checking or const-eval, instead reusing the expression's.
That complicated code working with bodies, because such constants were the only special case where the "owner" of the body wasn't the HIR parent, but rather the same node as the body itself.
Also, if the body happened to be a closure, we had no way to allocate a `DefId` for both the constant *and* the closure, leading to *several* bugs (mostly ICEs where type errors were expected).
This PR rectifies the situation by adding another (`{ast,hir}::AnonConst`) node around every such constant. Also, const generics are expected to rely on the new `AnonConst` nodes, as well (cc @varkor).
* fixes#48838
* fixes#50600
* fixes#50688
* fixes#50689
* obsoletes #50623
r? @nikomatsakis
Speed up the macro parser
These three commits reduce the number of allocations done by the macro parser, in some cases dramatically. For example, for a clean check builds of html5ever, the number of allocations is reduced by 40%.
Here are the rustc-benchmarks that are sped up by at least 1%.
```
html5ever-check
avg: -6.6% min: -10.3% max: -4.1%
html5ever
avg: -5.2% min: -9.5% max: -2.8%
html5ever-opt
avg: -4.3% min: -9.3% max: -1.6%
crates.io-check
avg: -1.8% min: -2.9% max: -0.6%
crates.io-opt
avg: -1.0% min: -2.2% max: -0.1%
crates.io
avg: -1.1% min: -2.2% max: -0.2%
```
rustc: Disallow modules and macros in expansions
This commit feature gates generating modules and macro definitions in procedural
macro expansions. Custom derive is exempt from this check as it would be a large
retroactive breaking change (#50587). It's hoped that we can hopefully stem the
bleeding to figure out a better solution here before opening up the floodgates.
The restriction here is specifically targeted at surprising hygiene results [1]
that result in non-"copy/paste" behavior. Hygiene and procedural macros is
intended to be avoided as much as possible for Macros 1.2 by saying everything
is "as if you copy/pasted the code", but modules and macros are sort of weird
exceptions to this rule that aren't fully fleshed out.
[1]: https://github.com/rust-lang/rust/issues/50504#issuecomment-387734625
cc #50504
This commit feature gates generating modules and macro definitions in procedural
macro expansions. Custom derive is exempt from this check as it would be a large
retroactive breaking change (#50587). It's hoped that we can hopefully stem the
bleeding to figure out a better solution here before opening up the floodgates.
The restriction here is specifically targeted at surprising hygiene results [1]
that result in non-"copy/paste" behavior. Hygiene and procedural macros is
intended to be avoided as much as possible for Macros 1.2 by saying everything
is "as if you copy/pasted the code", but modules and macros are sort of weird
exceptions to this rule that aren't fully fleshed out.
[1]: https://github.com/rust-lang/rust/issues/50504#issuecomment-387734625
cc #50504
Implement edition hygiene for keywords
Determine "keywordness" of an identifier in its hygienic context.
cc https://github.com/rust-lang/rust/pull/49611
I've resurrected `proc` as an Edition-2015-only keyword for testing purposes, but it should probably be buried again. EDIT: `proc` is removed again.
Review proc macro API 1.2
cc https://github.com/rust-lang/rust/issues/38356
Summary of applied changes:
- Documentation for proc macro API 1.2 is expanded.
- Renamed APIs: `Term` -> `Ident`, `TokenTree::Term` -> `TokenTree::Ident`, `Op` -> `Punct`, `TokenTree::Op` -> `TokenTree::Punct`, `Op::op` -> `Punct::as_char`.
- Removed APIs: `Ident::as_str`, use `Display` impl for `Ident` instead.
- New APIs (not stabilized in 1.2): `Ident::new_raw` for creating a raw identifier (I'm not sure `new_x` it's a very idiomatic name though).
- Runtime changes:
- `Punct::new` now ensures that the input `char` is a valid punctuation character in Rust.
- `Ident::new` ensures that the input `str` is a valid identifier in Rust.
- Lifetimes in proc macros are now represented as two joint tokens - `Punct('\'', Spacing::Joint)` and `Ident("lifetime_name_without_quote")` similarly to multi-character operators.
- Stabilized APIs: None yet.
A bit of motivation for renaming (although it was already stated in the review comments):
- With my compiler frontend glasses on `Ident` is the single most appropriate name for this thing, *especially* if we are doing input validation on construction. `TokenTree::Ident` effectively wraps `token::Ident` or `ast::Ident + is_raw`, its meaning is "identifier" and it's already named `ident` in declarative macros.
- Regarding `Punct`, the motivation is that `Op` is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it `Punct` (first proposed in [the original RFC](https://github.com/rust-lang/rfcs/pull/1566), then [by @SimonSapin](https://github.com/rust-lang/rust/issues/38356#issuecomment-276676526)) , together with input validation it's now a subset of ASCII punctuation character category (`u8::is_ascii_punctuation`).
It only has a single use, within code handling indented block comments.
We can replace that with the new `FileMap::col_pos()`, which computes
the col position (BytePos instead of CharPos) based on the record of the
last newline char (which we already record).
This is actually an improvement, because
`trim_whitespace_prefix_and_push_line()` was using `col`, which is a
`CharPos`, as a slice index, which is a byte/char confusion.
It's silly for a hot function like `bump()` to have such an expensive
bounds check. This patch replaces terminator with `end_src_index`.
Note that the `self.terminator` check in `is_eof()` wasn't necessary
because of the way `StringReader` is initialized.
- `source_text` becomes `src`, matching `FileMap::src`.
- `byte_offset()` becomes `src_index()`, which makes it clearer that
it's an index into `src`. (Likewise for variables containing
`byte_offset` in their name.) This function also now returns a `usize`
instead of a `BytePos`, because every callsite immediately converted
the `BytePos` to a `usize`.
This patch removes the "old"/"new" names in favour of "foo"/"next_foo",
which matches the field names.
It also moves the setting of `self.{ch,pos,next_pos}` in the common case
to the end, so that the meaning of "foo"/"next_foo" is consistent until
the end.
This commit fixes a hard error where the `#![feature(rust_2018_preview)]`
feature was forbidden to be mentioned when the `--edition 2018` flag was passed.
This instead silently accepts that feature gate despite it not being necessary.
It's intended that this will help ease the transition into the 2018 edition as
users will, for the time being, start off with the `rust_2018_preview` feature
and no longer immediately need to remove it.
Closes#50662
Rename the 2018 edition lint names
* `rust_2018_breakage` -> `rust_2018_compatibility` - the lint for ensuring
that your code, in the 2015 edition, is compatible with the 2018 edition's
semantics. This is required to pass *before* you enable the 2018 edition.
* `rust_2018_migration` -> `rust_2018_idioms` - the lint for writing idiomatic
code after you've already enabled the 2018 edition
* `rust_2018_breakage` -> `rust_2018_compatibility` - the lint for ensuring
that your code, in the 2015 edition, is compatible with the 2018 edition's
semantics. This is required to pass *before* you enable the 2018 edition.
* `rust_2018_migration` -> `rust_2018_idioms` - the lint for writing idiomatic
code after you've already enabled the 2018 edition
Optimize string handling in lit_token().
In the common case, the string value in a string literal Token is the
same as the string value in a string literal LitKind. (The exception is
when escapes or \r are involved.) This patch takes advantage of that to
avoid calling str_lit() and re-interning the string in that case. This
speeds up incremental builds for a few of the rustc-benchmarks, the best
by 3%.
Benchmarks that got a speedup of 1% or more:
```
coercions
avg: -1.1% min: -3.5% max: 0.4%
regex-check
avg: -1.2% min: -1.5% max: -0.6%
futures-check
avg: -0.9% min: -1.4% max: -0.3%
futures
avg: -0.8% min: -1.3% max: -0.3%
futures-opt
avg: -0.7% min: -1.2% max: -0.1%
regex
avg: -0.5% min: -1.2% max: -0.1%
regex-opt
avg: -0.5% min: -1.1% max: -0.1%
hyper-check
avg: -0.7% min: -1.0% max: -0.3%
```
In the common case, the string value in a string literal Token is the
same as the string value in a string literal LitKind. (The exception is
when escapes or \r are involved.) This patch takes advantage of that to
avoid calling str_lit() and re-interning the string in that case. This
speeds up incremental builds for a few of the rustc-benchmarks, the best
by 3%.