Commit Graph

144 Commits

Author SHA1 Message Date
Ed Page df53b3dc04 test(lexer): Add frontmatter unit test 2025-07-10 10:25:29 -05:00
Ed Page 45a1e492b1 feat(lexer): Allow including frontmatter with 'tokenize' 2025-07-09 16:42:27 -05:00
klensy c76d032f01 setup CI and tidy to use typos for spellchecking and fix few typos 2025-07-03 10:51:06 +03:00
Marijn Schouten 2a5225a369 rustc_lexer: typo fix + small cleanups 2025-06-06 13:08:16 +00:00
Matthew Jasper 55f59fb0e3 Fix parsing of frontmatters with inner hyphens 2025-06-04 15:51:36 +00:00
Deadbeef 662182637e Implement RFC 3503: frontmatters
Supercedes #137193
2025-05-05 23:10:08 +08:00
Guillaume Gomez aff2bc7a88 Replace rustc_lexer/unescape with rustc-literal-escaper crate 2025-04-04 14:44:45 +02:00
Ralf Jung 20d04d8a40 Revert "Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu"
This reverts commit 08dfbf49e3, reversing
changes made to 10bcdad7df.
2025-03-18 13:28:56 +01:00
Jacob Pratt 08dfbf49e3 Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu
Add `*_value` methods to proc_macro lib

This is the implementation of https://github.com/rust-lang/libs-team/issues/459.

It allows to get the actual value (unescaped) of the different string literals.

Part of https://github.com/rust-lang/rust/issues/136652.

r? libs-api
2025-03-17 05:47:48 -04:00
Nicholas Nethercote ff0a5fe975 Remove #![warn(unreachable_pub)] from all compiler/ crates.
It's no longer necessary now that `-Wunreachable_pub` is being passed.
2025-03-11 13:14:21 +11:00
许杰友 Jieyou Xu (Joe) 063ef18fdc Revert "Use workspace lints for crates in compiler/ #138084"
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).

This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.

This reverts commit 48caf81484, reversing
changes made to c6662879b2.
2025-03-10 18:12:47 +08:00
Nicholas Nethercote 8a3e03392e Remove #![warn(unreachable_pub)] from all compiler/ crates.
(Except for `rustc_codegen_cranelift`.)

It's no longer necessary now that `unreachable_pub` is in the workspace
lints.
2025-03-08 08:41:43 +11:00
Nicholas Nethercote beba32cebb Specify rust lints for compiler/ crates via Cargo.
By naming them in `[workspace.lints.rust]` in the top-level
`Cargo.toml`, and then making all `compiler/` crates inherit them with
`[lints] workspace = true`. (I omitted `rustc_codegen_{cranelift,gcc}`,
because they're a bit different.)

The advantages of this over the current approach:
- It uses a standard Cargo feature, rather than special handling in
  bootstrap. So, easier to understand, and less likely to get
  accidentally broken in the future.
- It works for proc macro crates.

It's a shame it doesn't work for rustc-specific lints, as the comments
explain.
2025-03-08 08:41:09 +11:00
Michael Goulet 12e3911d81 Greatly simplify lifetime captures in edition 2024 2025-02-22 22:24:52 +00:00
Michael Goulet 76d341fa09 Upgrade the compiler to edition 2024 2025-02-22 00:01:48 +00:00
Guillaume Gomez 94f0f2b603 Reexport literal-escaper from rustc_lexer to allow rust-analyzer to compile 2025-02-10 10:38:22 +01:00
Guillaume Gomez 49d2d5a116 Extract unescape from rustc_lexer into its own crate 2025-02-10 10:38:22 +01:00
bjorn3 1fcae03369 Rustfmt 2025-02-08 22:12:13 +00:00
gvozdvmozgu 9f469eb600 implement eat_until leveraging memchr in lexer 2025-02-05 07:03:53 -08:00
Eric Huss a97404eee3 Add test to check unicode identifier version 2024-12-09 06:23:59 -08:00
Michael Goulet b87e935407 Revert "Reject raw lifetime followed by \' as well"
This reverts commit 1990f15608.
2024-12-01 05:22:16 +00:00
Nicholas Nethercote 4cd2840f00 Clean up c_or_byte_string.
- Rename a misleading local `mk_kind` as `single_quoted`.
- Use `fn` for all three arguments, for consistency.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote 11c96cfd94 Improve strip_shebang testing.
It's currently a bit ad hoc. This commit makes it more methodical, with
pairs of match/no-match tests for all the relevant cases.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote e9a0c3c98c Remove TokenKind::InvalidPrefix.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
2024-11-19 18:06:22 +11:00
Nicholas Nethercote 2c7c3697db Improve TokenKind comments.
- Improve wording.
- Use backticks consistently for examples.
2024-11-19 18:04:01 +11:00
Nicholas Nethercote df29f9b0c3 Improve fake_ident_or_unknown_prefix.
- Rename it as `invalid_ident_or_prefix`, which matches the possible
  outputs (`InvalidIdent` or `InvalidPrefix`).
- Use the local wrapper for `is_xid_continue`, for consistency.
- Make it clear what `\u{200d}` means.
2024-11-19 18:01:43 +11:00
Michael Goulet 1990f15608 Reject raw lifetime followed by \' as well 2024-10-30 01:13:18 +00:00
Peter Jaszkowiak 321a5db7d4 Reserve guarded string literals (RFC 3593) 2024-10-08 18:21:16 -06:00
Michael Goulet c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Michael Goulet 97910580aa Add initial support for raw lifetimes 2024-09-06 10:32:48 -04:00
Michael Goulet 3b3e43a386 Format lexer 2024-09-06 10:32:48 -04:00
Michael Goulet 9aaf873396 Reserve prefix lifetimes too 2024-09-06 10:32:48 -04:00
Nicholas Nethercote 6c84c55c9f Add warn(unreachable_pub) to rustc_lexer. 2024-08-27 15:12:46 +10:00
Nicholas Nethercote 84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Nicholas Nethercote 75b164d836 Use tidy to sort crate attributes for all compiler crates.
We already do this for a number of crates, e.g. `rustc_middle`,
`rustc_span`, `rustc_metadata`, `rustc_span`, `rustc_errors`.

For the ones we don't, in many cases the attributes are a mess.
- There is no consistency about order of attribute kinds (e.g.
  `allow`/`deny`/`feature`).
- Within attribute kind groups (e.g. the `feature` attributes),
  sometimes the order is alphabetical, and sometimes there is no
  particular order.
- Sometimes the attributes of a particular kind aren't even grouped
  all together, e.g. there might be a `feature`, then an `allow`, then
  another `feature`.

This commit extends the existing sorting to all compiler crates,
increasing consistency. If any new attribute line is added there is now
only one place it can go -- no need for arbitrary decisions.

Exceptions:
- `rustc_log`, `rustc_next_trait_solver` and `rustc_type_ir_macros`,
  because they have no crate attributes.
- `rustc_codegen_gcc`, because it's quasi-external to rustc (e.g. it's
  ignored in `rustfmt.toml`).
2024-06-12 15:49:10 +10:00
Michael Scholten 3c5e88c7d1 Improved the compiler code with clippy 2024-04-24 09:41:44 +02:00
Esteban Küber 19821ad234 Properly handle emojis as literal prefix in macros
Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix #123696.
2024-04-10 23:19:27 +00:00
Esteban Küber ea1883d7b2 Silence redundant error on char literal that was meant to be a string in 2021 edition 2024-03-17 23:35:19 +00:00
Esteban Küber 982918f493 Handle str literals written with ' lexed as lifetime
Given `'hello world'` and `'1 str', provide a structured suggestion for a valid string literal:

```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-3.rs:2:26
   |
LL |     println!('hello world');
   |                          ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("hello world");
   |              ~           ~
```
```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-1.rs:2:20
   |
LL |     println!('1 + 1');
   |                    ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("1 + 1");
   |              ~     ~
```

Fix #119685.
2024-03-17 23:35:18 +00:00
Nicholas Nethercote 0ac1195ee0 Invert diagnostic lints.
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.

This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
2024-02-06 13:12:33 +11:00
Nicholas Nethercote 6be2e5623c Use unescape_unicode for raw C string literals.
They can't contain `\x` escapes, which means they can't contain high
bytes, which means we can used `unescape_unicode` instead of
`unescape_mixed` to unescape them. This avoids unnecessary used of
`MixedUnit`.
2024-01-25 12:28:11 +11:00
Nicholas Nethercote 86f371ed59 Rename the unescaping functions.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
2024-01-25 12:28:11 +11:00
Nicholas Nethercote 5e5aa6d556 Rename and invert sense of Mode predicates.
I find it easier if they describe what's allowed, rather than what's
forbidden. Also, consistent naming makes them easier to understand.
2024-01-25 12:28:11 +11:00
Nicholas Nethercote a1c07214f0 Rework CStrUnit.
- Rename it as `MixedUnit`, because it will soon be used in more than
  just C string literals.
- Change the `Byte` variant to `HighByte` and use it only for
  `\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
  could be encoded with either `Byte` or `Char`.
- Add useful comments.
- Remove `is_ascii`, in favour of `u8::is_ascii`.
2024-01-25 12:28:11 +11:00
Nicholas Nethercote ef1e2228cf Use from instead of into in unescaping code.
The `T` type in these functions took me some time to understand, and I
find the explicit `T` in the use of `from` makes the code easier to
read, as does the `u8` annotation in `scan_escape`.
2024-01-25 12:26:36 +11:00
Matthias Krüger a54c295665 Rollup merge of #118639 - fmease:deny-features-in-stable-rustc-crates, r=WaffleLapkin
Undeprecate lint `unstable_features` and make use of it in the compiler

See also #117937.

r? compiler
2024-01-22 16:54:56 +01:00
Nicholas Nethercote 9018d2c455 Detect NulInCStr error earlier.
By making it an `EscapeError` instead of a `LitError`. This makes it
like the other errors produced when checking string literals contents,
e.g. for invalid escape sequences or bare CR chars.

NOTE: this means these errors are issued earlier, before expansion,
which changes behaviour. It will be possible to move the check back to
the later point if desired. If that happens, it's likely that all the
string literal contents checks will be delayed together.

One nice thing about this: the old approach had some code in
`report_lit_error` to calculate the span of the nul char from a range.
This code used a hardwired `+2` to account for the `c"` at the start of
a C string literal, but this should have changed to a `+3` for raw C
string literals to account for the `cr"`, which meant that the caret in
`cr"` nul error messages was one short of where it should have been. The
new approach doesn't need any of this and avoids the off-by-one error.
2024-01-12 16:19:37 +11:00
León Orell Valerian Liehr c0a9f722c4 Undeprecate and use lint unstable_features 2023-12-20 18:16:28 +01:00
Nicholas Nethercote b900eb7317 Rename some unescaping functions.
`unescape_raw_str_or_raw_byte_str` only does checking, no unescaping.
And it also now handles C string literals.

`unescape_raw_str` is used for all the non-raw strings.
2023-12-13 14:17:50 +11:00
Nicholas Nethercote 29c5158ef5 Adjust Mode::is_unicode_escape_disallowed.
Some cases are unreachable.
2023-12-13 10:06:13 +11:00