Mirrors/rust - rust - Gitea @ Femelysm.ru

mirror of https://github.com/rust-lang/rust.git synced 2026-04-28 19:27:30 +03:00

Author	SHA1	Message	Date
Jules Bertholet	8d072616a5	Add APIs for dealing with titlecase - `char::is_cased` - `char::is_titlecase` - `char::case` - `char::to_titlecase`	2026-03-21 14:30:38 -04:00
Karl Meakin	902199b5b1	Compress case-mapping tables	2026-03-08 22:39:35 +00:00
Karl Meakin	95365cc5bf	Make `unicode_data` tests normal Instead of generating a standalone executable to test `unicode_data`, generate normal tests in `coretests`. This ensures tests are always generated, and will be run as part of the normal testsuite. Also change the generated tests to loop over lookup tables, rather than generating a separate `assert_eq!()` statement for every codepoint. The old approach produced a massive (20,000 lines plus) file which took minutes to compile!	2026-03-08 22:38:31 +00:00
Marco A L Barbosa	262426d535	Avoid index check in char::to_lowercase and char::to_uppercase	2025-12-30 16:20:19 -03:00
Jieyou Xu	4aeb297064	Revert "unicode_data refactors RUST-147622" This PR reverts RUST-147622 for several reasons: 1. The RUST-147622 PR would format the generated core library code using an arbitrary `rustfmt` picked up from `PATH`, which will cause hard-to-debug failures when the `rustfmt` used to format the generated unicode data code versus the `rustfmt` used to format the in-tree library code. 2. Previously, the `unicode-table-generator` tests were not run under CI as part of `coretests`, and since for `x86_64-gnu-aux` job we run library `coretests` with `miri`, the generated tests unfortunately caused an unacceptably large Merge CI time regression from ~2 hours to ~3.5 hours, making it the slowest Merge CI job (and thus the new bottleneck). 3. This PR also has an unintended effect of causing a diagnostic regression (RUST-148387), though that's mostly an edge case not properly handled by `rustc` diagnostics. Given that these are three distinct causes with non-trivial fixes, I'm proposing to revert this PR to return us to baseline. This is not prejudice against relanding the changes with these issues addressed, but to alleviate time pressure to address these non-trivial issues.	2025-11-03 19:53:11 +08:00
Karl Meakin	0e6131c9aa	refactor: make `unicode_data` tests normal tests Instead of generating a standalone executable to test `unicode_data`, generate normal tests in `coretests`. This ensures tests are always generated, and will be run as part of the normal testsuite. Also change the generated tests to loop over lookup tables, rather than generating a separate `assert_eq!()` statement for every codepoint. The old approach produced a massive (20,000 lines plus) file which took minutes to compile!	2025-10-31 14:12:17 +00:00
Karl Meakin	9a80731094	refactor: make string formatting more readable To make the final output code easier to see: * Get rid of the unnecessary line-noise of `.unwrap()`ing calls to `write!()` by moving the `.unwrap()` into a macro. * Join consecutive `write!()` calls using a single multiline format string. * Replace `.push()` and `.push_str(format!())` with `write!()`. * If after doing all of the above, there is only a single `write!()` call in the function, just construct the string directly with `format!()`.	2025-10-31 14:12:14 +00:00
Karl Meakin	c8ab4279a5	refactor: format `unicode_data` Remove `#[rustfmt::skip]` from all the generated modules in `unicode_data.rs`. This means we won't have to worry so much about getting indetation and formatting right when generating code. Exempted for now some tables which would be too big when formatted by `rustfmt`.	2025-10-31 14:11:39 +00:00
Karl Meakin	bf7b05c97b	refactor: move runtime functions to core Instead of `include_str!()`ing `range_search.rs`, just make it a normal module under `core::unicode`. This means the same source code doesn't have to be checked in twice, and it plays nicer with IDEs. Also rename it to `rt` since it includes functions for searching the bitsets and case conversion tables as well as the range represesentation.	2025-10-31 14:11:35 +00:00
Marco Cavenati	83a99aadcf	Bump unicode printable to version 17.0.0	2025-09-09 15:01:47 +02:00
Marco Cavenati	400e687598	Bump unicode_data to version 17.0.0	2025-09-09 09:03:52 +02:00
Karl Meakin	a8c669461f	optimization: Don't include ASCII characters in Unicode tables The ASCII subset of Unicode is fixed and will never change, so we don't need to generate tables for it with every new Unicode version. This saves a few bytes of static data and speeds up `char::is_control` and `char::is_grapheme_extended` on ASCII inputs. Since the table lookup functions exported from the `unicode` module will give nonsensical errors on ASCII input (and in fact will panic in debug mode), I had to add some private wrapper methods to `char` which check for ASCII-ness first.	2025-09-07 15:21:24 +02:00
Marijn Schouten	98e10290c9	change file-is-generated doc comment to inner	2025-09-05 10:08:45 +00:00
Stuart Cook	8b790d7b00	Rollup merge of #145414 - Kmeakin:km/unicode-table-refactors, r=joshtriplett,tgross35 unicode-table-generator refactors Split off from https://github.com/rust-lang/rust/pull/145219	2025-09-03 23:08:06 +10:00
bors	0f50696801	Auto merge of #145479 - Kmeakin:km/hardcode-char-is-control, r=joboet Hard-code `char::is_control` Split off from https://github.com/rust-lang/rust/pull/145219 According to https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table. This doesn't change the generated assembly, since the lookup table is small enough that[ LLVM is able to inline the whole search](https://godbolt.org/z/bG8dM37YG). But this does reduce the chance of regressions if LLVM's heuristics change in the future, and means less generated Rust code checked in to `unicode-data.rs`.	2025-08-30 14:18:21 +00:00
Karl Meakin	1bb9b151c9	refactor: Hard-code `char::is_control` According to https://www.unicode.org/policies/stability_policy.html#Property_Value, the set of codepoints in `Cc` will never change. So we can hard-code the patterns to match against instead of using a table.	2025-08-16 01:46:30 +01:00
Karl Meakin	69e1974bb0	refactor: Include size of case conversion tables Include the sizes of the `to_lowercase` and `to_uppercase` tables in the total size calculations.	2025-08-15 01:29:12 +00:00
Karl Meakin	c992245361	refactor: Include table sizes in comment at top of `unicode_data.rs` To make changes in table size obvious from git diffs	2025-08-15 01:29:12 +00:00
ltdk	c855a2f4d6	Hide docs for core::unicode	2025-08-13 12:27:44 -04:00
yukang	93db9e7ee0	Remove uncessary parens in closure body with unused lint	2025-07-10 09:25:56 +08:00
Markus Reiter	3628a8f326	Remove unneeded parentheses.	2025-03-08 12:56:00 +01:00
Markus Reiter	90ebc24607	Use `intrinsics::assume` instead of `hint::assert_unchecked`.	2025-03-07 20:19:12 +01:00
Markus Reiter	22725588d3	Never inline `lookup_slow`.	2025-03-07 20:17:52 +01:00
Markus Reiter	34ac75be28	Add second precondition for `skip_search`.	2025-03-06 21:38:39 +01:00
Markus Reiter	222adac953	Allow optimizing out `panic_bounds_check` in Unicode checks.	2025-03-06 21:38:39 +01:00
Urgau	8e61502484	core: add `#![warn(unreachable_pub)]`	2025-01-20 18:35:32 +01:00
Jakub Beránek	536516f949	Reformat Python code with `ruff`	2024-12-04 23:03:44 +01:00
Boxy	22998f0785	update cfgs	2024-11-27 15:14:54 +00:00
Ralf Jung	eddab479fd	stabilize const_unicode_case_lookup	2024-11-12 15:13:31 +01:00
bors	cf2b370ad0	Auto merge of #132500 - RalfJung:char-is-whitespace-const, r=jhpratt make char::is_whitespace unstably const I am adding this to the existing https://github.com/rust-lang/rust/issues/132241 feature gate, since `is_digit` and `is_whitespace` seem similar enough that one can group them together.	2024-11-06 04:07:32 +00:00
Matthias Krüger	b438a5cd2a	Rollup merge of #132499 - RalfJung:unicode_data.rs, r=tgross35 unicode_data.rs: show command for generating file https://github.com/rust-lang/rust/pull/131647 made this an easily runnable tool, now we just have to mention that in the comment. :) Fixes https://github.com/rust-lang/rust/issues/131640.	2024-11-03 12:08:51 +01:00
Ralf Jung	0804815e69	make char::is_whitespace unstably const	2024-11-02 10:17:16 +01:00
Ralf Jung	720d618b5f	unicode_data.rs: show command for generating file	2024-11-02 10:06:52 +01:00
Ralf Jung	66351a6184	get rid of a whole bunch of unnecessary rustc_const_unstable attributes	2024-11-02 09:59:55 +01:00
Ralf Jung	90e4f10f6c	switch unicode-data back to 'static'	2024-10-13 11:53:06 +02:00
Matthias Krüger	4428d6f363	Rollup merge of #130101 - RalfJung:const-cleanup, r=fee1-dead some const cleanup: remove unnecessary attributes, add const-hack indications I learned that we use `FIXME(const-hack)` on top of the "const-hack" label. That seems much better since it marks the right place in the code and moves around with the code. So I went through the PRs with that label and added appropriate FIXMEs in the code. IMO this means we can then remove the label -- Cc ``@rust-lang/wg-const-eval.`` I also noticed some const stability attributes that don't do anything useful, and removed them. r? ``@fee1-dead``	2024-09-12 19:03:41 +02:00
Marcondiro	c8d9bd488a	Bump unicode printable to version 16.0.0	2024-09-10 11:13:35 +02:00
Marcondiro	bdda4ec2f5	Bump unicode_data to version 16.0.0	2024-09-10 10:50:20 +02:00
Ralf Jung	332fa6aa6e	add FIXME(const-hack)	2024-09-08 23:08:40 +02:00
Nicholas Nethercote	c5dadd0408	Use `#[rustfmt::skip]` on some `use` groups to prevent reordering. `use` declarations will be reformatted in #125443. Very rarely, there is a desire to force a group of `use` declarations together in a way that auto-formatting will break up. E.g. when you want a single comment to apply to a group. #126776 dealt with all of these in the codebase, ensuring that no comments intended for multiple `use` declarations would end up in the wrong place. But some people were unhappy with it. This commit uses `#[rustfmt::skip]` to create these custom `use` groups in an idiomatic way for a few of the cases changed in #126776. This works because rustfmt treats any `use` item annotated with `#[rustfmt::skip]` as a barrier and won't reorder other `use` items around it.	2024-07-19 13:26:48 +10:00
Nicholas Nethercote	75b6ec9800	Avoid comments that describe multiple `use` items. There are some comments describing multiple subsequent `use` items. When the big `use` reformatting happens some of these `use` items will be reordered, possibly moving them away from the comment. With this additional level of formatting it's not really feasible to have comments of this type. This commit removes them in various ways: - merging separate `use` items when appropriate; - inserting blank lines between the comment and the first `use` item; - outright deletion (for comments that are relatively low-value); - adding a separate "top-level" comment. We also entirely skip formatting for four library files that contain nothing but `pub use` re-exports, where reordering would be painful.	2024-07-17 08:02:46 +10:00
Arpad Borsos	488598c183	Add a lower bound check to `unicode-table-generator` output This adds a dedicated check for the lower bound (if it is outside of ASCII range) to the output of the `unicode-table-generator` tool. This generalized the ASCII-only fast-path, but only for the `Grapheme_Extend` property for now, as that is the only one with a lower bound outside of ASCII.	2024-04-20 10:16:45 +02:00
Marcondiro	e9870b5df3	Bump Unicode printables to version 15.1, align to unicode_data	2024-03-28 11:21:52 +01:00
Marcondiro	01fa7209d5	Bump Unicode to version 15.1.0, regenerate tables	2024-02-09 17:35:46 +01:00
Trevor Gross	22d00dcd47	Apply changes to fix python linting errors	2023-06-16 20:56:01 -04:00
Martin Gammelsæter	54f55efb9a	Use hex literal for INDEX_MASK	2023-03-21 09:59:47 +01:00
Martin Gammelsæter	355e1dda1d	Improve case mapping encoding scheme The indices are encoded as `u32`s in the range of invalid `char`s, so that we know that if any mapping fails to parse as a `char` we should use the value for lookup in the multi-table. This avoids the second binary search in cases where a multi-`char` mapping is needed. Idea from @nikic	2023-03-16 21:42:15 +01:00
Martin Gammelsæter	f9bd884385	Split unicode case LUTs in single and multi variants The majority of char case replacements are single char replacements, so storing them as [char; 3] wastes a lot of space. This commit splits the replacement tables for both `to_lower` and `to_upper` into two separate tables, one with single-character mappings and one with multi-character mappings. This reduces the binary size for programs using all of these tables with roughly 24K bytes.	2023-03-16 12:34:04 +01:00
Martin Gammelsæter	8a4eb9e3a8	Skip serializing ascii chars in case LUTs Since ascii chars are already handled by a special case in the `to_lower` and `to_upper` functions, there's no need to waste space on them in the LUTs.	2023-03-15 17:27:23 +01:00
jonathanCogan	db47071df2	Replace libstd, libcore, liballoc in line comments.	2022-12-30 14:00:42 +01:00

1 2

61 Commits