Commit Graph

100 Commits

Author SHA1 Message Date
Scott McMurray 28449daa22 ascii::Char-ify the escaping code
This means that `EscapeIterInner::as_str` no longer needs unsafe code, because the type system ensures the internal buffer is only ASCII, and thus valid UTF-8.
2023-05-12 19:37:02 -07:00
Matthias Krüger ea0b6504fa Rollup merge of #111009 - scottmcm:ascii-char, r=BurntSushi
Add `ascii::Char` (ACP#179)

ACP second: https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527900570
New tracking issue: https://github.com/rust-lang/rust/issues/110998

For now this is an `enum` as `@kupiakos` [suggested](https://github.com/rust-lang/libs-team/issues/179#issuecomment-1527959724), with the variants under a different feature flag.

There's lots more things that could be added here, and place for further doc updates, but this seems like a plausible starting point PR.

I've gone through and put an `as_ascii` next to every `is_ascii`: on `u8`, `char`, `[u8]`, and `str`.

As a demonstration, made a commit updating some formatting code to use this: https://github.com/scottmcm/rust/commit/ascii-char-in-fmt (I don't want to include that in this PR, though, because that brings in perf questions that don't exist if this is just adding new unstable APIs.)
2023-05-04 19:18:21 +02:00
Scott McMurray 8c781b0906 Add the basic ascii::Char type 2023-05-03 22:09:33 -07:00
Dylan DPC f916c44aec Rollup merge of #105076 - mina86:a, r=scottmcm
Refactor core::char::EscapeDefault and co. structures

Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct.  This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once.  On 64-bit platforms, it also reduces size of some of
the structs:

    | Struct                     | Before | After |
    |----------------------------+--------+-------+
    | core::char::EscapeUnicode  |     16 |    12 |
    | core::char::EscapeDefault  |     16 |    12 |
    | core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators.  With this change this
will became trivial.  It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
2023-05-02 11:44:50 +05:30
Michal Nazarewicz 4d0f7e2f39 review 2023-04-30 03:59:11 +02:00
Deadbeef 76dbe29104 rm const traits in libcore 2023-04-16 06:49:27 +00:00
Michal Nazarewicz 45104397e5 Refactor core::char::EscapeDefault and co. structures
Change core::char::{EscapeUnicode, EscapeDefault and EscapeDebug}
structures from using a state machine to computing escaped sequence
upfront and during iteration just going through the characters.

This is arguably simpler since it’s easier to think about having
a buffer and start..end range to iterate over rather than thinking
about a state machine.

This also harmonises implementation of aforementioned iterators and
core::ascii::EscapeDefault struct.  This is done by introducing a new
helper EscapeIterInner struct which holds the buffer and offers simple
methods for iterating over range.

As a side effect, this probably optimises Display implementation for
those types since rather than calling write_char repeatedly, write_str
is invoked once.  On 64-bit platforms, it also reduces size of some of
the structs:

    | Struct                     | Before | After |
    |----------------------------+--------+-------+
    | core::char::EscapeUnicode  |     16 |    12 |
    | core::char::EscapeDefault  |     16 |    12 |
    | core::char::EscapeDebug    |     16 |    16 |

My ulterior motive and reason why I started looking into this is
addition of as_str method to the iterators.  With this change this
will became trivial.  It’s also going to be trivial to implement
DoubleEndedIterator if that’s ever desired.
2023-04-05 19:09:55 +02:00
bors adb4bfd25d Auto merge of #105671 - lukas-code:depreciate-char, r=scottmcm
Use associated items of `char` instead of freestanding items in `core::char`

The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
2023-02-12 11:09:06 +00:00
Tobias Bucher 77c85e9cba Remove a couple of #[doc(hidden)] pub fn and their #[feature] gates 2023-02-10 08:06:35 +01:00
Lukas Markeffsky 76e216f29b Use associated items of char instead of freestanding items in core::char 2023-01-14 11:58:41 +01:00
Pietro Albini f6762c2035 update stabilization version numbers 2022-12-28 09:18:42 -05:00
Michal Nazarewicz 28162ad970 char: µoptimise UTF-16 surrogates decoding
According to Godbolt¹, on x86_64 using binary and produces slightly
better code than using subtraction.  Readability of both is pretty
much equivalent so might just as well use the shorter option.

¹ https://rust.godbolt.org/z/9jM3ejbMx
2022-12-23 14:15:33 +01:00
Matthias Krüger 8c77da87d7 Rollup merge of #102470 - est31:stabilize_const_char_convert, r=joshtriplett
Stabilize const char convert

Split out `const_char_from_u32_unchecked` from `const_char_convert` and stabilize the rest, i.e. stabilize the following functions:

```Rust
impl char {
    pub const fn from_u32(self, i: u32) -> Option<char>;
    pub const fn from_digit(self, num: u32, radix: u32) -> Option<char>;
    pub const fn to_digit(self, radix: u32) -> Option<u32>;
}

// Available through core::char and std::char
mod char {
    pub const fn from_u32(i: u32) -> Option<char>;
    pub const fn from_digit(num: u32, radix: u32) -> Option<char>;
}
```

And put the following under the `from_u32_unchecked` const stability gate as it needs `Option::unwrap` which isn't const-stable (yet):

```Rust
impl char {
    pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}

// Available through core::char and std::char
mod char {
    pub const unsafe fn from_u32_unchecked(i: u32) -> char;
}
```

cc the tracking issue #89259 (which I'd like to keep open for `const_char_from_u32_unchecked`).
2022-11-14 19:26:15 +01:00
Sky a6372525ce Clarify the possible return values of len_utf16 2022-10-16 11:06:19 -04:00
est31 176c44c08e Stabilize const_char_convert 2022-09-29 14:26:56 +02:00
est31 12c15a2bfe Split out from_u32_unchecked from const_char_convert
It relies on the Option::unwrap function which is not const-stable (yet).
2022-09-29 14:26:24 +02:00
Akshay 591c1f25b2 introduce {char, u8}::is_ascii_octdigit 2022-09-27 11:55:13 +05:30
Pietro Albini 3975d55d98 remove cfg(bootstrap) 2022-09-26 10:14:45 +02:00
Sage Mitchell 4a3e169da7 Make char::is_lowercase and char::is_uppercase const
Implements #101400.
2022-09-04 08:07:53 -07:00
Jane Losare-Lusby bf7611d55e Move error trait into core 2022-08-22 13:28:25 -07:00
Cameron Steffen 17ddcb434b Improve primitive/std docs separation and headers 2022-08-20 16:50:29 -05:00
Vincenzo Palazzo 23bd7cbcb1 docs: remove repetition
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-08-09 21:54:05 +00:00
Vincenzo Palazzo 47a0a56c1d add more docs regarding ideographic numbers
Signed-off-by: Vincenzo Palazzo <vincenzopalazzodev@gmail.com>
2022-07-28 15:52:18 +00:00
Guillaume Gomez 53db831d62 Fix links in std/core documentation 2022-07-05 21:33:39 +02:00
Giles Cope bf02d1ea5f No need to check the assert all the time. 2022-04-16 19:30:23 +01:00
Dylan DPC c0655dec7e Rollup merge of #95566 - eduardosm:std_char_consts_and_methods, r=Mark-Simulacrum
Avoid duplication of doc comments in `std::char` constants and functions

For those consts and functions, only the summary is kept and a reference to the `char` associated const/method is included.

Additionaly, re-exported functions have been converted to function definitions that call the previously re-exported function. This makes it easier to add a deprecated attribute to these functions in the future.
2022-04-10 21:03:34 +02:00
bors f565016edd Auto merge of #95678 - pietroalbini:pa-1.62.0-bootstrap, r=Mark-Simulacrum
Bump bootstrap compiler to 1.61.0 beta

This PR bumps the bootstrap compiler to the 1.61.0 beta. The first commit changes the stage0 compiler, the second commit applies the "mechanical" changes and the third and fourth commits apply changes explained in the relevant comments.

r? `@Mark-Simulacrum`
2022-04-07 07:34:04 +00:00
Deadbeef 9a2d0e53f1 Update documentation for trim* and is_whitespace to include newlines 2022-04-06 11:03:36 +10:00
Pietro Albini 181d28bb61 trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
Eduardo Sánchez Muñoz a8ff1aead8 Avoid duplication of doc comments in std::char constants and functions.
For those consts and functions, only the summary is kept and a reference to the `char` associated const/method is included.

Additionaly, re-exported functions have been converted to function definitions that call the previously re-exported function. This makes it easier to add a deprecated attribute to these functions in the future.
2022-04-01 18:36:53 +02:00
David Tolnay 4246916619 Adjust feature names that disagree on const stabilization version 2022-03-31 12:34:48 -07:00
lcnr afbecc0f68 remove now unnecessary lang items 2022-03-30 11:23:58 +02:00
David Tolnay 2ac9efbe95 Debug print char 0 as '\0' rather than '\u{0}' 2022-03-27 04:49:10 -07:00
ltdk d5803678c1 Add u16::is_utf16_surrogate 2022-03-21 22:51:32 -04:00
Dylan DPC 85056107fa Rollup merge of #87618 - GuillaumeGomez:std-char-types-doc, r=jyn514,camelid
Add missing documentation for std::char types
2022-03-11 20:29:42 +01:00
T-O-R-U-S 72a25d05bf Use implicit capture syntax in format_args
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.

A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
2022-03-10 10:23:40 -05:00
Guillaume Gomez 5b5649fd73 Add missing documentation for std::char types 2022-03-07 13:59:10 +01:00
Mario Carneiro 7c3ebec0ca fix 2022-02-17 22:14:54 -08:00
Mario Carneiro 0f14bea448 Optimize char_try_from_u32
The optimization was proposed by @falk-hueffner in https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Micro-optimizing.20char.3A.3Afrom_u32/near/272146171,  and I simplified it a bit and added an explanation of why the optimization is correct.
2022-02-17 20:27:53 -08:00
Matthias Krüger c03bf54dd1 Rollup merge of #93392 - GKFX:char-docs, r=scottmcm
Clarify documentation on char::MAX

As mentioned in https://github.com/rust-lang/rust/issues/91836#issuecomment-994106874, the documentation on `char::MAX` is not quite correct – USVs are not "only ones within a certain range", they are code points _outside_ a certain range. I have corrected this and given the actual numbers as there is no reason to hide them.
2022-01-31 06:58:32 +01:00
George Bateman 9aaf52b66a (#93392) Update char::MAX docs and core::char::MAX 2022-01-30 23:10:24 +00:00
Maybe Waffle 17cd2cd592 Fix an edge case in chat::DecodeUtf16::size_hint
There are cases, when data in the buf might or might not be an error.
2022-01-30 15:32:21 +03:00
Maybe Waffle 2c97d1012e Fix wrong assumption in DecodeUtf16::size_hint
`self.buf` can contain a surrogate, but only a leading one.
2022-01-28 12:40:59 +03:00
George Bateman 2fb617ca0f Clarify documentation on char::MAX 2022-01-27 22:13:01 +00:00
Maybe Waffle cd4245d318 Make char::DecodeUtf16::size_hist more precise
New implementation takes into account contents of `self.buf` and rounds
lower bound up instead of down.
2022-01-27 00:30:33 +03:00
Ian Douglas Scott a02639dc09 Implement TryFrom<char> for u8
Previously suggested in https://github.com/rust-lang/rfcs/issues/2854.

It makes sense to have this since `char` implements `From<u8>`. Likewise
`u32`, `u64`, and `u128` (since #79502) implement `From<char>`.
2022-01-07 12:28:47 -08:00
Matthias Krüger 60625a6ef0 Rollup merge of #88858 - spektom:to_lower_upper_rev, r=dtolnay
Allow reverse iteration of lowercase'd/uppercase'd chars

The PR implements `DoubleEndedIterator` trait for `ToLowercase` and `ToUppercase`.

This enables reverse iteration of lowercase/uppercase variants of character sequences.
One of use cases:  determining whether a char sequence is a suffix of another one.

Example:

```rust
fn endswith_ignore_case(s1: &str, s2: &str) -> bool {
    for eob in s1
        .chars()
        .flat_map(|c| c.to_lowercase())
        .rev()
        .zip_longest(s2.chars().flat_map(|c| c.to_lowercase()).rev())
    {
        match eob {
            EitherOrBoth::Both(c1, c2) => {
                if c1 != c2 {
                    return false;
                }
            }
            EitherOrBoth::Left(_) => return true,
            EitherOrBoth::Right(_) => return false,
        }
    }
    true
}
```
2021-12-23 00:28:51 +01:00
David Tolnay 417b6f354e Update stability attribute for double ended case mapping iterators 2021-12-22 10:49:51 -08:00
r00ster91 a2d78573d3 Turn all 0x1b_u8 into '\x1b' or b'\x1b' 2021-11-19 18:14:18 +01:00
Yuki Okushi 7432588e5d Rollup merge of #89258 - est31:const_char_convert, r=oli-obk
Make char conversion functions unstably const

The char conversion functions like `char::from_u32` do trivial computations and can easily be converted into const fns. Only smaller tricks are needed to avoid non-const standard library functions like `Result::ok` or `bool::then_some`.

Tracking issue: https://github.com/rust-lang/rust/issues/89259
2021-11-19 13:06:31 +09:00