Files
rust/src/libcore/unicode
Mark Rousskov b0e121d9d5 Shrink bitset words through functional mapping
Previously, all words in the (deduplicated) bitset would be stored raw -- a full
64 bits (8 bytes). Now, those words that are equivalent to others through a
specific mapping are stored separately and "mapped" to the original when
loading; this shrinks the table sizes significantly, as each mapped word is
stored in 2 bytes (a 4x decrease from the previous).

The new encoding is also potentially non-optimal: the "mapped" byte is
frequently repeated, as in practice many mapped words use the same base word.

Currently we only support two forms of mapping: rotation and inversion. Note
that these are both guaranteed to map transitively if at all, and supporting
mappings for which this is not true may require a more interesting algorithm for
choosing the optimal pairing.

Updated sizes:

Alphabetic     : 2622 bytes     (-  414 bytes)
Case_Ignorable : 1803 bytes     (-  330 bytes)
Cased          : 808 bytes      (-  126 bytes)
Cc             : 32 bytes
Grapheme_Extend: 1508 bytes     (-  252 bytes)
Lowercase      : 901 bytes      (-   84 bytes)
N              : 1064 bytes     (-  156 bytes)
Uppercase      : 838 bytes      (-   96 bytes)
White_Space    : 91 bytes       (-    6 bytes)
Total table sizes: 9667 bytes   (-1,464 bytes)
2020-03-21 11:22:00 -04:00
..
2018-12-25 21:08:33 -07:00