mirror of
https://github.com/octoleo/telegram-bot-bash.git
synced 2024-11-29 10:43:53 +00:00
doc: unify use of locale, range and utf
This commit is contained in:
parent
8162695451
commit
0d678f4234
@ -38,10 +38,10 @@ export 'LANGUAGE=en_US.UTF-8'
|
|||||||
|
|
||||||
#### Known locale pitfalls
|
#### Known locale pitfalls
|
||||||
|
|
||||||
##### Missing locale C
|
##### Missing C locale
|
||||||
|
|
||||||
Even required by POSIX standard some systems (e.g. Manjaro Linux) has no locale `C` and `C.UTF-8` installed.
|
Even required by POSIX standard some systems (e.g. Manjaro Linux) has `C` and `C.UTF-8` locale not installed.
|
||||||
If bashbot display a warning about missing locale you must install locale `C` and `C.UTF-8`.
|
If bashbot display a warning about missing locale you must install `C` and `C.UTF-8` locale.
|
||||||
|
|
||||||
If you don't know what locales are installed on your sytsem use `locale -a` to display them.
|
If you don't know what locales are installed on your sytsem use `locale -a` to display them.
|
||||||
[Gentoo Wiki](https://wiki.gentoo.org/wiki/UTF-8).
|
[Gentoo Wiki](https://wiki.gentoo.org/wiki/UTF-8).
|
||||||
@ -50,9 +50,9 @@ If you don't know what locales are installed on your sytsem use `locale -a` to d
|
|||||||
##### Character classes
|
##### Character classes
|
||||||
|
|
||||||
In ASCII times it was clear `[:lower:]` and `[a-z]` means ONLY the lowercase letters `[abcd...xyz]`.
|
In ASCII times it was clear `[:lower:]` and `[a-z]` means ONLY the lowercase letters `[abcd...xyz]`.
|
||||||
With introdution of locales character classes and ranges contains every character fitting the class definition.
|
With introdution of localesi, character classes and ranges contains all charatcers fitting the class definition.
|
||||||
|
|
||||||
This means for UTF-8 locales `[:lower:]` and `[a-z]` contains ALL lowercase letters, e.g. `á ø ü` also,
|
This means with a Latin UTF-8 locale `[:lower:]` and `[a-z]` contains also e.g. `á ø ü` etc,
|
||||||
see [Unicode Latin lowercase letters](https://www.fileformat.info/info/unicode/category/Ll/list.htm)
|
see [Unicode Latin lowercase letters](https://www.fileformat.info/info/unicode/category/Ll/list.htm)
|
||||||
|
|
||||||
If that's ok for your script your'e fine, but many scripts rely on the idea of ASCII ranges and may produce undesired results.
|
If that's ok for your script your'e fine, but many scripts rely on the idea of ASCII ranges and may produce undesired results.
|
||||||
@ -63,7 +63,7 @@ If that's ok for your script your'e fine, but many scripts rely on the idea of A
|
|||||||
bash
|
bash
|
||||||
lower="abcö"
|
lower="abcö"
|
||||||
|
|
||||||
echo "$LC_ALL"
|
echo "$LC_ALL $LC_COLLATE"
|
||||||
[[ "$lower" =~ ^[a-z]+$ ]] && echo "Ups, $lower is all lower case!" || echo "OK, not lower case"
|
[[ "$lower" =~ ^[a-z]+$ ]] && echo "Ups, $lower is all lower case!" || echo "OK, not lower case"
|
||||||
|
|
||||||
LC_ALL="en_US.UTF-8"
|
LC_ALL="en_US.UTF-8"
|
||||||
@ -80,7 +80,7 @@ There are three solutions:
|
|||||||
3. use `LC_COLLATE` to change behavior of all programs: `export LC_COLLATE=C`
|
3. use `LC_COLLATE` to change behavior of all programs: `export LC_COLLATE=C`
|
||||||
|
|
||||||
|
|
||||||
To work independent of language and bash settings bashbot uses solution 1. and uses own "classes" if an exact match is mandatory:
|
To work independent of language and bash settings bashbot uses solution 1.: Own "ranges" if an exact match is mandatory:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
azazaz='abcdefghijklmnopqrstuvwxyz' # a-z :lower:
|
azazaz='abcdefghijklmnopqrstuvwxyz' # a-z :lower:
|
||||||
@ -434,5 +434,5 @@ for every poll until the maximum of BASHBOT_SLEEP ms.
|
|||||||
#### [Prev Advanced Use](3_advanced.md)
|
#### [Prev Advanced Use](3_advanced.md)
|
||||||
#### [Next Best Practice](5_practice.md)
|
#### [Next Best Practice](5_practice.md)
|
||||||
|
|
||||||
#### $$VERSION$$ v1.21-7-g0798f1a
|
#### $$VERSION$$ v1.25-dev-1-g8162695
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user