Unicode
From Reboil
Unicode is a system for encoding graphemes of various languages of Earth.
Stats
- Standard: ISO 10646 Information technology — Universal coded character set (UCS)
Combining Marks
Code Point | Example | Description | Comment |
---|---|---|---|
U+0300 |
à | Combining Grave Accent | |
U+0301 |
á | Combining Acute Accent | |
U+0302 |
â | Combining Circumflex Accent | |
U+0303 |
ã | Combining Tilde Accent | |
U+0304 |
ā | Combining Macron | |
U+0305 |
A̅B | Combining Overline | |
U+0306 |
ă | Combining Breve | |
U+0307 |
ċ | Combining Dot Above | |
U+0308 |
ä | Combining Diaeresis | |
U+030A |
å | Combining Ring Above | |
U+030C |
ǎ | Combining Caron Above | |
U+0323 |
ạ | Combining Dot Below[1] | |
U+0327 |
ç | Combining Cedilla | |
U+0328 |
ǫ | Combining Ogonek | |
U+0331 |
a̱ | Combining Macron Below | |
U+0332 |
k̲h | Combining Low Line | |
U+3099 |
あ゙ | Combining Katakana-Hiragana Voiced Sound Mark[2] | |
U+309A |
あ゚ | Combining Katakana-Hiragana Semi-Voiced Sound Mark[2] | |
U+0310 |
m̐ | Combining Candrabindu | |
U+035E |
͞ab | Combining Double Macron | |
U+035F |
͟kha | Combining Double Macron Below | |
Example | Example | Example | Example |
Code blocks containing combining marks include (from ref):
U+0300-036F
: Combining Diacritical MarksU+1AB0-1AFF
: Combining Diacritical Marks ExtendedU+1DC0-1DFF
: Combining Diacritical Marks SupplementU+20D0-20FF
: Combining Diacritical Marks for SymbolsU+FE20-FE2F
: Combining Half Marks
Unicode FAQ on combining marks.
Combining Diacritical Marks (Devangari)
Code Point | Example | Description | Comment |
---|---|---|---|
U+093C |
ख़ | DEVANAGARI SIGN NUKTA | |
Example | Example | Example | Example |
My frequently used code points
Code Point | Example | Description |
---|---|---|
U+00A0 |
No break space | |
U+00A9 |
© | Copyright sign |
U+00B6 |
¶ | Pilcrow Sign[3] |
U+00C6 |
Æ | Latin Capital Letter Ae |
U+00D0 |
Ð | Latin Capital Letter Eth |
U+00DE |
Þ | Latin Capital Letter Thorn |
U+00E6 |
æ | Latin Small Letter Ae |
U+00F0 |
ð | Latin Small Letter Eth |
U+00FE |
þ | Latin Small Letter Thorn |
U+0141 |
Ł | Latin Capital Letter L |
U+0142 |
ł | Latin Small Letter L |
U+02BC |
ʼ | Modifier Letter Apostrophe |
U+1D2C |
ᴬ | Modifier Letter Capital A |
U+1D2E |
ᴮ | Modifier Letter Capital B |
U+2013 |
– | En dash |
U+2014 |
— | Em dash |
U+2018 |
‘ | Left single quotation mark |
U+2019 |
’ | Right single quotation mark |
U+201C |
“ | Left double quotation mark |
U+201D |
” | Right double quotation mark |
U+2026 |
… | Horizontal ellipsis |
U+2126 |
Ω | Ohm sign |
U+2190 |
← | Leftwards arrow |
U+2191 |
↑ | Upwards arrow |
U+2192 |
→ | Rghtwards arrow |
U+2193 |
↓ | Downwards arrow |
U+2194 |
↔ | Left right arrow |
U+2195 |
↕ | Up down arrow |
U+2122 |
™ | Trade Mark Sign |
U+21D0 |
⇐ | Leftwards Double Arrow |
U+21D1 |
⇑ | Upwards Double Arrow |
U+21D2 |
⇒ | Rightwards Double Arrow |
U+21D3 |
⇓ | Downwards Double Arrow |
U+21D4 |
⇔ | Left Right Double Arrow |
U+25CC |
◌ | Dotted Circle |
U+263C |
☼ | White sun with rays[4] |
U+2661 |
♡ | White Heart Suit |
U+2665 |
♥ | Black Heart Suit |
U+2705 |
✅ | White Heavy Check Mark |
U+2728 |
✨ | Sparkles |
U+274C |
❌ | Cross Mark |
U+29B5 |
⦵ | Circle with Horizontal Bar[5] |
U+30FB |
・ | Katakana Middle Dot |
U+1F38B |
🎋 | Tanabata Tree |
Example | Example | Example |
Code Point | Example | Description |
---|---|---|
U+00B0 |
° | Degree sign |
U+00B1 |
± | Plus-minus symbol |
U+00D7 |
× | Multiplication Sign |
U+03B8 |
θ | Greek Small Letter Theta |
U+2032 |
′ | Prime (e.g. to mark derivatives in Calculus) |
U+2219 |
∙ | Bullet Operator (e.g. dot product) |
U+221A |
√ | Square Root |
U+221D |
∝ | Proportional To |
U+221E |
∞ | Infinity |
U+222B |
∫ | Integral (i.e. from Calculus) |
U+2248 |
≈ | Almost Equal To |
Code Point | Example | Description |
---|---|---|
U+2669 |
♩ | Quarter Note |
U+266A |
♪ | Eighth Note |
U+266B |
♫ | Beamed Eighth Note |
U+266B |
♫ | Eighth note |
Code Point | code | Example | Description |
---|---|---|---|
U+1F1E6 U+1F1F7 |
AR | 🇦🇷 | Argentina flag |
U+1F1E7 U+1F1F7 |
BR | 🇧🇷 | Brazil flag |
U+1F1E8 U+1F1F1 |
CL | 🇨🇱 | Chile flag |
U+1F1E8 U+1F1F3 |
CN | 🇨🇳 | China flag |
U+1F1E9 U+1F1EA |
DE | 🇩🇪 | Germany flag |
U+1F1EA U+1F1EC |
EG | 🇪🇬 | Egypt flag |
U+1F1EA U+1F1F8 |
ES | 🇪🇸 | Spain flag |
U+1F1EA U+1F1FA |
EU | 🇪🇺 | European Union flag |
U+1F1EB U+1F1EE |
FI | 🇫🇮 | Finland flag |
U+1F1EC U+1F1E7 |
GB | 🇬🇧 | Great Britain flag |
U+1F1EC U+1F1F7 |
GR | 🇬🇷 | Greece flag |
U+1F1EC U+1F1F9 |
GT | 🇬🇹 | Guatemala flag |
U+1F1ED U+1F1F3 |
HN | 🇭🇳 | Honduras flag |
U+1F1EE U+1F1E9 |
ID | 🇮🇩 | Indonesia flag |
U+1F1EE U+1F1F1 |
IL | 🇮🇱 | Israel flag |
U+1F1EE U+1F1F3 |
IN | 🇮🇳 | India flag |
U+1F1EE U+1F1F8 |
IS | 🇮🇸 | Iceland flag |
U+1F1EE U+1F1F9 |
IT | 🇮🇹 | Italy flag |
U+1F1EF U+1F1F5 |
JP | 🇯🇵 | Japan flag |
U+1F1F0 U+1F1F7 |
KR | 🇰🇷 | South Korea flag |
U+1F1F2 U+1F1FD |
MX | 🇲🇽 | Mexico flag |
U+1F1F3 U+1F1F4 |
NG | 🇳🇬 | Nigeria flag |
U+1F1F3 U+1F1EE |
NI | 🇳🇮 | Nicaragua flag |
U+1F1F3 U+1F1FF |
NZ | 🇳🇿 | New Zealand flag |
U+1F1F5 U+1F1E6 |
PA | 🇵🇦 | Panama flag |
U+1F1F5 U+1F1ED |
PH | 🇵🇭 | Philipines flag |
U+1F1F5 U+1F1F8 |
PS | 🇵🇸 | Palestine flag |
U+1F1F5 U+1F1F9 |
PT | 🇵🇹 | Portugal flag |
U+1F1F7 U+1F1FA |
RU | 🇷🇺 | Russia flag |
U+1F1F8 U+1F1E6 |
SA | 🇸🇦 | Saudi Arabia flag |
U+1F1F8 U+1F1EA |
SE | 🇸🇪 | Sweden flag |
U+1F1F8 U+1F1FB |
SV | 🇸🇻 | El Salvador flag |
U+1F1F9 U+1F1F7 |
TH | 🇹🇭 | Thailand flag |
U+1F1F9 U+1F1FC |
TW | 🇹🇼 | Taiwan flag |
U+1F1FA U+1F1E6 |
UA | 🇺🇦 | Ukraine flag |
U+1F1FA U+1F1F8 |
US | 🇺🇸 | United States flag |
U+1F1FB U+1F1EA |
VE | 🇻🇪 | Venezuela flag |
U+1F1FC U+1F1EB |
WS | 🇼🇸 | Samoa flag |
U+1F1FD U+1F1F0 |
XK | 🇽🇰 | Kosovo flag |
U+1F1FE U+1F1EA |
YE | 🇾🇪 | Yemen flag |
U+1F1FF U+1F1E6 |
ZA | 🇿🇦 | South Africa flag |
Usage
My language notes pages (e.g. Navajo notes) use combining diacritics extensively.
In TeXmacs, unicode points may be manually entered via the Emacs look-and-feel mode and typing C-q
, a hash symbol, and the unicode point number. (e.g. #29B5
will yield a ⦵) In the default look-and-feel, Esc-q
works in lieu of C-q
.
Encoding
An explanation for the mechanics of how UTF-8 encodes Unicode point numbers across multibytes can be found here.
History
See also
- Reference
- ISO/IEC 10646 "Information technology — Universal coded character set (UCS)"
- Universal Coded Character Set
- My language notes
- Creative Commons#Unicode Glyphs
External links
- SYMBL
- SYMBL Combining Diacritical Marks
- Video explanation of UTF-8 vs. UTF-32 and graphemes by Studying with Alex
- How do I type _?
- bk4, reboil.com
References
- ↑ Baltakatei: 2023-09-11: Use U+093C DEVANAGARI SIGN NUKTA with Devangari (e.g. Hindi).
- ↑ 2.0 2.1 “Dakuten and handakuten”. (2023-09-07). Wikipedia. Accessed 2023-09-07.
- ↑ Baltakatei: a.k.a. “paragraph sign”)
- ↑ Baltakatei: 2024-01-18: Dwarf Fortress glyph for “Dwarfbucks”.
- ↑ Baltakatei: 2023-11-13: Also known as a “plimsoll symbol” which IUPAC recommends in chemistry contexts to indicate a standard state.
Footnotes