Internationalization (i18n)

Internationalization , ‘i18n’ for short. Some Web site or service need to i18n for all world visitor. There are a lots kind of character set, sorting order, and calendars and so on. For example, in Japan the time stamp is used Chinese character and numerical character. “2021年12月23日12時34分56秒” stand for 12:34:56, 2021/12/23 (hh:mm:ss, yyyy/MM/dd).

Time format

LanguageExpress
Chinese二〇二一年十二月二十三日 星期一上午两点三十四分五十六秒
Japanese2021年12月23日月曜日午前2時34分56秒
Korean2021 년 12 월 23 일 월요일 오전 2시 34 분 56 초
EnglishMonday, December 23, 2021 2:34:56 am

Character Code

Single Byte Character Set, SBCS is able to express 1 character by 1 byte. It is mainly used English and European language. 1 byte character set can only represent up to 256 characters. So a lot of language , like Chinese, Japanese is not able to use Single Byte Character Set.

Byte CountCharacter Set(Language)
1256ASCII(English), ISO-8859(European etc), SJIS(Japanese Katakana), UTF-8(Alphabet)
265,536UTF16(All language), UTF8(All language), SJIS(Japanese Chinese character, Hiragana etc), GB2312(Chinese)
316,777,216UTF-8(All language)
44 294 967 296UTF-32(All language), UTF16(All language), UTF8(All language)


  • ASCII

ASCII is one of common single byte code. Its are included Alphabet character(a-z, A-Z), digit(0-9), unreadable character(space, return code) and symbol(!,”,#,_,etc).
ASCII is not used full 256 character of 8bits, is used 128 character of 7 bits

10hex16hexASCII10hex16hexASCII10hex16hexASCII
100A[LF]462E.623E>
130D[CR]472F/633F?
3220[Space]483006440@
3321!493116541A
3422503226642B
3523#513336743C
3624$523446844D
3725%533556945E
3826&543667046F
3927553777147G
4028(563887248H
4129)573997349I
422A*583A:744AJ
432B+593B;754BK
442C,603C<764CL
452D613D=774DM

10hex16hexASCII10hex16hexASCII10hex16hexASCII
784EN945E^1106En
794FO955F_1116Fo
8050P9660`11270p
8151Q9761a11371q
8252R9862b11472r
8353S9963c11573s
8454T10064d11674t
8555U10165e11775u
8656V10266f11876v
8757W10367g11977w
8858X10468h12078x
8959Y10569i12179y
905AZ1066Aj1227Az
915B[1076Bk1237B{
925C\1086Cl1247C|
935D]1096Dm1257D}
1267E~

  • ISO-8859-2

ISO8859 stands for 1 byte, 8-bit character encodings, there are 15 parts, such as ISO-8859-1, ISO-8859-2, ISO-8859,.. . ISO-8859-2 Supports those Central and Eastern European languages. Its use the Latin alphabet, including Bosnian, Polish, Croatian, Czech, Slovak, Slovene, Serbian, and Hungarian.

10hex16hex8859-210hex16hex8859-210hex16hex8859-2
100A[LF]462E.623E>
130D[CR]472F/633F?
3220[Space]483006440@
3321!493116541A
3422503226642B
3523#513336743C
3624$523446844D
3725%533556945E
3826&543667046F
3927553777147G
4028(563887248H
4129)573997349I
422A*583A:744AJ
432B+593B;754BK
442C,603C<764CL
452D613D=774DM

10hex16hex8859-210hex16hex8859-210hex16hex8859-2
7848N945E^1106En
7949O955F_1116Fo
8050P9660`11270p
8151Q9761a11371q
8252R9862b11472r
8353S9963c11573s
8454T10064d11674t
8555U10165e11775u
8656V10266f11876v
8757W10367g11977w
8858X10468h12078x
8959Y10569i12179y
905AZ1066Aj1227Az
915B[1076Bk1237B{
925C\1086Cl1247C|
935D]1096Dm1257D}

10hex16hex8859-210hex16hex8859-210hex16hex8859-2
160A0[NBSP]176B0°192C0Ŕ
161A1Ą177B1ą193C1Á
162A2˘178B2˛194C2Â
163A3Ł179B3ł195C3Ă
164A4¤180B4´196C4Ä
165A5Ľ181B5ľ197C5Ĺ
166A6Ś182B6ś198C6Ć
167A7§183B7ˇ199C7Ç
168A8¨184B8¸200C8Č
169A9Š185B9š201C9É
170AAŠ186BAş202CAĘ
171ABŤ187BBť203CBË
172ACŹ188BCź204CCĚ
173AD[SHY]189BD˝205CDÍ
174AEŽ190BEž206CEÎ
175AFŻ191BFż207CFĎ
10hex16hex8859-210hex16hex8859-210hex16hex8859-2
208D0Đ224E0ŕ240F0đ
209D1Ń225E1á241F1ń
210D2Ń226E2â242F2ň
211D3Ó227E3ă243F3ó
212D4Ô228E4ä244F4ô
213D5Ő229E5ĺ245F5ő
214D6Ö230E6ć246F6ö
215D7×231E7ç247F7÷
216D8Ř232E8č248F8ř
217D9Ů233E9é249F9ů
218DAÚ234EAę250FAú
219DBŰ235EBë251FBű
220DCÜ236ECě252FCü
221DDÝ237EDí253FDý
222DEŢ238EEî254FEţ
223DFß239EFď255FF˙

About: wpadmin


Leave a Reply

Translate »