Difference between revisions of "SHIP:Language Support"

From Serious Documentation
Jump to: navigation, search
(UTF-8 Encoding and String/Character Functions)
(The shiplanguage and shiplanguages Built-in Variables)
Line 32: Line 32:
 
SHIP has the built-in variables "shiplanguage" and "shiplanguages".
 
SHIP has the built-in variables "shiplanguage" and "shiplanguages".
  
shiplanguage contains the 4 character code for the current locale selected, for example "enUS".
+
shiplanguage contains the 4 character code for the current locale selected, for example "enUS". Changing shiplanguage (by assigning it some other value in a [[SHIP:Sail|Sail script]] will cause any visual text with resources attached to automatically change to the new language, or if the translation for the new language does not exist, to change to the default translation.
 
 
  
 
shiplanguages is a concatenated sequence of all the locales currently expressed in {{Node|text}} and {{Node|alternate}} resource nodes, listed with the default locale first.  For example, if the text resources in your GUI have translations for "enUS", "esMX" and "jaJP", with the default being "enUS", then shiplanguages will contain "enUSesMXjaJP".
 
shiplanguages is a concatenated sequence of all the locales currently expressed in {{Node|text}} and {{Node|alternate}} resource nodes, listed with the default locale first.  For example, if the text resources in your GUI have translations for "enUS", "esMX" and "jaJP", with the default being "enUS", then shiplanguages will contain "enUSesMXjaJP".

Revision as of 14:43, 22 April 2013

SHIP Multi-Language Support

SHIP natively supports multiple languages for your GUI. There are three basic elements of a multi-language GUI:

  • Fonts that support the characters required by the language
  • String/character encoding that allows you to represent text, along with functions to manipulate/convert text
  • Methods for displaying/updating/storing strings in various languages so you can choose (and display) the current GUI language.

Languages and Locales

The word "language" is often used interchangeably with "locale". In the broader sense, both terms ideally describe not only language but also regionality. Regionality comprehends local variations of language and expression -- ask a Texan trying to communicate with an Englishman if their "English" is the same language! The most common expression of locales is a 4 character combination of the ISO639-2 2-letter language code followed by the 2-letter ISO-3166 country code. Therefore “xxYY” describes a locale with language “xx” as-expressed in country/region “YY”.

For example, "enUS" refers to English as expressed in the US, as compared to "enUK" which refers to English as expressed in the United Kingdom. Here are some some common locale names:


  • enUS - English in the US
  • frFR - French in France
  • ptBR - Portuguese in Brazil
  • esMX - Spanish in Mexico
  • zhCN - Chinese in China (simplified character set)
  • zhTW - Chinese in Taiwan (traditional character set)
  • deDE - German in Germany
  • jaJP - Japanese in Japan


The shiplanguage and shiplanguages Built-in Variables

SHIP has the built-in variables "shiplanguage" and "shiplanguages".

shiplanguage contains the 4 character code for the current locale selected, for example "enUS". Changing shiplanguage (by assigning it some other value in a Sail script will cause any visual text with resources attached to automatically change to the new language, or if the translation for the new language does not exist, to change to the default translation.

shiplanguages is a concatenated sequence of all the locales currently expressed in text and alternate resource nodes, listed with the default locale first. For example, if the text resources in your GUI have translations for "enUS", "esMX" and "jaJP", with the default being "enUS", then shiplanguages will contain "enUSesMXjaJP".

Codesets, Codepoints, Unicode 16 and UTF-8

Each locale drives a specific set of required characters needed to express text in that locale. Because there are far more characters required than were comprehended in the original 7-bit ASCII collection, characters are now identified with an unsigned 16-bit value called a “Unicode codepoint”. Unicode is well described at unicode.org. A collection of Unicode codepoints needed to express a locale is called a “codeset”. Note that for all locales, the code set is not a single contiguous block of codepoints. For instance, even in English, the basic Latin characters are the traditional ASCII codepoints of 0x0020...0x007F but also the trademark symbol (™, Unicode 0x2122), the degrees symbol (°, Unicode 0x00B0), the Euro symbol (€, 0x20AC), the copyright symbol (©, 0x00A9). Excellent codepoint and codeset resources can be found at [1], [2], and [3].

A complete mapping of codesets per locale can be quite complicated, especially when factoring in the limited resources of the target device. For example, the Japanese locale (jaJP) theoretically requires over 10,000 codepoints – generally requiring megabytes of storage just for the font characters. The Serious GUI design tool,SHIPTide, has the ability to define custom codesets as well as auto-generate codesets based on the characters (codepoints) used in the specific GUI. This latter capability can ensure that the absolute minimum number of codepoints required for the GUI are included on the device, minimizing memory footprint and therefore cost.

Fonts

SHIP:Fonts are a collection of codepoints into a codeset, along with the visual glyphs to be shown for each codepoint. Fonts are discussed at length in the Fonts area of the documentation.

UTF-8 Encoding and String/Character Functions

All strings within SHIP are encoded using the UTF-8 mechanism. UTF-8 is a variable length multi-byte encoding. The traditional 7-bit ASCII characters are all represented exactly the same in UTF-8, however there is a "escape" character that, with some encoding hints, allows several bytes of data to be combined into a single Unicode 16 value. It is important to recognize that "bytes" does not mean "characters". Each character may be encoded with 1, 2, 3, or even more bytes.

The SHIPTide tool performs all string editing in UTF-8 encoding. It will not be obvious to the user: text strings look normal. Cutting and pasting (for example) from Translate will work seamlessly.

There are numerousSail Functions available for manipulating strings and converting characters to codepoints (and back).