SHIP:Language Support

From Serious Documentation
Jump to: navigation, search

SHIP natively supports multiple languages for your GUI. There are three basic elements of a multi-language GUI:

  • Fonts that support the characters required by the language
  • Strings with character encodings that allows you to represent multi-language text, along with functions to manipulate/convert text
  • Methods for displaying/updating/storing strings in various languages so you can choose (and display) the current GUI language.

Languages and Locales

The word "language" is often used interchangeably with "locale". In the broader sense, both terms ideally describe not only language but also regionality. Regionality comprehends local variations of language and expression -- ask a Texan trying to communicate with an Englishman if their "English" is the same language! The most common expression of locales is a 4 character combination of the ISO639-2 2-letter language code followed by the 2-letter ISO-3166 country code. Therefore “xxYY” describes a locale with language “xx” as-expressed in country/region “YY”.

For example, "enUS" refers to English as expressed in the US, as compared to "enUK" which refers to English as expressed in the United Kingdom. Here are some some common locale names:

  • enUS - English in the US
  • frFR - French in France
  • ptBR - Portuguese in Brazil
  • esMX - Spanish in Mexico
  • zhCN - Chinese in China (simplified character set)
  • zhTW - Chinese in Taiwan (traditional character set)
  • deDE - German in Germany
  • jaJP - Japanese in Japan

The following references may be useful:

Codepoints, UTF-8 and Unicode

In order to represent all the characters across all languages, the traditional ASCII character set that only has 8 bits (i.e. 256 characters) is insufficient. Unicode 16 is a standard that maps an unsigned 16 bit value called a "Unicode codepoint" or just "codepoint" to each glyph (character) across all different languages. The first 128 Unicode 16 characters are the same as the first 128 ASCII 8-bit characters. Unicode is well described at unicode.org.

However, storing text strings using uncompressed/unencoded 16-bit (i.e. 2 byte) values for every character can be very inefficient, especially when these strings have a preponderance of traditional ASCII characters including numbers and punctuation that would normally only take one byte each. Therefore various character compression/encoding systems have been developed in the industry attempting to optimize the storage required by text strings better than unencoded Unicode 16.

All strings within SHIP are encoded using the UTF-8 mechanism. UTF-8 is a variable length multi-byte encoding. The traditional 7-bit ASCII characters are all represented exactly the same in UTF-8. A special "escape" character, along with some encoding hints, allows several bytes of data to be combined into a single Unicode 16 value. It is important to recognize that "bytes" does not mean "characters". Each character may be encoded with 1, 2, 3, or even more bytes.

The SHIPTide tool performs all string editing in UTF-8 encoding. It will not be obvious to the user: text strings look normal. Cutting and pasting (for example) from Google Translate will work seamlessly. However, you must be careful if you (for example) copy a script from SHIPTide to your favorite text editor and then paste it back into SHIPTide. If your external text editor does not comprehend UTF-8 natively, any special non-ASCII characters in your embedded strings will be mangled.

There are numerous Sail Functions available for manipulating strings and converting characters to codepoints (and back).

Codesets

A collection of Unicode codepoints needed to express a locale is called a “codeset”. Note that for all locales, the code set is not a single contiguous block of Unicode codepoints. For instance, even in English, the basic Latin characters are the traditional ASCII codepoints of 0x0020...0x007F but also the trademark symbol (™, Unicode 0x2122), the degrees symbol (°, Unicode 0x00B0), the Euro symbol (€, 0x20AC), the copyright symbol (©, 0x00A9). Excellent codepoint and codeset resources can be found at [1], [2], and [3].

Each locale drives a specific set of required characters needed to express text in that locale.

A complete mapping of codesets per locale can be quite complicated, especially when factoring in the limited resources of the target device. For example, the Japanese locale (jaJP) theoretically requires over 10,000 codepoints – generally requiring megabytes of storage just for the font characters. The Serious GUI design tool,SHIPTide, has the ability to define custom codesets as well as auto-generate codesets based on the characters (codepoints) used in the specific GUI. This latter capability can ensure that the absolute minimum number of codepoints required for the GUI are included on the device, minimizing memory footprint and therefore cost.

Fonts

SHIP:Fonts are a collection of codepoints into a codeset, along with the visual glyphs to be shown for each codepoint. Fonts are discussed at length in the Fonts area of the documentation.


The shiplanguage and shiplanguages Built-in Variables

SHIP has the built-in variables "shiplanguage" and "shiplanguages".

shiplanguage contains the 4 character code for the current locale selected, for example "enUS". Changing shiplanguage (by assigning it some other value in a Sail script will cause any visual text with resources attached to automatically change to the new language, or if the translation for the new language does not exist, to change to the default translation.

shiplanguages is a concatenated sequence of all the locales currently expressed in text and alternate resource nodes, listed with the default locale first. For example, if the text resources in your GUI have translations for "enUS", "esMX" and "jaJP", with the default being "enUS", then shiplanguages will contain "enUSesMXjaJP".

Immediate Text vs. Text/String Resources

In the layout area, you can put immediate text in text nodes as well as SAIL scripts. In text nodes, just fill in the value property with the text string of your choice:

Immediate Layout-Area Text

Similarly, in SAIL scripts, a string constant is enclosed in quotes.

text.value = "this is immediate text in a SAIL script";

Both of these are traditionally referred to as "immediate" values. Even native English speakers will wonder at this term: "immediate" is most commonly used as a reference of time (i.e. "right now"). In this case, "immediate" is the older English meaning of "near; in very close proximity to".

Immediate text values have the disadvantage of being single-language. They also may take up redundant memory in the target system if the same text value is used in many places throughout the project.

A better way to organize, store, and manage text strings is through the use of text and string nodes in the resources section. These resources can store text strings or string fragments once that can be used multiple times throughout your GUI. In addition, they can hold the default language as well as many other translations.

text resources include various text formatting properties, including positioning and colors. Introduced in SHIP Version 5, string resources use less memory and render more quickly: they only contain the text string value and any translations and do not have any formatting/layout related properties. Therefore, where possible, use string resources over text resources.

Creating, Editing, and Translating Text/String Resources

SHIP Version 4

In SHIP Version 4, text with translations were shown as a parent text node within the resources area with the translations displayed as a list of child alternate nodes.

For example, the following shows a text resource with several available translations:

Text Resource with Tranlations

Add a new text resources by right clicking on the "text" grouping and selecting Add->text. Edit the value property to set the default text string for this resources. Selecting this node and right-clicking to "Add->alternate" allows you to add alternate language versions. You must specify the 4-letter language code for each translation.

text resources can also be placed in resource group nodes, allowing SAIL-based indexing of the resources -- for example through the use of the getChild() function.

SHIP Version 5

In SHIPTide Version 5, text and string are now available under the "Text" grouping as well under resource group nodes. As mentioned above, string resources are less resource-intensive than text resources and should be used wherever possible.

SHIPTide V5 also includes significant new management capabilities for multi-language resources, including:

  • Import/Export of all your resources from/to Microsoft Excel
  • Table-style editing
  • Default language management
  • Language list management

See SHIPTide V5 Language and Translation Management for a complete description of these features.

Using text and string Resources

Within the GUI, similar to the way an image is attached to a box, you can attach resource text to a layout text node. In this example, the box named footer has an inner text node with a text resource attached as its object:

Attaching Resource Text to Layout Text

Automatic Multi-Language Text Display

When text or string resources are attached to layout-area text nodes (via their object properties), the string is always displayed in the current system language (set by the shiplanguage variable) if possible. If a resources does not have a translation for the current shiplanguage, the default language version will always be displayed.

For example, consider a string resource with the default text "The Serious Human Interface™ Platform" and a single simplified Chinese translation (zhCN) of "严重的人机界面平台™". This resource may be attached to the GUI layout in various places throughout the GUI by various layout-area text nodes having their object property pointing at this resources. At runtime, if shiplanguage is set to "zhCN", then all displayed versions of this text will automatically be in Chinese. Any other shiplanguage value will cause the English version to be displayed.

This allows you to have "sparse translation" where not all languages and all phrases are translated.

Changing shiplanguage in a SAIL script will cause all displayed text using this attached-object mechanism to automatically change to the correct translation immediately.

Dynamic Multi-Language Text with Sail Scripting

While it is certainly simple to attach a text/string resources to a layout-area text object and have it automatically change with the shiplanguage, not all displayed text has static content. Often, you may need a piece of text that changes at runtime, for example numbers and values. SAIL scripts can leverage text/string resources as well to help simplify the translation management of these dynamic text elements.

For example, displaying the current date is a common and complex translation challenge. You may want to display something like "Day Month Year" on the screen, where the month element is in the current language.

This is a combination string of 3 elements: the day, the month, and the year. Only the month needs to be translated. In this case, we can define a text resource like this:

Month Names with Translations in "sets of 3 letters" format

However, we must build up the string to display at runtime when any of the following change:

  • shiplanguage
  • the boot of the GUI
  • the current month, day, or year changes (don't forget, that GUI screen can be sitting alive for months or even years!)

So the construction in the layout are is a bit more complex, but enables an auto-updating date string on the screen:

Script-driven Date Text with Translation

References