Foreword
The Calendaring and Scheduling Consortium (“CalConnect”) is a global non-profit organization with the aim to facilitate interoperability of collaborative technologies and tools through open standards.
CalConnect works closely with international and regional partners, of which the full list is available on our website ( https://www.calconnect.org/about/liaisons-and-relationships).
The procedures used to develop this document and those intended for its further maintenance are described in the CalConnect Directives.
In particular the different approval criteria needed for the different types of CalConnect documents should be noted. This document was drafted in accordance with the editorial rules of the CalConnect Directives.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. CalConnect shall not be held responsible for identifying any or all such patent rights. Details of any patent rights identified during the development of the document will be provided in the Introduction.
Any trade name used in this document is information given for the convenience of users and does not constitute an endorsement.
This document was prepared by Technical Committee LOCALIZATION.
Introduction
A number of international applications require the identification of written language conversion systems, including for terminology, lexicography, bibliography, and linguistics, especially for reverse transliteration, computational linguistics and machine pronunciation.
This document sets out the necessary procedures to maintain the registry of written language conversion systems.
The chosen term “written language conversion” is intended to refer to all types of conversions, i.e. transformations of written texts from one spelling system to another. It thus includes both script conversion (change of script: transliteration, transcription) and conversion of texts without changing the script (e.g. transcription of foreign names or words using the alphabet of a target language, change of the orthography in a language, etc.). For the sake of compactness of expression, “written language conversion” has been shortened to “conversion” in this document where it does not cause ambiguity.
Information and documentation — Codes for written language conversion systems
1. Scope
This document provides principles for establishing codes for the representation of written language conversion systems.
The codes are devised for usage in any application requiring the expression of written language conversion systems, including transliteration and romanization systems, in coded form.
2. Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes requirements of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639-2, International Organization for Standardization. Codes for the representation of names of languages — Part 2: Alpha-3 code. First edition. Geneva. https://www.iso.org/standard/4767.html.
ISO 639-3, International Organization for Standardization. Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages. First edition. Geneva. https://www.iso.org/standard/39534.html.
ISO 639-5, International Organization for Standardization. Codes for the representation of names of languages — Part 5: Alpha-3 code for language families and groups. First edition. Geneva. https://www.iso.org/standard/39536.html.
ISO 3166-1, International Organization for Standardization. Codes for the representation of names of countries and their subdivisions — Part 1: Country code. Fourth edition. Geneva. https://www.iso.org/standard/72482.html.
ISO 5127, International Organization for Standardization. Information and documentation — Foundation and vocabulary. Second edition. Geneva. https://www.iso.org/standard/59743.html.
ISO 8601 (all parts), International Organization for Standardization. Date and time – Representations for information interchange. First edition. 2019. Geneva. https://www.iso.org/standard/70907.html.
ISO 15924, International Organization for Standardization. Information and documentation — Codes for the representation of names of scripts. Second edition. Geneva. https://www.iso.org/standard/81905.html.
3. Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 5127 and the following apply.
3.1. script
particular graphic representation or class of representations of a set of characters used to write one or more languages
[SOURCE: ISO 5127, Clause 3.1.6.02]
3.2. spelling system
set of rules governing the orthography of a language
Note 1 to entry: Typically, a spelling system defines how the spoken form of a language is represented in writing. Several languages have undergone orthographic reforms which means they have had different spelling systems.
3.3. natural language
language which is or was in active use in a community of people, and the rules of which are mainly deduced from the usage
[SOURCE: ISO 5127, Clause 3.1.5.02]
3.4. character
member of a set of elements that is used for the representation, organization, or control of data
[SOURCE: ISO 5127, Clause 3.1.4.02]
3.5. written language
natural language (Clause 3.3) realized through the writing of characters (Clause 3.4)
[SOURCE: ISO 5127, Clause 3.1.5.04]
3.6. written language conversion
process whereby one spelling system (Clause 3.2) is converted into another spelling system
Note 1 to entry: This is a general term that includes script conversion but also, e.g., cases when a language changes its orthography without changing the script.
3.7. transliteration
process which consists of representing the characters of an alphabetical or syllabic system of writing by the characters of a conversion alphabet
3.8. transcription
process whereby the sounds of a given language are noted by the system of signs of a conversion language
3.9. romanization
script conversion from non-Roman to Roman {Latn} by means of transliteration (Clause 3.7), transcription (Clause 3.8) or both
[SOURCE: ISO 5127, Clause 3.1.6.14]
3.10. written language conversion system
set of rules for written language conversion (Clause 3.6)
3.11. language code
combination of characters used to represent the name of a language (ISO 5127, Clause 3.1.5.01) or languages
[SOURCE: ISO 5127, Clause 3.2.5.14]
3.12. script code
combination of characters used to represent the name of a {Latn}
[SOURCE: ISO 15924, Clause 3.8]
3.13. conversion system code
combination of characters used in a structured way to represent a written language conversion system (Clause 3.10)
4. Conversion system codes
4.1. Structure of conversion system codes
4.1.1. General
A conversion system code shall consist of four segments:
titular segment;
source spelling system segment;
target spelling system segment;
identifying segment.
Each segment SHALL consist of one or more elements.
4.1.2. Construction of the conversion system code
The following rules are to be adhered to for the construction of a conversion system code:
The codes shall consist of elements from the following Unicode ranges:
DIGIT ZERO through DIGIT NINE (U+0030 — U+0039)
LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z (U+0041 — U+005A)
LATIN SMALL LETTER A through LATIN SMALL LETTER Z (U+0061 — U+007A)
Segments shall be separated by a single “COLON” (”:”, Unicode U+003A).
Elements within a segment shall be separated by a single “HYPHEN-MINUS” (“-”, Unicode U+002D).
“HYPHEN-MINUS” (“-”, Unicode U+002D) within an element (e.g. 233-3) will also be accepted.
Other characters in the elements not covered by the above should be omitted or substituted.
4.1.3. Titular segment
This part will contain a reference to the conversion system authority or authorities by using identifiers, the list of which is maintained by ISO 24229/RA ( Appendix A.1). If an authority cannot be identified but the conversion system has a national character and/or is used by the government, the 2-letter country code from ISO 3166-1 should be used as the conversion system authority. If no conversion system authorities can be identified or its identification is not relevant, “Var” (varia) is used as the titular segment. See Clause 5 for more details.
4.1.4. Source spelling system segment
Except as specified in Clause 4.6, a script code is a mandatory element. Language-specific spelling systems have also language codes. In order to cover more specific needs the following four elements in the order given shall be used:
language code (3-letter code from ISO 639-2 or ISO 639-3 with preference to terminological codes. If a synonym is used from ISO 639-2, the ISO 639-2/T associated code should be used. ISO 639-2/T codes are intended to be used for terminology applications.);
script code (4-letter code from ISO 15924);
country code (2-letter code from ISO 3166-1);
spelling system extension (an ad hoc string to refer to a non-default spelling system of a language, such as old orthography).
EXAMPLE 1
ind-Latn-pre1972 (Indonesian language using the pre-1972 orthography)
EXAMPLE 2
bos-Arab (Bosnian language using Arabic script)
EXAMPLE 3
uzb-Arab-AF (Uzbek language as used in Afghanistan)
4.1.5. Target spelling system segment
This part may have the same four elements as listed in Clause 4.1.4.
4.1.6. Identifying segment
This part will serve to distinguish by version, year of issue, etc. conversion systems that otherwise have the same scope. It may also contain elements necessary for the recognition of the system itself if the system has some kind of identification element. All in all, the following elements may occur (in the order given):
identifying numbers, letters or else (such as standard number, e.g. 843)
version number (e.g. v6, v4-1)
year of adoption
year of issue
method identifier (if a standard devises more than one method of conversion, this optional ad hoc identifier could be used for distinction)
If there are cases when no elements can be used for this part, na (not applicable) will be the substitute.
EXAMPLE
2017 is the identifying segment of the system coded as UN:ara-Arab:Latn:2017.
4.2. Requirements for new conversion system codes
Additions to the list of conversion system codes shall be made on the basis of information from upon the request of a member of ISO 24229/AG (Appendix A.2) or the conversion system authority that manages this system.
The ISO 24229/AG will decide upon the addition, on the basis of the justification given for the actual requirements for international interchange. Code elements will be allocated accordingly.
A written language conversion system is eligible for a conversion system code assignment if it fulfils one of the following criteria.
The system has been approved for official use at some level of government.
The system has been developed and used by educational/scientific institutions, published in a peer reviewed scientific publication.
The system has been in substantial usage.
Assigning of a conversion system code also requires demonstration of one of the following usage factors:
necessity of identification of the system in interchange.
necessity of identification of the system in data encoding.
NOTE Systems that are used in isolation or only for temporary usage do not need to have assigned codes.
4.3. Deprecation of conversion system codes
Deprecation of conversion system codes shall be made upon request of a member of ISO 24229/AG or the conversion system authority that manages the system.
The ISO 24229/AG will decide upon the marking of deprecation, on the basis of the information received. The corresponding code is reserved for backwards-compatibility.
NOTE Deprecation only applies to the code representation of the written languages conversion system, and not the system itself. For example, deprecation may be necessary when the authority undergoes a rename.
4.4. User assigned conversion system codes
If users need codes to represent conversion systems not included in the conversion system registry, the code prefix of zz can be used, which must be placed at the beginning of the conversion system code, in the titular segment, and followed by a “HYPHEN MINUS” character (“-”, Unicode U+002D).
NOTE Users are advised that the above series of codes are not universally used, those code elements are not compatible between different entities.
4.5. Capitalization of conversion system codes
Conversion system codes will use capitalization according to the relevant standards but this does not have any distinctive meaning. For example, an all lower case code will be an equally valid code.
4.6. Abbreviated conversion system codes
In case of user demand, abbreviated conversion system codes may additionally be registered whereby in identifying language-specific spelling systems script codes are omitted if they can be considered as default scripts for the languages concerned. Examples are given in Clause 4.7. Sources, such as Common Locale Data Repository (CLDR) of the Unicode Consortium, should be consulted when determining default scripts for languages.
4.7. Examples of conversion system codes
The examples given here are only indicative and do not guarantee that such codes will be actually registered.
EXAMPLE 1
UN:ara-Arab:Latn:2017 (possible abbreviation — UN:ara:Latn:2017; United Nations system for the romanization of Arabic, approved 2017)
EXAMPLE 2
UN:mon-Mong-CN:Latn:1977 (possible abbreviation — UN:mon-CN:Latn:1977; United Nations system for the romanization of Mongolian in China, approved 1977)
EXAMPLE 3
BGN-PCGN:chn-Hans:Latn:1979 (BGN/PCGN 1979 Agreement — Romanization of Chinese)
EXAMPLE 4
ALA-LC:mal-Mlym:Latn:2012 (possible abbreviation — ALA-LC:mal:Latn:2012; ALA-LC romanization system that transliterates the Malayam language from Malayam script characters into Latin script)
EXAMPLE 5
ISO:Cyrl:Latn:9-1995 (ISO 9:1995 for the transliteration into Latin of Cyrillic characters)
EXAMPLE 6
ICAO:Arab:Latn:2015 (ICAO rules for rendering Arabic-script names in Latin letters, issued in 2015)
EXAMPLE 7
DIN:bel-Cyrl:Latn:1460-1982 (possible abbreviation — DIN:bel:Latn:1460-1982; DIN 1460:1982 for the transliteration of Belarusian into Latin)
EXAMPLE 8
ESKT:udm-Cyrl:est-Latn:2021 (possible abbreviation — ESKT:udm:est:2021; Estonian Language Committee’s rules for rendering Udmurt names in Estonian texts, approved 2021)
EXAMPLE 9
LV:eng-Latn:lav-Latn:2006 (possible abbreviation — LV:eng:lav:2006; official instructions in Latvia on rendering English proper names in Latvian, issued in 2006)
Target spelling systems can also be language-specific. Example 8 denotes a system to represent Udmurt names in Estonian texts using the Estonian alphabet, not Latin as a whole.
6. Data model and attributes
6.1. Common data model and attributes
6.1.1. General
The data models in this clause shall be used by other data models specified in this document.
6.1.2. Data models
Figure 1
6.1.3. Usage of ISO 15924 code elements
iso15924Code represents code elements from ISO 15924 for reference to scripts.
6.1.4. Usage of ISO 639 code elements
iso639Code represents code elements from ISO 639-2, ISO 639-3 and ISO 639-5 for reference to languages.
6.1.5. Usage of ISO 3166 code elements
iso3166Code represents country codes from ISO 3166-1.
6.1.6. Usage of ISO 8601 expressions
iso8601Expression represents datetime expressions that conform with ISO 8601 (all parts).
6.3. Conversion system data model and attributes
6.3.1. Diagram
Figure 3
6.3.2. Written language conversion system
code
A code that identifies the written language conversion system.
name
A name that represents the written language conversion system.
authority
The conversion system authority under which this conversion system belongs.
sourceSpelling
The spelling system used in the source text.
targetSpelling
The spelling system used in the output text.
identifyingSegment
An identifier that distinguishes the written languages conversion system from others with the same conversion system authority and spelling scopes.
relations
Written language conversion systems can be related to other written language conversion systems in a number of ways. For example, a written language conversion system may represent an adoption or variant of another written language conversion system.
Hierarchical structures of written language conversion systems can be constructed by means of relationships.
This element is optional.
codeStatus
An optional code that identifies the current status of the conversion system code itself.
systemStatus
An optional code that identifies the current status of the written language conversion system itself.
remarks
Any further notes.
The date of the adoption of the written language conversion system by the authority may be noted in the remarks.
A typical use case is to show its original code from the original system from where this code has been imported.
EXAMPLE
NOTE: OGC 11-122r1 code urd_Arab2Latn_ODNI_2004
6.3.3. Spelling system
scriptCode shall be present. In the case of a language-specific spelling system, languageCode is also required.
languageCode
A 3-letter code from ISO 639-2, ISO 639-3 and ISO 639-5 that identifies the source language being processed.
scriptCode
A 4-letter code from ISO 15924 that identifies the script of the spelling system.
countryCode
An optional 2-letter code from ISO 3166-1 that identifies the country associated with the spelling system.
extension
An optional ad hoc string to refer to a non-default spelling system of a language.
6.3.4. Conversion system relation
targetSystem
The conversion system of which this relation is a target.
type
One or more types of relation that the conversion system has with the target conversion system.
6.3.5. Conversion system code status
Examples of system code statuses:
- preferred
The current system code is marked as “preferred”.
- deprecated
The current system code is marked as “deprecated”.
NOTE The deprecation marker in no way indicates deprecation of the system itself.
EXAMPLE
When a conversion system code has been renamed, maybe due to the renaming of the corresponding system authority, then the old code can be considered “deprecated” in favour of the renamed code. The conversion system itself remains unchanged.
6.3.6. Conversion system status
Examples of system statuses:
- former
The current system is marked as “former”.
- current
The current system is marked as “current”.
- inactive
The current system is marked as “inactive”.
EXAMPLE
When it has been deprecated of its own accord, it can be considered “inactive”.
6.3.7. Conversion system relation type
Examples of relation types:
- basedOn
The current system is based on the target system. The conversion process inherits certain attributes from the target system.
EXAMPLE 1
ALA-LC:jpn-Hrkt:Latn:1997 is based on Var:jpn-Hrkt:Latn:Hepburn-1886.
- basisFor
The target system is based on the current system. It can be thought of as the inverse of basedOn.
EXAMPLE 2
Var:jpn-Hrkt:Latn:Hepburn-1886 is the basis for ALA-LC:jpn-Hrkt:Latn:1997, BGN:jpn-Hrkt:Latn:1930, BGN-PCGN:jpn-Hrkt:Latn:1976 and BGN-PCGN:jpn-Hrkt:Latn:2017
- aliasOf
The current system is an alias to the target system. The conversion processes are identical.
- adoptedFrom
The current system is adopted from the target system. The conversion processes may not be identical.
- supersedes
The current system supersedes the target system.
- supersededBy
The current system is superseded by the target system.
- relatedTo
The current system is related to the target system.
Appendix A
(normative)
Registration Authority
A.2. Advisory group (ISO 24229/AG)
A.2.1. Principles
For the purpose of increasing transparency and making sure the operations of the registration authority (ISO 24229/RA) are carried out in accordance with guidelines provided in this document, the registration authority will appoint an advisory group of at least 4 members and up to 12 members.
The advisory group will consist of experts knowledgeable in the fields of script or other type of conversions, information technologies, library management. The group will also contain a representative of TC 46 and may also contain representatives of organizations interested in using the conversion system codes.
A.2.2. Consensus phase and voting procedure
As a rule, the advisory group will make its decisions by consensus. If a vote is needed, the decision is approved when more than two-thirds of members vote for it.
A.5. Reservation of code elements
A.5.1. Introduction
Some code elements managed by ISO 24229/RA are reserved:
for a limited period when their reservation is the result of the deprecation ( Clause 4.3) or the alteration (Appendix A.4.1) of an entry;
for an indeterminate period when the reservation is the result of the application of international law or of exceptional requests ( Appendix A.5.3).
A.5.2. Period of non-allocation
Code elements that the ISO 24229/AG has altered or deleted should not be reallocated indefinitely.
A.5.3. Exceptional reservations
Code elements may be reserved, in exceptional cases, for conversion systems authorities and written language conversion systems which the ISO 24229/AG has decided not to include in the lists maintained by ISO 24229/RA, but for which an interchange or encoding requirement exists.
A.5.4. Reallocation
Before reallocating a former code element or a formerly reserved code element, the ISO 24229/AG shall consult, as appropriate, the authority or agency on whose behalf the code element was reserved, and consideration shall be given to difficulties which can arise from the reallocation.
A.5.5. List of reserved code elements
A list of reserved code elements is kept by the ISO 24229/RA.
A.6. Advice regarding use of code elements
The ISO 24229/AG is available for consultation and assistance on the use of codes for conversion system authorities and written language conversion systems.
Bibliography
[1] ISO 9:1995, International Organization for Standardization. Information and documentation — Transliteration of Cyrillic characters into Latin characters — Slavic and non-Slavic languages. Second edition. 1995. Geneva. https://www.iso.org/standard/3589.html.
[2] ISO 233-3, International Organization for Standardization. Information and documentation — Transliteration of Arabic characters into Latin characters — Part 3: Persian language — Transliteration. Second edition. Geneva. https://www.iso.org/standard/78514.html.
[3] ISO 639-1, International Organization for Standardization. Codes for the representation of names of languages — Part 1: Alpha-2 code. First edition. Geneva. https://www.iso.org/standard/22109.html.
[4] ALA-LC Romanization Tables, ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts, The Library of Congress, 1997, https://www.loc.gov/catdir/cpso/roman.html
[5] BGN/PCGN Romanization Systems, BGN/PCGN Romanization Systems, National Geospatial-Intelligence Agency, http://geonames.nga.mil/gns/html/romanization.html
[6] A Chinese-English Dictionary, A Chinese-English Dictionary, Revised Ed. 2, Herbert A. Giles, 1912.
[7] DIN 1460:1982, Umschrift kyrillischer Alphabete slawischer Sprachen (Conversion of Cyrillic alphabets of Slavic languages), 1982-04
[8] UNGEGN Working Group on Romanization Systems, Report on the Current Status of United Nations Romanization Systems for Geographical Names, United Nations Group of Experts on Geographical Names (UNGEGN): Working Group on Romanization Systems, https://www.eki.ee/wgrs/
[9] Fifth United Nations Conference on the Standardization of Geographical Names, Fifth United Nations Conference on the Standardization of Geographical Names, Montreal, 1987-08-18 — 1987-08-31. Vol. I. Report of the Conference, pp. 40-41.
[10] Unicode Transliteration Guidelines, Unicode Transliteration Guidelines. Available from: https://cldr.unicode.org/index/cldr-spec/transliteration-guidelines