is

Interscript

Interoperable
script conversion systems

The live demo supports 276 transliteration systems.

Introduction

This repository contains interoperable transliteration schemes from:

  • ALA-LC

  • BGN/PCGN

  • ICAO

  • ISO

  • UN (by UNGEGN)

  • Many, many other script conversion system authorities.

The goal is to achieve interoperable transliteration schemes allowing quality comparisons.

Demonstration

These transliteration systems are used in the demo:

bgnpcgn-rus-Cyrl-Latn-1947

BGN/PCGN Romanization of Russian

iso-rus-Cyrl-Latn-9-1995

ISO 9 Romanization of Russian

icao-rus-Cyrl-Latn-9303

ICAO MRZ Romanization of Russian

bas-rus-Cyrl-Latn-2017-bss

Bulgaria Academy of Science Streamlined System for Russian

interscript screencast

Installation

Prerequisites

Interscript depends on Ruby. Once you manage to install Ruby, it’s easy. This part won’t work until we release Interscript v2, please use the one below.

gem install interscript -v "~>2.0"

You can also download a local copy of this Git repository, eg. for development purposes:

git clone https://github.com/interscript/lcs
cd lcs/ruby
bundle install

Additional prerequisites for Thai systems

If you want to transliterate Thai systems, you will need to install some additional requirements. Please consult: Usage with Secryst.

Usage

Assume you have a file ready in the source script like this:

cat <<EOT > rus-Cyrl.txt
Эх, тройка! птица тройка, кто тебя выдумал? знать, у бойкого народа ты
могла только родиться, в той земле, что не любит шутить, а
ровнем-гладнем разметнулась на полсвета, да и ступай считать версты,
пока не зарябит тебе в очи. И не хитрый, кажись, дорожный снаряд, не
железным схвачен винтом, а наскоро живьём с одним топором да долотом
снарядил и собрал тебя ярославский расторопный мужик. Не в немецких
ботфортах ямщик: борода да рукавицы, и сидит чёрт знает на чём; а
привстал, да замахнулся, да затянул песню — кони вихрем, спицы в
колесах смешались в один гладкий круг, только дрогнула дорога, да
вскрикнул в испуге остановившийся пешеход — и вон она понеслась,
понеслась, понеслась!

Н.В. Гоголь
EOT

You can run interscript on this text using different transliteration systems.

interscript rus-Cyrl.txt \
  --system=bgnpcgn-rus-Cyrl-Latn-1947 \
  --output=bgnpcgn-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=iso-rus-Cyrl-Latn-9-1995 \
  --output=iso-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=icao-rus-Cyrl-Latn-9303 \
  --output=icao-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=bas-rus-Cyrl-Latn-2017-bss \
  --output=bas-rus-Latn.txt

It is then easy to see the exact differences in rendering between the systems.

diff bgnpcgn-rus-Latn.txt bas-rus-Latn.txt

If you use Interscript from the Git repository, you would call the following command instead of interscript:

# Ensure you are in your Git repository root path
ruby/bin/interscript rus-Cyrl.txt \
  --system=bas-rus-Cyrl-Latn-2017-bss \
  --output=bas-rus-Latn.txt

ISCS system codes

In accordance with ISO/CC 24229, the system code identifying a script conversion system has the following components:

e.g. bgnpcgn-rus-Cyrl-Latn-1947:

bgnpcgn

the authority identifier

rus

an ISO 639-{1,2,3,5} language code that this system applies to (For 639-2, use (T) code)

Cyrl

an ISO 15924 script code, identifying the source script

Latn

an ISO 15924 script code, identifying the target script

1947

an identifier unit within the authority to identify this system

Covered languages

Currently the schemes cover Cyrillic, Armenian, Greek, Arabic and Hebrew.

Samples to play with

References

Reference documents are located at the interscript-references repository. Some specifications that have distribution limitations may not be reproduced there.

Statistics

Amharic (5), Ancient Greek (to 1453) (1), Arabic (9), Armenian (1), Assamese (5), Azerbaijani (6), Baluchi (1), Bashkir (1), Belarusian (11), Bengali (5), Bulgarian (9), Chechen (2), Chinese (4), Dari (6), Dhivehi (4), Domung (1), Faroese (2), Geez (1), Georgian (8), German (1), Gujarati (5), Hebrew (1), Hindi (8), Icelandic (2), Japanese (7), Kannada (5), Kazakh (2), Kirghiz (2), Korean (15), Kurdish (1), Macedonian (7), Malayalam (4), Marathi (6), Modern Greek (1453-) (13), Mongolian (11), Mulam (1), Nepali (macrolanguage) (6), Northern Sami (1), Oriya (macrolanguage) (5), Pali (6), Panjabi (4), Persian (2), Persian (4), Pinjarup (1), Pushto (2), Romanian (1), Russian (13), Rusyn (1), Sanskrit (4), Serbian (7), Sinhala (3), Tajik (2), Tamil (5), Tamnim Citak (1), Tatar (2), Telugu (5), Thai (1), Turkmen (1), Uighur (1), Ukrainian (12), Urdu (3), Uzbek (3), Yue Chinese (2), undefined (1), undefined (4)

Copyright

Ribose© 2020. All rights reserved.