is

Interscript

Interoperable
script conversion systems

Interscript supports a number of programming languages. From the dropdown box select the one you are interested in.

 

Installation

Prerequisites

Interscript depends on Ruby. Once you manage to install Ruby, it’s easy. This part won’t work until we release Interscript v2, please use the one below.

gem install interscript -v "~>2.0"

You can also download a local copy of this Git repository, eg. for development purposes:

git clone https://github.com/interscript/lcs
cd lcs/ruby
bundle install

Additional prerequisites for Thai systems

If you want to transliterate Thai systems, you will need to install some additional requirements. Please consult: Usage with Secryst.

Usage

Assume you have a file ready in the source script like this:

cat <<EOT > rus-Cyrl.txt
Эх, тройка! птица тройка, кто тебя выдумал? знать, у бойкого народа ты
могла только родиться, в той земле, что не любит шутить, а
ровнем-гладнем разметнулась на полсвета, да и ступай считать версты,
пока не зарябит тебе в очи. И не хитрый, кажись, дорожный снаряд, не
железным схвачен винтом, а наскоро живьём с одним топором да долотом
снарядил и собрал тебя ярославский расторопный мужик. Не в немецких
ботфортах ямщик: борода да рукавицы, и сидит чёрт знает на чём; а
привстал, да замахнулся, да затянул песню — кони вихрем, спицы в
колесах смешались в один гладкий круг, только дрогнула дорога, да
вскрикнул в испуге остановившийся пешеход — и вон она понеслась,
понеслась, понеслась!

Н.В. Гоголь
EOT

You can run interscript on this text using different transliteration systems.

interscript rus-Cyrl.txt \
  --system=bgnpcgn-rus-Cyrl-Latn-1947 \
  --output=bgnpcgn-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=iso-rus-Cyrl-Latn-9-1995 \
  --output=iso-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=icao-rus-Cyrl-Latn-9303 \
  --output=icao-rus-Latn.txt

interscript rus-Cyrl.txt \
  --system=bas-rus-Cyrl-Latn-2017-bss \
  --output=bas-rus-Latn.txt

It is then easy to see the exact differences in rendering between the systems.

diff bgnpcgn-rus-Latn.txt bas-rus-Latn.txt

If you use Interscript from the Git repository, you would call the following command instead of interscript:

# Ensure you are in your Git repository root path
ruby/bin/interscript rus-Cyrl.txt \
  --system=bas-rus-Cyrl-Latn-2017-bss \
  --output=bas-rus-Latn.txt

Integration with Ruby Applications

'

Interscript can be used as a Ruby Gem library to be integrated with other Ruby applications.

Gemfile

You need to make sure your Gemfile contains the following lines:

source "https://rubygems.org"

gem "interscript", "~>2.0"

Requiring

In your codebase, if you don’t do Bundler.require, you will need to add the following line:

require "interscript"

Listing all available maps

To list all available maps, one must execute the following code:

maps = Interscript.maps

maps will be an array containing all Interscript maps by their name.

Transliterating text

To transliterate test using a given map, like bas-rus-Cyrl-Latn-2017-bss, one must execute:

cache = {}
input = "Хелло"
output = Interscript.transliterate("bas-rus-Cyrl-Latn-2017-bss",
                                   input,
                                   cache)

You should preserve the cache variable for performance reasons. It is optional, you don’t need to (but should) supply it.

Using Ruby compiler

If performance is of utmost performance for your application and you want to sacrifice a little bit of loading time for much better performance, you can use Interscript::Compiler::Ruby instead of Interscript::Interpreter (which is used by default).

require "interscript/compiler/ruby"

cache = {}
input = "Хелло"
output = Interscript.transliterate("bas-rus-Cyrl-Latn-2017-bss",
                                   input,
                                   cache,
                                   compiler: Interscript::Compiler::Ruby)

Transliterating in reverse

To reverse a given string using a map with a name of a form: bas-rus-Cyrl-Latn-2017-bss, change places for Cyrl and Latn.

To reverse a given string using a map with a name of a form: var-swe-Latn-Latn-2021, append -reverse to its name.

Please note: this only works for Ruby implementation. Other implementations depend on the Ruby implementation for the purpose of compilation. For those, you need to compile the map using the Ruby implementation, but the name has to be given according to the above hint.

Usage with Rababa

'

RABABA is the Arabic Diacritization Library that uses Machine Learning to predict missing diactricts. It is well integrated with Interscript.

Using it standalone

Run: gem install rababa

Integration with Ruby Applications

In your Gemfile, add:

source "https://rubygems.org"

gem "rababa"

Usage inside maps

stage {
  rababa config: "200"
}

As of now, Rababa is usable only by the Ruby implementation.

Usage from command line

interscript input.txt \
  --system=var-ara-Arab-Arab-rababa \
  --output=output.txt

Usage with Secryst

'

Secryst is a seq2seq transformer suited for transliteration. Written in Ruby. It’s installation is a bit tricky, you should consult its own installation guide (at GitHub). By default we don’t use Secryst, unless you have installed it.

Using it standalone

It’s enough to install it. Be sure to consult the guide above.

Integration with Ruby Applications

In your Gemfile, add:

source "https://rubygems.org"

gem "secryst"

Create a Secrystfile near your Gemfile with the following, for each model you want to use in your application. Please consult our Secrystfile to get all the maps needed to get all the Secryst maps needed.

model "model-name"

Usage inside maps

stage {
  # ... sub "a", "b" ...
  secryst model: "model-name"
  # ... sub "c", "d" ...
}

As of now, Secryst is usable only by the Ruby implementation.