Spectrum Technology Platform

 View Only
  • 1.  About transliteration

    Employee
    Posted 03-01-2019 09:41
    Edited by Eric Hubert 03-01-2019 09:56
    The Spectrum Transliterator componant relies on ICU4J. ICU4J means International Component for Unicode for Java.  It is a mature, widely used Java library providing Unicode and Globalization support based on the ICU specifications.

    ICU Wikipedia
    The ICU Project

    With the Spectrum transliteration component you can override the default transliteration values mapping in the stage options by passing a specific TransliteratorID. For instance getting rid of diacritics for better data matching is very easy by passing the value "Latin-ASCII" - not exposed in the stage - .Transliterator identifiers could be Basic, Filtered or Compound as explained there: http://userguide.icu-project.org/transforms/general

    ------------------------------
    Eric Hubert
    PreSales Engineer
    Pitney Bowes Software France SAS
    Levallois Perret
    ------------------------------