Skip to content

Language codes issue #63

Description

@Kungs-Fr

Since it is a language processing library, it probably makes more sense to use language codes ISO_639-1, ISO_639-2, ISO_639-3 than country codes ISO 3166-1.

  • Belarusian language code is be not by (ISO 3166-1)

  • Montenegrin language code is cnr (ISO_639-2) not me (ISO 3166-1)

  • Serbian language code is sr not rs (ISO 3166-1)

  • Tajik language code is tg not tj (ISO 3166-1)

  • Ukrainian language code is uk not ua (ISO 3166-1)

  • Use Case : If I want to display programmatically the list of language supported, I need to get supported codes from cyrtranslit.supported() and then do the conversion from the country ISO 3166 to the language ISO_639 to have the name of the language and not the name of the country.

  • Use Case : I am using langdetect library to detect the language of the text to transcript and use this language code (ISO_639) as a parameter of cyrtranslit. It will not work for the languages listed above because cyrtranslit is using country codes.

  • Finally, I will recommend to use only ISO_639-2 (not ISO_639-1) since Montenegrin do not have ISO_639-1
    In such case the list of supported languages will be:

    • Belarusian : bel
    • Bulgarian : bul
    • Greek : ell
    • Montenegrin : cnr
    • Macedonian : mkd
    • Mongolian : mon
    • Russian : rus
    • Serbian : srp
    • Tajik : tgk
    • Ukrainian : ukr

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions