Language codes issue

Since it is a language processing library, it probably makes more sense to use language codes ISO_639-1, ISO_639-2, ISO_639-3 than country codes ISO 3166-1.

- **Belarusian** language code is **be** not **by** (ISO 3166-1)
- **Montenegrin** language code is **cnr** (ISO_639-2) not **me** (ISO 3166-1)
- **Serbian** language code is **sr** not **rs** (ISO 3166-1)
- **Tajik** language code is **tg** not **tj** (ISO 3166-1)
- **Ukrainian** language code is **uk** not **ua** (ISO 3166-1)

- Use Case : If I want to display programmatically the list of language supported, I need to get supported codes from cyrtranslit.supported() and then do the conversion from the country ISO 3166 to the language ISO_639 to have the name of the language and not the name of the country.

- Use Case : I am using langdetect library to detect the language of the text to transcript and use this language code (ISO_639) as a parameter of cyrtranslit. It will not work for the languages listed above because cyrtranslit is using country codes.

- Finally, I will recommend to use only ISO_639-2 (not ISO_639-1) since **Montenegrin** do not have ISO_639-1
  In such case the list of supported languages will be:
  - **Belarusian** : bel
  - **Bulgarian** : bul
  - **Greek** : ell
  - **Montenegrin** : cnr
  - **Macedonian** : mkd
  - **Mongolian** : mon
  - **Russian** : rus
  - **Serbian** : srp
  - **Tajik** : tgk
  - **Ukrainian** : ukr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language codes issue #63

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Language codes issue #63

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions