nori/CITATION.cff at main · tenki-labs/nori · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
cff-version: 1.2.0
title: "NORI: A Translation-Universals Benchmark for Norwegian LLMs"
message: "If you use this benchmark in academic work, please cite it as below."
type: software
authors:
  - given-names: Einar
    family-names: Holt
    affiliation: tenki
    orcid: ""
repository-code: "https://github.com/tenki-labs/nori"
url: "https://github.com/tenki-labs/nori"
abstract: >-
  NORI (NORwegian Idiomatic) is a reproducible benchmark that measures how
  natively Norwegian an LLM's Norwegian output actually is. It operationalizes
  the five translation-universals categories from Toury (1995), Baker (1996),
  and Mauranen and Kujamaki (2004) as concrete measurements: explicitation,
  normalization, simplification, levelling out, and source-language interference.
  Each axis is normalized to [0, 1] against a reference distribution computed
  on 1,500 Norwegian Wikipedia articles plus eight verified Norwegian-language
  works from Project Gutenberg (Hamsun ×3, Ibsen ×3, Undset, Wergeland).
  v2.1.1 restored the Gutenberg branch with byte-verified IDs after v2.1.0
  withdrew the original five IDs (which had resolved to unrelated English,
  French, and Greek texts). The composite NORI score is the mean of the
  five axes.
keywords:
  - norwegian
  - bokmaal
  - llm
  - benchmark
  - translation-studies
  - low-resource-nlp
  - tenki
license: MIT
version: "2.1.1"
date-released: 2026-05-27
preferred-citation:
  type: software
  authors:
    - given-names: Einar
      family-names: Holt
  title: "NORI: A Translation-Universals Benchmark for Norwegian LLMs"
  year: 2026
  url: "https://github.com/tenki-labs/nori"
  version: "2.1.1"