-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCITATION.cff
More file actions
44 lines (44 loc) · 1.62 KB
/
Copy pathCITATION.cff
File metadata and controls
44 lines (44 loc) · 1.62 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
cff-version: 1.2.0
title: "NORI: A Translation-Universals Benchmark for Norwegian LLMs"
message: "If you use this benchmark in academic work, please cite it as below."
type: software
authors:
- given-names: Einar
family-names: Holt
affiliation: tenki
orcid: ""
repository-code: "https://github.com/tenki-labs/nori"
url: "https://github.com/tenki-labs/nori"
abstract: >-
NORI (NORwegian Idiomatic) is a reproducible benchmark that measures how
natively Norwegian an LLM's Norwegian output actually is. It operationalizes
the five translation-universals categories from Toury (1995), Baker (1996),
and Mauranen and Kujamaki (2004) as concrete measurements: explicitation,
normalization, simplification, levelling out, and source-language interference.
Each axis is normalized to [0, 1] against a reference distribution computed
on 1,500 Norwegian Wikipedia articles plus eight verified Norwegian-language
works from Project Gutenberg (Hamsun ×3, Ibsen ×3, Undset, Wergeland).
v2.1.1 restored the Gutenberg branch with byte-verified IDs after v2.1.0
withdrew the original five IDs (which had resolved to unrelated English,
French, and Greek texts). The composite NORI score is the mean of the
five axes.
keywords:
- norwegian
- bokmaal
- llm
- benchmark
- translation-studies
- low-resource-nlp
- tenki
license: MIT
version: "2.1.1"
date-released: 2026-05-27
preferred-citation:
type: software
authors:
- given-names: Einar
family-names: Holt
title: "NORI: A Translation-Universals Benchmark for Norwegian LLMs"
year: 2026
url: "https://github.com/tenki-labs/nori"
version: "2.1.1"