Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
-
Updated
Mar 26, 2026 - C#
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
A list of Romanian NLP Datasets
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.
Measure how understandable a German text is.
Dataset for web-scaled information extraction.
This repo is the dataset for the paper "A New Dataset and Methodology for Malicious URL Classification"
Dataset with annotation of Russian-language poems
Persian Slang Words (dataset)
Free news datasets from Newsdata.io for ML, NLP, and sentiment analysis - Business, Sports, Entertainment, Health, COVID, Politics, Tech & more.
CSV extraction of Kamus Besar Bahasa Indonesia (KBBI) v6.1.0. Over 194,000 research-ready entries with full metadata (Meanings, Examples, Etymology, and Classes).
Persian sms dataset
Multi-Perspective Sarcasm Explanation Dataset with Human
A meticulously curated, AI-enriched dataset of 1000 essential TOEFL words with academic context
CC0 corpus of 1,354 ancient Greek authors and 4,053 works — Homer through late antiquity. Philosophy, history, drama, lyric, medicine, mathematics, rhetoric, and the fragmentary traditions, in clean Unicode Greek. PDF, Markdown, plain text, and JSON. Data store for Eulogikon (https://eulogikon.org). AI training permitted.
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
ELNER-DZ: A Dataset for Named Entity Recognition and Linking in Algerian Arabic Dialect (Darija)
A conservative release candidate of cleaned Chinese legal texts for legal NLP, RAG prototypes, corpus cleaning, and training-data preparation.
國際公約中英雙語結構化資料集 · Bilingual international treaty corpus in structured JSON
All the resources needed to establish an Islamic AI: a curated PDF library and a custom-developed persona
Add a description, image, and links to the nlp-dataset topic page so that developers can more easily learn about it.
To associate your repository with the nlp-dataset topic, visit your repo's landing page and select "manage topics."