Skip to content

gungorkaya-eng/toefl-essential-vocabulary-dataset

Repository files navigation

🎓 TOEFL Essential Vocabulary Dataset (AI-Enriched)

License: MIT Format Words

A meticulously curated, AI-enriched dataset of 1,000 high-frequency academic words essential for the TOEFL iBT, IELTS, and advanced English comprehension.

🌟 Why This Dataset?

Most open-source vocabulary lists only provide a word and a simple definition. This dataset is engineered for deep linguistic understanding and NLP applications. Each entry includes:

  • Academic Theme: The specific field (e.g., Biology, Sociology, Geology) where the word frequently appears.
  • Exact Synonyms: Hand-picked synonyms curated specifically for academic reading comprehension.
  • Contextual Example: A high-quality, TOEFL-level sentence demonstrating real-world academic usage.
  • Difficulty Level: Rated from 3 (Intermediate) to 5 (Advanced).
  • POS & Definitions: Precise Part of Speech tagging and accurate English definitions.

📂 File Formats Included

  • toefl_essential_vocabulary.json: Full structured data (Best for Web & Mobile Apps).
  • toefl_essential_vocabulary.csv: Tabular data (Best for Pandas, Data Science, and Kaggle).
  • toefl_essential_vocabulary.txt: Plain word list (Best for quick array imports).

💻 Sample Data (JSON)

{
  "word": "proliferation",
  "pos": "noun",
  "difficulty": 5,
  "theme": "Biology",
  "synonyms": ["multiplication", "expansion"],
  "definition_en": "Rapid increase in numbers.",
  "example_sentence": "The proliferation of invasive plant species severely threatens the delicate balance of the local ecosystem."
}

🚀 Get the Full Version & Multilingual App This repository contains the core 1,000-word dataset.

If you are a student preparing for exams, you can access the Full 1,650-Word Database, featuring interactive flashcards, multiple academic sentences for each word, and complete translations in Turkish, German, and Spanish at:

👉 https://wordlevel.net

⚖️ License & Attribution This dataset is completely free and open-source under the MIT License. You can use it in your apps, research, or LLM training pipelines.

Attribution Requirement: If you use this dataset in a public repository, website, or research paper, you must provide a clickable do-follow backlink to https://wordlevel.net as the original source of the data.

About

A meticulously curated, AI-enriched dataset of 1000 essential TOEFL words with academic context

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors