Skip to content

Releases: LAiSER-Software/extract-module

v1.0.0 — Main Launch

19 Jun 18:17

Choose a tag to compare

First stable release of LAiSER for extraction and alignment workflows.

v0.5.0 — Knowledge & Task Pipeline

07 May 05:01
a412792

Choose a tag to compare

What's New

Knowledge & Task Pipeline

  • Add Knowledge and Task FAISS indexes alongside existing Skills index
  • Add prebuilt taxonomy indexes for Knowledge and Tasks
  • Add pipeline build scripts for Knowledge and Task indexes

KST Extraction API

  • Generalize AlignmentService to support all three index types (Knowledge,
    Skills, Tasks)
  • Add concept extraction API for Knowledge, Skills, Tasks

Other

  • Update Gemini integration
  • Add Task Statements workbook

v0.4.1

31 Mar 02:46

Choose a tag to compare

Add source_url to taxonomy alignment output

V0.4.0.1 Streamlined Index Pipeline and Multi-Cloud LLM Support

17 Mar 08:31
57759d7

Choose a tag to compare

Release Notes

Highlights

This release introduces a streamlined index pipeline and adds multi-cloud LLM support.

What's New

Streamlined Index Pipeline

The index pipeline has been simplified to make extraction and indexing workflows more efficient and easier to manage.

Multi-Cloud LLM Support

Support for multi-cloud LLM integration has been added, enabling the module to work across different cloud-based model providers.

Release v0.3.2: Architectural Refactor, Enhanced Platform Support, and Bug Fixes

17 Dec 07:56
1aa16db

Choose a tag to compare

🚀 Development Update

✨ Key Features & Improvements

  • Architectural Refactoring: Implemented layered architecture with improved separation of concerns
  • LLM Models: Added llm_models subpackage for better organization
  • Documentation: Updated contributor guidelines with Conventional Commit standards

🐛 Bug Fixes

  • Fixed dtype inconsistencies with GPTQ quantization, switched to AWQ quantization
  • Resolved data_type error for Gemma2 models in vLLM, no longer supported
  • Updated input exceptions to prevent unnecessary warnings
  • Improved input data format to support user preferred labels
  • Enhanced LLM response parsing

📚 Documentation Updates

  • Added Semantic Versioning convention
  • Updated README and contributor guides
  • Added refactoring guide

🔧 Technical Changes

  • Updated project configuration in pyproject.toml
  • Enhanced main application entry point at __init__.py
  • Improved error handling and data access patterns
  • Implemented the Service layer to separate business logic

Release v0.3.0: LLM-Driven Skill Extraction & Taxonomy Mapping

02 Oct 20:14
145f613

Choose a tag to compare

LAiSER Extract Module v0.3 🚀

🚀 Development Update

✨ Key Features & Improvements

1. LLM-Driven Skill Extraction

  • Replaces the old raw description embedding approach.
  • Skills are now explicitly extracted from cleaned job descriptions using structured LLM prompts.

2. Preprocessing Layer Added

  • Job descriptions are cleaned before extraction to remove irrelevant or misleading content (PII, HR/legal text, fluff, etc.).
  • Ensures that alignment happens only on meaningful skill signals.

3. Testing Pipeline Introduced

  • Multiple test files added to validate core package functionality.
  • Strengthens reliability and long-term maintainability.

📂 Release Resources


📌 Notes

This release marks the transition from a doc-embedding-only approach to a hybrid pipeline (Preprocess → LLM Extract → FAISS Align), producing cleaner, more interpretable, and auditable outputs.

Skill-Extraction v0.2.2 (ESCO + KSA)

12 Jun 17:57
106bf7d

Choose a tag to compare

Release v0.2.2 · Skill-Extraction Refactor (“ESCO + KSA”)

⚡PR⚡#101

✨ Highlights

  1. Taxonomy-aware skill extraction
    • Integrates a FAISS index of the ESCO skills taxonomy.
    Skill_Extractor.get_top_esco_skills() now returns {Skill, index, score} enabling deterministic Skill Tag values (ESCO.<index>).

  2. KSA enrichment with vLLM
    • New helper get_ksa_details() generates Knowledge Required and Task Abilities lists for each skill.
    • Automatically invoked when a GPU/vLLM backend is available.

  3. Unified output schema
    The extractor returns a tidy DataFrame with seven columns:
    Research ID, Description, Raw Skill, Knowledge Required, Task Abilities, Skill Tag, Correlation Coefficient.


🔧 Detailed Changes

Area Description
utils.py get_top_esco_skills() enhanced to include ESCO index and similarity score.
llm_methods.py Added get_ksa_details() plus supporting imports.
skill_extractor.py • Ensured self.index is always defined.
build_faiss_index_esco() / load_faiss_index_esco() now instance methods storing the index under laiser/input.
• New taxonomy-first pipeline inserted at the top of extractor(); legacy alignment kept for fallback.

⚠️ Deprecated / To Be Removed

  • align_skills() and align_KSAs() will be dropped in v0.3 once consumers migrate to the new output format.

🚧 Known Issues / Roadmap

  1. JSON parsing in get_ksa_details() needs additional resilience checks.
  2. LLM calls are still executed per skill; batching will come in v0.3.
  3. Duplicate import json lines remain in llm_methods.py.
  4. Consider CPU-only fallback for KSA generation.
  5. Persistence of the ESCO vector index should move to a cloud vector DB.
  6. vLLM isn't supported on MPS/MacOS as of now.

⬆️ Upgrade Notes

pip install -U laiser==0.2.2 

No changes to input parameters are required, but downstream code should read the new seven-column schema.


Next up

0.2 → 0.3

  • adding batching and dropping deprecated APIs; increment patch for bug fixes.