Releases: LAiSER-Software/extract-module
v1.0.0 — Main Launch
v0.5.0 — Knowledge & Task Pipeline
What's New
Knowledge & Task Pipeline
- Add Knowledge and Task FAISS indexes alongside existing Skills index
- Add prebuilt taxonomy indexes for Knowledge and Tasks
- Add pipeline build scripts for Knowledge and Task indexes
KST Extraction API
- Generalize AlignmentService to support all three index types (Knowledge,
Skills, Tasks) - Add concept extraction API for Knowledge, Skills, Tasks
Other
- Update Gemini integration
- Add Task Statements workbook
v0.4.1
V0.4.0.1 Streamlined Index Pipeline and Multi-Cloud LLM Support
Release Notes
Highlights
This release introduces a streamlined index pipeline and adds multi-cloud LLM support.
What's New
Streamlined Index Pipeline
The index pipeline has been simplified to make extraction and indexing workflows more efficient and easier to manage.
Multi-Cloud LLM Support
Support for multi-cloud LLM integration has been added, enabling the module to work across different cloud-based model providers.
Release v0.3.2: Architectural Refactor, Enhanced Platform Support, and Bug Fixes
🚀 Development Update
✨ Key Features & Improvements
- Architectural Refactoring: Implemented layered architecture with improved separation of concerns
- LLM Models: Added
llm_modelssubpackage for better organization - Documentation: Updated contributor guidelines with Conventional Commit standards
🐛 Bug Fixes
- Fixed
dtypeinconsistencies with GPTQ quantization, switched to AWQ quantization - Resolved
data_type errorfor Gemma2 models in vLLM, no longer supported - Updated input exceptions to prevent unnecessary warnings
- Improved input data format to support user preferred labels
- Enhanced LLM response parsing
📚 Documentation Updates
- Added Semantic Versioning convention
- Updated README and contributor guides
- Added refactoring guide
🔧 Technical Changes
- Updated project configuration in
pyproject.toml - Enhanced main application entry point at
__init__.py - Improved error handling and data access patterns
- Implemented the Service layer to separate business logic
Release v0.3.0: LLM-Driven Skill Extraction & Taxonomy Mapping
LAiSER Extract Module v0.3 🚀
🚀 Development Update
✨ Key Features & Improvements
1. LLM-Driven Skill Extraction
- Replaces the old raw description embedding approach.
- Skills are now explicitly extracted from cleaned job descriptions using structured LLM prompts.
2. Preprocessing Layer Added
- Job descriptions are cleaned before extraction to remove irrelevant or misleading content (PII, HR/legal text, fluff, etc.).
- Ensures that alignment happens only on meaningful skill signals.
3. Testing Pipeline Introduced
- Multiple test files added to validate core package functionality.
- Strengthens reliability and long-term maintainability.
📂 Release Resources
- Source code: GitHub Repository
- Python package : PyPi
📌 Notes
This release marks the transition from a doc-embedding-only approach to a hybrid pipeline (Preprocess → LLM Extract → FAISS Align), producing cleaner, more interpretable, and auditable outputs.
Skill-Extraction v0.2.2 (ESCO + KSA)
Release v0.2.2 · Skill-Extraction Refactor (“ESCO + KSA”)
⚡PR⚡#101
✨ Highlights
-
Taxonomy-aware skill extraction
• Integrates a FAISS index of the ESCO skills taxonomy.
•Skill_Extractor.get_top_esco_skills()now returns{Skill, index, score}enabling deterministicSkill Tagvalues (ESCO.<index>). -
KSA enrichment with vLLM
• New helperget_ksa_details()generates Knowledge Required and Task Abilities lists for each skill.
• Automatically invoked when a GPU/vLLM backend is available. -
Unified output schema
The extractor returns a tidy DataFrame with seven columns:
Research ID, Description, Raw Skill, Knowledge Required, Task Abilities, Skill Tag, Correlation Coefficient.
🔧 Detailed Changes
| Area | Description |
|---|---|
| utils.py | get_top_esco_skills() enhanced to include ESCO index and similarity score. |
| llm_methods.py | Added get_ksa_details() plus supporting imports. |
| skill_extractor.py | • Ensured self.index is always defined.• build_faiss_index_esco() / load_faiss_index_esco() now instance methods storing the index under laiser/input.• New taxonomy-first pipeline inserted at the top of extractor(); legacy alignment kept for fallback. |
⚠️ Deprecated / To Be Removed
align_skills()andalign_KSAs()will be dropped in v0.3 once consumers migrate to the new output format.
🚧 Known Issues / Roadmap
- JSON parsing in
get_ksa_details()needs additional resilience checks. - LLM calls are still executed per skill; batching will come in v0.3.
- Duplicate
import jsonlines remain inllm_methods.py. - Consider CPU-only fallback for KSA generation.
- Persistence of the ESCO vector index should move to a cloud vector DB.
- vLLM isn't supported on MPS/MacOS as of now.
⬆️ Upgrade Notes
pip install -U laiser==0.2.2 No changes to input parameters are required, but downstream code should read the new seven-column schema.
Next up
0.2 → 0.3
- adding batching and dropping deprecated APIs; increment patch for bug fixes.