This python package converts sentences into tokens and passes tokens through a model to get the sentence embedding. Designed to take dataloader format as input.
First, within your environment, install the package.
pip install git+https://github.com/stair-lab/embedder.gitIn your script, include the module:
from embed_text_package.embed_text_v2 import EmbedderThen you can initialize an embedder, load the model and call it:
NOTE: the load() function will load both, the model and embedder.
model_name = "<HF_repo>/<HF_model>"
embdr = Embedder()
embdr.load(model_name)
emb = embdr.get_embeddings(dataloader, MODEL_NAME, cols_to_be_embded)Where dataloader is type Dataloader,
model_name is type str.
cols_to_be_embded is type list and should contain the names of the columns
of the dataloader dataset which shall be embedded.
First, within your environment, install the package pytest.
pip install pytestThen, cd to main folder of the package ("embedder") and type:
pytest