We introduce a novel framework, ConceptDrift, that models the hypothesis generation task as a sequence of temporal graphlets and simultaneously encodes spatial, temporal, and semantic change. Unlike existing approaches that treat these dimensions independently, ConceptDrift is the first to provide a holistic understanding of concept evolution by integrating them into a unified framework. Grounded in the theories of the Distributional Hypothesis and Conceptual Change, our method adapts these principles to the unique challenges of large-scale biomedical literature. We conduct extensive experiments across multiple datasets and demonstrate that ConceptDrift consistently outperforms state-of-the-art baselines in generating accurate and meaningful hypotheses. Our framework shows immediate practical benefits for web-based literature mining tools in life sciences and biomedicine, offering more robust and predictive feature representations.
- This repository holds the virology, neurology, and immunology datasets along with the implementation of ConceptDrift.
Use Git-LFS to ensure data.zip is downloaded. Please unzip data.zip in the project directory. You should have a directory named data in the project directory.
Our temporal dynamic graphs are stored as Torch-Geometric TemporalData objects. You can find a pickle files holding the datasets along with mappings from the node ids to MeSH Terms in the data/{dataset} folders. The biobert embeddings for the terms are also provided in the data/{dataset} folders.
To ensure optimal performance and compatibility, your system should meet the following requirements:
- Python and Library Versions:
- Python: Version 3.10 or higher
- PyTorch: Version 2.0 or higher
- Torch Geometric: Version 2.5.2 or higher
- CUDA: Version 11.8
- Hardware Compatibility:
- ConceptDrift has been tested to successfully work under the following conditions:
- CPU Systems: Single CPU with at least 32GB of RAM.
- GPU Systems: Nvidia A40, A6000, or A100 GPU for accelerated computation.
To ensure ConceptDrift can run properly on your system, please follow the following steps:
- To replicate our environment with all the necessary packages, please install the packages in
requirements.txtin a virtual environment (e.g. Conda or Pip environment) before running our code. - Edit
src/config.inito include the file path to thedatafolder.
- To train ConceptDrift, activate your environment with the necessary packages and go to the
srcfolder. - Execute
python train.py --dataset {dataset}to train on thevirology,neurology, orneurologydataset.
Hyperparameters can be adjusted with the following command line arguments: --batch_size (default is 200), --max_epochs (default is 2), and --lr (default is 0.0001).
