Domain Adaptation DANN Retrieval

A domain-adversarial retrieval study on whether reducing source-target domain gap preserves or damages embedding neighborhood structure.

This repository compares a source-only retrieval baseline against a Domain-Adversarial Neural Network under controlled source-target shift. The goal is to test whether domain-invariant representations actually improve retrieval, or whether adversarial alignment can reduce domain gap while disturbing useful nearest-neighbor structure.

The key idea:

Making domains look similar is not enough.
For retrieval, the embedding space must still preserve useful nearest-neighbor structure.

Core Question

Can domain-adversarial training reduce source-target representation shift without damaging retrieval neighborhood structure?

Why This Repo Exists

Domain adaptation is often evaluated with classification accuracy or domain confusion.

Retrieval is different.

For retrieval, a representation must preserve local geometry: samples from the same semantic group should remain close, even after source-target alignment.

This repository evaluates that tradeoff directly by comparing:

a source-only retrieval baseline
a DANN model with gradient reversal and domain classification

The goal is not only to ask whether DANN reduces domain gap, but whether that reduction actually helps retrieval.

Domain-Adaptation Visualizations

The plots below summarize how DANN changes retrieval performance, domain alignment, and source-target embedding spread.

Retrieval comparison	Domain alignment	Embedding variance

Each panel links to the full-resolution figure.

Panel	What to notice
Retrieval comparison	Baseline and DANN are compared using source-to-target Recall@1, Recall@10, and Recall@50 under moderate and strong domain shift.
Domain alignment	DANN reduces the domain centroid distance, but domain confusion alone does not guarantee better retrieval. The dashed line at 0.5 marks chance-level domain accuracy. Since source vs target is a binary prediction task, 0.5 means the domain classifier is guessing randomly. Values closer to 0.5 indicate stronger domain confusion, but this should be interpreted together with retrieval metrics.
Embedding variance	Source and target embedding variance show whether the learned representation remains spread out or becomes compressed after adversarial alignment.

These figures are qualitative diagnostics. The main conclusions are based on the quantitative results in experiments/results_table.csv.

Experiment Design

The benchmark uses controlled synthetic CXR-like image domains.

These are not clinical images. They are controlled source-target image distributions designed to isolate domain-shift behavior before moving to authorized real datasets.

Two domain-shift levels are tested:

Shift level	Meaning
moderate	target domain has moderate contrast, noise, artifact, and latent-distribution shift
strong	target domain has stronger visual and latent-distribution shift

Each setting compares:

Model	Description
Baseline	supervised contrastive retrieval model without adversarial domain alignment
DANN	same retrieval model with gradient reversal and a domain classifier

Method

The model uses:

Component	Role
CNN encoder	maps images into an embedding space
Supervised contrastive loss	preserves group-based retrieval structure
Domain classifier	predicts source vs target domain
Gradient reversal layer	encourages domain-invariant representations
Retrieval metrics	evaluate whether domain alignment preserves neighborhood structure

Result Snapshot

The current benchmark uses:

30,000 samples per domain
60,000 total rows
48,000 training rows
12,000 validation rows
image size 96
batch size 512
10 epochs
RTX 4090 with mixed precision

Documentation

Full results:

Main observation

DANN reduced source-target domain gap in both moderate and strong shift settings, but retrieval improvement was mixed.

Shift	DANN effect
Moderate	reduced domain gap slightly, but slightly reduced retrieval
Strong	reduced domain gap clearly, improved Recall@1, but slightly reduced Recall@10 and Recall@50

This supports the main conclusion:

Domain invariance alone is not sufficient for retrieval.
Retrieval-specific metrics are needed to verify whether aligned representations remain useful.

Compute Setup

Experiments were run with CUDA-enabled PyTorch on:

NVIDIA GeForce RTX 4090
CUDA 12.8
Mixed precision enabled

The training script records:

epoch time
peak CUDA memory
batch size
training loss
retrieval Recall@K
Lift@K over random retrieval
domain classifier accuracy
source-target centroid distance

Quick Start

Install dependencies:

pip install -r requirements.txt

Generate controlled source-target domain data:

python src/make_domain_data.py --shift-level moderate --n-samples 30000 --n-groups 300 --image-size 96

Train the baseline model:

python src/train.py --model-type baseline --epochs 10 --batch-size 512 --image-size 96 --num-workers 0 --output-dir outputs/moderate_baseline --checkpoint-dir checkpoints/moderate_baseline

Train the DANN model:

python src/train.py --model-type dann --epochs 10 --batch-size 512 --image-size 96 --num-workers 0 --lambda-domain 0.25 --lambda-grl 1.0 --output-dir outputs/moderate_dann --checkpoint-dir checkpoints/moderate_dann

Collect results:

python src/collect_results.py

Repository Structure

domain-adaptation-dann-retrieval/
│
├── src/
│   ├── make_domain_data.py
│   ├── dataset.py
│   ├── model.py
│   ├── losses.py
│   ├── metrics.py
│   ├── train.py
│   └── collect_results.py
│
├── docs/
│   └── project_framing.md
│
├── experiments/
│   ├── results_table.csv
│   └── results_summary.md
│
├── requirements.txt
└── README.md

Generated folders are intentionally ignored:

data_generated/
outputs/
checkpoints/

Related Work

This project is grounded in domain-adversarial representation learning and medical image domain adaptation.

The central method reference is DANN, which introduced gradient reversal for learning features that remain discriminative for the main task while becoming difficult to classify by domain.

Medical image domain adaptation surveys motivate the broader problem: models can degrade when data shifts across scanners, institutions, acquisition protocols, preprocessing pipelines, or patient populations.

This repository extends that motivation into a retrieval-specific diagnostic setting, where the goal is not only domain invariance but preservation of neighborhood geometry.

References

DANN: Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-Adversarial Training of Neural Networks. JMLR, 2016.
Medical image domain adaptation survey: Hao Guan and Mingxia Liu. Domain Adaptation for Medical Image Analysis: A Survey. IEEE Transactions on Biomedical Engineering, 2022.
Deep UDA medical imaging review: S. Kumari et al. Deep Learning for Unsupervised Domain Adaptation in Medical Imaging: Recent Advancements and Future Perspectives. Computers in Biology and Medicine, 2024.
CXR domain adaptation example: Pierre Thiam et al. Unsupervised Domain Adaptation for the Detection of Cardiomegaly in Chest X-ray Images. Frontiers in Artificial Intelligence, 2023.
Contrastive domain adaptation: Yuexiang Feng et al. Contrastive Domain Adaptation with Consistency Match for Medical Image Analysis. Medical Image Analysis, 2023.

Scope

This repository uses controlled synthetic CXR-like source-target images.

The results should be interpreted as evidence about domain-adversarial retrieval behavior under known domain shift, not as clinical validation.

The main value of this benchmark is diagnostic: it shows that reducing domain gap does not automatically guarantee better retrieval, so retrieval-specific metrics must be evaluated directly.

A natural next step is to test the same baseline-vs-DANN comparison on authorized real medical imaging datasets and tune the adversarial weight across multiple values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Domain Adaptation DANN Retrieval

Core Question

Why This Repo Exists

Domain-Adaptation Visualizations

Experiment Design

Method

Result Snapshot

Documentation

Main observation

Compute Setup

Quick Start

Repository Structure

Related Work

References

Scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
experiments		experiments
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Domain Adaptation DANN Retrieval

Core Question

Why This Repo Exists

Domain-Adaptation Visualizations

Experiment Design

Method

Result Snapshot

Documentation

Main observation

Compute Setup

Quick Start

Repository Structure

Related Work

References

Scope

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages