-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path.gitignore
More file actions
343 lines (269 loc) · 7.88 KB
/
Copy path.gitignore
File metadata and controls
343 lines (269 loc) · 7.88 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[codz]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py.cover
.hypothesis/
.pytest_cache/
cover/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
.pybuilder/
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version
# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock
# UV
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
#uv.lock
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock
#poetry.toml
# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
#pdm.lock
#pdm.toml
.pdm-python
.pdm-build/
# pixi
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
#pixi.lock
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
# in the .venv directory. It is recommended not to include this directory in version control.
.pixi
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/
# Celery stuff
celerybeat-schedule
celerybeat.pid
# SageMath parsed files
*.sage.py
# Environments
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# pytype static type analyzer
.pytype/
# Cython debug symbols
cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
# Abstra
# Abstra is an AI-powered process automation framework.
# Ignore directories containing user credentials, local state, and settings.
# Learn more at https://abstra.io/docs
.abstra/
# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/
# Ruff stuff:
.ruff_cache/
# PyPI configuration file
.pypirc
# Cursor
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
# refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore
# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/
# -----------------------------
# Custom: Newsletter Project
# -----------------------------
# OS/editor cruft
.DS_Store
# Local public-release audit scratch files
xxx_*.md
# Raw data (original newsletter HTML files)
data_raw
# Local database backups; tracked historical manifests/files are unaffected.
data/supabase_backup/
# Large or intermediate processed files
*.csv
*.xlsx
# Optional: temporary working folders
data/tmp/
.env
# .ipynb_checkpoints/
**/.ipynb_checkpoints/
# -----------------------------
# AM1 Project Specific
# -----------------------------
# Model artifacts
*.pkl
analysis/models/
# MLflow
mlruns/
# Data: ignore dataset file contents under data/, but TRACK the code/apps/READMEs
# that now live there (labelling_app, document_check_app, link_resolver, sync scripts).
# (*.csv is already ignored globally above.)
data/**/*.parquet
data/**/*.feather
data/**/*.xlsx
data/**/*.xls
data/**/*.npz
data/**/*.faiss
data/tmp/
# Internal docs (not pushed to GitHub)
docs/internal/
# Dashboard snapshots (data files)
pipelines/nmf_baseline/dashboard/streamlit_nmf/snapshots/
# Credentials
credentials.json
# --- BERTopic artefacts (see ARTIFACTS.md) ---
# Embeddings: ~165MB, rebuildable via pipelines/bertopic_epoch/training/build_embeddings.py -> ignore.
pipelines/bertopic_epoch/models/embeddings/
# Regenerated analysis outputs -> ignore (canonical = frozen models + crosswalk).
pipelines/bertopic_epoch/outputs/
# ...EXCEPT the category scheme — it's a RUNTIME dependency of the inference API (the
# topic->category map read by inference.py + baked into the Docker image), so track it.
# This 3-line dance un-ignores just this one file inside the otherwise-ignored outputs/
# dir, and (being last + most specific) also beats the global *.csv rule above.
!pipelines/bertopic_epoch/outputs/
pipelines/bertopic_epoch/outputs/*
!pipelines/bertopic_epoch/outputs/final_topics_category_scheme.csv
# Frozen v1 topic_models (~19MB) are NOT ignored: commit them (load-bearing;
# topic IDs are referenced by docs/methods/category_scheme.csv).
# Local pre-restructure backups (never commit)
_backup_pre_restructure/
# MLflow local tracking (file-based)
experiments/mlruns/
.vscode/
# scraping pipeline data outputs
ingestion/data/
x_dll/
# PDFs (working docs, not part of the corpus)
*.pdf
# private company/commercialisation strategy — do not commit
planning/x_commercialisation_strategy.md
experiments/.numba_cache/
experiments/.matplotlib_cache/
*.pyc
# Guard: stray model tree if a notebook mis-resolves ROOT
pipelines/bertopic_epoch/notebooks/models/
# personal planning scratch (not for repo)
xxx_plan.md
# assessment working scratch (not for repo)
docs/assessment/business_metrics.md
# PSD assessment write-up + working drafts (personal, not for repo)
AM1 NOW/
docs/assessment/L6_MLengineer_PSD.docx
# editor/agent/OS junk
.claude/
.codex/
.agents/
.vscode/
**/.DS_Store
# Assessment working docs (kept local, not tracked)
X_ASSESSMENT/AIE PORTFOLIO QUESTIONS .docx
X_ASSESSMENT/L6_MLengineer_PSD_Draft_LY.docx
# Paper drafts (kept local, not tracked)
EMPIRICAL FLAGSHIP/
# Country briefs (kept local, not tracked)
country_briefs/
docs/briefs/
# EPOCH dashboard (old marketing-framed build; kept local, not tracked pending rebuild)
dashboard/epoch/
# TRACE dashboard WIP (kept local, not tracked)
July dashboard update/