Skip to content

Commit 632a724

Browse files
authored
Merge pull request #66 from kaya70875/refactor/improve-return-types-for-core-methods
Refactor/improve return types for core methods
2 parents 72bdf4a + e797ac9 commit 632a724

17 files changed

Lines changed: 220 additions & 137 deletions

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,16 @@ All notable changes to this project will be documented here.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
66

77
## [2.4.0]
8+
### Added
9+
- Added a `FetchResult` type alias for fetch results that may contain `ChannelData`, `VideoTranscript`, `VideoComments`, or `DLSnippet` objects.
10+
- Added `--transcripts-only` and `--snippets-only` CLI fetch modes.
11+
812
### Changed
913
- Improved developer experience with returning empty list objects on some methods instead of `None`.
14+
- Changed return types and values for `fetch_transcripts`, `fetch_snippets` and `fetch_comments` to improve type hints.
15+
- Changed CLI comment flags: `--comments` and `--comments-only` now select the fetch mode, while `--max-comments` controls the number of comments per video.
16+
- Exporters, `PreviewRenderer`, and `channel_data_to_rows()` now accept any supported fetch result shape and normalize it internally.
17+
1018

1119
## [2.3.2]
1220
### Fixed

README.md

Lines changed: 31 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ A python tool for fetching thousands of videos fast from a Youtube channel along
1919
- [Features](#features)
2020
- [Fetching Specific Channel Tabs (Videos / Shorts / Streams)](#fetching-specific-channel-tabs-videos--shorts--streams)
2121
- [Using Different Fetchers](#using-different-fetchers)
22-
- [Retreive Different Languages](#retreive-different-languages)
22+
- [Retrieve Different Languages](#retrieve-different-languages)
2323
- [Filtering](#filtering)
24-
- [Converting ChannelData to Rows](#converting-channeldata-to-rows)
24+
- [Converting Fetch Results to Rows](#converting-fetch-results-to-rows)
2525
- [SQLite Cache](#sqlite-cache)
2626
- [Failed Transcripts & Retry Behavior](#failed-transcripts--retry-behavior)
2727
- [Fetching Only Manually Created Transcripts](#fetching-only-manually-created-transcripts)
@@ -232,7 +232,7 @@ These options can be passed to any of the fetcher methods (`from_channel`, `from
232232

233233
See below for examples of usages.
234234

235-
## Retreive Different Languages
235+
## Retrieve Different Languages
236236

237237
You can use the `languages` param to retrieve your desired language. (Default en)
238238

@@ -314,9 +314,9 @@ ytfetcher channel TheOffice -m 50 -f json --min-views 1000 --min-duration 300 --
314314

315315
---
316316

317-
## Converting ChannelData to Rows
317+
## Converting Fetch Results to Rows
318318

319-
If you want a flat, row-based structure for ML workflows (Pandas, HuggingFace datasets, JSON/Parquet), you can use the helper in `ytfetcher.utils` to join transcript segments. Comments are only included if you fetched them with `fetch_with_comments` or `fetch_comments`.
319+
If you want a flat, row-based structure for ML workflows (Pandas, HuggingFace datasets, JSON/Parquet), you can use the helper in `ytfetcher.utils` to join transcript segments. It accepts any fetch result returned by the public API, including `ChannelData`, `VideoTranscript`, `VideoComments`, and `DLSnippet` lists.
320320

321321
```python
322322
from ytfetcher import YTFetcher
@@ -328,6 +328,8 @@ channel_data = fetcher.fetch_with_comments(max_comments=5)
328328
rows = channel_data_to_rows(channel_data, include_comments=True)
329329
```
330330

331+
When comments are available, pass `include_comments=True` to include comment text in the output rows.
332+
331333
---
332334

333335
## SQLite Cache
@@ -451,7 +453,7 @@ ytfetcher channel TEDx -f csv --manually-created
451453

452454
## Exporting
453455

454-
Use the `BaseExporter` class to export `ChannelData` in **csv**, **json**, or **txt**:
456+
Use the exporter classes to export `ChannelData` or any other supported fetch result in **csv**, **json**, or **txt**.
455457

456458
```python
457459
from ytfetcher.services import JSONExporter # OR you can import other exporters: TXTExporter, CSVExporter
@@ -527,35 +529,38 @@ fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
527529
comments = fetcher.fetch_comments(max_comments=20)
528530
```
529531

530-
This will return list of `Comment` like this:
532+
This will return a list of `VideoComments` objects like this:
531533

532534
```python
533535
[
534-
Comment(
535-
text='Comment one.',
536-
like_count=20,
537-
author='@author',
538-
time_text='8 days ago'
536+
VideoComments(
537+
video_id='id1',
538+
comments=[
539+
Comment(
540+
text='Comment one.',
541+
like_count=20,
542+
author='@author',
543+
time_text='8 days ago'
544+
)
545+
]
539546
)
540-
541-
## OTHER COMMENT OBJECTS...
542547
]
543548
```
544549

545550
### Fetching Comments With CLI
546551

547552
Fetching comments in `ytfetcher` with CLI is very easy.
548553

549-
To fetch comments with transcripts you can use `--comments` argument:
554+
To fetch comments with transcripts you can use the `--comments` mode. Use `--max-comments` to choose how many comments to fetch per video:
550555

551556
```bash
552-
ytfetcher channel TheOffice -m 20 --comments 10 -f json
557+
ytfetcher channel TheOffice -m 20 --comments --max-comments 10 -f json
553558
```
554559

555-
To fetch only comments with metadata you can use `--comments-only` argument:
560+
To fetch only comments you can use the `--comments-only` mode:
556561

557562
```bash
558-
ytfetcher channel TheOffice -m 20 --comments-only 10 -f json
563+
ytfetcher channel TheOffice -m 20 --comments-only --max-comments 10 -f json
559564
```
560565

561566
## Other Methods
@@ -571,13 +576,17 @@ data = fetcher.fetch_transcripts()
571576
print(data)
572577
```
573578

579+
`fetch_transcripts()` returns `list[VideoTranscript]`. Each item contains the `video_id` and that video's transcript segments.
580+
574581
### Fetch Snippets
575582

576583
```python
577584
data = fetcher.fetch_snippets()
578585
print(data)
579586
```
580587

588+
`fetch_snippets()` returns `list[DLSnippet]` with video metadata only.
589+
581590
---
582591

583592
## Proxy Configuration
@@ -672,6 +681,7 @@ ytfetcher channel TheOffice -m 20 --tab shorts -f json
672681

673682
# Fetch from the Live/Streams tab
674683
ytfetcher channel TheOffice -m 20 --tab streams -f json
684+
```
675685

676686
### Fetching by Video IDs
677687

@@ -693,13 +703,13 @@ ytfetcher search "AI Getting Jobs" -f json -m 25
693703
### Using Webshare Proxy
694704

695705
```bash
696-
ytfetcher <CHANNEL_HANDLE> -f json --webshare-proxy-username "<USERNAME>" --webshare-proxy-password "<PASSWORD>"
706+
ytfetcher channel <CHANNEL_HANDLE> -f json --webshare-proxy-username "<USERNAME>" --webshare-proxy-password "<PASSWORD>"
697707
```
698708

699709
### Using Custom Proxy
700710

701711
```bash
702-
ytfetcher <CHANNEL_HANDLE> -f json --http-proxy "http://user:pass@host:port" --https-proxy "https://user:pass@host:port"
712+
ytfetcher channel <CHANNEL_HANDLE> -f json --http-proxy "http://user:pass@host:port" --https-proxy "https://user:pass@host:port"
703713
```
704714
---
705715

@@ -714,7 +724,7 @@ docker-compose build
714724
Use `docker-compose run` to execute your desired command inside the container.
715725

716726
```bash
717-
docker-compose run ytfetcher poetry run ytfetcher channel -c TheOffice -m 20 -f json
727+
docker-compose run ytfetcher poetry run ytfetcher channel TheOffice -m 20 -f json
718728
```
719729
---
720730

docs/cli.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ YTFetcher comes with a simple CLI so you can fetch data directly from your termi
2020
ytfetcher -h
2121
```
2222

23-
YTFetcher supports three main commands:
23+
YTFetcher supports four main commands:
2424

2525
- `channel` - Fetch data from a YouTube channel handle
2626
- `video` - Fetch data from custom video IDs
@@ -159,21 +159,37 @@ All commands support the following common options:
159159

160160
### Comment Options
161161

162-
**`--comments <NUMBER>`**
162+
**`--comments`**
163163

164-
- Fetch top N comments alongside transcripts and metadata
165-
- Example: `ytfetcher channel TheOffice -m 20 --comments 10 -f json`
166-
- This fetches top 10 comments for each video along with transcripts
164+
- Fetch comments alongside transcripts and metadata
165+
- Example: `ytfetcher channel TheOffice -m 20 --comments --max-comments 10 -f json`
166+
- This fetches up to 10 comments for each video along with transcripts
167167

168-
**`--comments-only <NUMBER>`**
168+
**`--comments-only`**
169169

170-
- Fetch only comments with metadata (no transcripts)
171-
- Example: `ytfetcher channel TheOffice -m 20 --comments-only 10 -f json`
170+
- Fetch only comments (no transcripts or metadata)
171+
- Example: `ytfetcher channel TheOffice -m 20 --comments-only --max-comments 10 -f json`
172+
173+
**`--max-comments <NUMBER>`**
174+
175+
- Maximum comments to fetch per video
176+
- Default: `20`
177+
- Used with `--comments` or `--comments-only`
178+
179+
**`--transcripts-only`**
180+
181+
- Fetch only transcript data, without metadata
182+
- Example: `ytfetcher channel TheOffice -m 20 --transcripts-only -f json`
183+
184+
**`--snippets-only`**
185+
186+
- Fetch only video metadata, without transcripts
187+
- Example: `ytfetcher channel TheOffice -m 20 --snippets-only -f json`
172188

173189
**`--sort` <`top`, `new`>**
174190

175191
- Sort comments with top or newest ones (default to `top`).
176-
- Example: `ytfetcher channel TheOffice -m 10 -c --sort new`
192+
- Example: `ytfetcher channel TheOffice -m 10 --comments --max-comments 10 --sort new`
177193

178194
!!! Warning
179195
Comment fetching is resource-intensive. Performance depends on your internet connection and the volume of comments being retrieved.
@@ -286,12 +302,6 @@ This command only processes videos that:
286302
- Get credentials from [Webshare Dashboard](https://dashboard.webshare.io/proxy/settings)
287303
- Example: `ytfetcher channel TheOffice -f json --webshare-proxy-username "your_username" --webshare-proxy-password "your_password"`
288304

289-
**`--http-timeout`**
290-
291-
- HTTP request timeout in seconds
292-
- Default: `4.0`
293-
- Example: `ytfetcher channel TheOffice --http-timeout 6.0`
294-
295305
**`--http-headers`**
296306

297307
- Custom HTTP headers (Python dictionary format)
@@ -317,7 +327,7 @@ This command:
317327
### Fetch Comments with Transcripts
318328

319329
```bash
320-
ytfetcher channel TheOffice -m 10 --comments 5 -f csv -o ./data
330+
ytfetcher channel TheOffice -m 10 --comments --max-comments 5 -f csv -o ./data
321331
```
322332

323333
This command:
@@ -352,7 +362,7 @@ This command uses Webshare proxy to avoid rate limits when fetching large amount
352362
### Export Only Comments
353363

354364
```bash
355-
ytfetcher channel TheOffice -m 20 --comments-only 15 -f json --filename comments_only
365+
ytfetcher channel TheOffice -m 20 --comments-only --max-comments 15 -f json --filename comments_only
356366
```
357367

358368
This command fetches only comments (no transcripts) and saves them to `comments_only.json`.

docs/exporting.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Exporting
22

3-
The exporting feature allows you to save channel data in **multiple formats for analysis, reporting, or integration with other tools.** `ytfetcher` supports three widely-used export formats to suit different use cases and preferences.
3+
The exporting feature allows you to save fetched data in **multiple formats for analysis, reporting, or integration with other tools.** `ytfetcher` supports three widely-used export formats to suit different use cases and preferences.
44

5-
Use the `BaseExporter` class to export `ChannelData` in **csv, json, or txt**:
5+
Use the exporter classes to export `ChannelData` or any other supported fetch result in **csv, json, or txt**. Results from `fetch_youtube_data()`, `fetch_with_comments()`, `fetch_transcripts()`, `fetch_snippets()`, and `fetch_comments()` are normalized internally before writing.
66

77
```py
88
from ytfetcher.services import JSONExporter # OR you can import other exporters: TXTExporter, CSVExporter
@@ -56,4 +56,4 @@ class XMLExporter(BaseExporter):
5656
output_path = self._initialize_output_path(export_type='xml')
5757
# Your custom logic to convert self.channel_data to XML
5858
print(f"Exporting data to {output_path}")
59-
```
59+
```

docs/index.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ You can also preview this data using `PreviewRenderer` class from `ytfetcher.ser
101101
```py
102102
from ytfetcher.services import PreviewRenderer
103103

104-
channel_data = fetcher.fetch_youtube_data(max_comments=10)
104+
channel_data = fetcher.fetch_with_comments(max_comments=10)
105105
#print(channel_data)
106106
preview = PreviewRenderer()
107107
preview.render(data=channel_data, limit=4)
@@ -257,9 +257,9 @@ fetcher = YTFetcher.from_channel(
257257
)
258258
```
259259

260-
## Converting ChannelData to Rows
260+
## Converting Fetch Results to Rows
261261

262-
If you want a flat, row-based structure for ML workflows (Pandas, HuggingFace datasets, JSON/Parquet), use the helper in `ytfetcher.utils` to join transcript segments. Comments are only included if you fetched them with `fetch_with_comments` or `fetch_comments`.
262+
If you want a flat, row-based structure for ML workflows (Pandas, HuggingFace datasets, JSON/Parquet), use the helper in `ytfetcher.utils` to join transcript segments. It accepts any fetch result returned by the public API, including `ChannelData`, `VideoTranscript`, `VideoComments`, and `DLSnippet` lists.
263263

264264
```python
265265
from ytfetcher import YTFetcher
@@ -271,6 +271,8 @@ channel_data = fetcher.fetch_with_comments(max_comments=5)
271271
rows = channel_data_to_rows(channel_data, include_comments=True)
272272
```
273273

274+
When comments are available, pass `include_comments=True` to include comment text in the output rows.
275+
274276
## Fetching Comments
275277
`ytfetcher` allows you fetch comments in bulk **with additional metadata and transcripts** or **just comments alone.**
276278

@@ -328,7 +330,7 @@ To fetch comments alongside with transcripts and metadata you can use `fetch_wit
328330

329331
```py
330332
fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
331-
comments = fetcher.fetch_with_comments(max_comments=10, sort='top') # or new if you want latest comments
333+
channel_data = fetcher.fetch_with_comments(max_comments=10, sort='top') # or new if you want latest comments
332334
```
333335

334336
This will simply fetch **top 10 comments for every video** alongside with transcript data.
@@ -358,18 +360,21 @@ fetcher = YTFetcher.from_channel("TheOffice", max_results=5)
358360
comments = fetcher.fetch_comments(max_comments=20)
359361
```
360362

361-
This will return list of `Comment` like this:
363+
This will return a list of `VideoComments` objects like this:
362364

363365
```py
364366
[
365-
Comment(
366-
text='Comment one.',
367-
like_count=20,
368-
author='@author',
369-
time_text='8 days ago'
367+
VideoComments(
368+
video_id='id1',
369+
comments=[
370+
Comment(
371+
text='Comment one.',
372+
like_count=20,
373+
author='@author',
374+
time_text='8 days ago'
375+
)
376+
]
370377
)
371-
372-
## OTHER COMMENT OBJECTS...
373378
]
374379
```
375380

@@ -468,4 +473,4 @@ RuntimeConfig.enable_verbose()
468473
# Progress bars (tqdm) will now appear in your console
469474
fetcher = YTFetcher.from_channel(channel_handle="ChannelName")
470475
data = fetcher.fetch_youtube_data()
471-
```
476+
```

docs/release-notes.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
## Latest Changes
44
### Added
5+
- Added a `FetchResult` type alias for fetch results that may contain `ChannelData`, `VideoTranscript`, `VideoComments`, or `DLSnippet` objects.
6+
- Added `--transcripts-only` and `--snippets-only` CLI fetch modes.
57
- Available results are no longer lost when `IpBlocked` is raised mid-fetch — collected transcripts are returned instead of raising an exception.
68
- Introduced a new `FetchOptions` data class for defining fetcher options like `languages`, `filters` etc.
79
- Added a `--sort` argument for choosing **top or new** comments with CLI.
@@ -12,6 +14,9 @@
1214
- Added transient failure categorization to improve retry/caching decisions.
1315

1416
### Changed
17+
- `fetch_transcripts()`, `fetch_snippets()`, and `fetch_comments()` now return more precise list types instead of mixed or optional shapes.
18+
- CLI comment flags now separate fetch mode from comment count: use `--comments` or `--comments-only` with `--max-comments`.
19+
- Exporters, `PreviewRenderer`, and `channel_data_to_rows()` now accept any supported fetch result shape and normalize it internally.
1520
- Removed deprecated `Exporter` class.
1621
- No more **network requests in __init__**.
1722
- `YTFetcher` now initializes correct `BaseYoutubeDLFetcher` inside classmethods.
@@ -157,4 +162,4 @@
157162

158163
### Changed
159164
- Update docs for `get_metadata` method.
160-
- Change default httpx.Timeout value to **4.0** to **2.0**.
165+
- Change default httpx.Timeout value to **4.0** to **2.0**.

0 commit comments

Comments
 (0)