제3강. 50개의 두뇌를 하나로: 대규모 소스 관리의 기술

👋 안녕하세요! 여러분의 AI 연구 파트너, 안티그래비티입니다.

지난 시간, 우리는 연구소를 세팅하고 첫 번째 가동을 마쳤습니다. 이제 본격적으로 연구소에 **'원재료'**를 채워 넣을 시간입니다.

NotebookLM이 다른 AI와 가장 다른 점이 뭘까요? 바로 **"내가 준 것만 먹고 자란다"**는 점입니다. ChatGPT는 전 세계 인터넷 데이터를 먹고 자랐지만, NotebookLM은 여러분이 넣어준 PDF, 여러분이 고른 유튜브 영상만을 신뢰합니다. 그래서 **"어떤 소스를, 어떻게 관리하느냐"**가 결과물의 퀄리티를 100% 좌우합니다.

이번 강의에서는 2025년 가장 핫했던 **'앵커 문서 전략(Anchor Document Strategy)'**을 포함해, 50개 이상의 방대한 소스를 프로처럼 다루는 비법을 알려드리겠습니다.

📚 1. 50개 소스, 어떻게 채울까? (The Big Container)

Module 1. Procurement of Raw Data: Sourcing for English Excellence

(Cleaning OCR and Standardizing CSAT Passages)

👋 Hello, Data Chemists!

In English education, the quality of your source is everything. If you upload a blurry OCR scan of a EBS workbook with messy headers and footers, your AI will produce messy results.

In this module, we learn how to "clean" English educational data to ensure 100% accuracy in logical analysis.

🧹 1. The OCR Cleanup: English Workbook Edition

Most English teachers work with scanned PDFs or captured images from workbooks. These are full of "noise":

Header/Footer Noise: "2026 EBS Su-neung-teukgang Page 42".
Question Numbering: "31. 다음 글의 주제로 가장 적절한 것은?".
Vocab Glossaries: Small footnotes at the bottom of the page.

The Solution: "Focus Cropping"

Use a PDF tool (like Acrobat or PDFElement) to crop the margins. If you only want the AI to analyze the Paragraph Logic, remove the question stems and the footnotes before uploading.

🎭 2. Managing Bilingual Content

English classrooms in Korea are primarily bilingual. You need to manage how NotebookLM sees English vs. Korean text.

Best Practice: The Layered Sourcing

Level 1 (English Only): Upload the pure English paragraph as a .txt file for Logic Mapping.
Level 2 (Bilingual): Upload the version with Korean translations for "Explanation Generation" and "Vocabulary Mapping".
Level 3 (Logic Anchor): Upload our 00_Anchor_Comparative_Lens_EN.txt to guide the AI's "Logical Voice".

📊 3. CSAT Data Tables: Analyzing Mock Exams

Did you just get the results of a National Mock Exam (모의고사)? Don't just look at the scores.

Upload the PDF of the student score results.
Use the Data Table feature to convert the PDF chart into a CSV.
Ask: "Which specific question type (Inference, Blank completion, Order) had the lowest accuracy?"
Result: Instant personalized clinic data for your entire class.

✂️ 4. Segmentation by CSAT Question Type

Instead of one giant "EBS Workbook.pdf", split your sources by Question Type:

Folder A: Blank Completion (빈칸 추론)
Folder B: Paragraph Ordering (순서 배열)
Folder C: Sentence Insertion (문장 삽입)

By isolating the sources, you prevent the AI from confusing the distinct logical patterns required for each type.

🧪 5. Today's Mini-Lab: Source Refinement

📌 Mission: "The Clean Link"

Step 1: Find a PDF passage that is "messy" (contains page numbers, logos, or multiple questions). Step 2: Create a cleaned-up version of just the body text in a .txt file. Step 3: Upload both to a new Notebook and ask: "Summarize the logic of the paragraph." Step 4: Compare the results. Notice how the cleaned version leads to a much more "elegant" and accurate summary.

Pro-tip: A clean source is the first step to becoming a "Prompt Grandmaster". 기대해주세요!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

제3강. 50개의 두뇌를 하나로: 대규모 소스 관리의 기술

👋 안녕하세요! 여러분의 AI 연구 파트너, 안티그래비티입니다.

📚 1. 50개 소스, 어떻게 채울까? (The Big Container)

Module 1. Procurement of Raw Data: Sourcing for English Excellence

(Cleaning OCR and Standardizing CSAT Passages)

🧹 1. The OCR Cleanup: English Workbook Edition

The Solution: "Focus Cropping"

🎭 2. Managing Bilingual Content

📊 3. CSAT Data Tables: Analyzing Mock Exams

✂️ 4. Segmentation by CSAT Question Type

🧪 5. Today's Mini-Lab: Source Refinement

📌 Mission: "The Clean Link"

FilesExpand file tree

01_Source_Management.md

Latest commit

History

01_Source_Management.md

File metadata and controls

제3강. 50개의 두뇌를 하나로: 대규모 소스 관리의 기술

👋 안녕하세요! 여러분의 AI 연구 파트너, 안티그래비티입니다.

📚 1. 50개 소스, 어떻게 채울까? (The Big Container)

Module 1. Procurement of Raw Data: Sourcing for English Excellence

(Cleaning OCR and Standardizing CSAT Passages)

🧹 1. The OCR Cleanup: English Workbook Edition

The Solution: "Focus Cropping"

🎭 2. Managing Bilingual Content

📊 3. CSAT Data Tables: Analyzing Mock Exams

✂️ 4. Segmentation by CSAT Question Type

🧪 5. Today's Mini-Lab: Source Refinement

📌 Mission: "The Clean Link"