Agent 1 Repo: https://github.com/veeuu/Transcript-Agents.git
AIzaSyDk7XgAtnEIC7-O2X9vJh8grOdXotJPhAk
Supabase is now a "Golden Standard" for AI applications due to its pg_vector integration.
- Semantic Search: Instead of searching for exact keywords, you search zfor "meaning" (e.g., searching "chilly weather" and finding results for "winter").
- RAG (Retrieval-Augmented Generation): You store your documentation as "vector embeddings" in Postgres. When a user asks an AI agent a question, Supabase finds the relevant text to provide context to the LLM.
- HNSW Indexing: High-speed indexing for vector data, allowing you to perform similarity searches across millions of rows in milliseconds.
When you need custom server-side logic that shouldn't live in the database or the frontend, you use Edge Functions.
-
Serverless TypeScript: Written in Deno/TypeScript and deployed globally to the "edge" (close to your users) for near-zero latency.
-
Integrations: Perfect for handling Stripe Webhooks, sending emails via Resend, or calling AI models (OpenAI/Hugging Face).
-
MCP (Model Context Protocol): In 2026, Supabase acts as an MCP server, allowing AI coding tools to "read" your database schema and help you write queries or migrations with perfect context
-
Use CaseUse Case How Supabase is Used SaaS Platforms Auth for roles, RLS for data isolation, and Stripe integration via Edge Functions. AI Knowledge Bases Storing PDF data as vectors in pg_vectorto build custom "Chat with your data" bots.Collaborative Tools Realtime Presence for "who is editing" and Broadcast for live cursor tracking. Mobile Apps Using the Supabase CLI for offline-first local development and Auth for biometric login. E-commerce Database triggers to update inventory and Edge Functions to handle complex checkout logic.
| Component | Tool Choice | Why? |
|---|---|---|
| Brain (LLM) | Gemini Pro | You already pay for it. Use the Google AI Studio API (Free Tier is very generous for development). |
| Orchestration | LangChain | Open-source. It handles the "chains" of logic between your data and the LLM. |
| Database/Vector | Supabase | Free tier includes pgvector. This is where you store your transcript "embeddings" for the chatbot to search. |
| Embeddings | Hugging Face | Instead of paying for API-based embeddings, run a model like all-MiniLM-L6-v2 locally via Hugging Face for free. |
| Web Scraping | Crawl4AI / Firecrawl | These are new open-source libraries that turn websites into clean Markdown, which is perfect for LLMs. |
| Frontend/Hosting | Vercel / Netlify | Better than Hostinger for "Frontend + Serverless Functions" (React/Next.js). Their free tiers are world-class. |
| Backend Hosting | Hostinger | Use this for your "Always-on" scrapers or Python scripts if you already own a plan. |
If you find Hostinger or Hugging Face too limited for certain tasks, look at these:
- Groq: If you need speed for simple tasks (like formatting raw text before sending it to Gemini), Groq’s free tier for Llama 3 is incredibly fast.
- n8n (Self-hosted): A low-code automation tool. You can host it on a cheap $5 VPS or even your local machine to automate the "Scrape -> Summarize -> Save to Supabase" workflow without writing much code.
- Resend: For the "Alerts" feature. Their free tier allows 3,000 emails per month—perfect for sending regulatory updates to yourself.
- Upstash: If you need a tiny, fast Redis cache or "Vector" database with a very simple free tier.
- Ingestion: Use a Python script (on Hostinger/Local) to scrape SEBI and News sites.
- Processing: Send that text to Gemini Pro via LangChain to "clean" it into a structured JSON format (Competitor, Tech Stack, Profit, etc.).
- Storage: * Store the structured facts in a standard Supabase Table.
- Turn the text into "Vectors" (using Hugging Face models) and store them in Supabase pgvector.
- Interaction: Your dashboard (on Vercel) talks to Supabase. When you ask a question, the system searches the vectors, finds the relevant SEBI rule or transcript snippet, and sends it to Gemini to "answer" your prompt.
*Meet - 1*
Problem Definition
-
Transcript Insights: A system to read your raw transcripts and automatically find key patterns and insights.
-
Market Research: Extracting competitor details like their tech, reviews, market presence, and financial profits.
-
News and SEBI Guidelines: Automated scraping to constantly pull and analyze the latest relevant news and regulatory updates.
-
Dashboard: A simple web interface with a chatbot so you can ask questions and get instant answers from all this compiled data.
-
Script
I need to be:
- clearer
- sharper
- more executable
What You Should NOT Do Be ruthless here:
Don’t build full scraping infra Don’t try real-time pipelines Don’t add authentication, scaling, etc. Don’t over-design UI
So when you speak, say things like:
- “This replaces manual research time by X”
- “This becomes a daily decision tool”
- “This scales to multiple companies”
Things we need to work on
- The Interface
- Transcript (Audio, Text) insights (sales, product, investor calls).
- Market and competitor research.
- News, SEBI, and Gov guidelines tracking.
- The Chatbot Experience
Any on of this features:
“Insight Confidence Score”- Show how reliable each insight is “Timeline of events from news” - Don’t do all three.
System must feel like
- It collects information automatically
- It organizes it
- It answers questions instantly
- Show working flow
- Keep it simple but real
- Focus on use case clarity
-
Initially prototype should do ONLY:
Very 1st stageInput: Give it -
- A company name OR
- A transcript
Processing: Extract -
- Key insights
- Competitor mentions
- Important signals
Output:
- Clean dashboard
- Chatbot that answers:
- “Summarize this”
- “Who are competitors?”
- “What are risks?
-
Flow Should Look Like
Step 1: Data Ingestion
- Upload transcript OR Type company name
Step 2: AI Processing
- Use:
- Gemini / LLM
- Simple scraping (SerpAPI or mock data)
Step 3: Structured Output
- Insights
- Competitors
- News summary
Step 4: Chat Interface
- “Ask anything about this data”
-
Architecture (Keep It Practical)
Don’t over-engineer. Show something that can scale later.
Backend
- Python (FastAPI)
- LangChain or simple LLM calls
- Basic scraper (BeautifulSoup / API)
Storage
- MongoDB or even JSON (prototype is fine)
Frontend
- Simple React / Next.js OR even Streamlit
-
Architecture you should present
3.1 Data ingestion layer
- Transcripts: Upload PDF/Doc/Zoom export → convert to text → segment by speaker and time.
- Competitors: Crawler for:
- Landing pages, pricing, product docs.
- Play Store/App Store reviews.
- News & SEBI:
- Scheduled scrapers for:
- Financial news tagged to their domain.
- SEBI circulars/press releases relevant to fintech, LODR, PMS, fraud, compliance.
- Scheduled scrapers for:
Describe this as “Connectors” so later you can add RBI, MCA, etc.
3.2 Normalization & enrichment
- Clean + deduplicate.
- Tag each document with metadata:
- Type: transcript, competitor, news, SEBI.
- Date, source, company, product, geography.
- For transcripts: auto‑add:
- “Call type” (sales, support, product).
- Sentiment per speaker/segment.
- Detected entities: product names, features, risk terms.
3.3 Indexing & retrieval (RAG core)
- Chunk text with structure (per section, per Q/A, per SEBI clause).
- Store in vector DB + keyword index.
- Implement domain‑aware retrieval:
- If user asks “SEBI guidelines on X”, prefer SEBI data.
- If “What are customers complaining about?”, prefer transcripts + reviews.
- Show that this is similar to finance‑specific RAG systems like FinSage or RavenPack’s RAG, which combine multi‑modal preprocessing and domain‑special retrieval for financial documents.
3.4 Reasoning & chatbot layer
- LLM agent orchestrator:
- Classifies query: “transcript insight”, “competitor”, “regulatory/news”, or “mixed”.
- Calls appropriate tools:
- Transcript analyzer.
- Competitor summarizer.
- Regulatory/news monitor.
- Synthesizes answer with citations to actual paragraphs, SEBI sections, etc.
Mention that future versions can add rule‑based checks for SEBI/RBI obligations (e.g., LODR Reg. 30 disclosures, AML/KYC patterns, etc.)
*Meet - 2*
Input: Audio, Text From this. we need to build an logic to
- Extract the content → 2. Structure it first in proper formatting → 3. Extract relevant information from that → 4. Find Problems, decisions, improvements. → 5. Final conclusion of talk, What we should do next. ⇒ Anything useful for decision making.
⇒ And this should be stored in rag and later should be accessible from Chat box.
Input
{Company_1} or {company1, company2, ..} ; Domain
-
Input
Input Type Data Field Example / Description Primary Target Target_Sectore.g., "WealthTech: Fractional Real Estate Investment in India." Competitor List Competitor_EntitiesList of names or URLs (e.g., "PropShare," "hBits," "Strata"). Specific Sources Data_Source_ScopeWeb, SEBI Circulars, G2 Reviews, PlayStore, or LinkedIn Job Posts. Search Parameters Intelligence_KPIs"Tech Stack," "Revenue Growth," "Regulatory Penalties," "User Complaints." Time Horizon Analysis_Periode.g., "Last 2 quarters" or "Historical 3 years." Regulatory Filter Compliance_Frameworke.g., "SEBI (Investment Advisers) Regulations, 2013."
Output
First we need to ⇒ 1. Find domain, 2. Firmo, 3. Techno, 4. Revenue_details, 5. Higher Management Peoples.
-
Output
Output Component Format Industrial Value Competitor Matrix JSON / TableSide-by-side comparison of features, pricing, and tech stack. Sentiment Analysis Score (0-100)Aggregated "Health Score" based on recent news and user reviews. Regulatory Risk Flag Alert LevelsHighlights if a competitor was recently fined or if new SEBI rules affect them. Product Gap Analysis Bullet Points"Competitor A has X, but users complain about Y. Build Z to win." Financial Pulse EstimatesFunding status, valuation trends, and estimated burn rate from news. Executive Summary MarkdownA 3-sentence "TL;DR" for quick decision-making. -
Example Workflow (Industrial Scenario)
The Input:
"Analyze Jupiter and Fi in the Neobanking sector. Focus on their UPI plugin tech, recent SEBI/RBI compliance news, and user reviews regarding customer support from the last 90 days."
The Agent's Internal Logic (via LangChain + Gemini):
- Search: Scrapes SEBI/RBI news portals for "Jupiter" and "Fi."
- Scrape: Pulls recent 1-star and 5-star reviews from PlayStore/AppStore.
- Analyze: Gemini reads the "Raw Transcripts" (per your notes) of their latest earnings calls or interviews.
- Vectorize: Stores the key findings in Supabase pgvector for future chat queries.
The Desired Output (Dashboard View):
- Feature Gap: "Jupiter has launched 'Community Labs'; Fi is still focused on 'Savings Rules'."
- Compliance Alert: "RBI's new digital lending guidelines may require both to update their KYC workflow by Q3."
- Sentiment: "Jupiter sentiment is down 12% due to recent app lag reports."
So we have a target section (Query)
⇒ For that we need to scrape the latest news. ⇒ We need to scrape Big Impact news in the past years.
We need to figure out the government Policies and Regularization in that specific section and schemes working and operating in that feild.
-
The Fintech Regulatory Input & Output Table
Category Proper Input (Sources) Desired Output (Intelligence) SEBI Circulars Official RSS: sebi.gov.in/sebirss.xml. Focus on "Investment Advisers" and "Research Analysts" sections.Impact Score: A 1–10 rating of how this rule affects your specific business model. RBI Notifications RBI Notification Page: Scrape rbi.org.indaily. Specifically track "Digital Lending" and "Payment Systems."Compliance Checklist: A bulleted list of "Must-Do" actions for your engineering/legal team. Niche News The Ken / Medianama / Entrackr: High-signal sources that explain why a policy changed. Competitor Response: How other Fintechs (e.g., Razorpay, PhonePe) are reacting to the news. Gov Guidelines MeitY & Ministry of Finance: Track the DPDP Act (Data Privacy) and GST council updates. Strategic Summary: A 3-sentence "TL;DR" for the CEO on whether to pivot or proceed. Social Sentiment Twitter (X) / LinkedIn: Track keywords like "RBI Ban" or "SEBI Fine" + [Competitor Name]. Early Warning: Detection of outages or regulatory "raids" before they hit mainstream news. -
Automating the "SEBI/RBI Pipeline"
Since you are using Gemini Pro and LangChain, you should build a two-stage pipeline for these documents:
Stage A: The "Watcher" (Low Cost)
- Tool: Use a simple Python script on Hostinger/Railway to monitor the SEBI RSS feed.
- Logic: If a new title contains keywords (e.g., "KYC," "UPI," "Lending"), trigger the next stage.
Stage B: The "Analyzer" (High Power)
- Tool: Send the PDF/URL to Gemini Pro.
- Prompt: "You are a Senior Fintech Compliance Officer. Read this new SEBI circular and identify: 1. Deadline for compliance. 2. Specific technical changes required in our database. 3. Risk level for a wealth-tech startup."
-
Industrial-Grade Fintech Sources (2026 List)
For your agent to be "expert-level," feed it these specific high-quality inputs:
- ET BFSI (Fintech Section): Best for industry-wide shifts and executive movements.
- Finshots (Policy): Excellent for understanding complex financial jargon in "plain English" (perfect for pre-processing text for your LLM).
- Medianama: The "gold standard" for digital policy, privacy laws, and payment regulation in India.
- RBI’s Master Directions: Unlike "Notifications," these are living documents. Your agent should check the "Updated as of" date to ensure your Supabase data is current.
- DigiSahamati (AA Ecosystem): If your product uses Account Aggregators, tracking this site is mandatory for technical schemas.
-
Critical Technical Tip for Fintech Data
The "Context Window" Advantage: Regulatory documents are often 50+ pages long. Instead of "chunking" them into small pieces (traditional RAG), use Gemini Pro’s 1M+ context window. Feed the entire Master Direction into the prompt.
Why? In Fintech, a footnote on page 42 can contradict a headline on page 1. Small "chunks" of data will miss this context; Gemini’s long-context will see it.
The "Invisible" Logic for your Dashboard: In your Supabase schema, create a table called
regulatory_alerts:source_url: Link to SEBI/RBI.raw_text_summary: Gemini-generated TL;DR.action_required: Boolean (True/False).affected_module: (e.g., "KYC Module," "Payment Gateway").
Does this give you a clear enough roadmap for the "News & SEBI" part of your dashboard? Or should we look into the specific Python libraries for PDF scraping of government sites?
⇒ Our chat bot should have access to internet and our db. Should priorities db result and use that to answer the query.
They want to extract all the reviews of target company,
- Play store / App store.
- Google Reviews (below company page is available)
- Trustpilot : https://www.trustpilot.com/
-
List of site for Reviews.
Name Website USP (1 to 3 words) One line details G2 g2.com Verified SaaS Leader The most comprehensive source for B2B software rankings and verified user feedback. Capterra capterra.com Visual Side-by-Side Owned by Gartner; excellent for comparing specific software features and pricing models. TrustRadius trustradius.com Long-form Depth Known for highly detailed, "no-fluff" reviews often written by enterprise decision-makers. Gartner Peer Insights gartner.com/reviews Enterprise Rigor Features strictly vetted reviews from IT professionals in large-scale organizations. SourceForge sourceforge.net Open Source Focus The go-to for technical reviews on open-source projects and developer-centric tools. GetApp getapp.com SMB Integration Focus Focuses on how apps integrate with other tools, vital for understanding a competitor's ecosystem.
Name Website USP (1 to 3 words) One line details Glassdoor glassdoor.com Salary & Culture Provides internal insights into company management, salary benchmarks, and stability. Blind teamblind.com Anonymous Raw Truth A platform for verified employees to discuss internal issues like layoffs or tech debt. Indeed indeed.com/companies High-Volume Feedback Best for non-tech industries (manufacturing, retail) with millions of worker reviews.
Name Website USP (1 to 3 words) One line details Clutch.co clutch.co Verified Service B2B The gold standard for reviewing agencies, consultancies, and IT service providers. BBB bbb.org Ethical & Legal Tracks formal consumer complaints and a company's legal/ethical standing in the US/Canada. Manta manta.com Small Business SEO A massive directory for small-to-mid-sized businesses with localized reviews. Sitejabber sitejabber.com E-commerce Trust Focuses on the "buyer experience" for online businesses and consumer services.
Check for hosting and setup - Try Mac, Windows and other hosting sites. Check for tech stack - need more AI, ML and RAG.
About App Hosting:
- Supabase
- Hostinger
RAG
- RAG with Supabase - https://supabase.com/docs/guides/ai/langchain
Tech and Skills
Frontend - React, Vue
Backend - Python,
Database - MySQL, VectorDB
News Scraper - https://github.com/odaysec/NewsCrap | https://github.com/NikolaiT/GoogleScraper | https://github.com/pratikpv/google_news_scraper_and_sentiment_analyzer.git | https://apify.com/glitch_404/ultimate-news-scraper/input-schema |
Reddit Scraper - https://apify.com/trudax/reddit-scraper-lite | https://console.apify.com/actors/3XedXIRBcjfKrnsDJ | https://console.apify.com/actors/TwqHBuZZPHJxiQrTU/input
Finance
Company Info - https://apify.com/saswave/crunchbase-company-organization-scraper/pricing |
Website Scraper - https://github.com/odaysec/ResearchHub.git | https://github.com/tasos-py/Search-Engines-Scraper.git | https://github.com/apify/crawlee-python | https://apify.com/apify/web-scraper/api/python |
Google Trend Scraper - https://apify.com/apify/google-trends-scraper
Company Employees Scraper for LinkedIn - https://console.apify.com/actors/cIdqlEvw6afc1do1p/input
Feature Extraction - https://huggingface.co/microsoft/harrier-oss-v1-0.6b Text Summaraization - https://huggingface.co/models?pipeline_tag=summarization
-
Insights from P
Build a system of AI agents that replaces the work of:
- a product research intern
- a market research analyst
- a founder’s ops assistant
The system should help answer:
“What are the most important user problems in retail investing, and what should we build next?”
The system should consist of 5 core agents working in a pipeline:
- Research Ingestion Agent
- Insight Extraction Agent
- Research Synthesis Agent
- Product Brief Agent
- Founder Copilot (Query Interface)
This is a closed-loop system:
Data → Insights → Hypotheses → Product Ideas → Founder Decisions
The system must follow these principles:
All outputs must follow predefined formats (schemas).
No free-form summaries.
The system should filter and prioritize high-signal inputs only.
Every output should help answer:
“What should we build?”
Build a working v1. Avoid over-engineering.
Continuously collect and structure relevant external and internal data.
Track 5–8 key competitors (Liquide.Life, Stockgro.com, StockEdge.com, Univest.in, Trackk.in, ValueResearchStocks.com.)
- Year Founded - google
- Names of Founders - google
- HQ - google
- Available on Web / Mobile / Both - google
- Funding Raised
- No of Users -google, play and app store
- Annual Revenue -google
- Key positioning/messaging - website
- Revenue model / pricing - website
- Differentiators, What users like them for -
- User complaints (from reviews) - google, reddit, play and app store
- Strategic moves (partnerships, expansion) -website, google
- New features launched - website
Track selected high-signal sources (will identify a list and share):
- YouTube (finance/investing creators) - Input as Yt Video or Yt channel link ⇒ If video then do full analysis, and if channel then do live option so when even we have new video then analyze it and with channel so last 10 videos option and if clicked on any then show the analysis of it.
- Reddit (investing communities) - Sub reddit - if Sub reddit then fetch past 20 conversations and analysis them and then keep live tab on if found new then fetch that. Reddit post - if post then run an entire analysis
- App Store / Play Store reviews - Here we will have direct link of app - or app name, then search and fetech the data, entire details like: Company, App Name, Downloads, Rating, Reviews, Images if any, Description, etc.
Filter by:
- high engagement
- relevant keywords (confusion, comparison, problems)
- Meeting transcripts -
- Founder notes -
- Product discussions -
- Title
- 2 line summary
- Major Decision
- Problem
- Possible solution pitched
- Tone - positive and negative
- Timeline - of discussion
- Improvement - For next Call.
Each input should be converted into structured entries:
Source Type: (Competitor / User / Internal)
Entity: (e.g., Groww / Reddit / Customer Interview)
Signal Type: (Feature / Complaint / Trend / Insight)
Content: (raw extracted text)
Timestamp: (date)
A clean, structured repository of raw signals.
Convert raw signals into structured problems and patterns.
Structured signals from Agent 1.
- Identify problems mentioned -
- Group similar signals - Which points as came, positive and negative points
- Count frequency (approximate is fine) - similar problems come in to account. - account
- Identify user type (beginner, trader, long-term investor) -
Problem:
Clear description of user issue
Evidence:
Examples from inputs
Frequency:
Low / Medium / High
User Type:
Beginner / Intermediate / Advanced
Source Mix:
(Competitor / Reddit / YouTube / Internal)
A list of validated user problems, not summaries.
Convert problems into higher-level insights and hypotheses.
- Problems from Agent 2
- Competitor signals
- Internal notes
- Identify patterns across problems
- Connect related issues
- Identify root causes
Insight:
Core underlying issue
Supporting Problems:
List of related problems
Evidence:
Why this insight is valid
Implication:
What this means for product
Clear, high-quality product insights.
Convert insights into actionable product ideas.
Insights from Agent 3.
- Identify opportunity areas
- Translate into product features
- Define user flows
Feature Name:
Problem:
What user issue this solves
Why It Matters:
Impact on user
Solution:
High-level description
User Flow:
Step-by-step usage
Expected Impact:
(e.g., reduces research time, increases confidence)
Clear, buildable product ideas.
Provide a simple interface for querying the system.
The system should answer questions like:
- What are the top 3 user problems this week?
- What insights are emerging?
- What should we build next?
- What are competitors doing differently?
Responses should include:
- direct answer
- supporting evidence
- confidence level
Enable fast, high-quality founder decision-making.
- Ingestion Agent collects signals
- Insight Agent extracts problems
- Synthesis Agent generates insights
- Product Agent creates ideas
- Copilot answers queries
The system should store:
- raw signals
- structured problems
- insights
- product ideas
Data should be:
- searchable
- retrievable
- organized by time and source
The system should generate a weekly report:
- Top 5 User Problems
- Key Emerging Insights
- Competitor Movements
- 3 Product Opportunities
The system is successful if:
- It produces clear, structured outputs
- It surfaces non-obvious insights
- It reduces founder research time
- It generates actionable product ideas
- No complex ML models
- No prediction systems
- No personalization
- No over-engineered architecture
Focus on:
👉 useful outputs, not technical complexity
A working system where:
You can ask:
“What should we build next?”
And receive a structured, evidence-backed answer.
The system will be evaluated on:
- clarity of outputs
- usefulness of insights
- quality of structuring
- completeness of pipeline
- simplicity and reliability
This is not a data collection system.
This is a decision intelligence system.
Focus on helping the founder think better and move faster.