Skip to content

Latest commit

 

History

History
518 lines (340 loc) · 35.1 KB

File metadata and controls

518 lines (340 loc) · 35.1 KB

below is the Original requirement.

Here's the company: https://www.sourceclub.io/ https://www.linkedin.com/company/sourceclub/

i applied for this postion: https://jobs.gusto.com/postings/source-club-llc-head-of-technology-operations-systems-ai-key-role-fast-growth-100-remote-45d12920-c96e-4a6a-ab51-7d4d310026b3

Hey Abhi, Awesome! Here is the Case Study link: https://www.notion.so/SourceClub-Hiring-Step-2-Case-Study-36605af7a9088120bac6dea5be2e2e3e. Kind regards, Cristina Duarte.

inside that case study there are videos that demonstrate how they do the savings analysis and other tasks in manual way. must  review those. before starting on anything https://powerful-submarine-795.notion.site/Savings-Analysis-Loom-Links-21405af7a9088244a90881a2323b0d93

for assignment 3: heres the project queue: https://powerful-submarine-795.notion.site/Source-Club-Project-Queue-36605af7a90881a19106e6116ad92082?pvs=25

i will work on assignment 4 once all the 3 assignments are done.

go ahead and propose what to do, how to do and also do it on your own and hand it over to me so we can review once everything is done. a code is not necesary but nice to have for any or all of these assignments. a highlevel architecture is good and tools to use to do it etc.

my ask to claudecode is:

  • I want to see if i can build a prototype in local machine and push it github to share it with recruiter on instructions to install and run this.
  • go thru the full videos collection to understand how the businesses run today take some snapshots and create some files basically test files of these client files with the same data and add more simulated data in order to use those as sample input files to process and generate a UI to drop these files and the rest of the process has to be done by our application either using all alarms or some sort of linear procedure structure architecture keep the architecture very lean. This is a minimum viable product that we are building or even a prototype or proof of concept you could say goal is to just show that we can do this then save the scale, resilient and secure architecture approaches to propose as part of the original implementation, which is production Ready.
  • assignment 2 Tracy see what you can do in order to create a mock data and show that as part of your feature in the POC that we are building in order to show that we established that integration
  • for assignment three go to the list of the product backlog or future proposals think out loud think out big and propose a few more along with the ones that's been mentioned there. Also explain why we have to pick one over the other in terms of priority complexity to implement and the timeline implement.
  • once we're done with all this, we'll get to task four which is creating a video and a demo of all this we can talk more on that later downline once we have accomplished assignment 12 and three. Below is the response from some of the alarms. I'm not convinced that that is the right approach but they might have a point so to consider what they're proposing but come back with a farm proof of concept approach that can be running local that can be demonstrated to solve a business case.

suggestions by different LLMS:

gemini: the current code(POC) in this project was generated by gemini pro and enough information was added inside case_study folder in regards to this on May 21st 2026

perplexity:

I’d treat this as a deliverable pack, not just advice: Assignment 1 should be a pragmatic human-in-the-loop matching system; Assignment 2 should recommend a custom Stripe↔HubSpot sync with a canonical company/location map; Assignment 3 should prioritize the few projects that unblock revenue, service visibility, and future health scoring. That matches both the job brief and the case-study prompts, which explicitly emphasize shipping real systems, automating savings analysis first, and sequencing 3–5 high-leverage initiatives in the first 90 days.

What they actually want

SourceClub is a small, AI-forward dental buying group that says its core value prop is lower dental supply costs via negotiated pricing, flat-fee membership, and a centralized pricing/ordering experience, with 500+ member locations and the savings analysis as the proof step before closing a customer. The role is not pure engineering; it owns RevOps, systems, AI workflows, data layer quality, and cross-functional execution with one technical builder, and the company explicitly names the savings analysis pipeline as top priority in both the job post and the case study.

Assignment 1

The manual process today is to take a prospect’s purchase history and compare line items against SourceClub pricing, but the hard part is product matching because SKUs, descriptions, pack sizes, and units differ across files and suppliers. The training page also confirms the workflow has two parts—collecting purchase history and running the savings analysis—and points to Loom videos for the current manual process, so you should review those before finalizing any deliverable language or demo claims.

Proposed system

Layer What to do Tools
Intake Accept CSV/XLSX/PDF export, identify supplier/source, normalize headers N8N/Make, Python, Claude/OpenAI extraction
Canonicalization Standardize SKU, manufacturer, UOM, pack size, qty, unit price, desc Python, Pandas, regex rules
Match engine 3-stage matching: deterministic SKU/rules, semantic candidate retrieval, LLM adjudication Postgres/pgvector, embeddings, LLM
Human review Route low-confidence or high-dollar mismatches to analyst queue Retool/Notion/Airtable UI
Output Savings report, line-item audit trail, confidence score, “unmatched” bucket HubSpot attachment, PDF/CSV export

Recommended logic

  1. Deterministic match first: exact supplier SKU, manufacturer SKU, normalized UPC/NDC-style IDs, exact pack-size/UOM combos.
  2. Candidate retrieval second: embed cleaned description + manufacturer + pack/UOM, retrieve top (k) SourceClub catalog candidates.
  3. LLM judge third: compare prospect line to candidate set, force structured JSON output with matched SKU, rationale, normalized unit economics, and confidence band.
  4. Human review only for low-confidence lines, large-spend lines, or pack/UOM conflicts.

Confidence policy

Confidence Action
High Auto-accept into report
Medium Include but flag “review suggested”
Low Send to manual review queue
High dollars + medium confidence Force human review

Architecture

  • Source of truth tables: prospect_uploads, normalized_lines, catalog_items, match_candidates, final_matches, review_queue, analysis_runs.
  • Event flow: upload → parse → normalize → match → review exceptions → generate report → push summary to HubSpot company/deal.
  • KPI set: match rate, straight-through-processing %, analyst minutes saved, savings-report turnaround time, precision on audited matches.

What I’d hand them

  • A one-page architecture diagram.
  • A sample schema.
  • A scoring rubric.
  • A short “Phase 1 / Phase 2” rollout:
    • Phase 1: semi-automated, human-in-loop, 70–85% straight-through.
    • Phase 2: learned synonym library + feedback loop from reviewer corrections.
    • Phase 3: near-real-time self-serve upload inside customer/prospect workflow.

High-level architecture sketch

Upload -> Parser/OCR -> Line Normalizer -> Rules Match
                                   -> Candidate Retrieval (vector + keyword)
                                   -> LLM Match Judge
                                   -> Confidence Scorer
                     -> High confidence -> Savings Calculator -> Report Generator -> HubSpot
                     -> Low confidence  -> Review Queue ------> Approved Match ----> HubSpot

Assignment 2

They want billing status, subscription health, and revenue visible directly on the HubSpot company record, but Stripe uses subscription-per-location while HubSpot is organized around company records, creating a many-locations-to-one-company mapping problem. The project queue also labels Stripe↔HubSpot name matching as foundational and “Next,” because it unlocks billing reporting, dashboards, and health scoring.

Option Pros Cons Verdict
Native integration only Fastest, low setup Weak for custom entity mapping and multi-location logic No
Middleware only (Zapier/Make) Cheap, quick Brittle for canonical mapping, backfills, auditability Maybe for MVP
Custom sync + small middleware Best control, handles company/location/subscription model, auditable More setup Pick this

Pick: custom sync with canonical data model.

Model

  • HubSpot Company = parent dental group/customer.
  • HubSpot custom object or associated records = locations.
  • Stripe Customer = parent bill-to entity.
  • Stripe Subscription = one location membership.

Core fields to write into HubSpot

  • Billing status
  • Active subscription count
  • MRR / ARR
  • Failed payment flag
  • Last invoice paid date
  • Dunning / past due status
  • Subscription-by-location rollup.

Implementation

  1. Build a canonical mapping table between Stripe customer, Stripe subscription, HubSpot company, HubSpot location.
  2. Run one-time backfill + fuzzy/manual reconciliation for current records.
  3. Add Stripe webhooks for customer.updated, subscription.*, invoice.*, payment_intent.*.
  4. Push normalized billing snapshot to HubSpot custom properties nightly plus near-real-time webhook updates.
  5. Exception queue for unmapped or ambiguous entities.

Why this wins

  • It fits their multi-location billing reality better than native tools.
  • It creates the data spine needed for later dashboards and health scoring.
  • It’s durable; middleware can still be used for lightweight alerts, but the mapping logic stays in one owned service.

Assignment 3

Your first 90 days should follow dependencies, not just urgency. The queue itself tells you the top priority is Automate Savings Analysis, while Customer Service consolidation into HubSpot is also marked highest priority, and Stripe↔HubSpot matching is called foundational for downstream reporting and health scoring.

Order Project Why first
1 2.1 Automate Savings Analysis Direct revenue bottleneck, explicitly highest priority, repeated in job post and case study.
2 1.1 Stripe ↔ HubSpot Name Matching Foundational data layer for billing visibility and revenue reporting.
3 3.1 Consolidate Customer Service into HubSpot Current service requests are fragmented; ticketing/queue gives operational control fast.
4 3.8 Post-Onboarding Drip Campaign Already “Next,” quick win, improves activation and early engagement.
5 1.2 ZenOne Data Integration Enables customer health score and lifecycle automation later; backbone project.

Why not others first

  • Unified business dashboard is useful but premature if billing/customer/order data is not clean yet.
  • Customer health score depends heavily on ZenOne integration, so doing the score before the data pipe is backwards.
  • AI enablement is good, but broad audits before fixing the core revenue/service workflows would read unfocused.

What to submit

You should send one concise doc with three sections plus one appendix: Assignment 1 architecture + prototype concept, Assignment 2 options + chosen design, Assignment 3 prioritized roadmap, and appendix with assumptions/open questions. For Assignment 4, record a 3–5 minute walkthrough showing the architecture, the prioritized roadmap, and why you chose practical human-in-loop systems over “full AI magic,” because the case explicitly says a rough working thing beats a polished description and wants to hear your reasoning.

Raw take: the strongest answer is not “fancy AI.” It is “I can reduce a real bottleneck, create a trustworthy data layer, and sequence the next 90 days like an operator,” which is exactly how the role is written.

Would you like me to draft the actual submission doc in polished form next?


heres the content from all the notion links provided for this assignment :

SourceClub Hiring - Step 2: Case Study

Head of AI Powered Operations, Systems, & RevOps

Introduction

Background

Congratulations! You have been selected for the next step in the Source Club hiring process. This means you have highly impressed us and we think so far you are a great fit for our company. We are very excited you made it this far.

The next step in our process is a Source Club Case Study. This is essentially an 'audition' for the job and will give you great insight into some of the things this job entails, as well as providing us with an example of your skillset. We encourage you to give this your best effort and let your aptitude, creativity, and critical thinking shine through.

Pay

Upon submitting the case study, we will compensate you $50. The pay is nominal but this amount is to show you this is a real case study. The purpose of this is to give you a chance to show your skills while also understanding the position better. We will not be using any of your work, but we think it's important to show you we are serious about all candidates and appreciate your time.

We will send your payment via PayPal. If you do not have a PayPal or Venmo. If you do not include a valid PayPal or Venmo address when you submit, we won't be able to process your payment.

Next Steps

  • Top performers on the case study move on to the final step — interviews with our CEO and our VP of Growth.
  • This case study is projected to take 4–6 hours. Please timebox your effort. We are not looking for a polished consulting deliverable — we are looking for how you think and what you can actually build. A rough working thing beats a beautiful description of one.

Background

This is a real-life scenario for the job you will be performing. Source Club is a dental Group Purchasing Organization — we negotiate bulk pricing with suppliers on behalf of hundreds of dental practices, charge a flat monthly subscription, take zero vendor kickbacks, and pass the savings straight to our members. The average member saves $10,000–$30,000 a year on the exact same products they were already buying.

We are a small team — around seven people. The role you are interviewing for owns the systems, automations, data, and AI workflows that let a team this size run like a much bigger one. Every assignment below is real work currently sitting in our project queue.

One note before you start: we have intentionally left these assignments a little open-ended. Working autonomously, making reasonable assumptions, and figuring things out is a core part of this job. Where something is ambiguous, make a call, state your assumption, and keep moving. There are no trick questions.

Assignments

Assignment 1 — “Savings Analysis” Automation

The situation. This is our single biggest bottleneck. When a prospect is considering Source Club, we run a "savings analysis": we take the prospect's “purchase history” — an export of everything they currently buy from their dental supplier — and compare each line item against Source Club's negotiated pricing to show them their total potential savings. Today our founder does this by hand. It takes about 10 minutes per analysis, 20–40 times a month — 5 to 7+ hours a month of manual work — and it is the step we most want to automate.

The hard part is matching. The prospect's file and our pricing catalog rarely describe the same product the same way: different manufacturer SKUs, different descriptions, different pack sizes, different units of measure. A human can eyeball "is this the same box of gloves?" Software has to be taught to.

Here are videos and resources the team has given you to explain the process: Savings Analysis Loom Links

Your task:

  • Design the system that automates the savings analysis. How would you architect it end to end? What does the pipeline look like? Where does AI do the work, and where does a human stay in the loop? How do you handle the line items that don't cleanly match?
  • If you can, build as much of a working proof-of-concept as you reasonably can in the time you have time. The more you can give us an idea of how this would work, the better. Even working prototype on a slice of the data than a polished document describing one. Use whatever tools you would actually use on the job — Claude, spreadsheets, scripts, no-code tools, anything.
  • Deliver: your architecture and approach, whatever you built (links or files), and a short note on what you would do next with more time.

Suggested time: 2–3 hours.

Assignment 2 — Systems Architecture: Connecting Stripe and HubSpot

The situation. We bill through Stripe and run our CRM in HubSpot. One subscription in Stripe equals one dental practice location; a multi-location company is a single customer with several subscriptions. Right now the company and subscription names in Stripe don't match our HubSpot records — so nobody on the team can open a company in HubSpot and see its billing status without manually cross-referencing Stripe. We want subscription status, billing health, and revenue to show directly on the HubSpot company record.

Your task:

  • Map at least two or three different ways this could be solved — for example a native integration, a middleware tool, a custom build. The options are your call.
  • Pick one. Justify it: why this approach over the others, what it costs, and why it will be successful long term.

Suggested time: 30 min

Assignment 3 — Prioritizing Upcoming Projects

The situation. A core part of this role is owning Source Club's project queue — deciding what gets built, in what order, and why. We always have more good ideas than capacity to run them, so the real skill is reading a list like the one below and seeing where the leverage actually is. Take a look, then tell us how you'd prioritize it: if this were your first 90 days in the seat, which 3–5 projects would you tackle first, in what order, and why? We're less interested in a "right answer" than in how you reason about impact, sequencing, and tradeoffs.

Source Club — Project Queue

Suggested time: 30 Min

Assignment 4 — Video Walkthrough (3–5 minutes)

Record a short video, no longer than 5 minutes, walking us through your answers to Assignments 1, 2, and 3 — what you decided and why. Screen-share your work as you go. Part of this assignment is figuring out the simplest way to record and share it (Loom or similar). We want to hear what you think.

Delivery Directions

Overview

  • Deliver the assignments in whatever format best answers them — a doc, a spreadsheet, slides, a working prototype, code, your call. It should be organized, accurate, and clear. Your work will be reviewed by Source Club's leadership team, including our CEO.
  • You will need to make some assumptions. Info won’t be perfect. Fill in the gaps. Working autonomously and figuring things out is exactly the quality we are looking for — though we also realize we may have left out a detail or two.
  • This is a realistic situation, so the more realistic you make your work, the better.
  • If you have any questions or hit an issue, email jpuhl@sourceclub.io
  • Projected time to spend on this case study is 3-5 hours

Delivery

  • Please return the completed case study within 2 business days of receiving it — the earlier the better.

  • Email your deliverables (links or attachments) to: jpuhl@sourceclub.io AND cduarte@sourceclub.io

  • Make sure any links are set so that anyone with the link can view.

  • In your email, please include:

    • Links to your deliverables for Assignments 1, 2, and 3
    • A link to your video
    • Your PayPal or Venmo address (for the $50 — see Pay above)
    • How long this took you, in hours
    • Any comments or feedback you have for us
    • Feel free to attach any portfolio, GitHub, or prior-work links that represent you and your work

    ----------------------------_

Source Club — Project Queue:

A real, lightly-trimmed snapshot of Source Club's current project queue (May 2026). Team member names have been replaced with role labels; priority and status markers are kept as-is. This is the queue referenced in Assignment 3.

Category 1 — Infrastructure & Data Systems

  • 1.1 Stripe ↔ HubSpot Name Matching — company and subscription names in Stripe must match HubSpot records exactly; complex because of multi-location companies; foundational for billing reporting, dashboards, and health scoring. Status: Next.
  • 1.2 ZenOne Data Integration — ZenOne is the platform members order through; pull all ordering and pricing data into one place to see what each customer orders. Data backbone for the customer health score and lifecycle outreach. Status: Idea.
  • 1.3 Unified Business Data Dashboard — pull HubSpot, Google Analytics, Google Ads, and Facebook Ads into one dashboard; full funnel from marketing spend to closed revenue. Status: Idea.
  • 1.4 Notion Project Management System — build a Notion PM system with automated project, task, and weekly reporting rolled up automatically. Status: Idea.

Category 2 — Sales Pipeline

  • 2.1 Automate Savings Analysis — HIGHEST PRIORITY — ~10 min per analysis × 20–40/month of manual work; take a prospect's purchase history, compare against Source Club pricing, generate the savings report. Status: Not started.
  • 2.2 Sales Pipeline Automation (HubSpot) — automate 2–3 pipeline stages; verify purchase-history request emails; automate post-close document creation. Status: Partially started.
  • 2.3 Communication Tools Integration — install phone/SMS capability into HubSpot for sales; evaluate texting and calling tools. Status: Not started.
  • 2.4 Automate PandaDoc Creation, Signing & Payment — agreements are prepared and sent manually today; automate generation and sending and make member payment entry frictionless. Status: Not started.
  • 2.5 Improve Sales Presentations — rework the discovery-call and savings-analysis-call decks; refine messaging and objection handling. Status: Not started.

Category 3 — Customer Success

  • 3.1 Consolidate Customer Service into HubSpot — HIGHEST PRIORITY — service requests arrive scattered across email, phone, and text with no ticketing system or queue; move it all into HubSpot with a proper ticket system. Status: Not started.
  • 3.2 Onboarding Automation — rework the HubSpot onboarding pipeline; increase automation including scheduler integration; date properties for metrics. Status: Phase 1 complete.
  • 3.3 HubSpot Onboarding Calendar Integration — connect the onboarding-call calendar to HubSpot Meetings so reminders fire and dates auto-populate. Status: Idea.
  • 3.4 CS KPI & Dashboard — one HubSpot dashboard for a full daily read on customer-success performance. Status: Idea.
  • 3.5 Customer Health Score — score each customer on behavior and engagement after onboarding; depends heavily on the ZenOne data integration (1.3). Status: Discussion.
  • 3.6 45/90-Day Check-in & Lifecycle Engagement — structured 45- and 90-day touchpoints; flag "missed savings" opportunities; AI-assisted to keep it low-lift. Status: Discussion.
  • 3.7 Customer Referral & Advocacy Program — systematized referral ask at ~day 60–90; needs a referral landing page built. Status: Draft for review.
  • 3.8 Post-Onboarding Drip Campaign — a drip outreach sequence in the first two weeks after onboarding. Status: Next.
  • 3.9 Member Termination Process — define a clean termination process across billing, CRM, the ordering platform, and suppliers. Status: Idea.
  • 3.10 Customer Service SLA & Onboarding Metrics — define response, resolution, and escalation standards; track onboarding metrics. Status: Not started.
  • 3.11 Educational Product Offerings — scalable, self-serve educational content for members. Status: Discussion.
  • 3.12 Quotes Database — a reference database of quotes the customer-success team can pull from. Status: Idea.
  • 3.13 Monthly Vendor Roster — maintain a current monthly vendor/supplier roster as a repeatable deliverable. Status: Next.

Category 4 — Company-Wide AI Enablement

  • 4.1 Company AI Audit & Enablement Program — interview everyone in the company to find the biggest opportunities to automate, then build them; teach the team to use AI day to day. Status: Idea.

Savings Analysis Loom Links

Jake Notes

  • Two parts
    • Collecting Purchase History
    • Running Savings Analysis

Purchase History Cleanup Training

These videos are 8 months old. Many concepts are the same. Likely there are some data differences.

Overview Loom Video - Where to find SA Looms

LINK to Savings Analysis Videos

How to build Initial SA with Benco

Savings Analysis Training

https://www.loom.com/share/76e49312c83a4778b3ad10628c662aac


here's the data i curated from all the videos that are provided as training:

Here’s the best consolidated packet from the Looms you provided. It is enough to define a credible POC architecture and LLM build brief, but two videos still lack transcript/summary text in the accessible page content, so Darby/Henry-Schein-cleaning details remain partial. loom

Extracted process

The workflow is clearly split into two phases: supplier-specific data acquisition/cleanup and savings-analysis transformation/comparison. The overview video points users to the “Creating Initial Savings Analysis” SOP, notes Henry Schein process changes, and says the work is spreadsheet-heavy enough that extra screen space helps. loom

Benco retrieval requires logging in, navigating to order history, generating a detailed purchase analysis for the past 12 months, setting both date fields to null, checking accuracy, exporting to Excel, and renaming the file. Henry Schein retrieval requires logging in, generating a detailed purchased-items report, exporting to Excel, and saving the file in the customer folder with the supplier name in the filename; the sample account is Demit Dental with about $33,000 spend. us.dental-tribune

Video-by-video

Video What is confirmed Client / data
Overview of Savings Analysis and Data Extraction Techniques Orientation only; use SOP section “Creating Initial Savings Analysis,” multiple Looms, spreadsheet workflow, Henry Schein process changed. loom No sample client named. loom
How to Pull Benco Purchase History for Savings Analysis Pull Benco order history, generate detailed purchase analysis for last 12 months, set both date fields to null, verify, export Excel, rename file. us.dental-tribune Auburn Dental; Benco is main supplier. us.dental-tribune
How to Retrieve Purchase History from Henry Shun Pull Henry Schein purchase history, generate detailed items report, export Excel, save with supplier in filename. ftc Demit Dental; total spend about $33,000. ftc
How to do Base86 into SA using Vlookups Take customer info using Auburn Dental Group example, transpose SA data into Base86 template, upload product file, generate report, use VLOOKUP to populate Savings Analysis. facebook Auburn Dental Group; Base86 template/report. facebook
How to Clean Darby Purchase History(req 2 Vlookups) Confirmed only that Darby cleaning exists and requires two VLOOKUPs; transcript/summary text was not exposed here. youtube No client/data exposed. youtube
Henry Schein: Pull Purchase History + Clean the Data Confirms a combined Henry Schein retrieval + cleaning workflow exists, but transcript/summary text was not exposed here. investor.henryschein No extra exposed client/data beyond title. investor.henryschein

POC design

The real system should not start with fuzzy LLM matching. It should start with deterministic adapters per supplier, because the training materials show supplier-specific retrieval/export formats and spreadsheet transformations before data ever reaches the analysis stage. ftc

Recommended pipeline:

  1. Upload supplier export and select or auto-detect supplier.
  2. Run supplier adapter: Benco, Henry Schein, Darby, Base86. facebook
  3. Normalize into canonical schema.
  4. Deduplicate / consolidate equivalent products.
  5. Enrich manufacturer SKU where missing.
  6. Match to Source Club pricing catalog.
  7. Calculate current spend, projected spend, and estimated savings.
  8. Send low-confidence rows to review queue.

Claude brief

Build a POC for Source Club “Savings Analysis” automation.

Business goal:
Automate the manual workflow that compares a prospect’s supplier purchase history against Source Club negotiated pricing to estimate savings.

Confirmed from process videos:
- Workflow is spreadsheet-heavy and currently manual.
- There is an SOP section called “Creating Initial Savings Analysis.”
- Supplier-specific ingestion exists before the common savings-analysis stage.
- Henry Schein workflow changed recently.

Supplier adapters confirmed:
1. Benco adapter
   - Log in and access order history.
   - Generate detailed purchase analysis for past 12 months.
   - Ensure both date fields are null.
   - Verify report accuracy.
   - Export Excel and rename file.
   - Example client: Auburn Dental.
2. Henry Schein adapter
   - Log in and generate detailed purchased-items report.
   - Export Excel.
   - Save file in customer folder with supplier name in filename.
   - Example client: Demit Dental.
   - Example total spend: ~$33,000.
3. Base86 adapter
   - Use Savings Analysis data and transpose it into Base86 template.
   - Upload product file into Base86.
   - Generate report.
   - Use VLOOKUP-style lookup to populate the Savings Analysis output.
   - Example client: Auburn Dental Group.
4. Darby adapter
   - Cleaning process requires two VLOOKUP-style joins.

System architecture:
- Frontend:
  - file upload
  - supplier selector / auto-detection
  - review queue for exceptions
  - export button
- Backend:
  1. supplier adapter layer
  2. spreadsheet cleanup layer
  3. canonical schema mapper
  4. deduplication / clustering engine
  5. SKU enrichment layer
  6. pricing/catalog matching engine
  7. savings calculator
  8. human-review queue
  9. exporter

Canonical schema:
- customer_name
- supplier_name
- raw_description
- supplier_sku
- manufacturer_name
- manufacturer_sku
- quantity
- unit_price
- annual_spend
- report_period
- normalized_description
- normalized_pack_size
- normalized_uom
- cluster_id
- matched_catalog_sku
- matched_catalog_description
- matched_price
- estimated_savings
- confidence_score
- review_status

Business rules:
- Supplier-specific parsing happens first.
- Use deterministic spreadsheet ETL before AI.
- Use lookup-table logic where known (Base86, Darby).
- Only use AI/LLM for ambiguous normalization or missing manufacturer SKU enrichment.
- Keep confidence scores and an audit trail.
- Human reviews low-confidence rows and overrides mappings.

POC scope:
- Support Benco and Henry Schein first.
- Mock Darby/Base86 transformations with configurable lookup tables.
- Produce final Excel/CSV output with:
  - cleaned input
  - matched items
  - unmatched items
  - savings summary

Sample datasets to fabricate:
- Auburn Dental / Benco
- Demit Dental / Henry Schein
- Auburn Dental Group / Base86

Important caveat:
Some training videos confirm the existence of cleaning steps but do not expose full transcript text here, so some exact formulas/columns should be treated as assumptions and clearly marked in the implementation.

Honest gaps

What you still do not have from the accessible transcript data is the exact Darby cleaning logic, exact Henry Schein cleaning logic, exact SA workbook columns, exact Base86 file layout, and the exact VLOOKUP keys/tables. Those details were not exposed in the page content for the Darby and Henry Schein clean videos. youtube

If you use this in an interview/homework context, present it as: “I reverse-engineered the workflow from available Loom metadata and would validate remaining transform details from the underlying spreadsheets/transcripts.” That is accurate and defensible. investor.henryschein