Skip to content

Latest commit

 

History

History
550 lines (430 loc) · 15.7 KB

File metadata and controls

550 lines (430 loc) · 15.7 KB

Stateless Architecture: Zero Customer Data Storage

Problem with SQL Storage

  • Data at rest is an attack surface
  • Breach = user data exposed
  • GDPR/CCPA deletion requirements = data management headaches
  • Compliance certifications required
  • Backup/recovery complexity

Solution: Stateless Architecture

Don't store customer data at all. Only store what's absolutely necessary.


🎯 Option 1: Session-Only (Recommended for MVP)

Store: Only the user's Plaid access token in secure session Not Stored: Transactions, balances, credit scores, personal info

How It Works

# User logs in with Plaid Link
# Plaid returns: access_token
# Your app: Store ONLY in encrypted session/cookie

# Customer visits dashboard
# Your app: Fetch fresh data from Plaid on-demand
# Display live data
# Discard after response

Implementation

from flask import session
from cryptography.fernet import Fernet

class StatelessPlaidClient:
    """No database. Session-only token storage."""
    
    def __init__(self, encryption_key):
        self.cipher = Fernet(encryption_key)
    
    def store_token_in_session(self, access_token):
        """Encrypt token and store in browser cookie (httpOnly)"""
        encrypted = self.cipher.encrypt(access_token.encode())
        session['plaid_token'] = encrypted.hex()
        session.permanent = True
        session.modified = True
    
    def get_token_from_session(self):
        """Retrieve and decrypt token from session"""
        if 'plaid_token' not in session:
            raise ValueError("No Plaid token in session")
        
        encrypted = bytes.fromhex(session['plaid_token'])
        token = self.cipher.decrypt(encrypted).decode()
        return token
    
    def get_dashboard_data(self):
        """Fetch fresh data on-demand, don't store"""
        token = self.get_token_from_session()
        
        # Fetch from Plaid (live data)
        accounts = plaid_client.accounts_get(access_token=token)
        transactions = plaid_client.transactions_get(
            access_token=token,
            start_date='2024-09-13',
            end_date='2024-12-13'
        )
        balances = plaid_client.accounts_balance_get(access_token=token)
        
        # Return to browser
        # Data is NOT stored in database
        # Discarded after response
        return {
            'accounts': accounts.to_dict(),
            'transactions': transactions.to_dict(),
            'balances': balances.to_dict()
        }

# Flask setup
app.secret_key = os.getenv('FLASK_SECRET_KEY')
app.config['SESSION_COOKIE_HTTPONLY'] = True  # Can't be accessed by JavaScript
app.config['SESSION_COOKIE_SECURE'] = True    # HTTPS only
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(hours=24)

@app.route('/')
def dashboard():
    stateless = StatelessPlaidClient(os.getenv('ENCRYPTION_KEY'))
    data = stateless.get_dashboard_data()  # Fresh from Plaid
    return render_template('dashboard.html', data=data)

Pros ✅

  • Zero data breach risk - Nothing stored in database
  • GDPR compliant - No customer data to delete (it never existed)
  • No compliance certifications needed - You're not a data processor
  • Simple deployment - No database required
  • Cost-effective - No database server
  • Always fresh data - Real-time from Plaid

Cons ❌

  • Plaid API calls on every page load - Slower (200-500ms per request)
  • Rate limiting risk - Plaid has request limits
  • No offline mode - Can't show data if Plaid is down
  • No historical analysis - Can't show trends over time
  • Session expires - User must re-authenticate

Best For

  • MVP/Proof of concept
  • Low-traffic apps (<1000 users)
  • When you want zero data liability

🎯 Option 2: Plaid as Your Database (Best for Production)

Store: Only Plaid access token in secure session Not Stored: Transactions, balances, credit scores Why: Let Plaid store the data - they're compliance-certified

How It Works

# Plaid Link => access_token
# Store token in session (encrypted, httpOnly cookie)
# Every request => fetch fresh from Plaid
# Plaid handles compliance, backups, security

Architecture Diagram

User Browser
    ↓
Session (encrypted cookie)
    ├─ plaid_access_token (encrypted)
    └─ refresh_token (encrypted)
    ↓
Plaid API (your "database")
    ├─ Transactions (Plaid stores)
    ├─ Accounts (Plaid stores)
    ├─ Balances (Plaid stores)
    └─ Credit data (Plaid stores)

Your Database = EMPTY (or only stores subscription/billing)

Implementation

class PlaidAsDatabase:
    """Plaid is our database. We only store access tokens."""
    
    def __init__(self):
        self.plaid_client = plaid.ApiClient(
            configuration=plaid.Configuration(
                host=plaid.Environment.Sandbox,
                api_key=os.getenv('PLAID_SECRET')
            )
        )
    
    def get_transactions(self, access_token, days=90):
        """Query Plaid (your database)"""
        start_date = (datetime.now() - timedelta(days=days)).date()
        end_date = datetime.now().date()
        
        response = self.plaid_client.transactions_get(
            access_token=access_token,
            start_date=start_date,
            end_date=end_date
        )
        
        # Don't store in your database
        # Return directly to frontend
        return response.to_dict()
    
    def get_spending_by_category(self, access_token):
        """Aggregate from Plaid, don't store"""
        transactions = self.get_transactions(access_token)
        
        # Aggregate in memory (don't store)
        by_category = {}
        for txn in transactions['transactions']:
            category = txn['personal_finance_category']['primary']
            if category not in by_category:
                by_category[category] = 0
            by_category[category] += abs(txn['amount'])
        
        return by_category
    
    def get_credit_score(self, access_token):
        """Query Plaid, don't store"""
        response = self.plaid_client.credit_report_get(
            access_token=access_token
        )
        
        score = response['credit_report']['scores'][0]['value']
        return score

# Flask routes - no database queries
@app.route('/api/dashboard')
@require_auth
def get_dashboard():
    token = session['plaid_token']  # From encrypted cookie
    
    plaid_db = PlaidAsDatabase()
    
    return jsonify({
        'transactions': plaid_db.get_transactions(token, days=30),
        'spending': plaid_db.get_spending_by_category(token),
        'credit_score': plaid_db.get_credit_score(token)
    })

Pros ✅

  • Zero data breach risk - Nothing in your database
  • GDPR/CCPA effortless - No customer data to manage
  • Live data always - Real-time from Plaid
  • Plaid handles compliance - SOC 2 Type II certified
  • Plaid handles backups - Not your responsibility
  • Plaid handles security - Enterprise security

Cons ❌

  • Plaid dependency - If Plaid is down, app is down
  • Rate limiting - Plaid has API quotas per user
  • Slower - API call per request (~200-500ms)
  • No historical trends - Can't show "spending over 6 months"
  • Costs scale - More users = more Plaid API calls

Best For

  • Production apps where data liability is concern
  • Companies without data engineering resources
  • When Plaid uptime is acceptable
  • Budget-conscious startups

🎯 Option 3: Minimal Cache (Best Balance)

Store: Access token + last 7 days of transactions (metadata only) Not Stored: Personal info, SSN, credit scores Benefit: Better UX + minimal compliance burden

How It Works

class MinimalCacheArchitecture:
    """Cache only transaction metadata, not personal data"""
    
    def __init__(self):
        self.redis = redis.Redis()  # In-memory cache
    
    def sync_transactions(self, user_id, access_token):
        """Sync only transaction METADATA to Redis (TTL: 7 days)"""
        
        # Fetch from Plaid
        transactions = plaid_client.transactions_get(access_token)
        
        # Store ONLY metadata (no sensitive data)
        cache_data = [
            {
                'id': txn['transaction_id'],
                'date': txn['date'],
                'amount': txn['amount'],
                'category': txn['personal_finance_category']['primary'],
                'merchant': txn['merchant_name']
                # NOT stored: name, address, SSN, credit info
            }
            for txn in transactions['transactions']
        ]
        
        # Store in Redis with 7-day expiry
        self.redis.setex(
            f"user:{user_id}:transactions",
            timedelta(days=7),
            json.dumps(cache_data)
        )
    
    def get_dashboard(self, user_id, access_token):
        """Serve from cache, skip sensitive data"""
        
        # Try cache first
        cached = self.redis.get(f"user:{user_id}:transactions")
        if cached:
            return json.loads(cached)  # Fast, cached response
        
        # Cache miss: sync from Plaid
        self.sync_transactions(user_id, access_token)
        return self.get_dashboard(user_id, access_token)

# No SQL database needed, just Redis cache

Pros ✅

  • Minimal data liability - Only transaction metadata
  • Faster UX - Served from Redis cache
  • Still GDPR-friendly - 7-day auto-deletion
  • Scales better - Fewer Plaid API calls
  • Simple compliance - No PII stored

Cons ❌

  • Still need infrastructure - Redis server required
  • Cache invalidation - "Is cache stale?" complexity
  • Eventual consistency - Data lags behind Plaid by up to 7 days

Best For

  • Scaling from MVP to production
  • Apps that need better performance
  • When you want SOME caching without full database

🎯 Option 4: Hybrid (My Recommendation)

Layer 1: httpOnly encrypted session cookie

  • Plaid access token
  • User ID
  • Session metadata

Layer 2: Redis cache (optional, 7-day TTL)

  • Transaction metadata only (no PII)
  • Spending aggregates (calculated on-the-fly)
  • User preferences

Layer 3: Postgres (minimal)

  • Only: user ID, email, subscription status
  • NO financial data
  • NO personal data

Why This Works

User logs in
    ↓
Session cookie stores: access_token
    ↓
Check Redis for cached transactions
    ├─ Hit: Return from cache (fast)
    └─ Miss: Fetch from Plaid + cache in Redis
    ↓
Postgres stores: { user_id, email, subscription }
    (Just enough for billing/auth)

Data liability: MINIMAL
Performance: FAST
Compliance: TRIVIAL (no PII to manage)

Code Example

from flask import session
import redis
from sqlalchemy import Column, String
from datetime import timedelta

# Minimal Postgres model
class User(db.Model):
    id = Column(UUID, primary_key=True)
    email = Column(String(255), unique=True)  # Only email
    subscription = Column(String(50))         # premium/free
    created_at = Column(DateTime)
    # NO credit data, NO transactions, NO personal info

# Session (encrypted cookie)
session['plaid_token']      # Plaid access token
session['user_id']          # User ID
session.cookie_secure = True
session.cookie_httponly = True

# Redis cache (7-day TTL)
redis.setex(
    f"cache:txns:{user_id}",
    timedelta(days=7),
    transaction_data_json
)

# Hybrid dashboard
@app.route('/dashboard')
def dashboard():
    user_id = session['user_id']
    access_token = session['plaid_token']
    
    # Try cache first
    cached = redis.get(f"cache:txns:{user_id}")
    if cached:
        return render_template('dashboard.html', 
                             transactions=json.loads(cached))
    
    # Cache miss: fetch from Plaid
    txns = plaid_client.transactions_get(access_token)
    
    # Cache for 7 days
    redis.setex(f"cache:txns:{user_id}", 
                timedelta(days=7),
                json.dumps(txns))
    
    return render_template('dashboard.html', transactions=txns)

# Compliance is trivial
# User deletion: Just clear their session + Redis cache
@app.route('/api/delete-me', methods=['DELETE'])
def delete_account():
    user_id = session['user_id']
    
    User.query.filter_by(id=user_id).delete()  # Delete email only
    redis.delete(f"cache:txns:{user_id}")      # Clear cache
    session.clear()                             # Clear session
    
    db.commit()
    return {'status': 'deleted'}

📊 Comparison Table

Feature Session Only Plaid as DB Minimal Cache Hybrid
Data Breach Risk Zero ✅ Zero ✅ Very Low ✅ Very Low ✅
GDPR Compliance Trivial Trivial Easy Easy
API Calls/User High High Low ✅ Low ✅
Performance Slow Slow Fast ✅ Fast ✅
Cost Low Medium Medium Medium
Complexity Low ✅ Low ✅ Medium Medium
Infrastructure Nothing Nothing Redis Redis + Postgres
Can Show Trends No No Depends Yes ✅
Offline Mode No No 7 days 7 days

🚀 Recommended Path

For MVP (First 3 months)

Use Option 1: Session-Only

- No database
- No infrastructure
- Deploy to Render free tier
- Focus on features
- Acceptable performance for low traffic

For Production (3-6 months)

Switch to Option 4: Hybrid

- Redis cache for performance
- Minimal Postgres for billing/auth
- Still minimal compliance burden
- Scales to 10k+ users

For Scale (6+ months)

Add real database only if:

  • You need 6+ months of historical data
  • You're offering trend analysis
  • You're willing to invest in compliance
  • User data is a business asset

🔐 Security Wins

With stateless/minimal storage:

❌ Data breach = user info NOT exposed
   (Only encrypted token was stored)

❌ Ransomware = attacker gets NOTHING valuable
   (No customer data to exfiltrate)

❌ Disgruntled employee = can't steal customer data
   (They can steal code, not data)

❌ SQL injection = no customer data exposed
   (Attacker gets subscription status, not transactions)

✅ GDPR requests = "We don't store your data"
   (Literally nothing to export/delete)

📋 Decision Framework

Choose Option 1 (Session-Only) if:

  • Building MVP
  • <1000 users
  • Want zero infrastructure
  • Can tolerate 200-500ms latency
  • Acceptable Plaid API limits

Choose Option 4 (Hybrid) if:

  • Ready for production
  • Need better performance
  • Want historical data (but not 6+ months)
  • Have some infrastructure budget
  • Need faster GDPR responses

Choose traditional database only if:

  • Offering complex analytics
  • Need 6+ months of history
  • Building ML features
  • Willing to invest in compliance
  • Data is your product

⚡ Implementation Priority

  1. Week 1: Refactor to session-only storage

    • Remove user data storage from app.py
    • Keep only: user_id, plaid_token in session
    • Update templates to request fresh data
  2. Week 2: Add Redis caching (optional)

    • Cache transaction metadata only
    • 7-day TTL
    • No PII/sensitive data
  3. Week 3: Simplify database

    • Keep only: User (id, email, subscription)
    • Move everything else to session/cache
  4. Week 4: Document compliance

    • "We don't store customer financial data"
    • "Data is encrypted in transit"
    • "GDPR: No data to export/delete"

💡 Bottom Line

You can eliminate ~95% of your data compliance burden by not storing customer data.

It seems counterintuitive, but:

  • Storing less data = fewer security concerns
  • Fewer security concerns = easier compliance
  • Easier compliance = less expensive to operate
  • Less expensive = more profit margin

The question isn't "How do we securely store customer data?" The question is "Do we need to store customer data at all?"

Often the answer is: No.