Successfully implemented a flexible, interface-based image generation system that supports multiple AI providers. Teachers can now choose between Azure OpenAI DALL-E 3 and Ideogram AI for generating educational images.
Created a clean separation between the image generation interface and provider implementations:
┌─────────────────────────────────┐
│ ImageGenerationService │ ← Main service (facade)
│ - Manages caching │
│ - Routes to correct provider │
└─────────────────────────────────┘
↓
┌─────────────────────────────────┐
│ ImageGenerator Interface │
│ - GenerateImage() │
│ - GetProviderName() │
│ - IsConfigured() │
└─────────────────────────────────┘
↙ ↘
┌──────────────────┐ ┌──────────────────┐
│ AzureOpenAI │ │ Ideogram │
│ Generator │ │ Generator │
└──────────────────┘ └──────────────────┘
ImageGeneratorinterfaceImageCacheManagerinterfaceImageGeneratorOptionsstructGeneratedImageResultstruct
- Implements
ImageCacheManager - MD5-based caching
- Local file storage
- Cache statistics and management
- Implements
ImageGenerator - DALL-E 3 integration
- Educational prompt engineering
- Size/quality/style options
- Implements
ImageGenerator - Ideogram AI V_2 integration
- RESTful API client
- Safety filtering
- Provider selection based on config
- Unified caching layer
- Backward compatibility
Added to backend/config/config.go:
// Ideogram Configuration
IdeogramAPIKey string
// Image Generation Provider
ImageGenerationProvider string // "azure" or "ideogram"# Choose provider
IMAGE_GENERATION_PROVIDER=azure # or "ideogram"
# Azure OpenAI (if using Azure)
AZURE_OPENAI_KEY=your_key
AZURE_OPENAI_ENDPOINT=https://...
AZURE_OPENAI_DEPLOYMENT=dall-e-3
# Ideogram (if using Ideogram)
IDEOGRAM_API_KEY=your_key
# Caching (works with both)
IMAGE_CACHE_ENABLED=true| Feature | Azure DALL-E 3 | Ideogram AI |
|---|---|---|
| Setup Complexity | High (Azure resource + deployment) | Low (API key only) |
| Image Style | Cartoon, artistic | Realistic, design |
| Text Rendering | Moderate | Excellent |
| Speed | 10-30 sec | 5-15 sec |
| Cost | $0.04-$0.12/image | $0.08/image |
| Best For | Cartoons, playful scenes | Realistic images, labels |
- Clean interface for adding new providers
- Easy switching via configuration
- No code changes needed to switch providers
- Works with all providers
- MD5-based cache keys
- Automatic download and storage
- Cache stats and management
- Child-safe content (ages 4-12)
- Bright, colorful, cheerful style
- Clear visual representation
- Provider-specific optimization
- Graceful degradation
- Clear error messages
- Provider-specific diagnostics
- Existing code continues to work
- Type aliases for smooth migration
- Same API surface
1. User sets IMAGE_GENERATION_PROVIDER in .env
2. NewImageGenerationService() reads config
3. Switch statement creates appropriate generator:
- "ideogram" → NewIdeogramGenerator()
- "azure" or default → NewAzureOpenAIGenerator()
4. Service wraps generator + cache
5. All requests go through unified interface
1. Request comes in with word + translation
2. Check if provider is configured
3. Build cache key from prompt
4. Check cache (if enabled)
├─ Hit: Return cached image
└─ Miss: Continue
5. Call provider-specific GenerateImage()
6. Download and cache result
7. Return image URL + local path
// Frontend code works the same
const result = await imageGenerationService.generateImage({
word: "apple",
translation: "蘋果",
size: "1024x1024",
quality: "standard",
style: "vivid"
});// Service automatically uses configured provider
result, err := service.GenerateImage(ctx, opts)
// Could be Azure or Ideogram - transparent to caller!To add a new provider (e.g., Stability AI, Midjourney):
// services/stability_generator.go
type StabilityGenerator struct {
apiKey string
}
func (g *StabilityGenerator) GenerateImage(ctx context.Context, opts ImageGeneratorOptions) (*GeneratedImageResult, error) {
// Implementation
}
func (g *StabilityGenerator) GetProviderName() string {
return "Stability AI"
}
func (g *StabilityGenerator) IsConfigured() bool {
return g.apiKey != ""
}// config/config.go
StabilityAPIKey string// services/image_generation_service.go
case "stability":
gen, err := NewStabilityGenerator()
// ...IMAGE_GENERATION_PROVIDER=stability
STABILITY_API_KEY=your_keyDone! No changes to handlers, routes, or frontend needed.
IMAGE_GENERATION_PROVIDER=azure
go run main.go
# Generate an imageIMAGE_GENERATION_PROVIDER=ideogram
go run main.go
# Generate an image# Generate with Azure
IMAGE_GENERATION_PROVIDER=azure
# Click "Generate Image" - uses Azure
# Switch to Ideogram
IMAGE_GENERATION_PROVIDER=ideogram
# Restart backend
# Click "Generate Image" - uses Ideogram
# Compare results!docs/AZURE_OPENAI_SETUP.md- Azure DALL-E 3 setupdocs/IDEOGRAM_SETUP.md- Ideogram AI setupdocs/IMAGE_GENERATION_ERROR_FIX.md- Troubleshootingdocs/TEACHER_IMAGE_GENERATION_GUIDE.md- Teacher guide
services/image_generator_interface.go- Interface documentation- Code comments in all provider implementations
| Scenario | First Request | Subsequent Requests |
|---|---|---|
| Azure (no cache) | 15-30 sec, $0.04-$0.12 | 15-30 sec, $0.04-$0.12 |
| Azure (with cache) | 15-30 sec, $0.04-$0.12 | <100ms, $0.00 |
| Ideogram (no cache) | 5-15 sec, $0.08 | 5-15 sec, $0.08 |
| Ideogram (with cache) | 5-15 sec, $0.08 | <100ms, $0.00 |
- Location:
backend/uploads/image-cache/ - Format: PNG files with MD5 hash names
- Average size: 200-500KB per image
- 1000 images ≈ 200-500MB storage
Without Caching:
- Azure DALL-E 3: $40-$120
- Ideogram: $80
With Caching (one-time generation):
- Azure DALL-E 3: $40-$120 (first time), then $0
- Ideogram: $80 (first time), then $0
Best Practice: Pre-generate common words during setup, cache enabled.
- Never commit
.envto Git - Use environment variables in production
- Rotate keys regularly
- Azure: Built-in content filters
- Ideogram: Safety checks + negative prompts
- Both: Educational prompt templates
- Teacher and Admin only
- JWT authentication required
- Rate limiting recommended (future enhancement)
-
More Providers:
- Stability AI
- Midjourney (when API available)
- Replicate models
-
Advanced Features:
- Image variations
- Style mixing
- Batch optimization
- Custom prompt templates per provider
-
Analytics:
- Provider performance tracking
- Cost analytics per provider
- Quality ratings from teachers
-
UI Enhancements:
- Provider selection in UI
- Side-by-side comparison
- Regenerate with different provider
The refactored system is backward compatible. Existing code works without changes:
// Old way (still works)
opts := ImageGenerationOptions{
Word: "apple",
Translation: "蘋果",
}
// New way (same result)
opts := ImageGeneratorOptions{
Word: "apple",
Translation: "蘋果",
}Type aliases ensure smooth transition.
- Interface-based architecture for multiple providers
- Azure OpenAI DALL-E 3 provider (refactored)
- Ideogram AI provider (new)
- File-based cache manager (extracted)
- Provider selection via configuration
- Comprehensive documentation
- Backward compatibility
- Flexibility: Easy to switch providers or add new ones
- Cost optimization: Choose provider based on budget
- Quality options: Pick best provider for content type
- Future-proof: Interface-based for easy extensions
- Maintainability: Clean separation of concerns
# In .env, choose your provider:
IMAGE_GENERATION_PROVIDER=azure # Use Azure DALL-E 3
# OR
IMAGE_GENERATION_PROVIDER=ideogram # Use Ideogram AI- Configure desired provider in
.env - Test image generation
- Compare providers for your use case
- Enable caching for cost savings
- Pre-generate common vocabulary images
Implementation Date: January 2025
Version: 2.0 (Multi-Provider)
Providers Supported: Azure OpenAI DALL-E 3, Ideogram AI
Architecture: Interface-based, extensible, cached