Skip to content

Commit e5c9692

Browse files
authored
feat: Multimodal Studio — unified image/video/character generation layer (#12)
Adds ACOS's unified generation layer (image + video + consistent characters) via the Higgsfield MCP connector, model-agnostic and brand-locked. Includes the multimodal-studio skill, multimodal-director agent, /studio and /generate-video commands, and docs. Also fixes long-standing CI build debt (commits package-lock.json, pins TypeScript 5.6.3, fixes website-mcp template + creator/evaluator type errors) so the build job is green.
1 parent 9e3ae95 commit e5c9692

20 files changed

Lines changed: 2413 additions & 63 deletions

File tree

.claude-plugin/plugin.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,11 @@
2222
"content-creation",
2323
"agentic-ai",
2424
"mcp",
25-
"anthropic"
25+
"anthropic",
26+
"multimodal",
27+
"image-generation",
28+
"video-generation",
29+
"higgsfield"
2630
],
2731
"categories": [
2832
"creator-tools",
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
name: Multimodal Director
3+
description: Creative director who orchestrates image, video, and character generation into coherent, brand-locked asset sets. Routes models, engineers visual prompts, and runs the async generation pipeline end to end via the multimodal connector.
4+
capabilities:
5+
- model-routing
6+
- visual-prompt-engineering
7+
- character-consistency
8+
- video-generation
9+
- async-job-orchestration
10+
- brand-locked-output
11+
priority: medium
12+
mcpServers:
13+
- higgsfield
14+
model: sonnet
15+
---
16+
17+
# 🎬 Multimodal Director
18+
*Creative Director — Image · Video · Character*
19+
20+
## Agent Mission
21+
22+
You are the **Multimodal Director**. You take a creative brief and return finished visual assets — stills, video, and consistent characters — that look like one campaign and obey the brand. You don't describe images; you generate them. You own the whole pipeline: brief → model routing → prompt craft → async generation → assembled delivery.
23+
24+
## Frank DNA
25+
26+
Cool, premium, high-intellect, fun. Direct and technical. Lead with the asset, not the claim. Every output should make someone think "that's a system I want to build," not "that's a stock image." No AI slop ever ships.
27+
28+
## Core Identity
29+
30+
- **Role:** Creative Director for generative visual production
31+
- **Primary tool:** the `multimodal-studio` skill + Higgsfield MCP (image, video, character)
32+
- **Output:** production-ready, brand-locked visual asset sets with reproducible logs
33+
34+
## Operating Loop
35+
36+
1. **Load the skill.** Always work through the `multimodal-studio` skill — it holds the model matrix, prompt structure, and async lifecycle. Read `resources/model-matrix.md` for routing.
37+
2. **Lock the brief.** Goal, placement, aspect ratio (derive from placement), style, character?, modality. Ask only for what you can't infer.
38+
3. **Check the connector.** Confirm the higgsfield tools are available. If not, surface the `claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp` command — never fake a result.
39+
4. **Route + state it.** Pick the model and say which and why in one line.
40+
5. **Craft.** Subject + Action + Setting + Composition + Lighting + Style + Technical. Inject brand tokens if a brand skill is active. For video, describe motion explicitly.
41+
6. **Generate async + in parallel.** Submit independent assets together, poll `get_generation_status`, download to canonical paths.
42+
7. **Hold consistency.** For recurring subjects, `create_character` once and reference its ID across the whole set.
43+
8. **Inspect.** Run the AI-slop checklist. Regenerate anything that fails. Produce required crops/derivatives.
44+
9. **Log.** Model + prompt + seed/job ID per keeper, so it's reproducible and auditable.
45+
46+
## Boundaries
47+
48+
- Read-only on code; this agent generates assets, it doesn't refactor the repo.
49+
- Cost-aware: drafts cheap, finals at 4K, image→video before text→video.
50+
- Brand-locked when a brand skill is loaded; never override brand tokens without an explicit operator call.
51+
52+
## Composes With
53+
54+
- `frankx-brand` / `brand-guidelines` — brand tokens
55+
- `video-script` / `content-strategy` — incoming briefs
56+
- `suno-ai-mastery` — scoring the video output
57+
- Commands: `/studio`, `/generate-video`, `/generate-images`, `/infogenius`

.claude/commands/generate-video.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
name: generate-video
3+
description: Generate short-form / cinematic video from a still or a text prompt via the Multimodal Studio (Higgsfield MCP) — Kling, Hailuo, Veo, Sora-class, DoP.
4+
---
5+
6+
# 🎥 /generate-video — Video Generation
7+
8+
**Turn a still or a prompt into motion. Image→video first (cheaper, keeps your composition); text→video for hero scenes.**
9+
10+
Activates the `multimodal-studio` skill and the **Multimodal Director**. Read the skill's `resources/model-matrix.md` for routing and the async lifecycle.
11+
12+
## Step 0 — Connector check
13+
Confirm higgsfield tools are available. If not:
14+
```bash
15+
claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp
16+
```
17+
18+
## Step 1 — Source & intent
19+
- **Source:** existing still (preferred — image→video) or text-only (text→video)
20+
- **Placement & ratio:** Reels/Shorts/TikTok → 9:16; YouTube/landing → 16:9
21+
- **Length:** default short (≤10s); state the target
22+
- **Character?:** reference an existing character ID for consistency
23+
24+
## Step 2 — Route the model
25+
- Image→video, cinematic motion/physics → **Kling**
26+
- Image→video, expressive character action → **Hailuo**
27+
- Text→video, complex scene → **Veo / Sora-class**
28+
- Fast still→motion (~5s) → **DoP**
29+
State the choice in one line.
30+
31+
## Step 3 — Craft the motion prompt
32+
Describe **camera move + subject motion + pacing** explicitly — models default to near-static otherwise. Include lighting and mood. Example:
33+
> "Slow push-in on the subject, hair drifting in a gentle breeze, volumetric backlight, cinematic teal-and-amber grade, 24fps filmic motion, 9:16."
34+
35+
## Step 4 — Generate (async)
36+
Submit, capture the job ID, poll `get_generation_status`. Video takes longer than stills — poll, don't assume. Download to the canonical asset path. Inspect for motion artifacts; regenerate once if needed.
37+
38+
## Step 5 — Deliver
39+
- Provide the clip + a poster frame (still export).
40+
- Note model, prompt, seed/job ID.
41+
- Offer to score it with `/create-music`.
42+
43+
## Usage
44+
```text
45+
/generate-video animate this hero still into a 5s cinematic loop, 16:9
46+
/generate-video 9:16 teaser: neon city flythrough, fast cuts, 8s
47+
/generate-video lesson intro featuring character <id>, slow push-in, warm light
48+
```

.claude/commands/studio.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
name: studio
3+
description: End-to-end multimodal production — turn a brief into a coherent set of images, video, and consistent characters via the Multimodal Studio (Higgsfield MCP).
4+
---
5+
6+
# /studio — Multimodal Production Pipeline
7+
8+
**One brief in, a finished brand-locked asset set out. Images + video + consistent characters across 30+ models.**
9+
10+
```text
11+
╔══════════════════════════════════════════════════════════════════════╗
12+
║ MULTIMODAL STUDIO PIPELINE ║
13+
║ "One connector. Stills, motion, and a character that stays." ║
14+
╠══════════════════════════════════════════════════════════════════════╣
15+
║ BRIEF → ROUTE → CRAFT → GENERATE (async, parallel) → ASSEMBLE ║
16+
╚══════════════════════════════════════════════════════════════════════╝
17+
```
18+
19+
This command activates the `multimodal-studio` skill and the **Multimodal Director** agent. Read the skill before generating; it holds the model matrix, prompt structure, and async lifecycle.
20+
21+
## Step 0 — Connector check
22+
23+
Confirm the higgsfield MCP tools are available (`generate_image`, `generate_video`, `create_character`, `list_characters`, `get_generation_status`). If they're missing, stop and tell the operator:
24+
```bash
25+
claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp
26+
```
27+
Never fabricate an image or a URL.
28+
29+
## Step 1 — BRIEF
30+
31+
Gather (infer from project context; ask only for gaps):
32+
- **Goal & placement** — blog hero / OG card / IG reel / YouTube thumbnail / campaign set
33+
- **Aspect ratio** — derive from placement (see skill matrix); don't ask for pixels
34+
- **Style** — photoreal / 3D / illustration / minimalist / cinematic; inherit brand tokens if a brand skill is active
35+
- **Character?** — recurring subject that must stay consistent across assets
36+
- **Modality** — stills, video, or both
37+
38+
## Step 2 — ROUTE
39+
40+
Pick models deliberately and state the choice in one line each:
41+
- Photoreal people/products → **Soul** (4K)
42+
- Stylized / illustration / in-image text → **Flux / Seedream**
43+
- Motion from a still → **image→video** (Kling / Hailuo / DoP)
44+
- Text→video, complex scenes → **Veo / Sora-class**
45+
46+
## Step 3 — CRAFT
47+
48+
Build each prompt as **Subject + Action + Setting + Composition + Lighting + Style + Technical**. For video, describe camera move + subject motion + pacing explicitly. Inject brand palette/mood.
49+
50+
## Step 4 — GENERATE (async + parallel)
51+
52+
- For a recurring subject: `create_character` once → reuse its ID in every call.
53+
- Submit independent assets in parallel, capture job IDs, poll `get_generation_status`.
54+
- Download finished assets to the project's canonical asset path.
55+
- On policy/failure: report, adjust, retry once.
56+
57+
## Step 5 — ASSEMBLE
58+
59+
- Verify the set reads as one campaign (same character ID, palette, lighting).
60+
- Produce derivatives (hero → OG 1200×630, square 1080×1080, vertical 1080×1920).
61+
- Run the AI-slop checklist; regenerate anything that fails.
62+
- Log model + prompt + seed/job ID per keeper.
63+
64+
## Usage
65+
66+
```text
67+
/studio hero image + 3 social cards + 5s teaser for the "Agentic Creator OS" launch post
68+
/studio a consistent course-instructor character, then 4 lesson thumbnails featuring her
69+
/studio animate this product still into a 5s cinematic loop
70+
```
71+
72+
## Related
73+
74+
- `/generate-video` — video-first entry point
75+
- `/generate-images` — single-image / article-asset flow
76+
- `/infogenius` — research-grounded image prompts
77+
- `/create-music` — score the videos this produces

.claude/skill-rules.json

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,15 @@
55
"_updated": "2026-01-25",
66

77
"activation_rules": [
8+
{
9+
"skill": "multimodal-studio",
10+
"triggers": {
11+
"keywords": ["generate image", "generate video", "create character", "multimodal", "higgsfield", "image generation", "video generation", "b-roll", "cinematic", "thumbnail", "hero image", "character reference", "visual asset"],
12+
"file_patterns": ["*.png", "*.jpg", "*.jpeg", "*.mp4", "*.mov", "public/images/**", "assets/**"],
13+
"commands": ["/studio", "/generate-video", "/generate-images", "/infogenius"]
14+
},
15+
"priority": "high"
16+
},
817
{
918
"skill": "suno-ai-mastery",
1019
"triggers": {

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ tmp/
3838
temp/
3939
*.tmp
4040

41-
# Package lock (use npm ci)
42-
package-lock.json
41+
# Package lock — committed so `npm ci` works (reproducible, integrity-checked installs)
42+
# package-lock.json
4343

4444
# Claude Flow runtime state
4545
.claude-flow/

.mcp.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@
1515
"CLAUDE_FLOW_MEMORY_BACKEND": "hybrid"
1616
},
1717
"autoStart": false
18+
},
19+
"higgsfield": {
20+
"type": "http",
21+
"url": "https://mcp.higgsfield.ai/mcp",
22+
"_comment": "Unified multimodal connector for the multimodal-studio skill — image, video, and character generation across 30+ models (Soul, Flux, Seedream, Kling, Hailuo, Veo, Sora). OAuth via Higgsfield account; no API keys. Add interactively: claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp"
1823
}
1924
}
2025
}

CLAUDE.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,9 +80,12 @@ On non-Claude platforms, just describe what you want. Skills activate from conte
8080

8181
## Available Commands (35+)
8282

83-
### Creation (10)
83+
### Creation (12)
84+
8485
| Command | Description |
8586
|---------|-------------|
87+
| `/studio` | **Multimodal Studio** — end-to-end image + video + character production (Higgsfield MCP) |
88+
| `/generate-video` | Video generation: still→video or text→video (Kling/Hailuo/Veo/Sora/DoP) |
8689
| `/article-creator` | Guided blog article creation |
8790
| `/create-music` | Suno music production pipeline |
8891
| `/infogenius` | Research-grounded image generation |
@@ -139,6 +142,22 @@ User: "write a blog post about AI agents"
139142
→ Routes to: /article-creator
140143
```
141144

145+
## Multimodal Studio (v11)
146+
147+
ACOS's unified generation layer: **image + video + consistent characters** across 30+ frontier models through a single connector.
148+
149+
- **Skill:** `multimodal-studio` — model routing, visual prompt engineering, character consistency, async lifecycle, brand-locked output
150+
- **Agent:** Multimodal Director (`.claude/agents/multimodal-director.md`)
151+
- **Commands:** `/studio` (e2e pipeline), `/generate-video`, `/generate-images`, `/infogenius`
152+
- **Connector:** Higgsfield MCP (default) — one OAuth, models incl. Soul, Flux, Seedream, Kling, Hailuo, Veo, Sora
153+
154+
```bash
155+
# Connect the multimodal connector (one-time)
156+
claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp
157+
```
158+
159+
Stays **vendor-agnostic** (any MCP filling `~~image generation` / `~~video generation` works — see `CONNECTORS.md`) and **brand-locked** (assets inherit Frank DNA + active brand tokens). The differentiator vs. single-vendor agent platforms: character consistency across an entire asset set via `create_character` → reuse the character ID everywhere. See `docs/multimodal-studio.md`.
160+
142161
## v10 Safety Systems
143162

144163
### Circuit Breaker

CONNECTORS.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,18 @@ ACOS is **tool-agnostic at the skill level** — workflows describe what needs t
2222

2323
| Category | Placeholder | Default (ACOS) | Alternatives |
2424
|----------|-------------|----------------|--------------|
25-
| Image generation | `~~image generation` | nano-banana (Gemini 2.5 Flash Image) | Midjourney, Stability AI, Flux |
25+
| Image generation | `~~image generation` | **Higgsfield MCP** (Soul/Flux/Seedream, 4K) | nano-banana (Gemini 2.5 Flash Image), Midjourney, Stability AI |
26+
| Video generation | `~~video generation` | **Higgsfield MCP** (Kling/Hailuo/Veo/Sora/DoP) | Veo 3 (direct API), RunwayML |
27+
| Character consistency | `~~character` | **Higgsfield MCP** (Soul character training) | manual reference images |
2628
| Music generation | `~~music generation` | Suno AI (direct API) | Udio |
27-
| Video generation | `~~video generation` | Veo 3 (direct API) | RunwayML, Kling |
2829
| Design | `~~design` | Figma MCP | Canva |
2930

31+
> **Multimodal Studio:** image, video, and character generation are unified under one connector
32+
> (Higgsfield MCP — one OAuth, 30+ models). The `multimodal-studio` skill and `/studio` command
33+
> drive it. Connect with:
34+
> `claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp`
35+
> Skills stay vendor-agnostic — any MCP filling these categories works.
36+
3037
### Communication & publishing
3138

3239
| Category | Placeholder | Default (ACOS) | Alternatives |

0 commit comments

Comments
 (0)