Vision-to-Code Workflows
Vision-to-code capabilities enable generating implementation directly from visual specifications. This guide covers workflows for mockup-to-code, screenshot-to-code, and video-to-code transformations while maintaining quality, accessibility, and IP compliance.
Before implementing vision-to-code workflows, review PRD-STD-018: Multi-Modal AI Governance for mandatory requirements.
Capabilities Overview
What Vision-to-Code Can Do
| Input Type | Output | Tools |
|---|---|---|
| UI Mockups | React/Vue/Angular components | Kimi, Gemini |
| Screenshots | HTML/CSS reproduction | Kimi, Gemini, Claude |
| Wireframes | Structured layout code | Kimi, Gemini |
| Video Walkthroughs | Full page/application | Kimi |
| Design Systems | Component library | Kimi, Gemini |
| Hand Sketches | Digital implementation | Kimi |
Tool Capabilities Comparison
| Capability | Kimi K2.5 | Gemini 2.5 | Claude 4.5 |
|---|---|---|---|
| Native multimodal | Yes | Yes | Limited |
| Video understanding | Yes | Yes | No |
| Autonomous visual debugging | Yes | No | No |
| Code from wireframes | Yes | Yes | Limited |
| Responsive generation | Yes | Yes | Yes |
| Animation extraction | Yes | Limited | No |
Core Workflows
Workflow 1: Mockup to Component
Use Case: Convert Figma/Sketch designs to React components
Step 1: Input Preparation
✓ Export mockup as PNG (high resolution, 2x if possible)
✓ Export individual assets (icons, images) separately
✓ Note typography specifications
✓ Document color palette (hex codes)
✓ Verify design system tokens if applicable
Step 2: Prompt Engineering
## Context
Design System: Material-UI v5
Tech Stack: React, TypeScript, Tailwind CSS
Target: Reusable component
## Visual Input
[Attach mockup.png]
## Requirements
- Implement as React functional component
- Use TypeScript with proper interfaces
- Match visual design precisely (pixel-perfect)
- Implement responsive breakpoints: sm, md, lg
- Ensure WCAG 2.1 AA accessibility
- Add Storybook stories
- Include unit tests with React Testing Library
## Constraints
- Use design system components where available
- Don't include hardcoded colors (use theme)
- Optimize images for web
- Add proper ARIA labels
## Output Format
1. Component file (tsx)
2. Styles (tailwind classes)
3. Types definition
4. Storybook story
5. Test file
Step 3: Generation
kimi vision --input mockup.png --prompt workflow.md
Step 4: Verification Checklist
| Check | Method | Pass Criteria |
|---|---|---|
| Visual fidelity | Pixel-by-pixel comparison | <2px deviation |
| Responsive | Resize browser | Breakpoints correct |
| Accessibility | axe DevTools | 0 violations |
| Typography | Inspect element | Font, size, weight match |
| Colors | Color picker | Hex values match design |
| Spacing | Measure tool | Margin/padding match |
Step 5: Refinement
# If visual diff shows issues
kimi vision --input mockup.png --input current-implementation.png \
--prompt "Fix: button padding should be 16px not 12px"
Workflow 2: Screenshot to Code
Use Case: Reproduce existing UI (competitor analysis, legacy system migration)
⚠️ IP Warning: Only use for systems you own or have explicit permission to reproduce.
Step 1: Input Preparation
✓ Screenshot target UI (no PII or sensitive data)
✓ Capture at multiple viewport sizes
✓ Note interactive states (hover, active)
✓ Document color scheme
✓ Identify fonts used
Step 2: Generation with IP Safety
## Task
Create a dashboard layout with the following visual structure:
[Attach screenshot for structural reference only]
## IP Compliance
- Do NOT copy specific icons - use generic equivalents
- Do NOT reproduce proprietary graphics
- Do NOT use exact color values from screenshot
- Use original content only
- Structure reference only, not visual copying
## Output
Clean implementation using:
- Heroicons for icons
- Standard Tailwind color palette
- System fonts or licensed fonts only
Step 3: Legal Review Checkpoint
- No copyrighted images reproduced
- No trademarked logos included
- Color scheme sufficiently distinct
- Typography uses licensed fonts
- Layout structure is generic pattern
Workflow 3: Video to Application
Use Case: Reconstruct full application from video walkthrough
Step 1: Video Preparation
✓ Extract keyframes at state changes
✓ Document navigation flows
✓ Note animation timings
✓ Identify data flows
✓ Timestamp important interactions
Step 2: Staged Generation
# Stage 1: Extract keyframes
ffmpeg -i walkthrough.mp4 -vf "fps=1,scale=1920:-1" keyframes/%04d.png
# Stage 2: Generate page structure from keyframes
kimi vision --input keyframes/ --mode thinking \
"Identify all pages and navigation structure"
# Stage 3: Implement each page
for frame in keyframes/*.png; do
kimi vision --input $frame \
--prompt "Implement this page in Next.js"
done
# Stage 4: Connect navigation
kimi --mode agent \
"Wire up all pages with proper routing"
Step 3: Animation Reconstruction
## Animation Specifications
From video analysis:
- Page transition: 300ms ease-in-out
- Button hover: scale(1.05), 150ms
- Modal open: fade + slide up, 250ms
- Loading skeleton: pulse animation, 1.5s loop
## Implementation
Use Framer Motion for React animations matching these specifications.
Workflow 4: Design System Generation
Use Case: Generate component library from design system documentation
Step 1: Design System Documentation
✓ Component gallery image
✓ Token specifications (colors, typography, spacing)
✓ Usage examples
✓ Do/don't guidelines
Step 2: Token Extraction
kimi vision --input design-system.png \
--prompt "Extract all design tokens: colors, typography, spacing, shadows"
Step 3: Component Generation
# Generate base components
kimi vision --input button-examples.png \
--prompt "Generate Button component with all variants"
kimi vision --input input-examples.png \
--prompt "Generate Input component with all states"
# Continue for all components...
Step 4: Documentation Generation
kimi --mode agent \
--input components/ \
--prompt "Generate Storybook documentation for all components"
Quality Assurance
Visual Diff Testing
// Using Playwright for visual regression
test('component matches design', async ({ page }) => {
await page.goto('/component');
await expect(page).toHaveScreenshot('component.png', {
threshold: 0.1 // 0.1% pixel difference allowed
});
});
Accessibility Audit
# Automated accessibility check
axe-core --url http://localhost:3000 --format json
# Manual checklist
✓ Color contrast ratio ≥ 4.5:1
✓ Focus indicators visible
✓ ARIA labels present
✓ Keyboard navigation works
✓ Screen reader compatible
Responsive Verification
# Test at multiple viewports
for width in 375 768 1024 1440 1920; do
playwright test --viewport="${width}x800"
done
Common Pitfalls and Solutions
Pitfall 1: Hardcoded Values
Problem: Generated code uses hardcoded colors/sizes instead of design tokens.
Solution:
## Constraint (add to prompt)
- Use design system tokens ONLY
- Reference theme.colors.primary not #3B82F6
- Use spacing scale: 4, 8, 12, 16, 24, 32, 48
Pitfall 2: Missing Responsive Behavior
Problem: Component works at one size only.
Solution:
## Responsive Requirements
- Mobile (<640px): Stack layout, full-width buttons
- Tablet (640-1024px): Side-by-side layout
- Desktop (>1024px): Full layout with max-width container
Pitfall 3: Accessibility Oversight
Problem: Missing ARIA labels, poor contrast, no keyboard support.
Solution:
## Accessibility Requirements
- All interactive elements keyboard accessible
- ARIA labels for icon-only buttons
- Color contrast WCAG AA compliant
- Focus management for modals
- Screen reader announcements for state changes
Pitfall 4: Asset Management
Problem: Generated code references missing images.
Solution:
## Asset Handling
- Use placeholder images from placehold.co
- Mark image sources with TODO comments
- Provide image dimensions for layout stability
- Use Next.js Image component with proper sizing
IP and Legal Compliance
Pre-Generation Checklist
- Do you own the visual design or have license?
- Are fonts properly licensed?
- Are images stock or original?
- Does reproduction stay within fair use?
- Is this for competitive analysis (verify legality)?
Post-Generation Checklist
- No copyrighted images in output
- No trademarked logos reproduced
- Color scheme distinct enough
- Typography uses licensed fonts only
- Documentation of visual source
Documentation Template
## Visual Source Documentation
- **Source:** [Figma file / Screenshot / Video]
- **License:** [Owned / Licensed / Public domain]
- **Assets:** [List of extracted assets and licenses]
- **Attribution:** [If required]
- **Generated:** [Date, Tool version]
- **Reviewed by:** [Name, Role]
Tool-Specific Tips
Kimi K2.5
Strengths: Native multimodal, autonomous visual debugging
Best Practices:
# Use Thinking mode for complex designs
kimi --mode thinking vision --input mockup.png
# Enable visual debugging for refinement
kimi --mode agent --visual-debug \
--input mockup.png --input draft-implementation.png \
"Fix visual discrepancies"
# Use Agent Swarm for design system
kimi --mode swarm \
--input component-gallery.png \
"Generate all components in parallel"
Gemini 2.5
Strengths: 1M context, Google ecosystem integration
Best Practices:
# Include entire design system in context
gemini vision --input design-system.pdf \
--context tokens 1000000 \
"Generate components following this system"
Claude (Limited Vision)
Strengths: Reasoning about visual content
Best Practices:
# Use for analysis rather than generation
claude vision --input mockup.png \
"Describe the layout structure, color palette, and components"
# Then use text-based generation
claude "Implement the described component"
Cost Optimization
Resolution Strategy
| Use Case | Resolution | Reason |
|---|---|---|
| Layout structure | 1024px width | Sufficient for structure, lower cost |
| Component details | 1920px width | Need fine details for pixel-matching |
| Color extraction | Original | Accurate color sampling |
| Typography | 2x resolution | Sharp text for font identification |
Token Budgeting
# Estimate vision token usage
kimi vision --estimate --input mockup.png
# Output: ~5000 vision tokens
# Batch similar components
kimi vision --batch component-mockups/ \
--shared-prompt "Generate React components"
Integration with AEEF Standards
PRD-STD-001: Prompt Engineering
Vision prompts follow CRAFT framework:
- Context: Design system, tech stack
- Requirements: Visual fidelity, responsive, accessible
- Assumptions: Asset availability, license status
- Format: Component files, stories, tests
- Tests: Visual diff, accessibility audit
PRD-STD-002: Code Review
Vision-generated code requires:
- Visual diff verification
- Designer review for fidelity
- Accessibility audit
- IP clearance documentation
PRD-STD-018: Multi-Modal Governance
Mandatory compliance:
- Visual input provenance logged
- IP clearance verified
- Accessibility requirements met
- Audit trail maintained