Vision-to-Code Workflows

Vision-to-code capabilities enable generating implementation directly from visual specifications. This guide covers workflows for mockup-to-code, screenshot-to-code, and video-to-code transformations while maintaining quality, accessibility, and IP compliance.

Prerequisites

Before implementing vision-to-code workflows, review PRD-STD-018: Multi-Modal AI Governance for mandatory requirements.

Capabilities Overview

What Vision-to-Code Can Do

Input Type	Output	Tools
UI Mockups	React/Vue/Angular components	Kimi, Gemini
Screenshots	HTML/CSS reproduction	Kimi, Gemini, Claude
Wireframes	Structured layout code	Kimi, Gemini
Video Walkthroughs	Full page/application	Kimi
Design Systems	Component library	Kimi, Gemini
Hand Sketches	Digital implementation	Kimi

Tool Capabilities Comparison

Capability	Kimi K2.5	Gemini 2.5	Claude 4.5
Native multimodal	Yes	Yes	Limited
Video understanding	Yes	Yes	No
Autonomous visual debugging	Yes	No	No
Code from wireframes	Yes	Yes	Limited
Responsive generation	Yes	Yes	Yes
Animation extraction	Yes	Limited	No

Core Workflows

Workflow 1: Mockup to Component

Use Case: Convert Figma/Sketch designs to React components

Step 1: Input Preparation

✓ Export mockup as PNG (high resolution, 2x if possible)
✓ Export individual assets (icons, images) separately
✓ Note typography specifications
✓ Document color palette (hex codes)
✓ Verify design system tokens if applicable

Step 2: Prompt Engineering

## Context
Design System: Material-UI v5
Tech Stack: React, TypeScript, Tailwind CSS
Target: Reusable component

## Visual Input
[Attach mockup.png]

## Requirements
- Implement as React functional component
- Use TypeScript with proper interfaces
- Match visual design precisely (pixel-perfect)
- Implement responsive breakpoints: sm, md, lg
- Ensure WCAG 2.1 AA accessibility
- Add Storybook stories
- Include unit tests with React Testing Library

## Constraints
- Use design system components where available
- Don't include hardcoded colors (use theme)
- Optimize images for web
- Add proper ARIA labels

## Output Format
1. Component file (tsx)
2. Styles (tailwind classes)
3. Types definition
4. Storybook story
5. Test file

Step 3: Generation

kimi vision --input mockup.png --prompt workflow.md

Step 4: Verification Checklist

Check	Method	Pass Criteria
Visual fidelity	Pixel-by-pixel comparison	<2px deviation
Responsive	Resize browser	Breakpoints correct
Accessibility	axe DevTools	0 violations
Typography	Inspect element	Font, size, weight match
Colors	Color picker	Hex values match design
Spacing	Measure tool	Margin/padding match

Step 5: Refinement

# If visual diff shows issues
kimi vision --input mockup.png --input current-implementation.png \
  --prompt "Fix: button padding should be 16px not 12px"

Workflow 2: Screenshot to Code

Use Case: Reproduce existing UI (competitor analysis, legacy system migration)

⚠️ IP Warning: Only use for systems you own or have explicit permission to reproduce.

Step 1: Input Preparation

✓ Screenshot target UI (no PII or sensitive data)
✓ Capture at multiple viewport sizes
✓ Note interactive states (hover, active)
✓ Document color scheme
✓ Identify fonts used

Step 2: Generation with IP Safety

## Task
Create a dashboard layout with the following visual structure:
[Attach screenshot for structural reference only]

## IP Compliance
- Do NOT copy specific icons - use generic equivalents
- Do NOT reproduce proprietary graphics
- Do NOT use exact color values from screenshot
- Use original content only
- Structure reference only, not visual copying

## Output
Clean implementation using:
- Heroicons for icons
- Standard Tailwind color palette
- System fonts or licensed fonts only

Step 3: Legal Review Checkpoint

No copyrighted images reproduced
No trademarked logos included
Color scheme sufficiently distinct
Typography uses licensed fonts
Layout structure is generic pattern

Workflow 3: Video to Application

Use Case: Reconstruct full application from video walkthrough

Step 1: Video Preparation

✓ Extract keyframes at state changes
✓ Document navigation flows
✓ Note animation timings
✓ Identify data flows
✓ Timestamp important interactions

Step 2: Staged Generation

# Stage 1: Extract keyframes
ffmpeg -i walkthrough.mp4 -vf "fps=1,scale=1920:-1" keyframes/%04d.png

# Stage 2: Generate page structure from keyframes
kimi vision --input keyframes/ --mode thinking \
  "Identify all pages and navigation structure"

# Stage 3: Implement each page
for frame in keyframes/*.png; do
  kimi vision --input $frame \
    --prompt "Implement this page in Next.js"
done

# Stage 4: Connect navigation
kimi --mode agent \
  "Wire up all pages with proper routing"

Step 3: Animation Reconstruction

## Animation Specifications
From video analysis:
- Page transition: 300ms ease-in-out
- Button hover: scale(1.05), 150ms
- Modal open: fade + slide up, 250ms
- Loading skeleton: pulse animation, 1.5s loop

## Implementation
Use Framer Motion for React animations matching these specifications.

Workflow 4: Design System Generation

Use Case: Generate component library from design system documentation

Step 1: Design System Documentation

✓ Component gallery image
✓ Token specifications (colors, typography, spacing)
✓ Usage examples
✓ Do/don't guidelines

Step 2: Token Extraction

kimi vision --input design-system.png \
  --prompt "Extract all design tokens: colors, typography, spacing, shadows"

Step 3: Component Generation

# Generate base components
kimi vision --input button-examples.png \
  --prompt "Generate Button component with all variants"

kimi vision --input input-examples.png \
  --prompt "Generate Input component with all states"

# Continue for all components...

Step 4: Documentation Generation

kimi --mode agent \
  --input components/ \
  --prompt "Generate Storybook documentation for all components"

Quality Assurance

Visual Diff Testing

// Using Playwright for visual regression
test('component matches design', async ({ page }) => {
  await page.goto('/component');
  await expect(page).toHaveScreenshot('component.png', {
    threshold: 0.1 // 0.1% pixel difference allowed
  });
});

Accessibility Audit

# Automated accessibility check
axe-core --url http://localhost:3000 --format json

# Manual checklist
✓ Color contrast ratio ≥ 4.5:1
✓ Focus indicators visible
✓ ARIA labels present
✓ Keyboard navigation works
✓ Screen reader compatible

Responsive Verification

# Test at multiple viewports
for width in 375 768 1024 1440 1920; do
  playwright test --viewport="${width}x800"
done

Common Pitfalls and Solutions

Pitfall 1: Hardcoded Values

Problem: Generated code uses hardcoded colors/sizes instead of design tokens.

Solution:

## Constraint (add to prompt)
- Use design system tokens ONLY
- Reference theme.colors.primary not #3B82F6
- Use spacing scale: 4, 8, 12, 16, 24, 32, 48

Pitfall 2: Missing Responsive Behavior

Problem: Component works at one size only.

Solution:

## Responsive Requirements
- Mobile (<640px): Stack layout, full-width buttons
- Tablet (640-1024px): Side-by-side layout
- Desktop (>1024px): Full layout with max-width container

Pitfall 3: Accessibility Oversight

Problem: Missing ARIA labels, poor contrast, no keyboard support.

Solution:

## Accessibility Requirements
- All interactive elements keyboard accessible
- ARIA labels for icon-only buttons
- Color contrast WCAG AA compliant
- Focus management for modals
- Screen reader announcements for state changes

Pitfall 4: Asset Management

Problem: Generated code references missing images.

Solution:

## Asset Handling
- Use placeholder images from placehold.co
- Mark image sources with TODO comments
- Provide image dimensions for layout stability
- Use Next.js Image component with proper sizing

IP and Legal Compliance

Pre-Generation Checklist

Do you own the visual design or have license?
Are fonts properly licensed?
Are images stock or original?
Does reproduction stay within fair use?
Is this for competitive analysis (verify legality)?

Post-Generation Checklist

No copyrighted images in output
No trademarked logos reproduced
Color scheme distinct enough
Typography uses licensed fonts only
Documentation of visual source

Documentation Template

## Visual Source Documentation
- **Source:** [Figma file / Screenshot / Video]
- **License:** [Owned / Licensed / Public domain]
- **Assets:** [List of extracted assets and licenses]
- **Attribution:** [If required]
- **Generated:** [Date, Tool version]
- **Reviewed by:** [Name, Role]

Tool-Specific Tips

Kimi K2.5

Strengths: Native multimodal, autonomous visual debugging

Best Practices:

# Use Thinking mode for complex designs
kimi --mode thinking vision --input mockup.png

# Enable visual debugging for refinement
kimi --mode agent --visual-debug \
  --input mockup.png --input draft-implementation.png \
  "Fix visual discrepancies"

# Use Agent Swarm for design system
kimi --mode swarm \
  --input component-gallery.png \
  "Generate all components in parallel"

Gemini 2.5

Strengths: 1M context, Google ecosystem integration

Best Practices:

# Include entire design system in context
gemini vision --input design-system.pdf \
  --context tokens 1000000 \
  "Generate components following this system"

Claude (Limited Vision)

Strengths: Reasoning about visual content

Best Practices:

# Use for analysis rather than generation
claude vision --input mockup.png \
  "Describe the layout structure, color palette, and components"

# Then use text-based generation
claude "Implement the described component"

Cost Optimization

Resolution Strategy

Use Case	Resolution	Reason
Layout structure	1024px width	Sufficient for structure, lower cost
Component details	1920px width	Need fine details for pixel-matching
Color extraction	Original	Accurate color sampling
Typography	2x resolution	Sharp text for font identification

Token Budgeting

# Estimate vision token usage
kimi vision --estimate --input mockup.png
# Output: ~5000 vision tokens

# Batch similar components
kimi vision --batch component-mockups/ \
  --shared-prompt "Generate React components"

Integration with AEEF Standards

PRD-STD-001: Prompt Engineering

Vision prompts follow CRAFT framework:

Context: Design system, tech stack
Requirements: Visual fidelity, responsive, accessible
Assumptions: Asset availability, license status
Format: Component files, stories, tests
Tests: Visual diff, accessibility audit

PRD-STD-002: Code Review

Vision-generated code requires:

Visual diff verification
Designer review for fidelity
Accessibility audit
IP clearance documentation

Mandatory compliance:

Visual input provenance logged
IP clearance verified
Accessibility requirements met
Audit trail maintained

Capabilities Overview​

What Vision-to-Code Can Do​

Tool Capabilities Comparison​

Core Workflows​

Workflow 1: Mockup to Component​

Workflow 2: Screenshot to Code​

Workflow 3: Video to Application​

Workflow 4: Design System Generation​

Quality Assurance​

Visual Diff Testing​

Accessibility Audit​

Responsive Verification​

Common Pitfalls and Solutions​

Pitfall 1: Hardcoded Values​

Pitfall 2: Missing Responsive Behavior​

Pitfall 3: Accessibility Oversight​

Pitfall 4: Asset Management​

IP and Legal Compliance​

Pre-Generation Checklist​

Post-Generation Checklist​

Documentation Template​

Tool-Specific Tips​

Kimi K2.5​

Gemini 2.5​

Claude (Limited Vision)​

Cost Optimization​

Resolution Strategy​

Token Budgeting​

Integration with AEEF Standards​

PRD-STD-001: Prompt Engineering​

PRD-STD-002: Code Review​

PRD-STD-018: Multi-Modal Governance​

Related Resources​

Capabilities Overview

What Vision-to-Code Can Do

Tool Capabilities Comparison

Core Workflows

Workflow 1: Mockup to Component

Workflow 2: Screenshot to Code

Workflow 3: Video to Application

Workflow 4: Design System Generation

Quality Assurance

Visual Diff Testing

Accessibility Audit

Responsive Verification

Common Pitfalls and Solutions

Pitfall 1: Hardcoded Values

Pitfall 2: Missing Responsive Behavior

Pitfall 3: Accessibility Oversight

Pitfall 4: Asset Management

IP and Legal Compliance

Pre-Generation Checklist

Post-Generation Checklist

Documentation Template

Tool-Specific Tips

Kimi K2.5

Gemini 2.5

Claude (Limited Vision)

Cost Optimization

Resolution Strategy

Token Budgeting

Integration with AEEF Standards

PRD-STD-001: Prompt Engineering

PRD-STD-002: Code Review

PRD-STD-018: Multi-Modal Governance

Related Resources