VercelVercel
Menu

Image-to-Video Generation

Last updated February 22, 2026

Animate a static image into a video. The image you provide becomes the video content itself - you're adding motion to that exact scene.

This is different from reference-to-video, where reference images show the model what characters look like, but the video is a completely new scene.

Google's Veo models support image-to-video generation, animating a starting image into a video.

ModelDescription
google/veo-3.1-generate-001Latest model with audio
google/veo-3.1-fast-generate-001Fast generation
google/veo-3.0-generate-001Previous generation, 1080p max
google/veo-3.0-fast-generate-001Faster generation, lower quality
ParameterTypeRequiredDescription
prompt.imagestringYesURL or base64-encoded image to animate
prompt.textstringNoDescription of the motion or animation
duration4 | 6 | 8NoVideo length in seconds. Defaults to 8
resolutionstringNoResolution ('720p', '1080p'). Defaults to '720p'
providerOptions.vertex.generateAudiobooleanNoGenerate audio alongside the video
providerOptions.vertex.resizeMode'pad' | 'crop'NoHow to resize the image to fit video dimensions. Defaults to 'pad'
providerOptions.vertex.enhancePromptbooleanNoUse Gemini to enhance prompts. Defaults to true
providerOptions.vertex.negativePromptstringNoWhat to discourage in the generated video
providerOptions.vertex.personGeneration'dont_allow' | 'allow_adult' | 'allow_all'NoWhether to allow person generation. Defaults to 'allow_adult'
providerOptions.vertex.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000
providerOptions.vertex.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes)
veo-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'google/veo-3.1-generate-001',
  prompt: {
    image: 'https://example.com/landscape.png',
    text: 'Camera slowly pans across the scene as clouds drift by',
  },
  resolution: '1080p',
  providerOptions: {
    vertex: {
      resizeMode: 'crop',
      generateAudio: true,
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

KlingAI's image-to-video models animate images with standard or professional quality modes.

ModelDescription
klingai/kling-v3.0-i2vMulti-shot generation, 15s clips, enhanced consistency
klingai/kling-v2.6-i2vAudio-visual co-generation, cinematic motion
klingai/kling-v2.5-turbo-i2vFaster generation, lower cost
ParameterTypeRequiredDescription
prompt.imagestring | BufferYesThe image to animate. See image requirements below.
prompt.textstringNoDescription of the motion. Max 2500 characters.
durationnumberNoVideo length in seconds. 5 or 10 for v2.x, 3-15 for v3.0. Defaults to 5.
providerOptions.klingai.mode'std' | 'pro'No'std' for standard quality. 'pro' for professional quality. Defaults to 'std'.
providerOptions.klingai.negativePromptstringNoWhat to avoid in the video. Max 2500 characters.
providerOptions.klingai.cfgScalenumberNoPrompt adherence (0-1). Higher = stricter. Defaults to 0.5. Not supported on v2.x.
providerOptions.klingai.sound'on' | 'off'NoGenerate audio. Defaults to 'off'. Requires v2.6+.
providerOptions.klingai.voiceListarrayNoVoice IDs for speech. Max 2 voices. Requires v3.0+ with sound: 'on'. Cannot coexist with elementList. See voice generation.
providerOptions.klingai.multiShotbooleanNoEnable multi-shot generation. Requires v3.0+. See multi-shot.
providerOptions.klingai.elementListarrayNoReference elements for element control. Up to 3 elements. Requires v3.0+. Cannot coexist with voiceList.
providerOptions.klingai.watermarkInfoobjectNoSet { enabled: true } to generate watermarked result.
providerOptions.klingai.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000.
providerOptions.klingai.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes).

The input image (prompt.image) must meet these requirements:

  • Formats: .jpg, .jpeg, .png
  • File size: 10MB or less
  • Dimensions: Minimum 300px
  • Aspect ratio: Between 1:2.5 and 2.5:1

When using base64 encoding, submit only the raw base64 string without any prefix:

// Correct
const image = 'iVBORw0KGgoAAAANSUhEUgAAAAUA...';
 
// Incorrect - do not include data: prefix
const image = '...';
klingai-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat slowly turns its head and blinks',
  },
  duration: 5,
  providerOptions: {
    klingai: {
      mode: 'std',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Generate a video that transitions between a starting and ending image. The model interpolates the motion between the two frames.

ParameterTypeRequiredDescription
prompt.imagestring | BufferYesThe first frame (starting image).
providerOptions.klingai.imageTailstring | BufferYesThe last frame (ending image). Same format requirements as prompt.image.

When using imageTail, the following features are mutually exclusive and cannot be combined:

  • First/last frame (image + imageTail)
  • Motion brush (dynamicMasks / staticMask)
  • Camera control (cameraControl)
first-last-frame.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const firstFrame = fs.readFileSync('start.png');
const lastFrame = fs.readFileSync('end.png');
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: firstFrame,
    text: 'Smooth transition between the two scenes',
  },
  providerOptions: {
    klingai: {
      imageTail: lastFrame,
      mode: 'pro',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Add speech to your video using voice IDs. Requires v2.6+ models with sound: 'on'.

Reference voices in your prompt using <<<voice_1>>> syntax, where the number matches the order in voiceList:

voice-generation.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/person.png',
    text: 'The person<<<voice_1>>> says: "Hello, welcome to my channel"',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      sound: 'on',
      voiceList: [{ voiceId: 'your_voice_id' }],
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

You can use up to 2 voices per video. Voice IDs come from KlingAI's voice customization API or system preset voices.

Control camera movement during video generation. This is mutually exclusive with first/last frame and motion brush features.

ParameterTypeRequiredDescription
providerOptions.klingai.cameraControl.typestringYesCamera movement type. See options below.
providerOptions.klingai.cameraControl.configobjectNoMovement configuration. Required when type is 'simple'.

Camera movement types:

TypeDescriptionConfig required
'simple'Basic movement with one axisYes
'down_back'Camera descends and moves backwardNo
'forward_up'Camera moves forward and tilts upNo
'right_turn_forward'Rotate right then move forwardNo
'left_turn_forward'Rotate left then move forwardNo

Simple camera config options (use only one, set others to 0):

ConfigRangeDescription
horizontal[-10, 10]Camera translation along x-axis. Negative = left.
vertical[-10, 10]Camera translation along y-axis. Negative = down.
pan[-10, 10]Camera rotation around y-axis. Negative = left.
tilt[-10, 10]Camera rotation around x-axis. Negative = down.
roll[-10, 10]Camera rotation around z-axis. Negative = counter-clockwise.
zoom[-10, 10]Focal length change. Negative = narrower FOV.
camera-control.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/landscape.png',
    text: 'A serene mountain landscape',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      cameraControl: {
        type: 'simple',
        config: {
          zoom: 5,
          horizontal: 0,
          vertical: 0,
          pan: 0,
          tilt: 0,
          roll: 0,
        },
      },
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Control which parts of the image move and how using mask images. This is mutually exclusive with first/last frame and camera control features.

ParameterTypeRequiredDescription
providerOptions.klingai.staticMaskstringNoMask image for areas that should remain static.
providerOptions.klingai.dynamicMasksarrayNoArray of dynamic mask configurations (up to 6).
providerOptions.klingai.dynamicMasks[].maskstringYesMask image for areas that should move.
providerOptions.klingai.dynamicMasks[].trajectoriesarrayYesMotion path coordinates. 2-77 points for 5s video.

Mask requirements:

  • Same format as input image (.jpg, .jpeg, .png)
  • Aspect ratio must match the input image
  • All masks (staticMask and dynamicMasks[].mask) must have identical resolution

Trajectory coordinates use the bottom-left corner of the image as origin. More points create more accurate paths.

motion-brush.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v2.6-i2v',
  prompt: {
    image: 'https://example.com/scene.png',
    text: 'A ball bouncing across the scene',
  },
  providerOptions: {
    klingai: {
      mode: 'std',
      dynamicMasks: [
        {
          mask: 'https://example.com/ball-mask.png',
          trajectories: [
            { x: 100, y: 200 },
            { x: 200, y: 300 },
            { x: 300, y: 200 },
            { x: 400, y: 300 },
          ],
        },
      ],
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Generate videos with multiple storyboard shots, combining a start frame image with per-shot prompts. Requires Kling v3.0+ models.

ParameterTypeRequiredDescription
providerOptions.klingai.multiShotbooleanYesSet to true to enable multi-shot generation
providerOptions.klingai.shotTypestringNoSet to 'customize' for custom shot durations
providerOptions.klingai.multiPromptarrayYesArray of shot configurations
providerOptions.klingai.multiPrompt[].indexnumberYesShot order (starting from 1)
providerOptions.klingai.multiPrompt[].promptstringYesText description for this shot
providerOptions.klingai.multiPrompt[].durationstringYesDuration in seconds for this shot
multi-shot-i2v.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'klingai/kling-v3.0-i2v',
  prompt: {
    image: 'https://example.com/start-frame.png',
    text: '',
  },
  aspectRatio: '16:9',
  duration: 10,
  providerOptions: {
    klingai: {
      mode: 'pro',
      multiShot: true,
      shotType: 'customize',
      multiPrompt: [
        {
          index: 1,
          prompt: 'The character looks up at the sky.',
          duration: '4',
        },
        {
          index: 2,
          prompt: 'A bird flies across the frame.',
          duration: '3',
        },
        {
          index: 3,
          prompt: 'The character smiles and waves.',
          duration: '3',
        },
      ],
      sound: 'on',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Wan offers image-to-video with standard and flash variants. Both support audio generation. Wan requires image URLs (not buffers). Use Vercel Blob to host local images.

ModelDescription
alibaba/wan-v2.6-i2vStandard model with audio
alibaba/wan-v2.6-i2v-flashFast generation
ParameterTypeRequiredDescription
prompt.imagestringYesURL of the image to animate (URLs only, not buffers)
prompt.textstringYesDescription of the motion or animation
resolutionstringNo'1280x720' or '1920x1080'
durationnumberNo2-15 seconds
providerOptions.alibaba.audiobooleanNoGenerate audio. Standard models default to true, flash models default to false
providerOptions.alibaba.negativePromptstringNoWhat to avoid in the video. Max 500 characters
providerOptions.alibaba.audioUrlstringNoURL to audio file for audio-video sync (WAV/MP3, 3-30s, max 15MB)
providerOptions.alibaba.watermarkbooleanNoAdd watermark to the video. Defaults to false
providerOptions.alibaba.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000
providerOptions.alibaba.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes)
wan-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'alibaba/wan-v2.6-i2v-flash',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat waves hello and smiles',
  },
  duration: 5,
  providerOptions: {
    alibaba: {
      audio: true,
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Grok Imagine Video (by xAI) can animate images into videos. The output defaults to the input image's aspect ratio. If you specify aspectRatio, it will override this and stretch the image to the desired ratio.

ModelDurationResolution
xai/grok-imagine-video1-15s480p, 720p
ParameterTypeRequiredDescription
prompt.imagestringYesURL of the image to animate
prompt.textstringNoDescription of the motion or animation
durationnumberNoVideo length in seconds (1-15)
aspectRatiostringNoOverride the input image's aspect ratio (stretches the image)
providerOptions.xai.resolution'480p' | '720p'NoVideo resolution. Defaults to 480p
providerOptions.xai.pollIntervalMsnumberNoHow often to check task status. Defaults to 5000
providerOptions.xai.pollTimeoutMsnumberNoMaximum wait time. Defaults to 600000 (10 minutes)
grok-image-to-video.ts
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'xai/grok-imagine-video',
  prompt: {
    image: 'https://example.com/cat.png',
    text: 'The cat slowly turns its head and blinks',
  },
  duration: 5,
  providerOptions: {
    xai: {
      pollTimeoutMs: 600000,
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);


Was this helpful?

supported.