Reference-to-Video

Reference-to-Video Generation

Last updated March 14, 2026

Generate a completely new video scene featuring characters from reference media.

This is different from image-to-video, which animates an existing image. With reference-to-video, the reference images only show the model what your characters look like. They don't become the video content. Instead, your prompt describes a completely new scene, and the model generates that scene from scratch with your characters in it.

For example, you could provide photos of a cat and a dog, then prompt "character1 and character2 have a conversation in a cafe." The model creates that cafe scene from scratch, using the reference images only to understand what the characters look like.

Wan

Wan's reference-to-video models can incorporate multiple characters from reference media into a generated video. References must be URLs (use Vercel Blob for local files). Use character1, character2, etc. in your prompt to refer to each reference.

Wan models

Model	Description
`alibaba/wan-v2.6-r2v`	Standard model with audio
`alibaba/wan-v2.6-r2v-flash`	Fast generation

Wan parameters

Parameter	Type	Required	Description
`prompt`	`string`	Yes	Scene description using `character1`, `character2`, etc. to reference each character
`resolution`	`string`	No	`'1280x720'` or `'1920x1080'`
`duration`	`number`	No	2-10 seconds
`providerOptions.alibaba.referenceUrls`	`string[]`	Yes	Array of URLs to reference images or videos. The first URL maps to `character1`, the second to `character2`, and so on. Supports 0-5 images and 0-3 videos, max 5 total
`providerOptions.alibaba.audio`	`boolean`	No	Generate audio. Standard models default to `true`, flash models default to `false`
`providerOptions.alibaba.negativePrompt`	`string`	No	What to avoid in the video. Max 500 characters
`providerOptions.alibaba.shotType`	`'single'` \| `'multi'`	No	`'single'` for continuous shot. `'multi'` for multiple camera angles
`providerOptions.alibaba.watermark`	`boolean`	No	Add watermark to the video. Defaults to `false`
`providerOptions.alibaba.pollIntervalMs`	`number`	No	How often to check task status. Defaults to `5000`
`providerOptions.alibaba.pollTimeoutMs`	`number`	No	Maximum wait time. Defaults to `600000` (10 minutes)

Wan example

reference-to-video.ts

import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
 
const result = await generateVideo({
  model: 'alibaba/wan-v2.6-r2v',
  prompt: 'character1 and character2 have a friendly conversation in a cozy cafe',
  resolution: '1920x1080',
  duration: 4,
  providerOptions: {
    alibaba: {
      // References can be images or videos
      referenceUrls: [
        'https://example.com/cat.png',
        'https://example.com/dog.png',
      ],
      shotType: 'single',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Wan example with Vercel Blob

If you have local files, upload them to Vercel Blob first:

reference-to-video-with-blob.ts

import { experimental_generateVideo as generateVideo } from 'ai';
import { put } from '@vercel/blob';
import fs from 'node:fs';
 
const catImage = fs.readFileSync('./cat.png');
const { url: catUrl } = await put('cat.png', catImage, { access: 'public' });
 
const dogImage = fs.readFileSync('./dog.png');
const { url: dogUrl } = await put('dog.png', dogImage, { access: 'public' });
 
const result = await generateVideo({
  model: 'alibaba/wan-v2.6-r2v',
  prompt: 'character1 and character2 play together in a sunny garden',
  resolution: '1280x720',
  duration: 4,
  providerOptions: {
    alibaba: {
      referenceUrls: [catUrl, dogUrl],
      shotType: 'single',
    },
  },
});
 
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);

Image-to-Video

Motion Control

Was this helpful?

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users