Reference-to-Video Generation
Generate a completely new video scene featuring characters from reference media.
This is different from image-to-video, which animates an existing image. With reference-to-video, the reference images only show the model what your characters look like. They don't become the video content. Instead, your prompt describes a completely new scene, and the model generates that scene from scratch with your characters in it.
For example, you could provide photos of a cat and a dog, then prompt "character1 and character2 have a conversation in a cafe." The model creates that cafe scene from scratch, using the reference images only to understand what the characters look like.
Wan's reference-to-video models can incorporate multiple characters from reference media into a generated video. References must be URLs (use Vercel Blob for local files). Use character1, character2, etc. in your prompt to refer to each reference.
| Model | Description |
|---|---|
alibaba/wan-v2.6-r2v | Standard model with audio |
alibaba/wan-v2.6-r2v-flash | Fast generation |
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Scene description using character1, character2, etc. to reference each character |
resolution | string | No | '1280x720' or '1920x1080' |
duration | number | No | 2-10 seconds |
providerOptions.alibaba.referenceUrls | string[] | Yes | Array of URLs to reference images or videos. The first URL maps to character1, the second to character2, and so on. Supports 0-5 images and 0-3 videos, max 5 total |
providerOptions.alibaba.audio | boolean | No | Generate audio. Standard models default to true, flash models default to false |
providerOptions.alibaba.negativePrompt | string | No | What to avoid in the video. Max 500 characters |
providerOptions.alibaba.shotType | 'single' | 'multi' | No | 'single' for continuous shot. 'multi' for multiple camera angles |
providerOptions.alibaba.watermark | boolean | No | Add watermark to the video. Defaults to false |
providerOptions.alibaba.pollIntervalMs | number | No | How often to check task status. Defaults to 5000 |
providerOptions.alibaba.pollTimeoutMs | number | No | Maximum wait time. Defaults to 600000 (10 minutes) |
import { experimental_generateVideo as generateVideo } from 'ai';
import fs from 'node:fs';
const result = await generateVideo({
model: 'alibaba/wan-v2.6-r2v',
prompt: 'character1 and character2 have a friendly conversation in a cozy cafe',
resolution: '1920x1080',
duration: 4,
providerOptions: {
alibaba: {
// References can be images or videos
referenceUrls: [
'https://example.com/cat.png',
'https://example.com/dog.png',
],
shotType: 'single',
},
},
});
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);If you have local files, upload them to Vercel Blob first:
import { experimental_generateVideo as generateVideo } from 'ai';
import { put } from '@vercel/blob';
import fs from 'node:fs';
const catImage = fs.readFileSync('./cat.png');
const { url: catUrl } = await put('cat.png', catImage, { access: 'public' });
const dogImage = fs.readFileSync('./dog.png');
const { url: dogUrl } = await put('dog.png', dogImage, { access: 'public' });
const result = await generateVideo({
model: 'alibaba/wan-v2.6-r2v',
prompt: 'character1 and character2 play together in a sunny garden',
resolution: '1280x720',
duration: 4,
providerOptions: {
alibaba: {
referenceUrls: [catUrl, dogUrl],
shotType: 'single',
},
},
});
fs.writeFileSync('output.mp4', result.videos[0].uint8Array);Was this helpful?