| Model | Total Score | I2V Score | Quality Score | CM | I2V SC | I2V BC | SC | BC | MS | DD | AQ | IQ | TF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DynamiCrafter-256 | 84.35 | 91.29 | 77.42 | 22.18 | 95.40 | 96.22 | 94.60 | 98.30 | 97.82 | 38.69 | 59.40 | 62.29 | 97.03 |
| SEINE | 82.12 | 88.60 | 75.64 | 15.91 | 93.45 | 94.21 | 93.94 | 97.01 | 96.20 | 24.55 | 56.55 | 70.52 | 95.07 |
| SEINE-512x320* | 83.49 | 89.62 | 77.37 | 23.36 | 94.85 | 94.02 | 94.20 | 97.26 | 96.68 | 34.31 | 58.42 | 70.97 | 96.72 |
| SparseCtrl | 80.34 | 85.13 | 75.54 | 25.82 | 88.39 | 92.46 | 85.08 | 93.81 | 94.25 | 81.95 | 49.88 | 69.35 | 91.78 |
| ConsistI2V* | 83.30 | 90.38 | 76.22 | 33.60 | 94.69 | 94.57 | 95.27 | 98.28 | 97.38 | 18.62 | 59.00 | 66.92 | 97.56 |
| FrameBridge-VideoCrafter | 85.37 | 92.83 | 77.92 | 30.72 | 96.24 | 97.25 | 94.63 | 98.92 | 98.51 | 35.77 | 59.38 | 63.28 | 98.01 |
| FrameBridge-CogVideoX | 85.93 | 95.22 | 76.65 | 92.06 | 95.42 | 97.13 | 93.60 | 98.62 | 97.57 | 48.29 | 54.28 | 60.00 | 96.61 |
| Model | DD |
|---|---|
| DynamiCrafter-256 | 38.69 |
| FrameBridge-VideoCrafter | 35.77 |
| FrameBridge-VideoCrafter-FrameStride | 46.26 |
| FrameBridge-VideoCrafter-NoisyCondition | 48.62 |
| Model | Base Model | FVD | CLIPSIM | PIC |
|---|---|---|---|---|
| Diffusion (i.e., DynamiCrafter) | VideoCrafter | 192 | 0.2245 | 0.6131 |
| FrameBridge w.o.SAF | VideoCrafter | 299 | 0.2246 | 0.5559 |
| FrameBridge w.SAF | VideoCrafter | 99 | 0.2250 | 0.6963 |
| Model | Base Model | FVD | CLIPSIM | PIC |
|---|---|---|---|---|
| Diffusion | CogVideoX | 118 | 0.2250 | 0.7659 |
| FrameBridge w.o.SAF | CogVideoX | 178 | 0.2250 | 0.7104 |
| FrameBridge w.SAF | CogVideoX | 107 | 0.2250 | 0.7731 |
| Model | FVD ↓ | CLIPSIM ↑ | PIC ↑ | VBench-I2V Total Score ↑ |
|---|---|---|---|---|
| Diffusion | 171.25 | 0.2251 | 0.7231 | 82.98 |
| FrameBridge | 147.73 | 0.2251 | 0.7586 | 84.01 |
| Static Image: | Text Condition: | ||
|
a bridge that is in the middle of a river, camera zooms out | ||
| FrameBridge-CogVideoX (DD=48.29, Total Score=85.93): | DynamiCrafter (DD=38.69, Total Score=84.35): | ConsistI2V (DD=18.62, Total Score=83.30): | |
| SparseCtrl (DD=81.95, Total Score=80.34): | |||
| Static Image: | Text Condition: | ||
|
leaves blown off by the wind | ||
| FrameBridge-VideoCrafter: | DynamiCrafter: | FrameBridge-VideoCrafter-MI: | |
| Static Image: | Text Condition: | ||
|
a blue car driving down a dirt road near train tracks | ||
| FrameBridge-VideoCrafter: | DynamiCrafter: | FrameBridge-VideoCrafter-MI: | |
| Sample 1: "camera zoom-in, close look at the apple" | Sample 2: "fireworks in the night sky over a city" | ||
| Condition Image: | I2V Result: | Condition Image: | I2V Result: |
|
|
||
| Sample 3: "bird flying off the tree" | Sample 4: "a castle on top of a hill covered in snow, camera pans left" | ||
| Condition Image: | I2V Result: | Condition Image: | I2V Result: |
|
|
||
| Sample 5: "the table is rotating" | Sample 6: "a great white shark swimming in the ocean" | ||
| Condition Image: | I2V Result: | Condition Image: | I2V Result: |
|
|
||
| Model | MSR-VTT | UCF-101 | |||||
|---|---|---|---|---|---|---|---|
| FVD ↓ | CLIPSIM ↑ | PIC ↑ | FVD ↓ | IS ↑ | PIC ↑ | ||
| Coupling Flow-Matching, σ = 0 | 1047 | 0.2249 | 0.4484 | 2066 | 14.87 | 0.4275 | |
| Coupling Flow-Matching, σ = 1 | 110 | 0.2249 | 0.6936 | 342 | 40.81 | 0.6419 | |
| Vanilla Flow-Matching | 204 | 0.2249 | 0.5701 | 370 | 36.27 | 0.6070 | |
| Diffusion (i.e., DynamiCrafter) | 192 | 0.2245 | 0.6163 | 485 | 29.46 | 0.6266 | |
| FrameBridge | 99 | 0.2250 | 0.6963 | 312 | 39.89 | 0.6697 | |