Table 1. Qualitative comparison between FrameBridge-CogVideoX and other diffusion I2V baselines. Name of pre-trained video diffusion models are specified in the brackets.
Static Image: Text Condition:
a hot-air balloon flying over a desert landscape
FrameBridge-CogVideoX (CogVideoX-2B): DynamiCrafter (VideoCrafter1):
ConsistI2V (Inflated Stable Diffusion 2.1-Base): SEINE (LaVie):
Table 2. Qualitative comparison between FrameBridge-CogVideoX and other diffusion I2V baselines. Name of pre-trained video diffusion models are specified in the brackets.
Static Image: Text Condition:
a blue fishing boat is navigating in the ocean next to a cruise ship
FrameBridge-CogVideoX (CogVideoX-2B): DynamiCrafter (VideoCrafter1):
ConsistI2V (Inflated Stable Diffusion 2.1-Base): SEINE (LaVie):
Table 3. Qualitative comparison between FrameBridge-CogVideoX and other diffusion I2V baselines. Name of pre-trained video diffusion models are specified in the brackets.
Static Image: Text Condition:
a bridge that is in the middle of a river, camera zooms out
FrameBridge-CogVideoX (CogVideoX-2B): DynamiCrafter (VideoCrafter1):
ConsistI2V (Inflated Stable Diffusion 2.1-Base): SEINE (LaVie):
Table 4. Qualitative comparison between FrameBridge-CogVideoX and other diffusion I2V baselines. Name of pre-trained video diffusion models are specified in the brackets.
Static Image: Text Condition:
a rabbit is playing the guitar
FrameBridge-CogVideoX (CogVideoX-2B): DynamiCrafter (VideoCrafter1):
ConsistI2V (Inflated Stable Diffusion 2.1-Base): SEINE (LaVie):
Table 5. Qualitative comparison between FrameBridge-CogVideoX and other diffusion I2V baselines. Name of pre-trained video diffusion models are specified in the brackets.
Static Image: Text Condition:
a building that is sitting on the side of a pond, camera pans right
FrameBridge-CogVideoX (CogVideoX-2B): DynamiCrafter (VideoCrafter1):
ConsistI2V (Inflated Stable Diffusion 2.1-Base): SEINE (LaVie):