V2C qualitative results: ours vs. DUSt3R variants
JOG3R can reconstruct 3D cameras from both real videos and generated videos.
The estimated cameras are better than pretrained DUSt3R and on-par with finetuned DUSt3R on RealEstate10k.
Input: real videos (Figure 4 in the main paper)
input video
|
ground truth camera trajectory
|
our camera trajectory
|
pretrained DUSt3R's trajectory
|
fine-tuned DUSt3R's trajectory
|
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
Input: generated videos (no ground truth avaliable)
a basketball court in the backyard of a house |
our camera trajectory
|
pretrained DUSt3R's trajectory
|
fine-tuned DUSt3R's trajectory
|
from-scratch trained DUSt3R's trajectory
|
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
a modern home with glass walls and patio furniture |
our camera trajectory
|
pretrained DUSt3R's trajectory
|
fine-tuned DUSt3R's trajectory
|
from-scratch trained DUSt3R's trajectory
|
![]() |
![]() |
![]() |
![]() |
T2V & T2V+C qualitative results
All videos in this section are generated from JOG3R (Figure 6 in the main paper).
We additionally generate cammera paths (T2V+C) and confirm they are nearly identical
with the paths estimated by running V2C on the generated videos (T2V->V2C) (Figure 5 in the main paper).
For each pair of frames, we visualize only 10 correspondences to avoid clutter.
a backyard with steps leading up to a blue house |
correspondences from T2V+C
![]() |
correspondences from T2V->V2C
![]() |
---|---|---|
camera poses from T2V+C
![]() |
camera poses from T2V->V2C
![]() |
|
a basketball court in the backyard of a house |
correspondences from T2V+C
![]() |
correspondences from T2V->V2C
![]() |
camera poses from T2V+C
![]() |
camera poses from T2V->V2C
![]() |
|
a patio with chairs and tables in front of a house |
correspondences from T2V+C
![]() |
correspondences from T2V->V2C
![]() |
camera poses from T2V+C
![]() |
camera poses from T2V->V2C
![]() |
|
a view of a kitchen and living room in a new home |
correspondences from T2V+C
![]() |
correspondences from T2V->V2C
![]() |
camera poses from T2V+C
![]() |
camera poses from T2V->V2C
![]() |
|
a dining room table with chairs and a view of the water |
correspondences from T2V+C
![]() |
correspondences from T2V->V2C
![]() |
camera poses from T2V+C
![]() |
camera poses from T2V->V2C
![]() |