VROOM: Visual Reconstruction over Onboard Multiview

Yajat Yadav+ Varun Bharadwaj+ Tanish Baranwal+
Jathin Korrapati+
1 UC Berkeley (+: equal contribution)


A sample segment of the race, and the corresponding track reconstruction + car motion recovered by our method.

We introduce VROOM, a system for reconstructing 3D models of Formula 1 circuits using only onboard camera footage from racecars. Leveraging video data from the 2023 Monaco Grand Prix, we address challenges in the videos such as high-speed motion, sharp turns of perspectives from a singular camera perspective.

[Paper]      [Code]     [BibTeX]

Interactive 4D Visualization

Explore our 4D reconstruction results of VROOM on various track mini-sectors.

Left Click Drag with left click to rotate view
Scroll Wheel Scroll to zoom in/out
Right Click Drag with right click to move view
W S Moving forward and backward
A D Moving left and right
Q E Moving upward and downward

Abstract

We introduce VROOM, a system for reconstructing 3D models of Formula 1 circuits using only onboard camera footage from racecars. Leveraging video data from the 2023 Monaco Grand Prix, we address challenges in the videos such as high-speed motion, sharp turns of perspec- tives from a singular camera perspective. Our pipeline uti- lizes different methods such as DROID—Slam, AnyCam, and Monst3r and combine preprocessing techniques such as different methods of masking, temporal chunking, and resolution scaling to account for dynamic motion and com- putational constraints. We show Vroom is able to partially recover the track and vehicle trajectories in complex envi- ronments. These findings indicate the feasibility of using onboard video for scalable 4D reconstruction across multi- ple agents in real-world settings.

Preprocessing Methods

In order to make our method with the long F1 videos, we implement several preprocessing methods to improve efficiency while maintaining reconstruction quality.

Masking

Since the car remains fixed with respect to the camera in the videos, this often interferes with the reconstruction. We try multiple masking methods to address this issue.
Left: Masking just the car fails to solve the issue. Right: Masking bottom half of the frame works best.

Next, in addition to downsampling resolution and FPS, we also strategically chunk the video on straight segments in the race. Since our method processes the video chunk by chunk, chunking on the straights is essential for ensuring each turn is reconstructed with the highest accuracy possible.

(Top) First Video Chunk; (Bottom) Second Video Chunk. As we see, the two overlap in a straight segment of the race.

Chunk-wise Point Cloud and Camera Extrinsics

We utilize and extend the monst3r repo for learning the point cloud and camera parameters per chunk.

Dynamic global point cloud and camera pose estimation

This figure is from the MonST3R paper and illustrates the system we used. For each chunk, a window slides across the video to build a graph with edges between pairs of frames. For each pair, optical flow is found using an off-the-shelf method, and the point cloud found by Monst3r. These intermediates are used in a global-level bundle adjustment to optimize the point cloud and camera parameters. After repeating the above process with each chunk, we stitch together the chunks by utilizing the overlapping frames- we utilize this correspondence to solve for the transformation matrix between frame i's extrinsics and frame (i+1)'s extrinsics. This allows us to stitch together the camera trajectories, as well as combine all the point clouds in one global reference frame.

Results - Turn Reconstructions

tr>
Predicted vs Ground Truth Comparisons
Turn Predicted Ground Truth
Turn 1  Turn 1 Ground Truth 1
Turn 8 Prediction 2 Ground Truth 2
Failure case; Turn 15 + 16 Prediction 3 Ground Truth 3
Turn 8 Prediction 4 Ground Truth 4

BibTex

@article{yadav2024vroom,
  author = {Yadav, Yajat and Bharadwaj, Varun and Korrapati, Jathin and Baranwal, Tanish},
  title = {VROOM: Visual Reconstruction over Onboard Multiview},
  year = {2025},
  note = {Unpublished manuscript, for a class project},
  institution = {University of California, Berkeley},
  url = {http://varun-bharadwaj.github.io/vroom},
}

Acknowledgements: We borrow this template from the Monst3r paper at Monst3r. Similarly, the interactive 4D visualization is inspired by the visualizations presented by Monst3r.