Data Collection
The VLAI L1 is built for large-scale teleoperation data collection. Its one-click recording pipeline, dual-arm synchronization, and built-in VR teleop make it the fastest path from robot delivery to training-ready dataset.
Bimanual VR Teleoperation Recording
Connect and verify all systems
rc connect --device l1 --host 192.168.1.45
rc status # check: arms, base, cameras, battery all green
Move L1 to recording position
Drive the L1 to the task workspace using WASD in the browser panel. Set the lift height for the task (e.g., 130 cm for table-top manipulation). Park it and lock the wheels.
rc teleop --device l1 # open browser panel
# Drive to position, then lock:
python -c "from roboticscenter import L1; r=L1('192.168.1.45'); r.connect(); r.base.lock_wheels(); r.disconnect()"
Set up the task scene and cameras
Place task objects in consistent starting positions. Verify camera views in the browser panel — both wrist cameras (Developer Max) and any external cameras should cover the task workspace.
Start recording session via CLI
rc record \
--device l1 \
--task "Pick up the bottle and pour into the glass" \
--num_episodes 50 \
--output ~/datasets/l1-pour-v1 \
--teleop_mode vr # or: browser, leader_arms
# Press ENTER in VR to start each episode, ENTER again to end
Review episodes
rc replay \
--dataset ~/datasets/l1-pour-v1 \
--episode 0
The viewer shows all camera streams + joint state time-series synchronized. Delete poor episodes before pushing.
Push to HuggingFace Hub
huggingface-cli login
rc push_dataset \
--dataset ~/datasets/l1-pour-v1 \
--repo_id your-username/l1-pour-v1
L1 Dataset Schema
The L1 recording pipeline produces a multi-modal dataset with both arms, the mobile base, all cameras, and optional language annotations.
Quality Checklist
The L1's VR teleop can introduce unique data quality issues around latency and bimanual coordination. Run through this before pushing to the Hub.
-
1VR latency was below 50ms during recording Check the latency monitor in the browser panel during recording. Above 50ms, the operator's hand movements lag the robot's actions, creating a causal mismatch in the dataset. Re-record on a lower-latency WiFi channel if needed.
-
2Both arms moved as intended (no single-arm episodes) For bimanual tasks, verify both arms show significant motion in
observation.left_arm_stateandobservation.right_arm_state. Single-arm-dominant episodes may indicate the operator favored one hand. -
3Mobile base was stationary during arm manipulation Unless you are recording mobile manipulation tasks,
observation.base_stateshould be nearly constant within each episode. Base movement during manipulation causes the workspace to shift relative to the cameras. -
4All camera streams present for the full episode The L1's WiFi bandwidth may drop frames under load. Run
rc validate_dataset --dataset ~/datasets/l1-pour-v1to check for missing frames across all camera streams. -
5Language instruction matches what was demonstrated The language instruction is set before recording starts. If the operator improvised a different approach (e.g., used one arm instead of two), update the instruction or delete the episode.
Training a VLA from Your Dataset
Once your dataset is on HuggingFace Hub, fine-tune a VLA with the L1 action space.
Fine-tune OpenVLA on L1 data
pip install roboticscenter[vla]
python -m roboticscenter.scripts.finetune_vla \
--model openvla/openvla-7b \
--dataset your-username/l1-pour-v1 \
--action_space l1_bimanual \ # registers the 16-DOF bimanual action head
--epochs 50 \
--output_dir outputs/openvla-l1-pour
Deploy fine-tuned VLA on-device (Developer Pro/Max)
rc deploy vla \
--model outputs/openvla-l1-pour \
--quantize int4 \
--device l1 \
--host 192.168.1.45
# Run the policy:
rc run policy \
--task "Pick up the bottle and pour into the glass" \
--max_steps 100