
A general-purpose humanoid program centered on a Vision-Language-Action model (“Helix”) and real-world pilots, with a clear bias toward manipulation, long-horizon task execution, and commercial deployments.
Figure’s humanoid program is a useful contrast to Tesla Optimus because it begins from a different systems thesis: humanoids become valuable when they can execute multi-step tasks in the real world with minimal reprogramming. Figure’s pathway to that outcome is not “make the best body first,” but “build a generalist brain that can connect perception, language, and actions,” then iterate the hardware toward mass deployability.
The centerpiece of that thesis is Helix, a Vision-Language-Action (VLA) model that Figure describes as a unified stack for perception, language understanding, and learned control. In practical terms, Helix is intended to turn high-level instructions (natural language) and sensory input (vision + robot state) into reliable low-level actions (hands, arms, posture) without the old robotics burden of writing a new behavior pipeline for every task.
Hardware matters, but Figure’s story is clearly software-forward: a humanoid is a vessel for an embodied AI system that must generalize across object variety and environment messiness. The most credible evidence that Figure understands this is their emphasis on pilots where the robot must perform boring work: parts handling, packaging/logistics, and structured factory tasks. The goal is not to win a mobility contest. The goal is to win a reliability contest.
Figure 02 was the “workhorse generation” used to learn what breaks in real deployments. Figure 03 (introduced in October 2025) positions itself as a redesign meant for scale—explicitly “designed for Helix, the home, and the world at scale,” with emphasis on high-volume manufacturing. The big systems question is whether Helix can earn its keep outside curated demonstrations: can it reduce integration cost enough that customers treat the robot as a deployable labor unit rather than a custom engineering project?
64 / 100
72 / 100
60 / 100
78 / 100
74 / 100
76 / 100
| Spec | Details |
|---|---|
| Robot owner | Figure AI |
| Primary models | Figure 02 (industrial pilot generation), Figure 03 (scale redesign) |
| Helix | Vision-Language-Action model (VLA) intended to unify perception + language + control |
| Figure 02 height / weight | ~170 cm / ~70 kg (reported by BMW and Figure communications) |
| Payload | ~20 kg (commonly stated in BMW and Figure materials) |
| Runtime | ~5 hours (commonly reported in Figure 02 coverage and specs roundups) |
| Speed | ~1.2 m/s (commonly reported in Figure 02 coverage and specs roundups) |
| Vision | Six onboard RGB cameras (stated in Figure 02 release materials) |
| Hands | Human-scale hands; Figure materials reference 16 DoF hands and “human-equivalent strength” |
| Compute | Onboard compute increases vs prior generation (Figure described “3x inference” improvements; Nvidia reporting references added modules) |
| Deployment focus | Manufacturing + logistics, with a long-term “general-purpose” positioning including home (Figure 03 intro) |
| Public spec reliability | Medium-high (multiple primary sources describe consistent headline specs; audited performance metrics remain limited) |
Helix as the center of gravity. Figure describes Helix as a generalist VLA model that unifies perception, language understanding, and learned control. In practice, this means Helix is intended to map from “what the robot sees + what it’s asked to do” into “what the robot does,” including manipulation primitives and task sequences. The system-level promise is that you can deploy the same robot into new workflows with less traditional engineering glue: fewer handcrafted state machines and fewer one-off perception stacks per customer.
Perception-to-action, not just perception. Many robotics programs can identify objects and estimate poses, yet still struggle to act reliably. Figure’s framing is that perception alone isn’t the differentiator; the differentiator is pixels-to-actions that survive real-world variability. The decision to publicly brand Helix as VLA rather than “LLM on a robot” is important—VLA signals that control is first-class, not an afterthought.
Manipulation as the commercial wedge. The economics of humanoids are dominated by hands. If you can reliably pick, place, route, triage, and kit objects, you can monetize quickly in logistics and manufacturing. Figure 02’s published emphasis—six RGB cameras, higher onboard inference, and human-scale hands—fits that thesis. It’s also why Figure’s demonstrations tend to look like work: parts handling and package manipulation rather than stunts.
Hardware iteration shaped by pilots. Figure’s own BMW narratives frame Figure 02 as a learning platform: every intervention logged, every hour on the line shaping the next design. That’s the right kind of pain. Robots do not become commercial by winning a benchmark; they become commercial by surviving a production environment without constant babysitting. Figure 03 is explicitly positioned as the generation that incorporates those lessons and aims for manufacturing at scale.
The real test: recovery. For readers, the most predictive capability is not “task completion” but “safe recovery.” When a grasp slips, when a part is missing, when a tote is placed wrong—does the robot retry, re-plan, and continue safely? Helix’s long-horizon promise is meaningful only if it includes robust recovery behaviors that minimize escalations to humans.
| Priority | Pick | Why | Tradeoff to accept |
|---|---|---|---|
| Best first environment | Factory workcells + structured aisles | Constrained object sets, repeatable lighting, clear throughput KPIs, and controllable safety zones. | Early success may overfit to curated conditions; general-purpose claims must be validated with variability. |
| Highest ROI task class | Parts handling, kitting, and staging | High repetition, measurable errors, and strong human labor substitution potential. | Small error rates still create rework; reliability must be “boring.” |
| Fastest learning loop | Logistics triage + package manipulation | High-volume object interaction builds the manipulation dataset quickly, sharpening Helix’s pixels-to-actions loop. | Object variety increases edge cases; safety and recovery become more important than speed demos. |
| Long-term bet | General-purpose assistant across industries | If Helix reduces integration cost, the robot can be deployed into many workflows with less bespoke engineering. | Generalization across environments is the hardest problem; the last 10% dominates time and cost. |
Figure’s most credible path to scale is work that is repetitive enough to train and validate, yet variable enough to reward a generalist VLA approach. Below are representative workloads that match what Figure publicly emphasizes: manipulation-heavy tasks with clear KPIs and frequent opportunities to learn from interventions.
| Scenario | Environment | Required capability | What success looks like | Readiness signal |
|---|---|---|---|---|
| Kitting + staging | Factory workcell | Pick/place, object ID, gentle handling, consistent cadence | Correct parts placed in correct bins, repeatedly, with low rework | High ROI and clean metrics; perfect for early commercial validation |
| Package triage | Logistics line | Fast grasp selection, routing decisions, recovery on slips | Routes packages to correct lanes; recovers from mis-grasps safely | Builds a rich manipulation dataset; exposes long-tail edge cases early |
| Parts loading / unloading | Production aisle | Safe navigation, stable carrying, collision avoidance | Moves parts without drops or unsafe proximity events across a shift | Measures “boring reliability,” the real commercial threshold |
| Assisted assembly support | Station-based | Two-hand coordination, compliance, precision placement | Holds/positions components; humans do final fastening early | Great on-ramp where autonomy can grow without high risk |
| Reset + replenishment | Shift change | Navigation + manipulation + inventory awareness | Returns tools, refills bins, clears clutter, restores “known state” | Often underhyped; creates measurable operational value |
Figure’s Helix-centric narrative makes one thing especially important: you must separate “language interface” from “safety authority.” Natural language can be the operator interface, but the robot’s execution must remain bounded by force limits, motion constraints, and strict stop/recover behaviors.
Give Helix an early world that is repeatable: fixed lighting, known storage locations, limited object variety, and clear “safe lanes.” This isn’t cheating; it’s how you bootstrap reliable policies.
Most robots look competent in ideal conditions. The product story begins when disruptions happen and the robot stays safe while continuing the job.
If a human must help occasionally, treat it as data collection, not failure. The question is whether supervision cost drops over time as Helix improves.
Figure is betting that a unified Vision-Language-Action model (Helix) can reduce per-task programming and enable more general behavior from the same robot across different workflows. The commercial upside is lower integration cost and faster deployment.
“LLM on a robot” often means language without control. A VLA framing highlights that the model must produce actions reliably, not only generate words. For real work, control and recovery behavior matter more than conversational fluency.
A factory pilot doesn’t prove mass readiness, but it does prove contact with real constraints: cycle time, safety procedures, uptime, and human workflows. The strongest signal is whether the program publishes boring metrics: hours run, intervention rates, and failure modes.
Audited operational metrics: mean time between failures, shift-level uptime, unsupervised completion rates, and cost per deployed hour (including maintenance and supervision). The more those numbers stabilize, the more the robot becomes a product.
Prefer primary vendor posts and manufacturing partners for headline specs and dated milestones. Add audited metrics as they become available.
Medium-high. Headline specs and milestones are corroborated by Figure and BMW communications, and Helix/03 announcements are primary sources. What remains less visible (as with most humanoids) is audited reliability: intervention rates, MTBF, long-horizon autonomy success rates, and true total cost per deployed hour.