Figure

Figure

A Helix-native humanoid optimized for practical work: manipulation, logistics, and factory learning loops.

Figure (Figure 02 / Figure 03) (humanoid)

A general-purpose humanoid program centered on a Vision-Language-Action model (“Helix”) and real-world pilots, with a clear bias toward manipulation, long-horizon task execution, and commercial deployments.

Multimodal (VLA) Helix-first control Factory + logistics pilots Dexterous hands focus Commercial-first Confidence: medium-high

Figure’s humanoid program is a useful contrast to Tesla Optimus because it begins from a different systems thesis: humanoids become valuable when they can execute multi-step tasks in the real world with minimal reprogramming. Figure’s pathway to that outcome is not “make the best body first,” but “build a generalist brain that can connect perception, language, and actions,” then iterate the hardware toward mass deployability.

The centerpiece of that thesis is Helix, a Vision-Language-Action (VLA) model that Figure describes as a unified stack for perception, language understanding, and learned control. In practical terms, Helix is intended to turn high-level instructions (natural language) and sensory input (vision + robot state) into reliable low-level actions (hands, arms, posture) without the old robotics burden of writing a new behavior pipeline for every task.

Hardware matters, but Figure’s story is clearly software-forward: a humanoid is a vessel for an embodied AI system that must generalize across object variety and environment messiness. The most credible evidence that Figure understands this is their emphasis on pilots where the robot must perform boring work: parts handling, packaging/logistics, and structured factory tasks. The goal is not to win a mobility contest. The goal is to win a reliability contest.

Figure 02 was the “workhorse generation” used to learn what breaks in real deployments. Figure 03 (introduced in October 2025) positions itself as a redesign meant for scale—explicitly “designed for Helix, the home, and the world at scale,” with emphasis on high-volume manufacturing. The big systems question is whether Helix can earn its keep outside curated demonstrations: can it reduce integration cost enough that customers treat the robot as a deployable labor unit rather than a custom engineering project?


Scoreboard (key metrics)

Autonomy maturity

64 / 100

Helix is built for long-horizon task execution; pilots suggest meaningful autonomy in constrained settings, but full “shift-level” independence remains the bar.

Manipulation dexterity

72 / 100

Figure 02 emphasized human-scale hands and object handling; the program’s practical differentiation is “hands + judgment,” not acrobatics.

Locomotion robustness

60 / 100

Credible walking and stability for factories and warehouses; less oriented toward extreme terrain or dynamic athletics.

Learning scalability

78 / 100

Helix is explicitly intended to reduce per-task programming and improve generalization; gains depend on continuous real-world logging + iteration.

Deployment practicality

74 / 100

BMW pilots and logistics demonstrations indicate real integration effort and real constraints—good signals for a commercial trajectory.

Manufacturing scalability

76 / 100

Figure 03 is positioned as the scale redesign; whether it reaches high volume depends on supply chain execution and field reliability.

Technical specifications (public signals)

Spec Details
Robot owner Figure AI
Primary models Figure 02 (industrial pilot generation), Figure 03 (scale redesign)
Helix Vision-Language-Action model (VLA) intended to unify perception + language + control
Figure 02 height / weight ~170 cm / ~70 kg (reported by BMW and Figure communications)
Payload ~20 kg (commonly stated in BMW and Figure materials)
Runtime ~5 hours (commonly reported in Figure 02 coverage and specs roundups)
Speed ~1.2 m/s (commonly reported in Figure 02 coverage and specs roundups)
Vision Six onboard RGB cameras (stated in Figure 02 release materials)
Hands Human-scale hands; Figure materials reference 16 DoF hands and “human-equivalent strength”
Compute Onboard compute increases vs prior generation (Figure described “3x inference” improvements; Nvidia reporting references added modules)
Deployment focus Manufacturing + logistics, with a long-term “general-purpose” positioning including home (Figure 03 intro)
Public spec reliability Medium-high (multiple primary sources describe consistent headline specs; audited performance metrics remain limited)

What stands out beyond the scoreboard

Where Figure wins
  • Helix-first strategy: A VLA-centered stack aims to reduce the “robot integration tax” by letting one learned system handle diverse tasks with fewer bespoke pipelines.
  • Hands and manipulation emphasis: Figure 02’s story is not agility; it’s object handling. That’s the pathway to real commercial value.
  • Pilot credibility: BMW testing and production-line trial narratives are the kind of real-world friction that improves product truth fast.
  • Hardware iteration discipline: Figure 02 appears treated as the “learn-to-ship” generation; Figure 03 is positioned as the scale redesign informed by deployments.
  • Language as an interface: If Helix meaningfully supports natural-language instruction-to-action, it can shorten onboarding and broaden deployable tasks.
Where costs sneak up
  • Generalization debt: “General-purpose” usually hides a long tail of edge cases—unexpected objects, clutter, lighting shifts, and workflow variance.
  • Shift-level reliability: The demo problem is “do the task once.” The product problem is “do it 3,000 times with safe recovery.”
  • Supervision economics: Assisted autonomy is powerful for data collection, but too much human oversight can erase business value.
  • Maintenance + fleet ops: Commercial robots become operations businesses: parts, downtime, remote monitoring, updates, and on-site service.
  • Safety validation burden: Working near humans introduces certification and procedural layers that often slow rollouts more than engineering teams expect.

System-level breakdown (how the robot works)

Helix as the center of gravity. Figure describes Helix as a generalist VLA model that unifies perception, language understanding, and learned control. In practice, this means Helix is intended to map from “what the robot sees + what it’s asked to do” into “what the robot does,” including manipulation primitives and task sequences. The system-level promise is that you can deploy the same robot into new workflows with less traditional engineering glue: fewer handcrafted state machines and fewer one-off perception stacks per customer.

Perception-to-action, not just perception. Many robotics programs can identify objects and estimate poses, yet still struggle to act reliably. Figure’s framing is that perception alone isn’t the differentiator; the differentiator is pixels-to-actions that survive real-world variability. The decision to publicly brand Helix as VLA rather than “LLM on a robot” is important—VLA signals that control is first-class, not an afterthought.

Manipulation as the commercial wedge. The economics of humanoids are dominated by hands. If you can reliably pick, place, route, triage, and kit objects, you can monetize quickly in logistics and manufacturing. Figure 02’s published emphasis—six RGB cameras, higher onboard inference, and human-scale hands—fits that thesis. It’s also why Figure’s demonstrations tend to look like work: parts handling and package manipulation rather than stunts.

Hardware iteration shaped by pilots. Figure’s own BMW narratives frame Figure 02 as a learning platform: every intervention logged, every hour on the line shaping the next design. That’s the right kind of pain. Robots do not become commercial by winning a benchmark; they become commercial by surviving a production environment without constant babysitting. Figure 03 is explicitly positioned as the generation that incorporates those lessons and aims for manufacturing at scale.

The real test: recovery. For readers, the most predictive capability is not “task completion” but “safe recovery.” When a grasp slips, when a part is missing, when a tote is placed wrong—does the robot retry, re-plan, and continue safely? Helix’s long-horizon promise is meaningful only if it includes robust recovery behaviors that minimize escalations to humans.


Deployment pick (where Figure makes the most sense first)

Priority Pick Why Tradeoff to accept
Best first environment Factory workcells + structured aisles Constrained object sets, repeatable lighting, clear throughput KPIs, and controllable safety zones. Early success may overfit to curated conditions; general-purpose claims must be validated with variability.
Highest ROI task class Parts handling, kitting, and staging High repetition, measurable errors, and strong human labor substitution potential. Small error rates still create rework; reliability must be “boring.”
Fastest learning loop Logistics triage + package manipulation High-volume object interaction builds the manipulation dataset quickly, sharpening Helix’s pixels-to-actions loop. Object variety increases edge cases; safety and recovery become more important than speed demos.
Long-term bet General-purpose assistant across industries If Helix reduces integration cost, the robot can be deployed into many workflows with less bespoke engineering. Generalization across environments is the hardest problem; the last 10% dominates time and cost.

Real workloads table (what “useful” looks like)

Figure’s most credible path to scale is work that is repetitive enough to train and validate, yet variable enough to reward a generalist VLA approach. Below are representative workloads that match what Figure publicly emphasizes: manipulation-heavy tasks with clear KPIs and frequent opportunities to learn from interventions.

Scenario Environment Required capability What success looks like Readiness signal
Kitting + staging Factory workcell Pick/place, object ID, gentle handling, consistent cadence Correct parts placed in correct bins, repeatedly, with low rework High ROI and clean metrics; perfect for early commercial validation
Package triage Logistics line Fast grasp selection, routing decisions, recovery on slips Routes packages to correct lanes; recovers from mis-grasps safely Builds a rich manipulation dataset; exposes long-tail edge cases early
Parts loading / unloading Production aisle Safe navigation, stable carrying, collision avoidance Moves parts without drops or unsafe proximity events across a shift Measures “boring reliability,” the real commercial threshold
Assisted assembly support Station-based Two-hand coordination, compliance, precision placement Holds/positions components; humans do final fastening early Great on-ramp where autonomy can grow without high risk
Reset + replenishment Shift change Navigation + manipulation + inventory awareness Returns tools, refills bins, clears clutter, restores “known state” Often underhyped; creates measurable operational value

How to control risk (a practical rollout playbook)

Figure’s Helix-centric narrative makes one thing especially important: you must separate “language interface” from “safety authority.” Natural language can be the operator interface, but the robot’s execution must remain bounded by force limits, motion constraints, and strict stop/recover behaviors.

Constrain the environment to accelerate Helix learning

Give Helix an early world that is repeatable: fixed lighting, known storage locations, limited object variety, and clear “safe lanes.” This isn’t cheating; it’s how you bootstrap reliable policies.

  • Standardize object presentation (bins, fixtures, totes) for the first deployments.
  • Instrument failures: slips, drops, near-collisions, and timeouts.
  • Promote “recoveries” to first-class training signals, not only successes.
Make recovery the KPI that gates expansion

Most robots look competent in ideal conditions. The product story begins when disruptions happen and the robot stays safe while continuing the job.

  • Track: retries per task, time-to-recover, and escalation frequency.
  • Define a safe “give up” posture: stop, back off, alert, wait.
  • Use escalation logs to target the next model iteration.
Use assisted autonomy to build the dataset you need

If a human must help occasionally, treat it as data collection, not failure. The question is whether supervision cost drops over time as Helix improves.

  • Log every human intervention and annotate the reason.
  • Turn interventions into training targets: “what the robot should have done.”
  • Keep a hard ceiling on supervision minutes per hour.

Status timeline

  • 2024: Figure 02 introduced; BMW Plant Spartanburg testing announced and described in BMW press materials.
  • Feb 2025: Helix announced as a generalist Vision-Language-Action model for Figure robots.
  • 2025: Figure publishes logistics-focused Helix demonstrations and broader commercialization narratives.
  • Oct 9, 2025: Figure 03 introduced; described as a ground-up redesign for Helix and manufacturing at scale.
  • Late 2025: Multiple sources describe extended BMW trial and production-line contributions; details vary by reporting.

FAQ

What is Figure’s core bet vs other humanoids?

Figure is betting that a unified Vision-Language-Action model (Helix) can reduce per-task programming and enable more general behavior from the same robot across different workflows. The commercial upside is lower integration cost and faster deployment.

Why does Figure emphasize Helix instead of “LLM on a robot”?

“LLM on a robot” often means language without control. A VLA framing highlights that the model must produce actions reliably, not only generate words. For real work, control and recovery behavior matter more than conversational fluency.

Is Figure more “real” than other humanoids because of BMW?

A factory pilot doesn’t prove mass readiness, but it does prove contact with real constraints: cycle time, safety procedures, uptime, and human workflows. The strongest signal is whether the program publishes boring metrics: hours run, intervention rates, and failure modes.

What would most increase confidence in Figure’s trajectory?

Audited operational metrics: mean time between failures, shift-level uptime, unsupervised completion rates, and cost per deployed hour (including maintenance and supervision). The more those numbers stabilize, the more the robot becomes a product.



Verification confidence

Medium-high. Headline specs and milestones are corroborated by Figure and BMW communications, and Helix/03 announcements are primary sources. What remains less visible (as with most humanoids) is audited reliability: intervention rates, MTBF, long-horizon autonomy success rates, and true total cost per deployed hour.



Also in Robotics

NEO
NEO

Phoenix
Phoenix

Figure 1
Figure 1

Subscribe