Robots often have a starring role in science fiction, with depictions ranging from the sarcastic Marvin, who claims to be 50,000 times more intelligent than humans in The Hitchhiker’s Guide to the Galaxy, to the heroic R2-D2, who battles Imperial troops in the Star Wars movies. Robots are also stars in the workplace, although the tasks they complete—everything from packaging to picking stock—are less colorful than battling Storm Troopers. Over the past five decades, robots have transformed sectors ranging from automotive, where they are used in press, body weld, and paint shops, to retail, where they can roll through store aisles to monitor stock levels.
While robots have traditionally been highly specialized and designed to complete a single task, a new generation of general-purpose robots is emerging. Equipped with more sophisticated capabilities than their predecessors, these robots can complete diverse, unrelated tasks across different settings.1 Recent technological advances have fueled interest in general-purpose robotics, as noted in our other article, “Will embodied AI create robotic coworkers?” Challenges remain, however, for both robotic hardware and software.
The robot paradox
Robots far surpass humans in speed and accuracy when analyzing complex data, but they often struggle to navigate the physical world. This contrast is at the heart of Moravec’s paradox—the observation that computers excel at high-level cognitive tasks, such as arithmetic, but have difficulty with physical tasks that humans find simple, such as picking a blueberry without damaging it.
Traditionally, these physical limitations have challenged programmers and required them to create painstaking step-by-step instructions specifying the movements of each robot actuator in sequence. To understand the workload, imagine having to write a program for blueberry picking that tells each muscle in your arm, wrist, hand, and fingers how far, how fast, and with what force to move (all based on multimodal, real-time feedback from the nervous system). The program would also have to adjust for the myriad ways in which blueberries could differ in shape, softness, and position. Since the picker operates in a constantly changing environment, the program would then need to account for dynamic elements, such as how branches might shift during picking.
Few general-purpose robots have been deployed at scale, partly because of the intensive programming required to direct multiple physical tasks. But advances in software are now making general-purpose robots viable by enabling them to learn, adapt, and act in real time, without human intervention. Simultaneously, hardware improvements are optimizing robot dexterity, sensing, and power.
Major advances in software
AI has helped enable the leap from specialized robots focused on a single function, such as spot welding, to general-purpose robots. This technology is typically referred to as “physical AI” or “embodied AI” when referring to robotic capabilities to distinguish it from the AI programs used to create new content, such as videos, art, and fiction.
Embodied AI complements two other technologies that researchers have long applied to robotics: classic AI and traditional machine learning.2 Embodied AI takes traditional machine learning a step further by allowing robots to analyze input, including data from sensors, and then adjust or refine their physical motion, positioning, and behavior in real time.
Foundation models are the most critical embodied AI tool in the growth of general-purpose robotics. After being trained on vast amounts of data, these models can identify patterns that allow them to perform multiple tasks. The extent to which foundation models can optimize robot capabilities varies (Exhibit 1). With perception, for instance, they may greatly enhance vision and haptic (that is, touch) capabilities, but they do not greatly improve a robot’s ability to determine its proximity to other objects. Most foundation models already allow robots to analyze and understand visual cues, such as patterns essential for object recognition. Researchers are also developing multimodal foundation models that allow robots to perform actions based on both visual inputs and language inputs, such as spoken commands.
In addition to the growing sophistication of foundation models, other advances have enhanced software for general-purpose robotics:
- Behavioral cloning. Robots can now learn by watching humans—either live or in videos—in different contexts. The visual input is incorporated into models that allow the robots to copy their actions, even in complex, multistep activities. Imitation learning is particularly well suited to training for mobility and dexterity.
- Reasoning models. As their name suggests, these large language models can focus on problem solving, logical inference, and making deductions. (Like foundation models, they are a form of AI). Reasoning models break down problems or questions into parts before considering various solutions and then narrowing them down. Researchers often use foundation models to build new reasoning models.
- Enhanced computer vision and perception. Robots are equipped with multiple sensors, including gyroscopes that detect movement, cameras that analyze light and visual inputs, and tactile sensors that monitor pressure and identify textures. The introduction of additional sensing capabilities, including those that can interpret the amount of force an object exerts, will enable many more robotic functions.
Consider how these advances could make a general-purpose robot more useful in a manufacturing environment. Currently, factories use escapements and fixtures to hold objects in the right orientation during tooling. But if robots have embodied AI capabilities, they can use sensors to “see” the object and immediately adjust its orientation. While today’s multipurpose robots are programmed to make adjustments only within controlled parameters, future general-purpose robots may be capable of self-programming, allowing them to adapt automatically based on experience in multiple settings. Such robots will undertake many actions beyond a narrow set of pre-defined behaviors.
To get a sense of the complexity of the software in a general-purpose robot, consider the range of techniques used in programming one (Exhibit 2). For instance, perception requires reinforcement learning, foundation models, and convolutional neural networks (a type of deep learning algorithm that excels at processing image and video data).
Additional areas for software improvement
Despite the big leaps in technology, researchers must still overcome some software challenges to optimize how general-purpose robots learn from experience, adapt with minimal latency, and complete multiple tasks under unpredictable conditions. Many problems arise because the foundation models that govern robot activities require billions of data points, and researchers must build many more simulation environments to collect all the necessary information.
Given the gaps in current foundation models, some general-purpose robots still struggle to navigate unstructured or changing environments. This deficiency may not become obvious until they move from simulations, where their environment is relatively stable, to the real world, where employees might inadvertently disorient robots by moving worktables, leaving empty boxes on shelves, or mistakenly putting screws in a bin meant for bolts. For the same reason, performance accuracy may drop if a general-purpose robot moves from its typical location to a new environment. If researchers can coordinate robot functions more seamlessly—for instance, connecting locomotion to object manipulation—the error rate might decrease.
There is also room for improvement in processing input from sensors, including those that can “see” objects, haptics that analyze input related to touch, and auditory systems that can hear noise and recognize voice commands. What’s more, robots have difficulty simultaneously analyzing input from multiple sensors and considering the input data as a whole. As one example, robots cannot cook an omelet, since it requires them to consider visual and haptic data simultaneously and handle multiple objects, including frying pans, spatulas, and eggs.3
Better software could enhance multi-robot coordination, which may become increasingly common. If numerous general-purpose robots work on the same task or related activities, such as stocking shelves or performing search-and-rescue operations, they must be aware of other robots around them to avoid collisions or other complications.
Ground-breaking advances—but persistent challenges—in hardware
Advances in edge computing, which encompasses both hardware and software, have reduced latency in robots and allowed real-time decision making. Simultaneously, other hardware improvements have enhanced robotic dexterity, sensing, and power. Additional improvements are essential, however. For instance, researchers are still attempting to build a robot hand with flexible fingers and greater dexterity that will make it easier to grip irregularly shaped objects.4
Challenges also persist related to power sources. Although batteries allow robots to move without constraint, their lifespan is relatively short (about 3 to 5 hours for humanoid robots). General-purpose robots may require frequent recharging, especially if they engage in activities that require a lot of power, such as lifting heavy objects or high-torque motion. Slow charging speed—a problem that has taken center stage with the growth of electric vehicles—could also prolong downtime in robots, reducing productivity.
Another improvement opportunity relates to form. Researchers often prefer to use the smallest components possible to minimize robot size, but this approach can increase costs and reduce performance. Finding the right balance between these elements is still proving difficult.
The software and hardware challenges ahead may seem daunting, but recent advances suggest that general-purpose robots could take workplace automation to new heights. The only questions are how quickly progress will occur and whether companies will embrace change rapidly or hesitate. Given the promise of general-purpose robots, companies that act early may be best poised to capture value.