Artificial intelligence-powered robots have been successfully hacked by researchers, allowing them to carry out actions that are typically blocked by safety and ethical protocols. The researchers from Penn Engineering achieved a 100% jailbreak rate using their algorithm, RoboPAIR, which bypassed safety protocols on three different AI robotic systems within a few days. Normally, these large language model (LLM) controlled robots refuse to comply with prompts for harmful actions, but under the influence of RoboPAIR, they were able to perform tasks such as detonating a bomb, blocking emergency exits, causing collisions, and knocking shelves onto people. The researchers used Clearpath’s Robotics Jackal, NVIDIA’s Dolphin LLM, and Unitree’s Go2 robots for their study. The researchers also discovered that the robots were vulnerable to other forms of manipulation, such as asking them to perform actions they had previously refused, but with fewer situational details. Prior to the public release of their findings, the researchers shared their results with leading AI companies and the manufacturers of the robots used in the study. Alexander Robey, one of the authors, emphasized the importance of identifying weaknesses in AI systems and using red teaming practices to improve their safety. He called for a reevaluation of AI integration in physical robots and systems based on the findings of their research.
Trending
- Polygon’s Nailwal: The Jio Partnership Will Propel Real-World Web3 Adoption for 450 Million Users
- Babylon’s Total Value Locked Decreases by 32% as Wallets Unstake $1.2B in Bitcoin
- Bitcoin May Reach $1 Million If the U.S. Acquires 1 Million BTC — Bitcoin Policy Institute
- Cryptocurrency in a Bear Market: Rebound Expected in Q3 — Coinbase
- Italy’s Finance Minister Cautions That US Stablecoins Present a Greater Threat Than Tariffs
- Only 11% of Registered Bitcoin Companies in El Salvador Are Operational
- Ethereum’s Market Share Approaches Historic Lows as ETH Price Faces Risk of Declining to $1,100
- Mantra and Terra Luna: No Commonality Except for a Token Collapse