News/Research

Ken Goldberg on Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks

14 Aug, 2021

Ken Goldberg on Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks

Reinforcement learning has demonstrated high success in safe learning in dynamic environments. However, reinforcement learning in this context has its weaknesses. Namely, reinforcement learning can prove problematic in highly dimensional environments. Ken Goldberg, Albert Wilcox, Ashwin Balakrishna, Brijen Thananjeyan, and Joseph E. Gonzalez, address this issue in the new paper, LS: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks. In the paper, a new strategy of imitation learning is presented: Latent Space Safe Sets (LS3). In their experiment, LS3 undergoes a series of challenging pushing and routing tasks to determine its efficiency and successes in the face of environmental constraints.

From the abstract:

Reinforcement learning (RL) algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with high-dimensional state spaces, such as images, is challenging. We present Latent Space Safe Sets (LS3 ), which extends this strategy to iterative, long-horizon tasks with image observations by using suboptimal demonstrations and a learned dynamics model to restrict exploration to the neighborhood of a learned Safe Set where task completion is likely. We evaluate LS3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

Read the entire paper here!