Ken Goldberg in the International Journal of Robotics Research

28 Feb, 2022

BCNM faculty Ken Goldberg was featured twice in the 40th edition of the International Journal of Robotics Research. "Dynamic regret convergence analysis and an adaptive regularization algorithm for on-policy robot imitation learning" was published along with Jonathan N Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, and of course Ken Goldberg.

From the abstract:

On-policy imitation learning algorithms such as DAgger evolve a robot control policy by executing it, measuring performance (loss), obtaining corrective feedback from a supervisor, and generating the next policy. As the loss between iterations can vary unpredictably, a fundamental question is under what conditions this process will eventually achieve a converged policy. If one assumes the underlying trajectory distribution is static (stationary), it is possible to prove convergence for DAgger. However, in more realistic models for robotics, the underlying trajectory distribution is dynamic because it is a function of the policy. Recent results show it is possible to prove convergence of DAgger when a regularity condition on the rate of change of the trajectory distributions is satisfied. In this article, we reframe this result using dynamic regret theory from the field of online optimization and show that dynamic regret can be applied to any on-policy algorithm to analyze its convergence and optimality. These results inspire a new algorithm, Adaptive On-Policy Regularization (Aor), that ensures the conditions for convergence. We present simulation results with cart–pole balancing and locomotion benchmarks that suggest Aor can significantly decrease dynamic regret and chattering as the robot learns. To the best of the authors’ knowledge, this is the first application of dynamic regret theory to imitation learning.

Read the full paper here!

"Sequential robot imitation learning from observations" was published along with Ajay Kumar Tanwani, Andy Yan, Jonathan Lee, Sylvain Calinon, and Ken Goldberg.

From the abstract:

This paper presents a framework to learn the sequential structure in the demonstrations for robot imitation learning. We first present a family of task-parameterized hidden semi-Markov models that extracts invariant segments (also called sub-goals or options) from demonstrated trajectories, and optimally follows the sampled sequence of states from the model with a linear quadratic tracking controller. We then extend the concept to learning invariant segments from visual observations that are sequenced together for robot imitation. We present Motion2Vec that learns a deep embedding space by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while being pushed away from randomly sampled images of other segments, and a time contrastive loss is used to preserve the temporal ordering of the images.

Read the full paper here!