25 Jan, 2024


Grouping is inherently ambiguous due to the multiple levels of granularity in which one can decompose a scene -- should the wheels of an excavator be considered separate or part of the whole?

That is what Ken Goldberg, Professor of industrial Engineering and Operations Research William S. Floyd Jr. Distinguished Chair in Engineering, UC Berkeley, and Chief Scientist Ambi Robotics focuses on a new research paper.

Chung Min Kim, Mingxuan Wu, Justin Kerr, Ken Goldberg, Matthew Tancik, and Angjoo Kanazawa present Group Anything with Radiance Fields (GARField), an approach for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. To do this, they embrace group ambiguity through physical scale: by optimizing a scale-conditioned 3D affinity feature field, a point in the world can belong to different groups of different sizes. the group optimizes this field from a set of 2D masks provided by Segment Anything (SAM) in a way that respects coarse-to-fine hierarchy, using a scale to consistently fuse conflicting masks from different viewpoints. From this field they derive a hierarchy of possible groupings via automatic tree construction or user interaction. The team evaluates Garfield on a variety of in-the-wild scenes and finds it effectively extracts groups at many levels: clusters of objects, objects, and various subparts. GARField inherently represents multi-view consistent groupings and produces higher fidelity groups than the input SAM masks. GARField’s hierarchical grouping could have exciting downstream applications such as 3D asset extraction or dynamic scene understanding.

See the project paper here!

See the project website Here!