Tool Talk — Mario Krell, "Field Studies with Multimedia Big Data"
Mario Krell invites researchers to learn more about a new framework that harnesses user generated big data to inform their scholarship.
As people increasingly share all kinds of data about the world, large corpuses of publicly available data have become available. The Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset comprises 99.2 million images and nearly 800, 000 videos from Flickr. To enable scientists to leverage this data for studies, his team proposes a new framework that extracts the required information in a format usable by researchers who are not experts in big data processing. In this talk, Mario will discuss a number of examples from the literature of natural and social science studies (environmental changes, language and gesture communication, motion training for robotics, etc.) that could be piloted , supplemented, or even conducted using YFCC100M data, and proposes ideas for some entirely new studies. He will also share some of the challenges of building such a framework, such as cleaning unlabeled and noisily labeled data and constructing an intuitive interface capable of complex search queries at a large scale.
Mario invites researchers to join him in a discussion on how this new tool could be useful and encourages collaborators as he builds out this framework through various iterations.