Research Dispatches: Joyce Lee on Auditory Data Representation

27 May, 2018

This year, Joyce Lee was selected to work with Michelle Carney on her project “Auditory Representations of Data.”

In her own words:

With the rise of both virtual assistants and software-embedded devices, audio-first interactions are becoming more prevalent in daily life. However, there are not yet industry standards for communicating the droves of data generated by ubiquitous computing via sound experiences – particularly through emergent smart speaker interfaces. We imagine a future in which we are able to explore data by ear: how might Alexa enable us to understand complex datasets? Would it be possible to hear weather forecasts, stock market prices, or the status of our IoT-connected home – by sound alone?

Literature Review and Analysis

We first conducted an in-depth literature review of prior work related to data sonification,

shaping our analysis based on visual analogs of “auditory graphs” (i.e., histograms, scatter plots, pie charts, etc). Spanning across various disciplines – including human-computer interaction, accessibility, music, and art – the interdisciplinary selection of papers demonstrated great variety: some demonstrated more creative intent to evoke with memorable, musical designs, while others focused on accurate, scientific representations. Observed differences in the data sonification literature manifested in two primary areas:

1. the rigor – or absence – of the experimental procedure, and

2. the quality of stimuli used: whether researchers used abstract MIDI sounds or audio mapped to semantic meaning (e.g. using the sound of rain to communicate precipitation). Papers also demonstrated differences in whether the researchers mapped sounds to real or simulated data.

For our framework, we thus plotted the selected papers along two dimensions: objective vs. subjective sonification approach on the vertical axis, and abstract vs. functional data on the horizontal axis.

VUI Development and Usability Testing

A major difficulty encountered while reviewing the literature, however, is that much of it is decades old and lacks access to the audio files associated with each of the papers. Therefore, we reproduced three different sonification methods, using the audio software Ableton Live and Audacity:

1. an audio choropleth map proposed by Zhao et al to represent population data by state from the 2010 Census,

2. an audio line graph inspired by Brown and Brewster to represent employment data by age group from the 2015 American Community Survey (ACS), and

3. an audio pie chart designed by Franklin and Roberts to represent employment data (again from the ACS), but by education level.

Using the prototyping software Storyline, we used these auditory data representations to develop our own voice user interface, an Alexa skill called Tally Ho.

With this VUI, we conducted in-person usability testing to evaluate the potential of auditory data exploration via a contemporary, conversational interface. We sought and recruited a five-person sample that would demonstrate a range of familiarity with: smart speakers, musical knowledge, U.S. geography, and census data. During each moderated session, the participant was presented each of the three different audio representations in a randomized order, then asked follow-up questions about initial impressions, perceived difficulty, and user expectations.

Results and Future Work

Our prototype used pitch, timbre, and rhythm to represent data points, category differences, and overall trends: users were able to hear these distinctions and interpret them mostly correctly after hearing a scripted explanation from the VUI. Our results suggest that users generally enjoyed the experience of hearing data – finding it “cool,” “fun,” and even “powerful – but also had difficulty remembering key insights as passive listeners.

Given that the average human primarily relies on capabilities of sight first, it is unsurprising that VUIs tend to require a greater cognitive load and more training than traditional visual interfaces. Among the participants who already owned smart speakers, they mainly used them to perform simple tasks like playing music and setting alarms: understanding sonified data was an entirely new type of experience. One way to overcome this novelty factor would be to conduct a longitudinal study to assess changes in both performance and enjoyment of the experience over repeated interactions.

With the growth of conversational, voice user interfaces – in tandem with the rise of data quantifying everything around us – we are optimistic about the way forward for best practices in auditory data representation. As Ritter and Hermann suggest, humans are “capable [of] detect[ing] very subtle patterns in acoustic sounds, which is exemplified to an impressive degree in the field of music, or in medicine, where the stethoscope still provides very valuable guidance to the physician.” Continuing to develop interaction patterns for auditory data exploration will benefit not only those who are visually impaired or limited in numeracy skills, but also those who are curious about making sense of data through alternative means – ultimately, improving accessibility of information for all.