News/Research

Lyman Dispatch: Anushah Hossain

10 Sep, 2020

Lyman Dispatch: Anushah Hossain

Anushah Hossain received the 2020 Lyman Fellowship for her dissertation on a multi-lingual internet. The Peter Lyman Graduate Fellowship in new media, established in the memory of esteemed UC Berkeley Professor Peter Lyman, provides a stipend to a UC Berkeley Ph.D. candidate to support the writing of his or her Ph.D. dissertation on a topic related to new media. The fellowship is supported by donations from Professor Barrie Thorne, Sage Publications, and many individual friends and faculty.

Here's what Hossain said about the experience:

In late 2000, an email was sent out to the Unicode Consortium mailing list, asking about the missing Bangla character khanda ta. Unicode was an international standards body based out of Silicon Valley, whose mission it was to create a single scheme for use by every computer to represent every known script. It had launched in 1991 and by the release of its third edition in 1999, had come to support every commercially-viable and national script and achieve buy-in from governments and major technology companies.

When the email was sent regarding khanda ta, major Indic scripts such as Bangla had already been encoded in the Standard. But users were beginning to ask for adjustments. From their perspective, it appeared characters were missing, or certain encodings were prone to error. These challenges were taking place over public Consortium mailing lists and open meetings, and originated frequently from Indic-language users. Khanda ta was one of the earliest of such challenges, and remains in popular memory amongst many of interviewees during my Lyman fellowship research on Bangla-language computing this summer.

Part of what makes Indic scripts prone to discussion is that they are complex scripts, meaning the shape of a character -- how it appears on a page or screen -- depends on the context in which that character is being used. The early discussion around khanda ta, for example, was about whether it was truly a unique character, requiring its own spot in the Standard, or whether it was an alternate form of an already-encoded character. A later point of discussion was how the character would ultimately encoded: with its own single “codepoint” or with a sequence of them. Though Unicode usually erred towards sequences, experiences with Indic characters such as khanda ta eventually taught them to revise their stance. Sequencing characters from complex scripts proved difficult for implementation further up the computing stack.

The debate over khanda ta played out over four years, and involved members from the Unicode Consortium; Bangla-computing activists from Bangladesh, West Bengal, and the diaspora; and occasional interlocutors from universities, language academies, and industry. The earliest rumblings had come from grassroots organizations and digital pioneers who had taken on the mission of Bangla-computing and were beginning to advocate for changes when their national governments would not. While the character was an example of technical redundancy and mob-mentality for its detractors, it came to represent the need for preservation and completeness of the Bengali identity in the digital environment for its proponents. We see that eventually, khanda ta is encoded with a single codepoint — a singular victory for Bangla-computing volunteers — but persists as a meme and mistake in the minds of some Unicode members.

As I work on my dissertation on the history and politics of Bangla-language computing, I am beginning to disentangle the various axes along which tensions often strike: between programmer and user, between grassroots organizations and governments, between Bangladesh and India, and India and the West. The story of khanda ta serves as an entry to two chapters of my dissertation that will touch on these topics. The first is on the language politics of the subcontinent, from the late colonial period through independence, and how they manifest today in digital institutions such as Unicode. The second will focus on the khanda ta episode as an exploration of how (and whose) values are embedded in code. I apply a new media lens to this second chapter in particular, referencing Winner’s (1980) notion of the politics of technical artifacts, and Anderson’s (1983) notion of print media promulgating national identity.

I’ve been lucky to be able to pursue this research from my home in California amidst a pandemic. Though I had a trip planned to Bangladesh this summer to interview some of my subjects in person, much of my data collection has been able to proceed without interruption through virtual interviews and observation as meetings move online. I have been able to access online sources from Unicode encoding proposals and the technical Standard itself, Bangla-computing mailing lists, and archived snapshots of blogs and forums. I’m grateful such materials exist online and that the Lyman fellowship supported scholarship regardless of the location from which it is conducted.

This project is ostensibly about Bangladesh, the Bangla language, and its script, but this case bears similarity to many post-colonial states who’ve sought to establish their national identity in the digital environment. My hope is that the final product from this work will prove useful not only to area studies scholars, but also to those in New Media and beyond contemplating the relationship between language, identity, and information technologies.