Sharing ASL data online FAIRly with CARE the ASL way – MoLo and O5S5 projects

By Julie A. Hochgesang, Professor of Linguistics, Gallaudet University
Presentation for Cornell ASL Lecture Series
September 28, 2023

Thank you, Brenda Schertz and the Cornell Department of Linguistics and The Cornell Linguistics Circle, for inviting me to present! Emily Shaw and Miako Villanueva, long-time colleagues, are my interpreters.

Detailed visual description including slide content here: https://bit.ly/408qMd2

Introduction
Context
MoLo and O5S5
Sharing data from MoLo and O5S5
References and Acknowledgments
Bio

Abstract As a deaf linguist in North America, my recent work has been the documentation of the language use of the ASL communities in North America. In my presentation, I discuss how language documenters share their data publicly, drawing upon Austin Principles of Data Citation, FAIR and CARE guidelines and practices specific to signed language researchers. I also present findings from a recent survey we did with the ASL communities about sharing ASL data online.

I focus on two current documentation projects - “Motivated Look at Indicating Verbs in ASL (MoLo)” and “Documenting the experiences of the ASL communities in the time of COVID-19 (O5S5)” which I am currently preparing to share as open access. While I describe the projects and showcase some of the data, I specifically highlight the data statements I am creating for both of them and reflect on what it means to publicly share ASL videos online.
What follows are notes for this presentation, which was presented using mmhmm which doesn't allow for me to share presentation notes easily so I've created this webpage instead. Each block is connected to one or two "slides" of my presentation. 

To cite this presentation: 
Hochgesang, J.A. (2023). Sharing ASL data online FAIRly with CARE the ASL way - MoLo and O5S5 projects. figshare. Presentation. https://doi.org/10.6084/m9.figshare.24205566. 

Introduction

Screenshot of Julie's title slide with ASL signs on top "share" and "care" with title below "Sharing ASL data online FAIRly with CARE the ASL way - MoLo and O5S5 projects" by Julie A. Hochgesang. Below are two pictures with several rows and columns showing participants from the projects Julie's discussing today. Bottom has "Cornell -2023" and a logo for Gallaudet with "department of linguistics". Background is black. Text is white.

For this presentation, I discuss how I’m preparing data from two documentation projects for open access. The two projects are “Motivated Look at Indicating Verbs in ASL (MoLo)” and “O5S5: Documenting the experiences of the ASL communities in the time of Covid-19”. I present with my long-time collaborators Emily Shaw and Miako Villanueva as my ASL interpreters. Part of this discussion includes an introduction, the relevant context, a brief overview of the two projects and how I’m sharing the data as open access, including creating data statements for both projects.

One thing as a deaf linguist that I like to do is make my content as multimodal and multilingual as possible since that reflects who I am and how I do my work. In the presentation slides, we might expect to see written English. Those slides are what gets shared and usually becomes the lasting representation of the presentation. Since I’m talking about my work with the ASL communities, that representation being predominantly English doesn’t sit right with me so I’m also incorporating ASL as much as possible – usually by way of photos such as the ASL signs for “share” and “care”/“cherish” at the top since those two concepts are in my title and are two of my main themes for today. 

I’ve been documenting signing communities for over twenty years now. Although I’ve been signing and interacting with deaf signing communities my entire life, I’d say I got my start with language documentation when I was a Peace Corps Volunteer in Kenya, East Africa in 2002-2004. I helped with the Kenyan Sign Language CD dictionary and worked directly with deaf Kenyans to share information about their language. Then I went to Gallaudet for graduate school in linguistics and have been involved in documentation projects from then on – the Bilingual Bimodal Binational (BiBiBi) project, Field Methods courses at Gallaudet in the linguistics department (both as a student and as a teacher), Philadelphia Signs Project, Haitian Sign Language Documentation Project (LSHDoP), ASL Signbank (as a documentation tool), and, more recently, Motivated Look at Indicating Verbs in ASL (MoLo) and Documenting the Experiences of the ASL Communities in the Time of Covid-19 (O5S5). I’ll be focusing on the last two projects for this presentation.

Context

As I think about sharing the data – primarily ASL videos – from the MoLo and O5S5 projects, I draw upon open access guidelines FAIR and CARE which help us consider principles to make data open and what it means to do so. I also consult the Austin Principles of Data Citation, which is a set of principles for linguists to consider as they share primary data (the data that we observe) in a citable way. One way I’m sharing the data is through data statements (Bender et al., 2021).

I also consider the ethics of working with signed language communities (Harris et al., 2009; Hochgesang and Palfreyman, 2022). As deaf linguist Hou (2017) notes, ““For the most part, the documentation of sign languages and the social lives of deaf and hearing signers… has been conducted by hearing researchers…” (p. 339). Joseph Hill, another deaf linguist, says “They have insights as deaf linguists that have been informed by their biosocial and linguistic experiences as deaf people” (shared with permission, see Hochgesang (2019)). Some of my practices are similar to or have been inspired by Mobile Deaf. I too think about how to “showcase the data itself” (Moriarty 2020).

During the summer of 2023, I worked with two undergraduates for Gallaudet’s REU to create and disseminate a survey about sharing ASL data online. We collected information about their experiences accessing ASL videos online, what they valued and tested a few ways we could share our own project data. We got over 60 responses. Two key takeaways from this survey that I want to share here is that: 1) our respondents primarily use social media with video sharing platforms like YouTube as a close second to access ASL videos. 2) Most of the respondents preferred the ability to “share” (rather than “comment” or “like” functions).

MoLo and O5S5

I have described MoLo and O5S5 a bit more in a presentation I gave earlier this year. But I’ll also do a bit of sharing here, especially video-wise, to give you a sense of the data especially since this is an ASL lecture series.

MoLo is a corpus-based project inspired by the BSL corpus project. MoLo documents the language experiences of different members of the ASL communities and the videos were collected over Zoom. Data collection was active from 2019 to 2021 and processing and analysis are ongoing.

O5S5 is a documentation project that was the focus of one Field Methods class (fall of 2021) in which we documented the experiences of the ASL communities in the time of Covid-19. It was a fitting project for us because we were still living through a pandemic and dealing with all of the issues that came with it. We collected over 50 texts – interviews, conversations and narratives. Both on Zoom and in-person. Some participants are masked. Others physically distancing.

More information about each project are available here. Below you’ll see one systems prompt from MoLo (not yet captioned). And one narrative from O5S5.

MoLo systems prompt
One narrative from O5S5

Both projects use GUDA (Gallaudet University Documentation of ASL) protocols for digital organization, annotation and sharing. For digital organization, we use Google workspace (or Google Drive). For annotation, we use ELAN, ASL Signbank and SLAAASh annotation conventions. Finally, for sharing, we use google sites (MoLo under GUDA; O5S5), YouTube as well as more citable sites like Figshare and OSF.

Sharing data from MoLo and O5S5

Here’s how I’ve shared data from these projects. I’ve been using Google sites to create websites about both projects (O5S5 and MoLo). I’ve also made good use of YouTube.

I’ve incorporated aspects of FAIR, CARE and ethics of working with signed language communities. I also will explore the data statement (Bender et al., 2021) we are currently developing for O5S5.

I’ve spent a lot of time thinking about different aspects of all of this – collecting the data, organizing it, making it machine-readable (adding annotations or textual representation), and sharing it. And while I have been trained as a linguist to share these products in the traditional academic manner, I have also been developing my craft in which I “showcase the data” (Moriarty 2020) to be shared with the signing communities themselves.

As stated above, I’ve done the work (annotation, online sharing, etc) to make our content accessible, interoperable and reusable (the AIR in FAIR). To make the work findable (the F in FAIR), consistent and useful metadata needs to be provided with each video that is publicly shared. I also wanted to consider CARE principles that honored the ASL communities. My familiarity with current practices in sharing ASL videos both research and community-wise along with careful reflection and crafting in how and what to say. I have created metadata descriptions for when sharing via YouTube. I show templates in the figure below.

Screenshots of textual documents formatted in different subsections with spacing and headings marked by *title*

Original text available here:
https://www.dropbox.com/scl/fi/biuyub87wddedqfkukkf4/O5S5-ASL-video-Youtube-description.rtf?rlkey=dxdopk99mskjpkdntb3yani24&dl=0

and 

https://www.dropbox.com/scl/fi/vo6430lsm0b946i602y8u/MoLo-ASL-video-YouTube-description.rtf?rlkey=qabw3c1skus8oev0jzdghjxrc&dl=0

Here are two live examples from both projects – Lauren Ridloff’s narrative for O5S5 and Louise Applegate and Emily Sidansky’s system prompt for MoLo. (Please note we have explicit and informed consent to share their real names.) Some information listed in the description is typically expected metadata such as titles, dates and name(s) of participants. Other information is meant to make the data accessible such as visual descriptions. The other elements that I have been more deliberate about are the introductions in which we remind viewers that our data has been collected from real people and that their language use is to be respected since there is no one right way to use language. We also provide citation information for people to use for citing and helping other find our videos along with hashtags and links to our project documentation. For the O5S5 videos, we have listed the participants of the narratives as creators of these videos, honoring the “A” or “authority to control” in CARE. Finally, we provide licensing terms so people know how they can use these videos. 

Data statements (Bender et al., 2021) are “essential information about the characteristics of datasets” such as “curation rationale”, demographics of participants and annotators, and technical details about data collection and processing that normally don’t get included in traditional research publications. While I’ve followed best practices for signed language corpora and documentation (e.g., Berez-Kroeker et al., 2022; Fenlon and Hochgesang, 2022), data statements are new to signed language documentation (Schulder et al., 2021), especially for ASL data – either written or signed. I’ve been developing both a written English and signed ASL version of the data statement for both projects. I invited one of the participants of the O5S5 ASL project, Lauren Ridloff, also a well-known Deaf actress and dear friend of mine, to conduct the interview along with support by Emily Shaw. It is a deep dive in how and why we collected data, how we prepared it for sharing and archiving, and all the decisions along the way. I plan to share this on our O5S5 ASL website as both an entire video (3 hours!) and as separate “chapters” on a single web page.

Screenshot of a website with large brown banner with centered white text "O5S5 ASL Data Statement". Side banner has menu bar for O5S5 ASL website. Body of the webpage shows images of 4 embedded videos with subsection headings "preview of all O5S5 ASL videos", "Introduction to O5S5 ASL Data Statement", "Goal of Data Statement", "Why do a Data Statement in ASL"?. Julie and or Lauren are featured in most of these.
O5S5 ASL Data Statement on the O5S5 ASL project website (not yet live)

In this presentation, I’ve discussed a bit about how I’ve shared data from our projects, especially with guidance from FAIR, CARE and our own ASL communities. Thinking about how to share our data – actually our stories, our language, our lives – requires careful consideration and ongoing reflection. Thank you for letting me do so with you all.

References and acknowledgments

References
Bender, E., Friedman, B., & McMillan-Major, A. (2021). Data Statements: A Guide for Writing Data Statements for Natural Language Processing (Version 2). Tech Policy Lab, University of Washington. https://techpolicylab.uw.edu/wp-content/uploads/2021/11/Data_Statements_Guide_V2.pdf

Berez-Kroeker, A. L., McDonnell, B., Koller, E., & Collister, L. B. (Eds.). (2022). The Open Handbook of Linguistic Data Management. MIT Press. https://doi.org/10.7551/mitpress/12200.001.0001

Carroll, S. R., Garba, I., Figueroa-Rodríguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43. https://doi.org/10.5334/dsj-2020-043

Fenlon, J., & Hochgesang, J. A. (Eds.). (2022). Signed Language Corpora. Gallaudet University Press.

Harris, R., Holmes, H. M., & Mertens, D. M. (2009). Research Ethics in Sign Language Communities. Sign Language Studies, 9(2), 104–131. https://doi.org/10.1353/sls.0.0011

Hochgesang, J. A. (2019, December 6). Sign Language Description: A Deaf Retrospective and Application of Best Practices from Language Documentation [Opening keynote presentation]. The 8th Meeting of Signed and Spoken Language Linguistics, National Museum of Ethnology, Minpaku, Osaka, Japan. https://doi.org/10.6084/m9.figshare.13393427.v1

Hochgesang, J. (2023). Documenting the ASL communities: MoLo and O5S5 Projects. figshare. https://doi.org/10.6084/m9.figshare.22652689.v2

Hochgesang, J., Bates, M., Clark, A., Davis, K., Dunham, M., Hamilton, L., Kadar, S., Kim, Y., Martínez Castiblanco, J. A., Maucere, G., Newman, T., & Simmons, H.. (2021). O5S5: Documenting the experiences of the ASL Communities in the time of COVID-19 (Version2). figshare. https://doi.org/10.6084/m9.figshare.16983517.v2

Hochgesang, J.A., Crasborn, O. & Lillo-Martin, D. (2017-2023). ASL Signbank. New Haven, CT: Haskins Lab, Yale University. https://aslsignbank.haskins.yale.edu/

Hochgesang, J. A., Lepic, R., Dudis, P., Shaw, E., & Villanueva, M. (2022, September 27). Motivated look at indicating verbs in ASL (MoLo). Theoretical Issues in Sign Language Research 14, Osaka, Japan. Open Science Framework. https://doi.org/10.17605/OSF.IO/VJP6W

Hochgesang, J. A., Lepic, R., & Shaw, E. (2023). W(h)ither the ASL corpus?: Considering trends in signed corpus development. In E. Wehyrmeyer (Ed.), Gaining ground in sign language corpus linguistics (pp. 287–308). John Benjamins. https://doi.org/10.1075/scl.108.11hoc

Hochgesang, J. A., & Palfreyman, N. (2022). Sign language corpora and the ethics of working with sign language communities. In J. Fenlon & J. A. Hochgesang (Eds.), Signed Language Corpora (pp. 158–195). Gallaudet University Press.

Hou, L. Y.-S. (2017). Negotiating Language Practices and Language Ideologies in Fieldwork: A Reflexive Meta-Documentation. In A. Kusters, M. De Meulder, & D. O’Brien (Eds.), Innovations in Deaf Studies: The Role of Deaf Scholars (pp. 339–360). Oxford University Press.

Moriarty, E. (2020). Filmmaking in a Linguistic Ethnography of Deaf Tourist Encounters. Sign Language Studies, 20(4), 572–594. https://doi.org/10.1353/sls.2020.0019.

Schulder, M., Blanck, D., Hanke, T., Hofmann, I., Hong, S.-E., Jeziorski, O., König, L., König, S., Konrad, R., Langer, G., Nishio, R., & Rathmann, C. (2021). Data Statement for the Public DGS Corpus. Universität Hamburg. https://doi.org/10.25592/UHHFDM.9700

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Acknowledgments
Gratitude to all of our participants who shared their experiences with us. Deep appreciation for the work of MoLo research assistants – Donovan Catt, Chanika Dorsey, Paul Gabriola, Meagan Sietsema, LeeAnn Tang, and Cody Willow. Also see Hochgesang et al. (2021) reference for O5S5 research team members.

Funding for MoLo provided by Gallaudet’s Priority Research Fund grant 2019-2022; For O5S5, Gallaudet University School of Language, Education and Culture (2021-2022).

During the summer of 2023, I worked with two undergraduate students through the Gallaudet REU – Molly Clements and Stephanie Patterson. The survey cited in this presentation about sharing ASL data is based on the work they did. This material is based upon work supported by the National Science Foundation under Grant No. 2131524, “DASS: Designing Accountable Artificial Intelligence Services for People with Diverse Sensory Abilities”.

The research reported here was supported in part by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under award number R01DC013578 and award number R01DC000183. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The ASL Signbank was developed at Radboud University by Onno Crasborn, Wessel Stoop, Micha Hulsbosch, and Susan Even.

Resources

And I’m excited to announce that Signed Language Corpora (Gallaudet University Press) edited by Jordan Fenlon and myself with foreword by Trevor Johnston is out now!

The open-access Open Handbook of Linguistic Data Management is now available and an incredible resource along with its companion site. There are a few chapters related to signed languages, including my own.

Bio: Julie. A. Hochgesang (/ˈhoʊkˌsæŋ/) is a professor of Linguistics at Gallaudet University. She is a deaf* linguist who specializes in phonetics and phonology of signed languages, fieldwork, documentation, and corpora of signed languages, and ethics of working with signed language communities. Professor Hochgesang also works towards making linguistics accessible to the communities, especially the ASL communities, sharing multimodal products via social media and digital repositories.  She has contributed to ongoing efforts to create accessible collections for the ASL communities, most notably as active maintainer of the ASL Signbank. Her most recent ASL documentation projects include the "Philadelphia Signs Project". “Motivated Look at Indicating Verbs in ASL (MoLo)”, “Gallaudet University Documentation of ASL (GUDA)”, and “Documenting the Experiences of the ASL communities in the time of COVID-19 (O5S5 - ASL name derived from ASL variants for “Document COVID”). The work she does is because of the ASL communities and she considers herself a member of these communities. 

*white, sighted, hearing family, early signer, cisgender