Crowdsourcing for Hispanic Linguistics: Amazon’s Mechanical Turk as a source of Spanish data

Iván Ortega-Santos

doi:10.7557/1.8.1.4670

Authors

Iván Ortega-Santos University of Memphis

DOI:

https://doi.org/10.7557/1.8.1.4670

Keywords:

Amazon’s Mechanical Turk, crowdsourcing, Spanish Linguistics, data collection, research planning

Abstract

Within the field of Linguistics, Amazon’s Mechanical Turk, a crowdsourcing marketplace specializes in computer-based Human Intelligence Tasks, has been praised as a cost efficient source of data for English and other major languages. Spanish is a good candidate due to its presence within the US and beyond. Still, detailed information concerning the linguistic and demographic profile of Spanish-speaking ‘Turkers’ is missing, thus making it difficult for researchers to evaluate whether the Mechanical Turk provides the right environment for their tasks. This paper addresses this gap in our knowledge by developing the first detailed study of the presence of Spanish-speaking workers, focusing on factors relevant for research planning, namely, (socio)linguistically relevant variables and information concerning work habits. The results show that this platform provides access to a fairly active participant pool of both L1 and L2Spanish speakers as well as bilinguals. A brief introduction to how Amazon’s Mechanical Turk works and overview of Hispanic Linguistics projects that have so far used the Mechanical Turk successfully is included.