Crowdsourcing and Transcription – all for one or one for all?

In the age of a move towards digitizing a large variety of sources, it can be overwhelming for one person or one dedicated group of people to work solely on digitizing thousands of individual documents. Transcribe Bentham and Old Weather are two examples where these historians have made the sources they have available to the public for transcription via crowdsourcing. The Oxford English Dictionary describes crowdsourcing as “The practice of obtaining information or services by soliciting input from a large number of people, typically via the Internet and often without offering compensation”, which highlights the first thing about the participants of these transcription projects – they are volunteers. 1 These ‘citizen historians’ spend their time looking over photographs of documents and transcribing them into text, so that they can be turned into searchable archives. For the owners of the projects, this involves minimal costs, as they do not have to employ expensive transcription methods like Optical Character Recognition, and much like the Galaxy Zoo project with its 100 million images, these can be transcribed multiple times in order to filter out any user-based inaccuracies in the transcription. 2

When analysing the outcome of these projects it is important to understand who their audiences are and how the projects aim to use the archive after the transcription is complete. For the Old Bentham Paper project, its aim is to ‘create an authoritative scholarly edition of the Collected Works’ of philosopher and reformer Jeremy Bentham. 3 As these works are owned by University College of London, the use of the searchable archive after its transcription will be limited to academics who have access through a university to the archive. This can limit the will of ‘citizen historians’ to take part, especially amateurs who will not be able to see the fruits of their labour, but on the other hand this may not be the case. Many amateur historians who do take part in these projects do so for the greater purpose of digitising these important documents, rather than because they have any specific use for them. The Old Weather project, a collaboration between a group of companies including the National Archives and the Met Office, involves volunteers transcribing ships’ logs to from the mid 19th century to help historians track past ship movements and create a fluid history of crew and voyages. The project ranks its volunteers as different positions on a ship, which correspond to the amount of logs they have helped transcribe, from Officer to Captain. This can be seen as a motivation factor to progress up the leader board as a form of competition between fellow citizen historians.

One main criticism I had when using both systems was their technological complications, moreso for Old Weather. Transcribe Bentham used basic html code to give more accurate transcriptions, and had small icons to be able to insert these automatically without having to know how to use them yourself. For Old Weather however, the transcription method was much more complicated, using bulky highlighting and drop-down arrow methods, which quickly cluttered up a page, leaving the user unable to see some of the text they were trying to transcribe. However, this has somewhat to  do with the type of data that was being transcribed; Transcribe Bentham is handwritten letters and manuscripts, whereas Old Weather harnesses tables and ship logs. The layout of the latter is much more complicated for a user to create, which is why the archive gives a set amount of information that they wish to gain from each log. When crowdsourcing transcription, this must be taken into account – what does the archive want to gain from this information, and how is it going to be harnessed after it is complete?

What is important to note about transcribing as potential growth in the digital history field is that these projects are often just traditional forms of research and historical documentation on a larger scale. As Causer and Wallace state, the technology is used mainly to speed up these traditional methods rather to revolutionise them. Furthermore, it was found that only 6% of historians used these forms of digital history in their research, and only a similar amount were willing to use software to speed up familiar techniques, rather than any new or more complicated software that would yield different results. 4 5 This data shows the positive aspect of Transcribe Bentham and Old Weather, but also highlights how these projects do not push the boundaries of building better digital history tools.

  1.  ‘Definition of ‘crowdsourcing”, Oxford English Dictionary, http://www.oed.com/view/Entry/376403?redirectedFrom=crowdsourcing; accessed 22/03/15
  2. Raddick, M. J., A.S. Szalay, J. Vandenberg, G. Bracey, P.L. Gay, C. Lintott, P. Murray and K. Schawinski, ‘Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers‘, Astronomy Education Review, 9:1 (2010):
  3. Tim Causer and Valerie Wallace, ‘Building a Volunteer Community: Results and Findings from Transcribe Bentham’, Digital Humanities Quarterly, Vol. 6 No. 2 (2012).
  4. Summit 2006 Summit on Digital Tools for the Humanities. A Report on the Summit on Digital Tools. University of Virginia, 28–30 September 2005. 2006.
  5. Fred Gibbs and Trevor Owens, ‘Building Better Digital Humanities Tools: Towards broader audiences and user-centred designs’, Digital Humanities Quarterly, Vol. 6 No. 2 (2012).
Advertisements