Category Archives: publications

The limits of social media big data

handbook social media researchA new book chapter by Rob Kitchin has been published in The Sage Handbook of Social Media Research Methods edited by Luke Sloan and Anabel Quan-Haase. The chapter is titled ‘Big data – hype or revolution’ and provides a general introduction to big data, new epistemologies and data analytics, with the latter part focusing on social media data.  The text below is a sample taken from a section titled ‘The limits of social media big data’.

The discussion so far has argued that there is something qualitatively different about big data from small data and that it opens up new epistemological possibilities, some of which have more value than others. In general terms, it has been intimated that big data does represent a revolution in measurement that will inevitably lead to a revolution in how academic research is conducted; that big data studies will replace small data ones. However, this is unlikely to be the case for a number of reasons.

Whilst small data may be limited in volume and velocity, they have a long history of development across science, state agencies, non-governmental organizations and business, with established methodologies and modes of analysis, and a record of producing meaningful answers. Small data studies can be much more finely tailored to answer specific research questions and to explore in detail and in-depth the varied, contextual, rational and irrational ways in which people interact and make sense of the world, and how processes work. Small data can focus on specific cases and tell individual, nuanced and contextual stories.

Big data is often being repurposed to try and answer questions for which it was never designed. For example, geotagged Twitter data have not been produced to provide answers with respect to the geographical concentration of language groups in a city and the processes driving such spatial autocorrelation. We should perhaps not be surprised then that it only provides a surface snapshot, albeit an interesting snapshot, rather than deep penetrating insights into the geographies of race, language, agglomeration and segregation in particular locales. Moreover, big data might seek to be exhaustive, but as with all data they are both a representation and a sample. What data are captured is shaped by: the field of view/sampling frame (where data capture devices are deployed and what their settings/parameters are; who uses a space or media, e.g., who belongs to Facebook); the technology and platform used (different surveys, sensors, lens, textual prompts, layout, etc. all produce variances and biases in what data are generated); the context in which data are generated (unfolding events mean data are always situated with respect to circumstance); the data ontology employed (how the data are calibrated and classified); and the regulatory environment with respect to privacy, data protection and security (Kitchin, 2013, 2014a). Further, big data generally capture what is easy to ensnare – data that are openly expressed (what is typed, swiped, scanned, sensed, etc.; people’s actions and behaviours; the movement of things) – as well as data that are the ‘exhaust’, a by-product, of the primary task/output.

Small data studies then mine gold from working a narrow seam, whereas big data studies seek to extract nuggets through open-pit mining, scooping up and sieving huge tracts of land. These two approaches of narrow versus open mining have consequences with respect to data quality, fidelity and lineage. Given the limited sample sizes of small data, data quality – how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; veracity – the authenticity of the data and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to; and lineage – documentation that establishes provenance and fit for use; are of paramount importance (Lauriault, 2012). In contrast, it has been argued by some that big data studies do not need the same standards of data quality, veracity and lineage because the exhaustive nature of the dataset removes sampling biases and more than compensates for any errors or gaps or inconsistencies in the data or weakness in fidelity (Mayer-Schonberger and Cukier, 2013). The argument for such a view is that ‘with less error from sampling we can accept more measurement error’ (p.13) and ‘tolerate inexactitude’ (p. 16).

Nonetheless, the warning ‘garbage in, garbage out’ still holds. The data can be biased due to the demographic being sampled (e.g., not everybody uses Twitter) or the data might be gamed or faked through false accounts or hacking (e.g., there are hundreds of thousands of fake Twitter accounts seeking to influence trending and direct clickstream trails) (Bollier, 2010; Crampton et al., 2012). Moreover, the technology being used and their working parameters can affect the nature of the data. For example, which posts on social media are most read or shared are strongly affected by ranking algorithms not simply interest (Baym, 2013). Similarly, APIs structure what data are extracted, for example, in Twitter only capturing specific hashtags associated with an event rather than all relevant tweets (Bruns, 2013), with González-Bailón et al. (2012) finding that different methods of accessing Twitter data – search APIs versus streaming APIs – produced quite different sets of results. As a consequence, there is no guarantee that two teams of researchers attempting to gather the same data at the same time will end up with identical datasets (Bruns, 2013). Further, the choice of metadata and variables that are being generated and which ones are being ignored paint a particular picture (Graham, 2012). With respect to fidelity there are question marks as to the extent to which social media posts really represent peoples’ views and the faith that should be placed on them. Manovich (2011: 6) warns that ‘[p]eoples’ posts, tweets, uploaded photographs, comments, and other types of online participation are not transparent windows into their selves; instead, they are often carefully curated and systematically managed’.

There are also issues of access to both small and big data. Small data produced by academia, public institutions, non-governmental organizations and private entities can be restricted in access, limited in use to defined personnel, or available for a fee or under license. Increasingly, however, public institution and academic data are becoming more open. Big data are, with a few exceptions such as satellite imagery and national security and policing, mainly produced by the private sector. Access is usually restricted behind pay walls and proprietary licensing, limited to ensure competitive advantage and to leverage income through their sale or licensing (CIPPIC, 2006). Indeed, it is somewhat of a paradox that only a handful of entities are drowning in the data deluge (boyd and Crawford, 2012) and companies such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms are under no obligations to share freely the data they collect through their operations. In some cases, a limited amount of the data might be made available to researchers or the public through Application Programming Interfaces (APIs). For example, Twitter allows a few companies to access its firehose (stream of data) for a fee for commercial purposes (and have the latitude to dictate terms with respect to what can be done with such data), but with a handful of exceptions researchers are restricted to a ‘gardenhose’ (c. 10 percent of public tweets), a ‘spritzer’ (c. one percent of public tweets), or to different subsets of content (‘white-listed’ accounts), with private and protected tweets excluded in all cases (boyd and Crawford, 2012). The worry is that the insights that privately owned and commercially sold big data can provide will be limited to a privileged set of academic researchers whose findings cannot be replicated or validated (Lazer et al., 2009).

Given the relative strengths and limitations of big and small data it is fair to say that small data studies will continue to be an important element of the research landscape, despite the benefits that might accrue from using big data such as social media data. However, it should be noted that small data studies will increasingly come under pressure to utilize the new archiving technologies, being scaled-up within digital data infrastructures in order that they are preserved for future generations, become accessible to re-use and combination with other small and big data, and more value and insight can be extracted from them through the application of big data analytics.

Rob Kitchin

New paper: Algorhythmic governance: Regulating the ‘heartbeat’ of a city using the Internet of Things

Claudio Coletta and Rob Kitchin have published a new Programmable City working paper (No. 22) – Algorhythmic governance: Regulating the ‘heartbeat’ of a city using the Internet of Things – which is due to be delivered at the Algorithms in Culture workshop at the University of California Berkeley, 1-2 December 2016.

It can be downloaded from: OSF, ResearchGate, Academia

Abstract

To date, research examining the socio-spatial effects of smart city technologies have charted how they are reconfiguring the production of space, spatiality and mobility, and how urban space is governed, but have paid little attention to how the temporality of cities is being reshaped by systems and infrastructure that capture, process and act on real-time data. In this paper, we map out the ways in which city-scale Internet of Things infrastructures, and their associated networks of sensors, meters, transponders, actuators and algorithms, are used to measure, monitor and regulate the polymorphic temporal rhythms of urban life. Drawing on Lefebvre (1992[2004]), and subsequent research, we employ rhythmanalysis in conjunction with Miyazaki’s (2012, 2013a/b) notion of ‘algorhythm’ and nascent work on algorithmic governance, to develop a concept of ‘algorhythmic governance’. We then use this framing to make sense of two empirical case studies: a traffic management system and sound monitoring and modelling. Our analysis reveals: (1) how smart city technologies computationally perform rhythmanalysis and undertake rhythm-work that intervenes in space-time processes; (2) three distinct forms of algorhythmic governance, varying on the basis of adaptiveness, immediacy of action, and whether humans are in, on-, of-, off-the-loop; (3) and a number of factors that shape how algorhythmic governance works in practice.

Key words: algorhythm, algorithmic governance, rhythmanalysis, Internet of Things, smart cities, time geography

 

 

New paper on frictions in civic hacking

Drawing on postcolonial technoscience and particularly the notion of ‘frictions’, Sung-Yueh Perng and Rob Kitchin analyse how solutions are worked up, challenged and changed in civic hacking events. The paper is published in Social & Cultural Geography and is entitled Solutions and frictions in civic hacking: collaboratively designing and building wait time predictions for an immigration office. There are still eprints available for free via the link: http://www.tandfonline.com/eprint/SSWBCcCech3hezdgIFZp/full. For more details about the paper, the abstract is pasted below.

Abstract: Smart and data-driven technologies seek to create urban environments and systems that can operate efficiently and effortlessly. Yet, the design and implementation of such technical solutions are full of frictions, producing unanticipated consequences and generating turbulence that foreclose the creation of friction-free city solutions. In this paper, we examine the development of solutions for wait time predictions in the context of civic hacking to argue that a focus on frictions is important for establishing a critical understanding of innovation for urban everyday life. The empirical study adopted an ethnographically informed mobile methods approach to follow how frictions emerge and linger in the design and production of queue predictions developed through the civic hacking initiative, Code for Ireland. In so doing, the paper charts how solutions have to be worked up and strategies re-negotiated when a shared motivation meets different data sources, technical expertise, frames of understanding, urban imaginaries and organisational practices; and how solutions are contingently stabilised in technological, motivational, spatiotemporal and organisational specificities rather than unfolding in a smooth, linear, progressive trajectory.

New paper in Geoforum – The praxis and politics of building urban dashboards

Rob Kitchin, Sophia Maalsen and Gavin McArdle have a new paper published in Geoforum titled ‘The praxis and politics of building urban dashboards’.  It is open access with this link until early Dec.

Abstract: This paper critically reflects on the building of the Dublin Dashboard – a website built by two of the authors that provides citizens, planners, policy makers and companies with an extensive set of data and interactive visualizations about Dublin City, including real-time information – from the perspective of critical data studies. The analysis draws upon participant observation, ethnography, and an archive of correspondence to unpack the building of the dashboard and the emergent politics of data and design. Our findings reveal four main observations. First, a dashboard is a complex socio-technical assemblage of actors and actants that work materially and discursively within a set of social and economic constraints, existing technologies and systems, and power geometries to assemble, produce and maintain the website. Second, the production and maintenance of a dashboard unfolds contextually, contingently and relationally through transduction. Third, the praxis and politics of creating a dashboard has wider recursive effects: just as building the dashboard was shaped by the wider institutional landscape, producing the system inflected that landscape. Fourth, the data, configuration, tools, and modes of presentation of a dashboard produce a particularised set of spatial knowledges about the city. We conclude that rather than frame dashboard development in purely technical terms, it is important to openly recognize their contested and negotiated politics and praxis.

Big data and the city

A special issue of ‘Built Environment’ – Big Data and the City – edited by Mike Batty has just been published and includes a paper by Gavin McArdle and Rob Kitchin on improving the veracity of open and real-time urban data.  Full details of contents below:

  • Editorial: Big Data, Cities and Herodotus by MICHAEL BATTY
  • Big Data and the City by MICHAEL BATTY
  • From Origins to Destinations: The Past, Present and Future of Visualizing Flow Maps by MATTHEW CLAUDEL, TILL NAGEL, and CARLO RATTI
  • Towards a Better Understanding of Cities Using Mobility Data by MAXIME LENORMAND and JOSÉ J. RAMASCO
  • Finding Pearls in London’s Oysters by JON READES, CHEN ZHONG, ED MANLEY, RICHARD MILTON and MICHAEL BATTY
  • A Classification of Multidimensional Open Data for Urban Morphology by ALEXANDROS ALEXIOU, ALEX SINGLETON, and PAUL A. LONGLEY
  • User-Generated Big Data and Urban Morphology by A.T. CROOKS, A. CROITORU, A. JENKINS, R. MAHABIR, P. AGOURIS and A. STEFANIDIS
  • Sensing Spatiotemporal Patterns in Urban Areas: Analytics and Visualizations Using the Integrated Multimedia City Data Platform by PIYUSHIMITA (VONU) THAKURIAH, KATARZYNA SILA-NOWICKA, and JORGE GONZALEZ PAULE
  • Playful Cities: Crowdsourcing Urban Happiness with Web Games by DANIELE QUERCIA
  • Big Data for Healthy Cities: Using Location-Aware Technologies, Open Data and 3D Urban Models to Design Healthier Built Environment by HARVEY J. MILLER and KRISTIN TOLLE
  • Improving the Veracity of Open and Real-Time Urban Data by GAVIN MCARDLE and ROB KITCHIN
  • Wise Cities: ‘Old’ Big Data and ‘Slow’ Real Time by FABIO CARRERA
  • Collecting and Visualizing Real-Time Urban Data Through City Dash-Boards by STEVEN GRAY, OLIEVER O’BRIEN and STEPHAN HÜGEL

New paper: Urban data and city dashboards: Six key issues

Rob Kitchin and Gavin McArdle have published a new Programmable City working paper (no. 21) – Urban data and city dashboards: Six key issues – on SocArXiv today.  It is a pre-print of a chapter that will be published in Kitchin, R., Lauriault, T.P. and McArdle, G. (eds) (forthcoming) Data and the City. Routledge, London..

Abstract

This chapter considers the relationship between data and the city by critically examining six key issues with respect city dashboards: epistemology, scope and access, veracity and validity, usability and literacy, use and utility, and ethics.  While city dashboards provide useful tools for evaluating and managing urban services, understanding and formulating policy, and creating public knowledge and counter-narratives, our analysis reveals a number of conceptual and practical shortcomings.  In order for city dashboards to reach their full potential we advocate a number of related shifts in thinking and praxes and forward an agenda for addressing the issues we highlight.  Our analysis is informed by our endeavours in building the Dublin Dashboard.

Key words: dashboards, cities, access, epistemology, ethics, open data, scope, usability, utility, veracity, validity