Author Archives: Rob Kitchin

The limits of social media big data

handbook social media researchA new book chapter by Rob Kitchin has been published in The Sage Handbook of Social Media Research Methods edited by Luke Sloan and Anabel Quan-Haase. The chapter is titled ‘Big data – hype or revolution’ and provides a general introduction to big data, new epistemologies and data analytics, with the latter part focusing on social media data.  The text below is a sample taken from a section titled ‘The limits of social media big data’.

The discussion so far has argued that there is something qualitatively different about big data from small data and that it opens up new epistemological possibilities, some of which have more value than others. In general terms, it has been intimated that big data does represent a revolution in measurement that will inevitably lead to a revolution in how academic research is conducted; that big data studies will replace small data ones. However, this is unlikely to be the case for a number of reasons.

Whilst small data may be limited in volume and velocity, they have a long history of development across science, state agencies, non-governmental organizations and business, with established methodologies and modes of analysis, and a record of producing meaningful answers. Small data studies can be much more finely tailored to answer specific research questions and to explore in detail and in-depth the varied, contextual, rational and irrational ways in which people interact and make sense of the world, and how processes work. Small data can focus on specific cases and tell individual, nuanced and contextual stories.

Big data is often being repurposed to try and answer questions for which it was never designed. For example, geotagged Twitter data have not been produced to provide answers with respect to the geographical concentration of language groups in a city and the processes driving such spatial autocorrelation. We should perhaps not be surprised then that it only provides a surface snapshot, albeit an interesting snapshot, rather than deep penetrating insights into the geographies of race, language, agglomeration and segregation in particular locales. Moreover, big data might seek to be exhaustive, but as with all data they are both a representation and a sample. What data are captured is shaped by: the field of view/sampling frame (where data capture devices are deployed and what their settings/parameters are; who uses a space or media, e.g., who belongs to Facebook); the technology and platform used (different surveys, sensors, lens, textual prompts, layout, etc. all produce variances and biases in what data are generated); the context in which data are generated (unfolding events mean data are always situated with respect to circumstance); the data ontology employed (how the data are calibrated and classified); and the regulatory environment with respect to privacy, data protection and security (Kitchin, 2013, 2014a). Further, big data generally capture what is easy to ensnare – data that are openly expressed (what is typed, swiped, scanned, sensed, etc.; people’s actions and behaviours; the movement of things) – as well as data that are the ‘exhaust’, a by-product, of the primary task/output.

Small data studies then mine gold from working a narrow seam, whereas big data studies seek to extract nuggets through open-pit mining, scooping up and sieving huge tracts of land. These two approaches of narrow versus open mining have consequences with respect to data quality, fidelity and lineage. Given the limited sample sizes of small data, data quality – how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; veracity – the authenticity of the data and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to; and lineage – documentation that establishes provenance and fit for use; are of paramount importance (Lauriault, 2012). In contrast, it has been argued by some that big data studies do not need the same standards of data quality, veracity and lineage because the exhaustive nature of the dataset removes sampling biases and more than compensates for any errors or gaps or inconsistencies in the data or weakness in fidelity (Mayer-Schonberger and Cukier, 2013). The argument for such a view is that ‘with less error from sampling we can accept more measurement error’ (p.13) and ‘tolerate inexactitude’ (p. 16).

Nonetheless, the warning ‘garbage in, garbage out’ still holds. The data can be biased due to the demographic being sampled (e.g., not everybody uses Twitter) or the data might be gamed or faked through false accounts or hacking (e.g., there are hundreds of thousands of fake Twitter accounts seeking to influence trending and direct clickstream trails) (Bollier, 2010; Crampton et al., 2012). Moreover, the technology being used and their working parameters can affect the nature of the data. For example, which posts on social media are most read or shared are strongly affected by ranking algorithms not simply interest (Baym, 2013). Similarly, APIs structure what data are extracted, for example, in Twitter only capturing specific hashtags associated with an event rather than all relevant tweets (Bruns, 2013), with González-Bailón et al. (2012) finding that different methods of accessing Twitter data – search APIs versus streaming APIs – produced quite different sets of results. As a consequence, there is no guarantee that two teams of researchers attempting to gather the same data at the same time will end up with identical datasets (Bruns, 2013). Further, the choice of metadata and variables that are being generated and which ones are being ignored paint a particular picture (Graham, 2012). With respect to fidelity there are question marks as to the extent to which social media posts really represent peoples’ views and the faith that should be placed on them. Manovich (2011: 6) warns that ‘[p]eoples’ posts, tweets, uploaded photographs, comments, and other types of online participation are not transparent windows into their selves; instead, they are often carefully curated and systematically managed’.

There are also issues of access to both small and big data. Small data produced by academia, public institutions, non-governmental organizations and private entities can be restricted in access, limited in use to defined personnel, or available for a fee or under license. Increasingly, however, public institution and academic data are becoming more open. Big data are, with a few exceptions such as satellite imagery and national security and policing, mainly produced by the private sector. Access is usually restricted behind pay walls and proprietary licensing, limited to ensure competitive advantage and to leverage income through their sale or licensing (CIPPIC, 2006). Indeed, it is somewhat of a paradox that only a handful of entities are drowning in the data deluge (boyd and Crawford, 2012) and companies such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms are under no obligations to share freely the data they collect through their operations. In some cases, a limited amount of the data might be made available to researchers or the public through Application Programming Interfaces (APIs). For example, Twitter allows a few companies to access its firehose (stream of data) for a fee for commercial purposes (and have the latitude to dictate terms with respect to what can be done with such data), but with a handful of exceptions researchers are restricted to a ‘gardenhose’ (c. 10 percent of public tweets), a ‘spritzer’ (c. one percent of public tweets), or to different subsets of content (‘white-listed’ accounts), with private and protected tweets excluded in all cases (boyd and Crawford, 2012). The worry is that the insights that privately owned and commercially sold big data can provide will be limited to a privileged set of academic researchers whose findings cannot be replicated or validated (Lazer et al., 2009).

Given the relative strengths and limitations of big and small data it is fair to say that small data studies will continue to be an important element of the research landscape, despite the benefits that might accrue from using big data such as social media data. However, it should be noted that small data studies will increasingly come under pressure to utilize the new archiving technologies, being scaled-up within digital data infrastructures in order that they are preserved for future generations, become accessible to re-use and combination with other small and big data, and more value and insight can be extracted from them through the application of big data analytics.

Rob Kitchin

New paper: Algorhythmic governance: Regulating the ‘heartbeat’ of a city using the Internet of Things

Claudio Coletta and Rob Kitchin have published a new Programmable City working paper (No. 22) – Algorhythmic governance: Regulating the ‘heartbeat’ of a city using the Internet of Things – which is due to be delivered at the Algorithms in Culture workshop at the University of California Berkeley, 1-2 December 2016.

It can be downloaded from: OSF, ResearchGate, Academia

Abstract

To date, research examining the socio-spatial effects of smart city technologies have charted how they are reconfiguring the production of space, spatiality and mobility, and how urban space is governed, but have paid little attention to how the temporality of cities is being reshaped by systems and infrastructure that capture, process and act on real-time data. In this paper, we map out the ways in which city-scale Internet of Things infrastructures, and their associated networks of sensors, meters, transponders, actuators and algorithms, are used to measure, monitor and regulate the polymorphic temporal rhythms of urban life. Drawing on Lefebvre (1992[2004]), and subsequent research, we employ rhythmanalysis in conjunction with Miyazaki’s (2012, 2013a/b) notion of ‘algorhythm’ and nascent work on algorithmic governance, to develop a concept of ‘algorhythmic governance’. We then use this framing to make sense of two empirical case studies: a traffic management system and sound monitoring and modelling. Our analysis reveals: (1) how smart city technologies computationally perform rhythmanalysis and undertake rhythm-work that intervenes in space-time processes; (2) three distinct forms of algorhythmic governance, varying on the basis of adaptiveness, immediacy of action, and whether humans are in, on-, of-, off-the-loop; (3) and a number of factors that shape how algorhythmic governance works in practice.

Key words: algorhythm, algorithmic governance, rhythmanalysis, Internet of Things, smart cities, time geography

 

 

New paper: The ethics of smart cities and urban science

A new paper by Rob Kitchin has been published in Philosophical Transactions A titled ‘The ethics of smart cities and urban science’ in a special issue on ‘The ethical impact of data science’.

Abstract

Software-enabled technologies and urban big data have become essential to the functioning of cities. Consequently, urban operational governance and city services are becoming highly responsive to a form of data-driven urbanism that is the key mode of production for smart cities. At the heart of data-driven urbanism is a computational understanding of city systems that reduces urban life to logic and calculative rules and procedures, which is underpinned by an instrumental rationality and realist epistemology. This rationality and epistemology are informed by and sustains urban science and urban informatics, which seek to make cities more knowable and controllable. This paper examines the forms, practices and ethics of smart cities and urban science, paying particular attention to: instrumental rationality and realist epistemology; privacy, datafication, dataveillance and geosurveillance; and data uses, such as social sorting and anticipatory governance. It argues that smart city initiatives and urban science need to be re-cast in three ways: a re-orientation in how cities are conceived; a reconfiguring of the underlying epistemology to openly recognize the contingent and relational nature of urban systems, processes and science; and the adoption of ethical principles designed to realize benefits of smart cities and urban science while reducing pernicious effects.

The paper is behind a paywall, so if you don’t have access and you’re interested in reading email Rob (rob.kitchin@nuim.ie) and he’ll send you a copy.

Smart Docklands in a word, and smart city bingo

A couple of weeks ago I published a list of words that members of the Smart Dublin Advisory Network felt represented qualities they hoped Smart Dublin would fulfil.  At a recent meeting about a proposed Smart Docklands initiative attendees were asked to perform the same task – use one word to describe a desirable quality for the area/initiative.  Here is that list of aspiration words:

Co-creation                  Innovation                  Collaboration
Best practice               Showcase                    Testbed
Quality of Life             Community                 Engagement
Smart energy              Telecoms                     Internet of Things
Data                              Open                            Bright
Intelligent                    Optimized                    Autonomous system
Sustainability              Safety                           Resource efficient
Industry                       Startups                       Opportunity
Alignment                    Integrated                   Deploy and forget
Electricity                     Battery                         Energy
Connectivity                Smart mobility

While there is some overlap in the lists, it’s interesting to note the differences between the aspirations expressed at the two meetings.

Here are the words in the Smart Dublin list that are not in the Smart Docklands one:

Networking, Collaborative, Cooperation, Sharing, People, Well-being, Accessible, Diversity, Insight, Problem-solving, Strategic, Joined-up, Agile, Transformative, Future-proofing, International, Socio-technical, Curiosity, Easy

And here are the words in the Smart Docklands list not in the Smart Dublin one:

Co-creation, Best practice, Showcase, Smart energy, Telecoms, Internet of Things, Open, Bright, Intelligent, Optimized, Autonomous system, Resource efficient, Industry, Opportunity, Alignment, Deploy and forget, Electricity, Battery, Energy, Smart mobility

And here is the overlap:

Innovation, Collaboration, Testbed, Quality of Life, Community, Engagement, Data, Sustainability, Safety, Startups, Integrated, Connectivity

Perhaps not unsurprisingly the Smart Docklands list has more economic aspirations, but does still contain ambitions concerning community, engagement, quality of life and sustainability.  Adding the two list together, I sense, provides a kind of ‘smart city bingo’ – a full house of smart city goals.

Thanks for Jamie Cudden and Réka Pétercsák for compiling and sending the Smart Docklands list to me.

Rob Kitchin

Post advertised: Postdoc on ProgCity project

We are seeking a postdoctoral researcher (14 month contract) to join the Programmable City project.  The researcher will critically examine:

  • the political economy of smart city technologies and initiatives; the creation of smart city markets; the inter-relation of urban (re)development and smart city initiatives; the relationship between vendors, business lobby groups, economic development agencies, and city administrations; financialization and new business models; and/or,
  • the relationship between the political geography of city administration, governance arrangements, and smart city initiatives; political and legal geographies of testbed urbanism and smart city initiatives; smart city technologies and governmentality.

There will be some latitude to negotiate with the principal investigator the exact focus of the research undertaken. While some of the research will require primary fieldwork (Dublin/Boston), it is anticipated it will also involve the secondary analysis of data already generated by the project.

More details on the post and how to apply can be found on the university HR website.  Closing date: 5th December.

Smart Dublin – in one word

The first Smart Dublin Advisory Network meeting took place on the 12th October in the Mansion House.  The plan is for the network to meet every six months to help guide the work of Smart Dublin as it develops and implements its strategy and programmes.  The first meeting mainly focused on introducing Smart Dublin and undertaking some initial workshop exercises to brainstorm initial ideas and feedback and to do so preliminary backcasting.  The first task was a quick introduction and for each person to say in one word a quality they hoped Smart Dublin would fulfil.  Here’s a list of those aspirational words – which I have grouped into triplets – a list against which to judge over the next few years how successful Smart Dublin has been.

Connectivity              Networking              Integrated
Collaborative            Cooperation             Sharing
People                       Community              Engagement
Well-being                 Safe                           Quality-of-life
Accessible                 Sustainable              Diversity
Data                           Insight                       Problem-solving
Strategic                    Joined-up                  Agile
Transformative        Future-proofing       International
Innovation                Start-ups                   Testing
Socio-technical        Curiosity                    Easy

Interestingly, efficiency, economy and open – which are three of the four key terms that have to date underpinned Smart Dublin’s work (along with engagement) – were not suggested. Personally, I think it’s a fascinating list in terms of what it prioritizes as key attributes of a successful smart city and it would be interesting to compare this list to other lists produced by stakeholder groups in other cities.  A brief post about the advisory board meeting and the Smart Dublin showcase that followed its first meeting can be found here.

Rob Kitchin