Tag Archives: big data

New paper in Big Data and Society: What makes big data, big data?

Rob Kitchin and Gavin McArdle have a new paper – What makes big data, big data? Exploring the ontological characteristics of 26 datasets – published in Big Data and Society.

Abstract

Big Data has been variously defined in the literature. In the main, definitions suggest that Big Data possess a suite of key traits: volume, velocity and variety (the 3Vs), but also exhaustivity, resolution, indexicality, relationality, extensionality and scalability. However, these definitions lack ontological clarity, with the term acting as an amorphous, catch-all label for a wide selection of data. In this paper, we consider the question ‘what makes Big Data, Big Data?’, applying Kitchin’s taxonomy of seven Big Data traits to 26 datasets drawn from seven domains, each of which is considered in the literature to constitute Big Data. The results demonstrate that only a handful of datasets possess all seven traits, and some do not possess either volume and/or variety. Instead, there are multiple forms of Big Data. Our analysis reveals that the key definitional boundary markers are the traits of velocity and exhaustivity. We contend that Big Data as an analytical category needs to be unpacked, with the genus of Big Data further delineated and its various species identified. It is only through such ontological work that we will gain conceptual clarity about what constitutes Big Data, formulate how best to make sense of it, and identify how it might be best used to make sense of the world.

The paper is available for download as a PDF here.

Robinson Crusoe dreams of big data

I’ve just come across a very nice passage from Michael Tournier’s 1967 novel, Friday; or, The Other Island (a retelling of the Daniel Defoe’s Robinson Crusoe), which seems to capture perfectly the desire of big data projects and I thought worth sharing:

I demand, I insist, that everything around me shall henceforth be measured, tested, certified, mathematical, and rational. One of my tasks must be to make a full survey of the island, its
distances and its contours, and incorporate all these details in an accurate surveyor’s map. I should like every plant to be labeled, every bird to be ringed, every animal to be branded. I
shall not be content until this opaque and impenetrable place, filled with secret ferments and malignant stirrings, has been transformed into a calculated design, visible and intelligible to
its very depths!

I discovered it in Anne Galloway and Matthew Ward’s piece: Locative Media as Socialising and Spatialising Practices: Learning from Archaeology.

New paper: Locative media and data-driven computing experiments

Sung-Yueh Perng, Rob Kitchin and Leighton Evans have published a new paper entitled ‘Locative media and data-driven computing experiments‘ available as Programmable City Working Paper 16 on SSRN.

Abstract

Over the past two decades urban social life has undergone a rapid and pervasive geocoding, becoming mediated, augmented and anticipated by location-sensitive technologies and services that generate and utilise big, personal, locative data. The production of these data has prompted the development of exploratory data-driven computing experiments that seek to find ways to extract value and insight from them. These projects often start from the data, rather than from a question or theory, and try to imagine and identify their potential utility. In this paper, we explore the desires and mechanics of data-driven computing experiments. We demonstrate how both locative media data and computing experiments are ‘staged’ to create new values and computing techniques, which in turn are used to try and derive possible futures that are ridden with unintended consequences. We argue that using computing experiments to imagine potential urban futures produces effects that often have little to do with creating new urban practices. Instead, these experiments promote big data science and the prospect that data produced for one purpose can be recast for another, and act as alternative mechanisms of envisioning urban futures.

Keywords: Data analytics, computing experiments, locative media, location-based social network (LBSN), staging, urban future, critical data studies

The paper is available for download here.

Data and the City workshop, Session 4 videos

We’re back again with another set of papers from The Programmable City’s Data and the City workshop. These videos formed a wonderfuld opening session for the second day of the event.

Data Models and the City

Service Oriented Design and Polyglot Binding for Efficient Sharing and Analysing of Data in Cities

Pouria Amirian, Big Data Project Manager and Data Science Research Associate, University of Oxford

Abstract
Nowadays successful and efficient management of a city depends on how data are collected, shared and transferred within and between various organizations in the city and how data analytics are used for extracting actionable insights for decision making. Since each organization use different platforms, operating systems and software for the above mentioned tasks, data sharing mechanisms should be provided as platform independent services. This platform independent services can then utilized by various users for different purposes. For example for research purpose of universities, for business purposes of industry and commercial companies, for improving the existing services by city council and related organizations and even for facilitating communication between people and policy makers. Platform independency is necessary quality of services for providing interoperability from technical point of view. The interoperability at various levels is an important requirement and vision for public services and it is well defined in initiatives like European Interoperability Framework (EIF) and many national interoperability frameworks. Based on the mentioned frameworks, exchange of data is an ultimate enabler for sharing information and knowledge between organizations.

In addition to platform independency, in order to make the services as resourceful as possible the services need to be designed based on certain principles. The principles for designing services are dependent on the type of applications and users of those services. This paper first describes the concept of service orientation and then explains three different approaches for sharing data and analysis in a city. Finally the paper suggest an architecture (Organizational Service Layer) to implement polyglot binding for flexible, scalable and interoperable implementation of services in a city.

Data About Cities: Redefining Big, Recasting Small

Michael Batty, Professor, Centre for Advanced Spatial Analysis (CASA), University College London

Abstract
In this paper, we argue that the development of data with respect to its use in understanding and planning cities is intimately bound up with the development of methods for manipulating such data, in particular digital computation. We argue that although data volumes have dramatically increased as has their variety in urban contexts largely due to the development of micro devices that enable all kinds of human and physical phenomena to be sensed in real time, big data is not peculiar to contemporary times. It essentially goes back to basic notions of how we deal with relationships and functions in cities that relate to interactions. Big data is thus generated by concatenating smaller data sets and in particular if we change our focus from locations to interactions and flows, then data has faced the challenges of bigness for many years. This should make us more careful about defining what is ‘big data’ and to illustrate these points, we first look at traditional interaction patterns – flows of traffic in cities and show some of the problems of searching for pattern in such data. We then augment this discussion of big data by examining much more routine travel data which is sensed from using smart cards for fare-charging and relating this to questions of matching demand and supply in the context of understanding the routine operation of transit. This gives us some sense of the variety of big data and the challenges that are increasingly necessary in dealing with this kind of data in the face of advances in digital computation.

Putting Out Data Fires; life with the OpenStreetMap DWG

Jo Walsh, Registers of Scotland

Abstract
OpenStreetMap is a collaborative map of the world, being made on a voluntary basis, and the Data Working Group is its dispute resolution service. Edit wars and tagging conflicts are not frequent, and are often dealt with on a community basis, but when they escalate unbearably, someone calls in the DWG. The DWG operates simultaneously as a kind of police force and as the social work arm of the voluntary fire service for OpenStreetMap. I have had the honour of serving on the DWG since November 2014, and will discuss how consideration several cases of active conflict in different cities worldwide, sheds some light on the different forces at work involved in putting together a collaborative map, and the ways in which people are personally affected. The tone of the paper will owe a little to Bruno Latour’s classic infrastructure detective story, “Aramis”.

New paper: The diverse nature of big data

Rob Kitchin and Gavin McArdle have published a new paper entitled ‘The diverse nature of big data‘ available as Programmable City Working Paper 15 on SSRN.

Abstract:  Big data has been variously defined in the literature. In the main, definitions suggest that big data are those that possess a suite of key traits: volume, velocity and variety (the 3Vs), but also exhaustivity, resolution, indexicality, relationality, extensionality and scalability. However, these definitions lack ontological clarity, with the term acting as an amorphous, catch-all label for a wide selection of data. In this paper, we consider the question ‘what makes big data, big data?’, applying Kitchin’s (2013, 2014) taxonomy of seven big data traits to 26 datasets drawn from seven domains, each of which is considered in the literature to constitute big data. The results demonstrate that only a handful of datasets possess all seven traits, and some do not possess either volume and/or variety. Instead, there are multiple forms of big data. Our analysis reveals that the key definitional boundary markers are the traits of velocity and exhaustivity. We contend that big data as an analytical category needs to be unpacked, with the genus of big data further delineated and its various species identified. It is only through such ontological work that we will gain conceptual clarity about what constitutes big data, formulate how best to make sense of it, and identify how it might be best used to make sense of the world.

Key words: big data, ontology, taxonomy, types, characteristics

Download paper

Data and the City workshop, Session 1 videos

Thank you to everyone who attended our 2015 workshop Data and the City in early September. It was a fantastic event. Over the next few days we will make the video recordings of the presentations available online.

Here is the introduction to the workshop and the first session, Critically Framing Data.

Opening talk

Data-driven, networked urbanism

Rob Kitchin, NIRSA, Maynooth University

Abstract
For as long as data have been generated about cities various kinds of data-informed urbanism have been occurring. In this paper, I argue that a new era is presently unfolding wherein data-informed urbanism is increasingly being complemented and replaced by data-driven, networked urbanism. Cities are becoming ever more instrumented and networked, their systems interlinked and integrated, and vast troves of big urban data are being generated and used to manage and control urban life in real-time. Data-driven, networked urbanism, I contend, is the key mode of production for what have widely been termed smart cities. In this paper I provide a critical overview of data-driven, networked urbanism and smart cities focusing in particular on the relationship between data and the city (rather than network infrastructure or computational or urban issues), and critically examine a number of urban data issues including: the politics of urban data; data ownership, data control, data coverage and access; data security and data integrity; data protection and privacy, dataveillance, and data uses such as social sorting and anticipatory governance; and technical data issues such as data quality, veracity of data models and data analytics, and data integration and interoperability. I conclude that whilst data-driven, networked urbanism purports to produce a commonsensical, pragmatic, neutral, apolitical, evidence-based form of responsive urban governance, it is nonetheless selective, crafted, flawed, normative and politically-inflected. Consequently, whilst data-driven, networked urbanism provides a set of solutions for urban problems, it does so within limitations and in the service of particular interests.

Session 1: Critically Framing Data

Provenance and Possibility: Critically Framing Data

Jim Thatcher, ssistant Professor, Division of Urban Studies, University of Washington – Tacoma

Abstract
‘Big data’s’ boosters present a mythology wherein it is perpetually new, pushing ever-forwards towards bigger and better representations of the world. Similarly, the vision of the “smart city” is inevitably an ahistorical imaginary of data-intense urban planning and coordination. Data sources, actually existing data, appear in the literature as uncritical, pre-existing, and decontextualized representations of the world to be exploited in service to a techno-utopian urban. The sources of data, its provenance, recede into a technical issue: one in a litany of hurdles to be overcome through austere, computational methodologies. Such technical approaches to provenance efface the intentionality of data creators. They leave out the inscription of meaning that goes into data objects as socio-technical, emergent indicators at urban scales and instead seats them as objective reality. Using a critical data studies approach, this paper connects and contextualizes the conditions of data’s production with its potential uses in the world. Data provenance, the where, how, and from whom data is produced, is intrinsically linked to how it comes to (re)present the world. This exploration serves as the first steps in developing a schema that includes mobile devices, municipal services, and other categories and tie that to the historical ideologies through which these technologies emerged in urban contexts.

Where are data citizens?

Evelyn Ruppert, Professor, Department of Sociology, Goldsmiths, University of London

Abstract
If we increasingly know, experience and enact cities through data then we need to understand who are the subjects of that data and the space of relations they occupy. The development of the Internet of Things (IoT) means phones, watches, dishwashers, fridges, cars, and many other devices are always already connected to the Internet and generating enormous volumes of data about movements, locations, activities, interests, encounters, and private and public relationships. It also means that conduct is being governed through myriad arrangements and conventions of the Internet. What does this mean for how data subjects become data citizens? If indeed through the act of making claims data subjects become citizens how do we understand the spaces of this becoming? Challenging a separation between ‘real’ space and ‘virtual’ space, I define cyberspace as a space of social struggles: a space of transactions and interactions between and among bodies acting through the Internet. How these struggles are part-and-parcel of the constitution of the programmable city is the critical framing that I take up in this paper.

Unfortunately no video recording of this paper was made. Audio should be made available shortly.

Data cultures, power and the city

Jo Bates, Lecturer in Information Politics and Policy, Information School, University of Sheffield

Abstract
How might we come to know a city through data? As citizens, policy makers, academics and businesses turn increasingly to data analytics in an effort to gain insight into and manage cities, this paper argues that rather than seeking to find the truth of cities in their data, we might better illuminate the flows of power and influence in the contemporary urban environment through close critical examination of these emerging, intersecting local data cultures and practices.

What value and relevance do local data cultures see in emergent data practices? How do they come to influence and shape them? What are their hopes, aspirations, concerns and fears? What tensions and struggles are emerging? Where and how do these local practices intersect with one another? How are they embedded within and responding to developments in national and transnational data practices, infrastructures and flows?

Through focusing on the complex and contested “assemblages” of political, economic, social and cultural processes that data production and flow are embedded in, and recognising local data practices as specific articulations of social relations situated within time and space, what can we learn about our cities and how they are situated within the global flows of capital and power in the early 21st century?

This paper will begin to address these questions using illustrative examples drawn from empirical research findings.

The remainder of the videos will be released on our website in coming weeks. If you just can’t wait, you can check them out now on our Vimeo account.