Rob Kitchin’s paper ‘Big data, new epistemologies and paradigm shifts’ has been published in the first edition of a new journal, Big Data and Society, published by Sage. The paper is open access and can be downloaded by clicking here. A video abstract is below. The paper is also accompanied by a blog post ‘Is big data going to radically transform how knowledge is produced across all disciplines of the academy?’ on the Big Data and Society blog.
Abstract
This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology.
Rob Kitchin presented a Bellwether Lecture at the Oxford Institute Institute on February 28th entitled ‘The Real Time City? Big Data and Smart Urbanism’. OII have just uploaded the webcast of the full lecture. The written version of the paper was recently published in Geojournal (visit GeoJournal website or Download). Kitchin, R. (2014) The real-time city? Big data and smart urbanism. GeoJournal 79(1): 1-14.
An extended version of a Programmable City working paper, with two new sections, has been published in GeoJournal (visit GeoJournal website or Download).
Kitchin, R. (2014) The real-time city? Big data and smart urbanism. GeoJournal 79(1): 1-14.
‘Smart cities’ is a term that has gained traction in academia, business and government to describe cities that, on the one hand, are increasingly composed of and monitored by pervasive and ubiquitous computing and, on the other, whose economy and governance is being driven by innovation, creativity and entrepreneurship, enacted by smart people. This paper focuses on the former and, drawing on a number of examples, details how cities are being instrumented with digital devices and infrastructure that produce ‘big data’. Such data, smart city advocates argue enables real-time analysis of city life, new modes of urban governance, and provides the raw material for envisioning and enacting more efficient, sustainable, competitive, productive, open and transparent cities. The final section of the paper provides a critical reflection on the implications of big data and smart urbanism, examining five emerging concerns: the politics of big urban data, technocratic governance and city development, corporatisation of city governance and technological lock-ins, buggy, brittle and hackable cities, and the panoptic city.
Below is the first draft of a 1000 word entry on Big Data by Rob Kitchin for the forthcoming International Encyclopedia of Geography to be published by Wiley-Blackwell and the Association of American Geographers. It sets out a definition of big data, how they are produced and analyzed, and some of their pros and cons. Feedback on the content would be welcome.
Abstract
Big data consist of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis, much of which are spatially and temporally referenced. The generation, processing and storing of such data have been enabled by advances in computing and networking. Insight and value can be extracted from them using new data analytics, including forms of machine learning and visualisation. Proponents contend that big data are reshaping how knowledge is produced, business conducted, and governance enacted. However, there are concerns regarding access to big data, their quality and veracity, and the ethical implications of their use with respect to dataveillance, social sorting, security, and control creep.
Keywords: big data, analytics, visualisation, ethics
Defining big data
The etymology of ‘big data’ can be traced to the mid-1990s, first used to refer to the handling and analysis of massive datasets (Diebold 2012). It is only since 2008, however, that the term has gained traction, becoming a business and industry buzzword. Like many rapidly emerging concepts, big data has been variously defined, but most commentators agree that it differs from what might be termed ‘small data’ with respect to its traits of volume, velocity and variety (Zikopoulos et al., 2012). Traditionally, data have been produced in tightly controlled ways using sampling techniques that limit their scope, temporality and size. Even very large datasets, such as national censuses, have been restricted to generally 30-40 questions and are carried out once every ten years. Advances in computing hardware and software and networking have, however, enabled much wider scope for producing, processing, analyzing and storing massive amounts of diverse data on a continuous basis. Moreover, big data generation strives to be: exhaustive, capturing entire populations or systems (n=all); fine-grained in resolution and uniquely indexical in identification; relational in nature, containing common fields that enable the conjoining of different data sets; and flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly) (boyd and Crawford 2012; Kitchin 2013; Marz and Warren 2012; Mayer-Schonberger and Cukier 2013). Big data thus comprises of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis. For example, in 2012 Wal-Mart was generating more than 2.5 petabytes (250 bytes) of data relating to more than 1 million customer transactions every hour (Open Data Center Alliance 2012), and Facebook was processing 2.5 billion pieces of content (links, comments, etc), 2.7 billion ‘Like’ actions and 300 million photo uploads per day (Constine 2012). Such big data, its proponents argue, enable new forms of knowledge that produce disruptive innovations with respect to how business is conducted and governance enacted. Given that much big data are georeferenced they hold much promise for new kinds of spatial analysis and modelling.
Sources of big data
Big data are produced in three broad ways: through directed, automated and volunteered systems (Kitchin 2013). Directed systems are controlled by a human operator and include CCTV, spatial video and LiDAR scans. Automated systems automatically capture data as an inherent function of the technology and include: the recording of retail purchases at the point of sale; transactions and interactions across digital networks (e.g., sending emails, internet banking); the use of digital devices such as mobile phones that record and communicate the history of their own utilisation; clickstream data that records navigation through a website or app; measurements from sensors embedded into objects or environments; the scanning of machine-readable objects such as transponders and barcodes; and machine to machine interactions across the internet. Volunteered systems rely on users to gift data through uploads and interactions and include engaging in social media (e.g., posting comments, observations, photos to social networking sites such as Facebook) and the crowdsourcing of data wherein users generate data and then contribute them into a common platform (e.g., uploading GPS-traces into OpenStreetMap).
Analyzing big data
Given their volume, variety and velocity, big data present significant analytical challenges that traditional methods — which have been designed to extract insights from scarce and static data — are not well suited. The solution has been the development of a new suite of data analytics that are rooted in research around artificial intelligence and expert systems, and new forms of data visualisation and visual analytics, both of which rely on high powered computing. Data analytics seek to produce machine learning that iteratively evolves an understanding of datasets using computer algorithms, automatically recognizing complex patterns and constructing models that explain and predict such patterns and optimize outcomes (Han et al. 2011). Moreover, since different approaches have their strengths and weaknesses, depending on the type of problem and data, an ensemble approach can be employed that builds multiple solutions using a variety of techniques to model and predict the same phenomena. As such, it becomes possible to apply hundreds of different algorithms to a dataset to ensure that the most illuminating insights are produced. Given the enormous volumes and velocity of big data, visualisation has proven a popular way for both making sense of data and communicating that sense. Visualisation methods seek to reveal the structure, pattern and trends of variables and their interconnections. Tens of thousands of data points can be plotted to reveal a structure that is otherwise hidden (e.g, mapping trends across millions of tweets to see how they vary across people and places) or the real-time dynamics of a phenomenon can be monitored using graphic and spatial interfaces (e.g., the flow of traffic across a city).
Pros and cons of big data
The hype surrounding big data is for good reason. Big data offers the possibility of shifting from data-scarce to data-rich studies of all aspects of the world; narrow to exhaustive samples; static snapshots to dynamic vistas; coarse aggregations to high resolutions; relatively simple models to complex, sophisticated simulations and predictions (Kitchin 2013). More so, big data consist of both qualitative and quantitative data, most of which are spatially and temporally referenced. Big data provides greater breadth, depth, scale, timeliness and are inherently longitudinal in nature. They enable researchers to gain greater insights into various systems. For businesses and government, such data hold the promise of increased productivity, competitiveness, efficiency, effectiveness, utility, sustainability and securitisation and the potential to better manage organisations, leverage value and produce capital, govern people, and create better places (Kitchin 2014).
Big data are not without negative issues, however. For example, most big data are generated by private corporations such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms, none of whom are under any obligations to share freely the data they generate. As such, access to such data is at present limited. There are also concerns with respect to how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; their veracity and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to. Further, big data raise a number of ethical questions concerning the extent to which they facilitate dataveillance (surveillance through data records), infringe on privacy and other human rights, enable social sorting (provide differential treatment to services), pose security concerns with regards to identity theft, and enable control creep wherein data generated for one purpose is used for another (Kitchin 2014).
References
boyd, D. and Crawford, K. (2012) Critical questions for big data. Information, Communication and Society 15(5): 662-679
Constine, J. (2012) How Big Is Facebook’s Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/ (last accessed 28 January 2013)
Diebold, F. (2012) A personal perspective on the origin(s) and development of ‘big data’: The phenomenon, the term, and the discipline. http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf (last accessed 5th February 2013)
Han, J., Kamber, M. and Pei, J. (2011) Data Mining: Concepts and Techniques. 3rd edition. Morgan Kaufmann, Waltham, MA.
Kitchin, R. (2013) Big data and human geography: Opportunities, challenges and risks. Dialogues in Human Geography 3(3) 262–267
Kitchin, R. (2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage, London.
Marz, N. and Warren, J. (2012) Big Data: Principles and Best Practices of Scalable Realtime Data Systems. MEAP edition. Westhampton, NJ: Manning.
Mayer-Schonberger, V. and Cukier, K. (2013) Big Data: A Revolution that will Change How We Live, Work and Think. John Murray, London.
Open Data Center Alliance (2012) Big Data Consumer Guide. Open Data Center Alliance. http://www.opendatacenteralliance.org/docs/Big_Data_Consumer_Guide_Rev1.0.pdf (last accessed 11 February 2013)
Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T. and Lapis, G. (2012) Understanding Big Data. McGraw Hill, New York.
The first Programmable City Working Paper has been published on SSRN, written by Rob Kitchin and Tracey P. Lauriault, and concerns the relationship between small and big data, the scaling-up of small data into data infrastructures, and how to conceptualize and make sense of such infrastructures.
Small data, data infrastructures and big data
Abstract
The production of academic knowledge has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions. It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds. This approach is presently being challenged by the development of big data. Small data studies will, however, continue to be important in the future because of their utility in answering targeted queries. Nevertheless, small data are being made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and re-use, and open them up to combination with big data and analysis using big data analytics. This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples. The final section provides a framework for conceptualizing and making sense of data and data infrastructures.
Key words: big data, small data, data infrastructures, data politics, spatial data infrastructures, cyber-infrastructures, epistemology
3. Experiences as a producer, consumer and observer of open data (Slides, Bio), By Dr Peter Mooney, Environmental Protection Agency (EPA) funded Research Fellow at the Department of Computer Science, NUIM