Monthly Archives: January 2014

Podcast: Experiences as a producer, consumer and observer of open data

Talk presented by Dr Peter Mooney (Environmental Protection Agency (EPA) funded Research Fellow at the Department of Computer Science, NUIM) at the first Programmable City Seminar.

Open data and evidence informed decision making
Wednesday, November 13, 16:00-18:00

Speakers discuss making data accessible to the public and the challenges they face in their multiple roles as citizens, public servants and researchers.

Topics include: Citizen science, civic engagement, open data portals, Apps, hackathons, & crowdsourcing vs authoritative processes

Podcast: Open Government Data: The Fingal Story

Talk presented by Dominic Byrne (Head of Information Technology with Fingal County Council) at the first Programmable City Seminar.

Open data and evidence informed decision making
Wednesday, November 13, 16:00-18:00

Speakers discuss making data accessible to the public and the challenges they face in their multiple roles as citizens, public servants and researchers.

Topics include: Citizen science, civic engagement, open data portals, Apps, hackathons, & crowdsourcing vs authoritative processes

Podcast: An Open Data Story

Talk presented by Dr Tracey P. Lauriault (Programmable City Project, NUIM) at the first Programmable City Seminar.

Open data and evidence informed decision making
Wednesday, November 13, 16:00-18:00

Speakers discuss making data accessible to the public and the challenges they face in their multiple roles as citizens, public servants and researchers.

Topics include: Citizen science, civic engagement, open data portals, Apps, hackathons, & crowdsourcing vs authoritative processes

Seminar 3: Sustainable Connected Cities and the London Living Labs Project

All!  Seminar 2, Coding Play/Crafting Code in the City is this Wednesday, John Hume Board Room at 4PM and already we are getting ready for Seminar 3.

ProgCitySeminar3-poster-FINAL

The Programmable City Project is happy to welcome Dr David Prendergast who will discuss Sustainable Connected Cities and the London Living Labs Project.

Time: 16:00 – 18:00, Wednesday, 19 February, 2014

Venue: Room 2.31, 2nd Floor Iontas Building, North Campus NUI Maynooth (Map)

Abstract: Cities offer many opportunities to innovate with technologies, from the infrastructures that underlie the sewers, to computing in the cloud. How though can we integrate the technological, economic and social needs of cities in ways that are sustainable and human-centred? How do we inform, develop and evaluate systems and services that enhance the quality of city life for diverse publics? This talk discusses the approach taken by the Intel Collaborative Research Institute for Sustainable Connected Cities and provides an overview of key projects including the ambitious London Living Labs programme conducted in association with the UK Future Cities Catapult.

Bio: Dr David Prendergast is a social anthropologist and a Principal Investigator in the Intel Collaborative Research Institute for Sustainable Connected Cities with Imperial College and University College London. He also holds the position of Visiting Professor of Healthcare Innovation at Trinity College Dublin. His research over the last fifteen years has focused on later life-course transitions and he has authored a number of books and articles on ageing, health, technology and social relationships. During his career David has been involved in several major research projects including: a multi-year ethnography of intergenerational relationships and family change in South Korea; the provision of paid home care services in Ireland; a three year ESRC study into death, dying and bereavement in England and Scotland; and Intel’s Global Ageing Project which explored the expectations and experiences of growing older around the world. After receiving his PhD from Cambridge University, Dr Prendergast held research posts at the University of Sheffield, and Trinity College Dublin.

Big data: draft of encyclopedia entry

Below is the first draft of a 1000 word entry on Big Data by Rob Kitchin for the forthcoming International Encyclopedia of Geography to be published by Wiley-Blackwell and the Association of American Geographers.  It sets out a definition of big data, how they are produced and analyzed, and some of their pros and cons.  Feedback on the content would be welcome.

Abstract

Big data consist of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis, much of which are spatially and temporally referenced.  The generation, processing and storing of such data have been enabled by advances in computing and networking.  Insight and value can be extracted from them using new data analytics, including forms of machine learning and visualisation.  Proponents contend that big data are reshaping how knowledge is produced, business conducted, and governance enacted.  However, there are concerns regarding access to big data, their quality and veracity, and the ethical implications of their use with respect to dataveillance, social sorting, security, and control creep.

Keywords: big data, analytics, visualisation, ethics

Defining big data

The etymology of ‘big data’ can be traced to the mid-1990s, first used to refer to the handling and analysis of massive datasets (Diebold 2012).  It is only since 2008, however, that the term has gained traction, becoming a business and industry buzzword.  Like many rapidly emerging concepts, big data has been variously defined, but most commentators agree that it differs from what might be termed ‘small data’ with respect to its traits of volume, velocity and variety  (Zikopoulos et al., 2012).  Traditionally, data have been produced in tightly controlled ways using sampling techniques that limit their scope, temporality and size.  Even very large datasets, such as national censuses, have been restricted to generally 30-40 questions and are carried out once every ten years.  Advances in computing hardware and software and networking have, however, enabled much wider scope for producing, processing, analyzing and storing massive amounts of diverse data on a continuous basis.  Moreover, big data generation strives to be: exhaustive, capturing entire populations or systems (n=all); fine-grained in resolution and uniquely indexical in identification; relational in nature, containing common fields that enable the conjoining of different data sets; and flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly) (boyd and Crawford 2012; Kitchin 2013; Marz and Warren 2012; Mayer-Schonberger and Cukier 2013).  Big data thus comprises of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis.  For example, in 2012 Wal-Mart was generating more than 2.5 petabytes (250 bytes) of data relating to more than 1 million customer transactions every hour (Open Data Center Alliance 2012), and Facebook was processing 2.5 billion pieces of content (links, comments, etc), 2.7 billion ‘Like’ actions and 300 million photo uploads per day (Constine 2012).  Such big data, its proponents argue, enable new forms of knowledge that produce disruptive innovations with respect to how business is conducted and governance enacted.  Given that much big data are georeferenced they hold much promise for new kinds of spatial analysis and modelling.

Sources of big data

Big data are produced in three broad ways: through directed, automated and volunteered systems (Kitchin 2013).  Directed systems are controlled by a human operator and include CCTV, spatial video and LiDAR scans.  Automated systems automatically capture data as an inherent function of the technology and include: the recording of retail purchases at the point of sale; transactions and interactions across digital networks (e.g., sending emails, internet banking); the use of digital devices such as mobile phones that record and communicate the history of their own utilisation; clickstream data that records navigation through a website or app; measurements from sensors embedded into objects or environments; the scanning of machine-readable objects such as transponders and barcodes; and machine to machine interactions across the internet.  Volunteered systems rely on users to gift data through uploads and interactions and include engaging in social media (e.g., posting comments, observations, photos to social networking sites such as Facebook) and the crowdsourcing of data wherein users generate data and then contribute them into a common platform (e.g., uploading GPS-traces into OpenStreetMap).

Analyzing big data

Given their volume, variety and velocity, big data present significant analytical challenges that traditional methods — which have been designed to extract insights from scarce and static data — are not well suited.  The solution has been the development of a new suite of data analytics that are rooted in research around artificial intelligence and expert systems, and new forms of data visualisation and visual analytics, both of which rely on high powered computing.  Data analytics seek to produce machine learning that iteratively evolves an understanding of datasets using computer algorithms, automatically recognizing complex patterns and constructing models that explain and predict such patterns and optimize outcomes (Han et al. 2011).  Moreover, since different approaches have their strengths and weaknesses, depending on the type of problem and data, an ensemble approach can be employed that builds multiple solutions using a variety of techniques to model and predict the same phenomena.  As such, it becomes possible to apply hundreds of different algorithms to a dataset to ensure that the most illuminating insights are produced.  Given the enormous volumes and velocity of big data, visualisation has proven a popular way for both making sense of data and communicating that sense.  Visualisation methods seek to reveal the structure, pattern and trends of variables and their interconnections.  Tens of thousands of data points can be plotted to reveal a structure that is otherwise hidden (e.g, mapping trends across millions of tweets to see how they vary across people and places) or the real-time dynamics of a phenomenon can be monitored using graphic and spatial interfaces (e.g., the flow of traffic across a city).

Pros and cons of big data

The hype surrounding big data is for good reason.  Big data offers the possibility of shifting from data-scarce to data-rich studies of all aspects of the world; narrow to exhaustive samples; static snapshots to dynamic vistas; coarse aggregations to high resolutions; relatively simple models to complex, sophisticated simulations and predictions (Kitchin 2013).  More so, big data consist of both qualitative and quantitative data, most of which are spatially and temporally referenced.  Big data provides greater breadth, depth, scale, timeliness and are inherently longitudinal in nature.  They enable researchers to gain greater insights into various systems.  For businesses and government, such data hold the promise of increased productivity, competitiveness, efficiency, effectiveness, utility, sustainability and  securitisation and the potential to better manage organisations, leverage value and produce capital, govern people, and create better places (Kitchin 2014).

Big data are not without negative issues, however.  For example, most big data are generated by private corporations such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms, none of whom are under any obligations to share freely the data they generate.  As such, access to such data is at present limited.  There are also concerns with respect to how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; their veracity and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to.  Further, big data raise a number of ethical questions concerning the extent to which they facilitate dataveillance (surveillance through data records), infringe on privacy and other human rights, enable social sorting (provide differential treatment to services), pose security concerns with regards to identity theft, and enable control creep wherein data generated for one purpose is used for another (Kitchin 2014).

References

boyd, D. and Crawford, K. (2012) Critical questions for big data.  Information, Communication and Society 15(5): 662-679

Constine, J. (2012) How Big Is Facebook’s Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/ (last accessed 28 January 2013)

Diebold, F. (2012) A personal perspective on the origin(s) and development of ‘big data’: The phenomenon, the term, and the discipline.  http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf (last accessed 5th February 2013)

Han, J., Kamber, M. and Pei, J. (2011) Data Mining: Concepts and Techniques. 3rd edition. Morgan Kaufmann, Waltham, MA.

Kitchin, R. (2013)  Big data and human geography: Opportunities, challenges and risks.  Dialogues in Human Geography 3(3) 262–267

Kitchin, R. (2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences.  Sage, London.

Marz, N. and Warren, J. (2012) Big Data: Principles and Best Practices of Scalable Realtime Data Systems.  MEAP edition. Westhampton, NJ: Manning.

Mayer-Schonberger, V. and Cukier, K. (2013) Big Data: A Revolution that will Change How We Live, Work and Think.  John Murray, London.

Open Data Center Alliance (2012) Big Data Consumer Guide.  Open Data Center Alliance. http://www.opendatacenteralliance.org/docs/Big_Data_Consumer_Guide_Rev1.0.pdf (last accessed 11 February 2013)

Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T. and Lapis, G. (2012) Understanding Big Data.  McGraw Hill, New York.

New paper: Small data, data infrastructures and big data

The first Programmable City Working Paper has been published on SSRN, written by Rob Kitchin and Tracey P. Lauriault, and concerns the relationship between small and big data, the scaling-up of small data into data infrastructures, and how to conceptualize and make sense of such infrastructures.

Small data, data infrastructures and big data

Abstract
The production of academic knowledge has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions.  It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds.  This approach is presently being challenged by the development of big data.  Small data studies will, however, continue to be important in the future because of their utility in answering targeted queries.  Nevertheless, small data are being made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and re-use, and open them up to combination with big data and analysis using big data analytics.  This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples.  The final section provides a framework for conceptualizing and making sense of data and data infrastructures.

Key words: big data, small data, data infrastructures, data politics, spatial data infrastructures, cyber-infrastructures, epistemology

Download the paper