Tag Archives: big data

Seminar 4: Citizens, Data, Virtual Reality and the Internet of Things – Revisiting the City

Hi everyone,

For our next seminar, we have invited Dr Andy Hudson-Smith to discussion Citizens, Data, Virtual Reality and the Internet of Things!

Time: 16:00 – 18:00, Wednesday, 2 April, 2014
Venue: Room 2.31, 2nd Floor Iontas Building, North Campus NUI Maynooth (Map)

Abstract
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few (IBM, 2103). This data can, compared to traditional data sources, be defined as ‘big’. Cities and urban environments are the main sources for big data, every minute 100,000 tweets are sent globally, Google receives 2,000,000 search requests and users share 684,478 pieces of content on Facebook (Mashable, 2012). An increasingly amount of this data stream is geolocated, from Check-ins via Foursquare through to Tweets and searches via Google Now, the data cities and individuals emit can be collected and viewed to make the data city visible, aiding our understanding of now only how urban systems operate but opening up the possibility of a real-time view of the city at large (Hudson-Smith, 2013). The talk explores systems such as The City Dashboard (http://www.citydashboard.org) and the rise of the Internet of Things (IoT) in terms of data collection, visualization and analysis. Joining these up creates a move towards the Smart City and via innovations in IoT a look towards augmented reality pointing towards the the creation of a ‘Smart Citizen’, ‘the Quantified Self’ and ultimately a Smart City.

IBM (2103), Big Data at the Speed of Business, http://www-01.ibm.com/software/data/bigdata/
Mashable (2012), How Much Data is Created Every Minute, http://mashable.com/2012/06/22/data-created-every-minute/
Hudson-Smith (2013) – Tagging and Tracking, Architectural Design, 01, 2014, High Definition, Zero Tolerance in Design and Production.

Speaker bio
Dr Andrew Hudson-Smith is Director of the Centre for Advanced Spatial Analysis (CASA) at The Bartlett, University College London. Andy is a Reader in Digital Urban Systems and Editor-in-Chief of Future Internet Journal, he is also an elected Fellow of the Royal Society of Arts, a member of the Greater London Authority Smart London Board and Course Founder of the MRes in Advanced Spatial Analysis and Visualisation and MSc in Smart Cities at University College London.

ProgCitySeminar4-poster

The real-time city? Big data and smart urbanism

An extended version of a Programmable City working paper, with two new sections, has been published in GeoJournal (visit GeoJournal website or Download).

Kitchin, R. (2014) The real-time city? Big data and smart urbanism.  GeoJournal 79(1): 1-14.

‘Smart cities’ is a term that has gained traction in academia, business and government to describe cities that, on the one hand, are increasingly composed of and monitored by pervasive and ubiquitous computing and, on the other, whose economy and governance is being driven by innovation, creativity and entrepreneurship, enacted by smart people. This paper focuses on the former and, drawing on a number of examples, details how cities are being instrumented with digital devices and infrastructure that produce ‘big data’. Such data, smart city advocates argue enables real-time analysis of city life, new modes of urban governance, and provides the raw material for envisioning and enacting more efficient, sustainable, competitive, productive, open and transparent cities. The final section of the paper provides a critical reflection on the implications of big data and smart urbanism, examining five emerging concerns: the politics of big urban data, technocratic governance and city development, corporatisation of city governance and technological lock-ins, buggy, brittle and hackable cities, and the panoptic city.

Big data: draft of encyclopedia entry

Below is the first draft of a 1000 word entry on Big Data by Rob Kitchin for the forthcoming International Encyclopedia of Geography to be published by Wiley-Blackwell and the Association of American Geographers.  It sets out a definition of big data, how they are produced and analyzed, and some of their pros and cons.  Feedback on the content would be welcome.

Abstract

Big data consist of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis, much of which are spatially and temporally referenced.  The generation, processing and storing of such data have been enabled by advances in computing and networking.  Insight and value can be extracted from them using new data analytics, including forms of machine learning and visualisation.  Proponents contend that big data are reshaping how knowledge is produced, business conducted, and governance enacted.  However, there are concerns regarding access to big data, their quality and veracity, and the ethical implications of their use with respect to dataveillance, social sorting, security, and control creep.

Keywords: big data, analytics, visualisation, ethics

Defining big data

The etymology of ‘big data’ can be traced to the mid-1990s, first used to refer to the handling and analysis of massive datasets (Diebold 2012).  It is only since 2008, however, that the term has gained traction, becoming a business and industry buzzword.  Like many rapidly emerging concepts, big data has been variously defined, but most commentators agree that it differs from what might be termed ‘small data’ with respect to its traits of volume, velocity and variety  (Zikopoulos et al., 2012).  Traditionally, data have been produced in tightly controlled ways using sampling techniques that limit their scope, temporality and size.  Even very large datasets, such as national censuses, have been restricted to generally 30-40 questions and are carried out once every ten years.  Advances in computing hardware and software and networking have, however, enabled much wider scope for producing, processing, analyzing and storing massive amounts of diverse data on a continuous basis.  Moreover, big data generation strives to be: exhaustive, capturing entire populations or systems (n=all); fine-grained in resolution and uniquely indexical in identification; relational in nature, containing common fields that enable the conjoining of different data sets; and flexible, holding the traits of extensionality (can add new fields easily) and scaleability (can expand in size rapidly) (boyd and Crawford 2012; Kitchin 2013; Marz and Warren 2012; Mayer-Schonberger and Cukier 2013).  Big data thus comprises of huge volumes of diverse, fine-grained, interlocking data produced on a dynamic basis.  For example, in 2012 Wal-Mart was generating more than 2.5 petabytes (250 bytes) of data relating to more than 1 million customer transactions every hour (Open Data Center Alliance 2012), and Facebook was processing 2.5 billion pieces of content (links, comments, etc), 2.7 billion ‘Like’ actions and 300 million photo uploads per day (Constine 2012).  Such big data, its proponents argue, enable new forms of knowledge that produce disruptive innovations with respect to how business is conducted and governance enacted.  Given that much big data are georeferenced they hold much promise for new kinds of spatial analysis and modelling.

Sources of big data

Big data are produced in three broad ways: through directed, automated and volunteered systems (Kitchin 2013).  Directed systems are controlled by a human operator and include CCTV, spatial video and LiDAR scans.  Automated systems automatically capture data as an inherent function of the technology and include: the recording of retail purchases at the point of sale; transactions and interactions across digital networks (e.g., sending emails, internet banking); the use of digital devices such as mobile phones that record and communicate the history of their own utilisation; clickstream data that records navigation through a website or app; measurements from sensors embedded into objects or environments; the scanning of machine-readable objects such as transponders and barcodes; and machine to machine interactions across the internet.  Volunteered systems rely on users to gift data through uploads and interactions and include engaging in social media (e.g., posting comments, observations, photos to social networking sites such as Facebook) and the crowdsourcing of data wherein users generate data and then contribute them into a common platform (e.g., uploading GPS-traces into OpenStreetMap).

Analyzing big data

Given their volume, variety and velocity, big data present significant analytical challenges that traditional methods — which have been designed to extract insights from scarce and static data — are not well suited.  The solution has been the development of a new suite of data analytics that are rooted in research around artificial intelligence and expert systems, and new forms of data visualisation and visual analytics, both of which rely on high powered computing.  Data analytics seek to produce machine learning that iteratively evolves an understanding of datasets using computer algorithms, automatically recognizing complex patterns and constructing models that explain and predict such patterns and optimize outcomes (Han et al. 2011).  Moreover, since different approaches have their strengths and weaknesses, depending on the type of problem and data, an ensemble approach can be employed that builds multiple solutions using a variety of techniques to model and predict the same phenomena.  As such, it becomes possible to apply hundreds of different algorithms to a dataset to ensure that the most illuminating insights are produced.  Given the enormous volumes and velocity of big data, visualisation has proven a popular way for both making sense of data and communicating that sense.  Visualisation methods seek to reveal the structure, pattern and trends of variables and their interconnections.  Tens of thousands of data points can be plotted to reveal a structure that is otherwise hidden (e.g, mapping trends across millions of tweets to see how they vary across people and places) or the real-time dynamics of a phenomenon can be monitored using graphic and spatial interfaces (e.g., the flow of traffic across a city).

Pros and cons of big data

The hype surrounding big data is for good reason.  Big data offers the possibility of shifting from data-scarce to data-rich studies of all aspects of the world; narrow to exhaustive samples; static snapshots to dynamic vistas; coarse aggregations to high resolutions; relatively simple models to complex, sophisticated simulations and predictions (Kitchin 2013).  More so, big data consist of both qualitative and quantitative data, most of which are spatially and temporally referenced.  Big data provides greater breadth, depth, scale, timeliness and are inherently longitudinal in nature.  They enable researchers to gain greater insights into various systems.  For businesses and government, such data hold the promise of increased productivity, competitiveness, efficiency, effectiveness, utility, sustainability and  securitisation and the potential to better manage organisations, leverage value and produce capital, govern people, and create better places (Kitchin 2014).

Big data are not without negative issues, however.  For example, most big data are generated by private corporations such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms, none of whom are under any obligations to share freely the data they generate.  As such, access to such data is at present limited.  There are also concerns with respect to how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; their veracity and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to.  Further, big data raise a number of ethical questions concerning the extent to which they facilitate dataveillance (surveillance through data records), infringe on privacy and other human rights, enable social sorting (provide differential treatment to services), pose security concerns with regards to identity theft, and enable control creep wherein data generated for one purpose is used for another (Kitchin 2014).

References

boyd, D. and Crawford, K. (2012) Critical questions for big data.  Information, Communication and Society 15(5): 662-679

Constine, J. (2012) How Big Is Facebook’s Data? 2.5 Billion Pieces Of Content And 500+ Terabytes Ingested Every Day, 22 August 2012, http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/ (last accessed 28 January 2013)

Diebold, F. (2012) A personal perspective on the origin(s) and development of ‘big data’: The phenomenon, the term, and the discipline.  http://www.ssc.upenn.edu/~fdiebold/papers/paper112/Diebold_Big_Data.pdf (last accessed 5th February 2013)

Han, J., Kamber, M. and Pei, J. (2011) Data Mining: Concepts and Techniques. 3rd edition. Morgan Kaufmann, Waltham, MA.

Kitchin, R. (2013)  Big data and human geography: Opportunities, challenges and risks.  Dialogues in Human Geography 3(3) 262–267

Kitchin, R. (2014) The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences.  Sage, London.

Marz, N. and Warren, J. (2012) Big Data: Principles and Best Practices of Scalable Realtime Data Systems.  MEAP edition. Westhampton, NJ: Manning.

Mayer-Schonberger, V. and Cukier, K. (2013) Big Data: A Revolution that will Change How We Live, Work and Think.  John Murray, London.

Open Data Center Alliance (2012) Big Data Consumer Guide.  Open Data Center Alliance. http://www.opendatacenteralliance.org/docs/Big_Data_Consumer_Guide_Rev1.0.pdf (last accessed 11 February 2013)

Zikopoulos, P.C., Eaton, C., deRoos, D., Deutsch, T. and Lapis, G. (2012) Understanding Big Data.  McGraw Hill, New York.

New paper: Small data, data infrastructures and big data

The first Programmable City Working Paper has been published on SSRN, written by Rob Kitchin and Tracey P. Lauriault, and concerns the relationship between small and big data, the scaling-up of small data into data infrastructures, and how to conceptualize and make sense of such infrastructures.

Small data, data infrastructures and big data

Abstract
The production of academic knowledge has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions.  It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds.  This approach is presently being challenged by the development of big data.  Small data studies will, however, continue to be important in the future because of their utility in answering targeted queries.  Nevertheless, small data are being made more big data-like through the development of new data infrastructures that pool, scale and link small data in order to create larger datasets, encourage sharing and re-use, and open them up to combination with big data and analysis using big data analytics.  This paper examines the logic and value of small data studies, their relationship to emerging big data and data science, and the implications of scaling small data into data infrastructures, with a focus on spatial data examples.  The final section provides a framework for conceptualizing and making sense of data and data infrastructures.

Key words: big data, small data, data infrastructures, data politics, spatial data infrastructures, cyber-infrastructures, epistemology

Download the paper

Big data and human geography forum

A forum on big data and human geography has just been published in Dialogues in Human Geography 3(3), November 2013.  It includes a paper by Rob Kitchin on the opportunities, challenges and risks of big data to geographic scholarship.  Here’s a full list of contributions:

Mark Graham and Taylor Shelton: Geography and the future of big data, big data and the future of geography, pp. 255-261,

Rob Kitchin: Big data and human geography: Opportunities, challenges and risks, pp. 262-267

Evelyn Ruppert: Rethinking empirical social sciences, pp. 268-273

Michael Batty: Big data, smart cities and city planning, pp. 274-279

Michael F Goodchild: The quality of big (geo)data, pp. 280-284

Sean P Gorman: The danger of a big data episteme and the need to evolve geographic information systems, pp. 285-291

Sandra González-Bailón: Big data and the fabric of human geography, pp. 292-296

Trevor J Barnes: Big data, little history, pp. 297-302

Putting public data to work and putting the I back in IT

The Silicon Republic is “Ireland’s No 1 resource for technology news”.  Along with delivering the news about technology innovation in Ireland they orchestrate fantastic industry events to amplify emerging discussions of importance with the irish indigenous and foreign technology sectors.  Tracey P. Lauriault has attended three of their events to date, and participated in their latest, the Irish Data Forum.  Their events feature a cross section of industry, academic and public sector experts to discuss trends, issues and innovations. The discussions are frank and audience participation is skillfully moderated by Ann O’Dea CEO, Editor-at-large of the paper.

The Irish Data Forum Putting “I” back in “IT” discussed cloud, big data, data analytics, open data, data science and public data.  It also examined the data revolution and how Ireland can be at its heart.

Below is a selection of media from the event.

Panel 2 Part 1 Video

Panel 2 Part 2 Video