A new paper, ‘Improving the Veracity of Open & Real-Time Urban Data’, has been published by Gavin McArdle and Rob Kitchin as Programmable City Working Paper 13. The paper has been prepared for the Data and the City workshop to be held at Maynooth University Aug 31th-Sept 1st.
Abstract
Within the context of the smart city, data are an integral part of the digital economy and are used as input for decision making, policy formation, and to inform citizens, city managers and commercial organisations. Reflecting on our experience of developing real-world software applications which rely heavily on urban data, this article critically examines the veracity of such data (their authenticity and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to) and how they can be assessed in the absence of quality reports from data providers. While data quality needs to be considered at all aspects of the data lifecycle and in the development and use of applications, open data are often provided ‘as-is’ with no guarantees about their veracity, continuity or lineage (documentation that establishes provenance and fit for use). This allows data providers to share data with undocumented errors, absences, and biases. If left unchecked these data quality issues can propagate through multiple systems and lead to poor smart city applications and unreliable ‘evidence-based’ decisions. This leads to a danger that open government data portals will come to be seen as untrusted, unverified and uncurated data-dumps by users and critics. Drawing on our own experiences we highlight the process we used to detect and handle errors. This work highlights the necessary janitorial role carried out by data scientists and developers to ensure that data are cleaned, parsed, validated and transformed for use. This important process requires effort, knowledge, skill and time and is often hidden in the resulting application and is not shared with other data users. In this paper, we propose that rather than lose this knowledge, in the absence of data providers documenting them in metadata and user guides, data portals should provide a crowdsourcing mechanism to generate and record user observations and fixes for improving the quality of urban data and open government portals.
Key words: open data, big data, realtime data, veracity, quality, fidelity, metadata, urban data, transport, smart cities