At the recent Conference of the Association of American Geographers held in Tampa, April 8-12, I was asked to be a discussant on a set of three sessions concerning geographers engagement with big data. The first session was a general intro panel to big data from a geographical perspective, the second panel consisted of a set of a dozen or so short lightening talks (no more than 5 mins each) about each speaker’s on-going research, and the third panel presented some demos of practical approaches researchers are making to harvesting, curating and sharing big geo-data.
Rather than focus my discussion on the individual comments, papers and demos, I reflected more broadly on the presentations, which I felt had been overly focused on one particular kind of big data, namely social media, with a little crowdsourcing thrown in, and had done so from a standpoint that was overly technical or quite narrowly conceived in conceptual terms. My argument was that we need to help develop, along with other social science disciplines, critical data studies (a term borrowed from Craig Dalton and Jim Thatcher) that fully appreciate and uncover the complex assemblages that produce, circulate, share/sell and utilise data in diverse ways and recognize the politics of data and the diverse work that they do in the world. This also requires a critical examination of the ontology of big data and its varieties which extend well beyond social media to include various forms of digital and automated surveillance, techno-social systems of work, exhaust from digital devices, sensors, scanners, the internet of things, interaction and transactional data, sousveillance, and various modes of volunteered data. As well as a thorough consideration of its technical and organisational shortcomings/issues, its associated politics and ethics, and its consequences for the epistemologies, methodologies and practices of academia and various domains of everyday life. I concluded with a call for more synoptic, conceptual and normative analysis of big data, as well as detailed empirical research that examine all aspects of big data assemblages. In other words, I was advocating for a more holistic and critical analysis of big data. Given the speed at which the age of big data is coming into being, such analyses in my view are very much needed to make sense of the changes occuring.
For another reflection on the sessions see Mark Graham’s comments on Zero Geography.
At the ProgCity open data event I was talking to Denis Parfenov and Flora Fleischer from Open Knowledge Foundation Ireland about some of our AIRO experiences at trying to leverage data out of government departments and to push forward an open data agenda in Ireland. I said I would share a slide I first presented a few years ago about the ostrich attitude to data and evidence-informed policy within many government departments and the arguments used against opening and sharing data and providing open access analysis tools and training.
These arguments were presented to me in one government department in a sequence as I rebutted each assertion. The last two capture perfectly the way that Ireland works politically and institutionally.
We don’t need open data – officials on the ground know what’s going on their area/domain.
Anybody who knows what they’re doing in government can access, process and interpret relevant data.
The data is too sensitive to share and might used in ways for which it wasn’t intended.
The potential gains with respect to increased understanding, and more effective and efficient government are over-stated, and it’s not financially viable.
Just because you have data it doesn’t mean it’ll get used; that’s not how Irish policy is made.
Even if it shapes policy, policy is not implemented or enforced in Ireland.
The whole line of reasoning basically leads to: what is the point of opening data when it won’t make a blind bit of difference as to how the country is run and it’s only going to create additional work and be an annoyance? Given the government has signed up to the open government partnership and to providing open data it’ll be interesting to see to what extent these attitudes persist amongst agencies and politicians (who equally dislike hard evidence as an inconvenience to gombeen politics). Today, the government announced that postcode data will not be open, which is not a great start.
It is clear from recent attempts to make freedom of information requests more difficult that, with a few exceptions within some units, the government is not so much interested in open data for the purposes of transparency, reform and to be held to account, but rather the hope that they might be leveraged economically to create apps, new data products and jobs. If that can be done in a way that does not get in the way of normal political and policy business I suspect they’d be delighted.
Hopefully, the open data movement will not get stifled by the sixth point: even if we have an open data policy it does not mean it’ll get implemented, or if it does it’ll take a long time and be limited. To go back to postcodes, that has been in the pipeline for at least ten years, with endless false starts and consultations. There is no reason why open data should take that long, but it wouldn’t surprise me if it did.
I’ve been a long time supporter of open data and providing analytic tools to citizens to enable evidence-informed participation in public debate. Since 2006, when it was initially established as the Cross-Border Regional Research Observatory, I have been PI on the All-Island Research Observatory (www.airo.ie), a project that provides access to various government datasets in the Republic of Ireland, Northern Ireland and Europe, along with interactive mapping and graphing tools. The core project team of Justin Gleeson, Aoife Dowling and Eoghan McCarthy have worked hard to leverage datasets out of various agencies and negotiate more favourable licensing terms, add value and insight to these datasets, promote data journalism through collaboration with the Irish Times and Irish Examiner, and provide open access to a couple of thousand datasets through the AIRO datastore.
The arguments concerning the benefits of open data are now reasonably well established and include contentions that open data lead to increased transparency and accountability with respect to public bodies and services; increases the efficiency and productivity of agencies and enhances their governance; promotes public participation in decision making and social innovation; and fosters economic innovation and job and wealth creation (Pollock 2006; Huijboom and Van der Broek 2011; Janssen 2012; Yiu 2012).
What is less well examined are the potential problems affecting, and negative consequences of, open data initiatives. Consequently, as a provocation for Wednesday’s (Nov 13th, 4-6pm) Programmable City open data event I thought it might be useful to outline four critiques of open data, each of which deserves and demands critical attention: open data lacks a sustainable financial model; promotes a politics of the benign and empowers the empowered; lacks utility and usability; and facilitates the neoliberalisation and marketisation of public services. These critiques do not suggest abandoning the move towards opening data, but contend that open data initiatives need to be much more mindful of what data are being made open, how data are made available, how they are being used, and how they are being funded.
Funding and sustainability
Because, to date, attention has been largely focused on the supply-side of accessing data and creating open data initiatives, insufficient attention has been paid to the economics of creating sustainably funded initiatives. Data might be non-rivalrous in nature, meaning that it can distributed for marginal cost but the initial copy needs to be paid for along with on-going data management and customer service (Pollock 2006). As such, open data might well be a free resource for end-users, but its production and curation is certainly not without significant cost (especially with respect to appropriate technologies and skilled staffing). In many cases, the data being opened has to date been a major source of revenue for organisations, and in the case of companies, competitive advantage. A key question, therefore, centres on how open data projects are funded sustainably in the absence of a direct revenue stream?
A number of different models have been suggested (see Ferro and Osella 2013), but it is generally acknowledged that securing a stable financial base is best achieved by direct government subvention. Here, it is argued that such a subvention will be offset by two factors. First, open data will produce diverse consumer surplus value, generating significant public goods which are worth the investment of public expenditure. Second, open data will lead to new innovative products that will create new markets, which in turn will produce additional corporate revenue and tax receipts (Pollock 2008). These tax receipts will be in excess of additional government costs of opening the data. This may well be the case with high value datasets such as mapping and transport data, but much less likely with most other datasets.
de Vries et al. (2011) reported that the average apps developer made only $3,000 per year from apps sales, with 80 percent of paid Android apps being downloaded fewer than 100 times. In addition, they noted that even successful apps, such as MyCityWay which had been downloaded 40 million times, were not yet generating profits. Instead, venture capitalists are investing in projects with potential whilst a sustainable business model is sought. Given austerity and cutbacks across governments finding the necessary funds to open data is a challenge. And yet, the consequences of reductions or fluctuations in the financial base of open data services are likely to be a decline in data quality, responsiveness, innovation, and general performance (Pollock 2008). At present, the jury is still out on whether opening up all public sector data is economically viable and sustainable, especially in the short term.
Politics of the benign and empowering the empowered
Another consequence of focusing on gaining access to the data, is to ignore the politics of the data themselves, what the data reveals, or how they are used and for whose interests (Shah 2013). The open data movement largely seeks to present an image of being politically benign and commonsensical, promoting a belief that opening up data is inherently a good thing in and of itself by democratising data. For others, making data accessible is just one element with respect to the notion of openness. Just as important are what the data consist of, how they can be used, and how they can create a more just and equitable society. If open data merely serves the interests of capital by opening public data for commercial re-use and further empowers those who are already empowered and disenfranchises others, then it has failed to make society more democratic and open (Gurstein 2011; Shah 2013).
Implicit in most discussions on open data is that the data is neutral and objective in nature and that everyone has the potential to access and use such data (Gurstein 2011; Johnson 2013). However, these are not the case. With respect to open data themselves, as Johnson (2013) contends, a high degree of social privilege and social values are embedded in public sector data with respect to what data are generated relating to whom and what (especially within domains that function as disciplinary systems, such as social welfare and law enforcement), and whose interests are represented within the data set and whose interests are excluded. As such, value structures are inherent in data sets and these subsequently shape analysis and interpretation and work to propagate injustices and reinforce dominant interests.
Citizens have differential access to the hardware and software required to download and process open data sets, as well as varying levels of skills required to analyze, contextualize and interpret the data (Gurstein 2011). And even if some groups have the ability to make compelling sense of the data, they do not necessarily have the contacts needed to gain a public voice and influence a debate, or the political skill to take on a well resourced and savvy opponent. As such, the democratic potential of open data has been overly optimistic, with most users those with high degrees of technical knowledge and an established political profile (McClean 2011). Indeed, open data can work to further empower the empowered and to reproduce and deepen power imbalances (Gurstein 2011). An oft-cited example of the latter is the digitization of land records in Karnataka, India, where an open data project, which was promoted as a ‘pro-poor’ initiative, worked to actively disenfranchise the poor by enabling those with financial resources and skills to access previously restricted data and to re-appropriate their lands (Gurstein 2011; Slee 2012; Donovan 2012). Far from aiding all citizens, in this case open data facilitated a change in land rights and a transfer of wealth from poor to rich. In other words, opening data does not mean an inherent democratization of data. Indeed, open data can function as a tool of disciplinary power (Johnson 2013).
Utility and usability
In a study of a number of different open data projects, Helbig et al. (2012) reported that many are too technically focused amounting to “little more than websites linked to miscellaneous data files, with no attention to the usability, quality of the content, or consequences of its use.” The result is a set of open data sites that operate more as data holdings or data dumps, lacking the qualities expected in a well organised and run data infrastructure such as clean, high quality, validated and interoperable data that comply with data standards and have appropriate metadata and full record sets (associated documentation); preservation, backup and auditing policies; re-use, privacy and ethics policies; administrative arrangements, management organisation and governance mechanisms; and financial stability and a long term plan of development and sustainability. Many sites also lack appropriate tools and contextual materials to support data analysis. Moreover, the data sets released are often low-hanging fruit, consisting of those that are easy to release and contain non-sensitive data that has relatively low utility. In contrast, data that might be more difficult and demanding to make open, due to issues of sensitivity or because they require more management work to comply with data protection laws, often remain closed (Chignard 2013).
Part of the issue is that many open data sites have been rough and ready responses to an emerging phenomena. They have been built by enthusiasts and organisations who have little experience of data archiving or the contextual use of the data being opened. They have been supported and promoted by hackathons and data dives, which reproduce many of these issues. As McKeon (2013) and Porway (2013) contend, these events, which invite coders and other interested parties to build apps using open data, can do as much harm as good. Whilst they do focus attention on the data and are good for networking, those doing the coding often have little deep contextual knowledge with regards to what the data refers, belong to a particular demographic that is not reflective of wider society (e.g., young, educated and tech-orientated), and believe that deep structural problems can be resolved by technological solutions. In other words, they are “built by a micro-community of casual volunteers, not by people with a deep stake in seeing the project succeed” (McKeon 2013). Further, hackathon created solutions often remain at version 1.0, with little after event follow-up, maintenance or development.
Because of these various teething issues, rather than creating a virtuous cycle, where the release of more and more data sets, in more formats, produces growing use, and therefore the release of more data, as assumed by the open data movement, Helbig et al. (2013) note that many sites have low and declining traffic as they do not encourage use or facilitate users, and are limited by other factors such as data management practices, agency effort and internal politics. After an initial spark of interest, data use drops quite markedly as the limitations of the data are revealed and users struggle to work out how the data might be profitably analyzed and used. McClean (2011), for example, notes that analysis arising from open data has had limited impact on political debates, and concludes with respect to COINS (government financial data in the UK), that after “a brief flurry of media interest in mid-2010, in the immediate aftermath of the release, … reports explicitly mentioning COINS are now extremely rare and those members of the press who were most interested obtaining access to it report that it has not proved particularly useful as a driver of journalism.” Where data are released periodically (e.g., quarterly or annually), usage tends to be cyclical and often tied to specific projects (such as consultancy reports) rather than to have a more consistent pattern of use. In such cases, Helbig et al. (2012) observed a set of negative or balancing feedback loops slowed the supply of data and use, thus further decreasing usage. Thus, after some initial ‘quick wins’, the danger is that any virtuous cycle shifts from being positive to negative, and thus the rationale for central government funding of such initiatives is undermined and in due course cut.
Neoliberalisation and marketisation of public services
Jo Bates (2012) argues, “open initiatives such as OGD [open government data] emerge into a historical process, not a neutral terrain.” As with all political initiatives, the politics of open data are not simply commonsensical or neutral, but rather are underpinned by political and economic ideology. The open data movement is diverse and made up of a range of constituencies with different agendas and aims, and is not driven by any one party. However, Bates makes the case that the open data movement, in the UK at least, had little political traction until big business started to actively campaign for open data, and open government initiatives started to fit into programmes of forced austerity and the marketisation of public services. For her, political parties and business have appropriated the open data movement on “behalf of dominant capitalist interests under the guise of a ‘Transparency Agenda’” (Bates 2012).
In other words, the real agenda of business interested in open data is to get access to expensively produced data for no cost, and thus a heavily subsidised infrastructural support from which they can leverage profit, whilst at the same time removing the public sector from the marketplace and weakening its position as the producer of such data. Indeed, because the income from data/data services disappears by opening data (which is especially acute in trading funds where data production and management was largely being funded by fees with some public subsidy), public sector bodies are more likely to be forced outsource such services to the private sector on a competitive basis or cede data production to the private sector which they then have to procure (Gurstein 2013). Here, data services and data derived from public data has to be purchased back by the data creator. At the same time the data literacy of the organisation is hollowed out. Moreover, because open data often concerns a body’s own activities, especially when supplemented by key performance indicators, they facilitate public sector reform and reorganisation that promotes a neoliberal, New Public Management ethos and private sector interests (McClean 2011; Longo 2011).
Such processes, Bates (2013) argues, are part of a deliberate political strategy to open up the “provision of almost all public services to competition from private and third sector providers”, with open data about public services enabling “service users to make informed choices within a market for public services based on data-driven applications produced by a range of commercial and non-commercial developers” (original emphasis). In such cases, the transparency agenda promoted by politicians and businesses is merely a rhetorical discursive device. If either party was genuinely interested in transparency then it would be equally supportive of the right to information movement (freedom of information) and the work of whistleblowers (Janssen 2012) and also loosening the shackles of intellectual property rights more broadly (Shah 2013). Instead, governments and businesses are generally resistant to both.
Conclusion
Open data initiatives hold much promise and value. They are radically altering access to publicly produced data and making new kinds of analysis possible. They are creating new forms of transparency and accountability, fostering new form of social participation and evidence-informed modes of governance, and promoting innovation and wealth generation. At the same time, much more critical attention needs to be paid to how open data projects are developing as complex socio-technical systems with diverse stakeholders and agendas. To date, efforts have concentrated on the political and technical work of establishing open data projects, and not enough on studying these discursive and material moves and their consequences. As a result, we lack detailed case studies of open data projects in action, the assemblages surrounding and shaping them, and the messy, contingent and relational ways in which they unfold. It is only through such studies that are more complete picture of open data will emerge, one that reveals both the positive and negatives of such projects, and which will provide answers to more normative questions concerning how they should be implemented and to what ends.
This post is a modified extract from a forthcoming book by Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences (Sage, London).
References
Bates, J. (2012) “This is what modern deregulation looks like”: Co-optation and contestation in the shaping of the UK’s Open Government Data Initiative. The Journal of Community Informatics 8(2). http://www.ci-journal.net/index.php/ciej/article/view/845/916 (last accessed 6 February 2013)
de Vries, M., Kapff, L., Negreiro Achiaga, M., Wauters, P., Osimo, D., Foley, P., Szkuta, K., O’Connor, J., and Whitehouse, D. (2011) Pricing of Public Sector Information Study (POPSIS). http://epsiplatform.eu/sites/default/files/models.pdf (last accessed 11 August 2013)
Donovan, K. (2012). Seeing like a slum: Towards open, deliberative development. Georgetown Journal of International Affairs, 13(1), 97-104.
Ferro, E. and Osella, M. (2013) Eight Business Model Archetypes for PSI Re-Use. “Open Data on the Web” Workshop, 23rd-24th April 2013, Google Campus, Shoreditch, London. http://www.w3.org/2013/04/odw/odw13_submission_27.pdf (last accessed 13 August 2013)
Gordon-McKeon, S. (2013) Hacking the hackathon. Shaunagm.net http://www.shaunagm.net/blog/2013/10/hacking-the-hackathon/ 10th October (last accessed 21 October 2013)
Johnson, J.A. (2013) From open data to information justice. Paper presented at the Annual Conference of the Midwest Political Science Association April 13, 2013, Chicago, Illinois. http://papers.ssrn.com/abstract=2241092 (last accessed 16 August 2013)
McClean, T. (2011) Not with a bang but a whimper: The politics of accountability and open data in the UK. Paper prepared for the American Political Science Association Annual Meeting. Seattle, Washington, 1-4 September 2011. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1899790(last accessed 19th August 2013)