Category Archives: analysis

Living well with data: Practicing slow computing

I had the pleasure of being a panellist for the Association of Internet Researchers (AoIR) plenary session yesterday, along with Helen Kennedy and Seeta Peña Gangadharan. I thought I’d share the text of my short presentation here.

In my time I want to focus on one part of the session descriptor, namely ‘how can we live well with data, rather than just survive.’ This is an issue that I’ve been giving some thought to and discuss at length in a recent book, Slow Computing: Why We Need Balanced Digital Lives, written with Alistair Fraser (available for the duration of the conference for £7 using the code SC20 at BUP website). Rather than simply critique how digital society is unfolding, we wanted to set out practical and political interventions which can be performed individually or collectively that push back against the negative aspects of living digital lives; that allow people to claim and assert some level of control and advocate for a different kind of digital world. In short, how can people live a ‘digital good life’, or as we put it in the book, ‘experience the joy and benefits of computing, but in a way that asserts individual and collective autonomy?’

In the book, we focus our attention on how our everyday lives have been transformed by digital technologies in two key respects.

The first is with respect to time and how networked devices have ushered in an era of network or instantaneous time, where people are always and everywhere available, encounters are organized on-the-fly, tasks become interleaved and multiply, there is working-time drift, and individuals can feel they are tied to a digital leash that leaves them harried and anxious. Technologies are altering the pace, tempo and temporal organization of digital life in ways that are not always to our benefit or well-being.

Our second concern is with respect to data and how increased datafication and dataveillance is enabling companies and states to profile, socially sort, target, nudge and manage us. How excessive data extraction is fuelling the growth of data capitalism and reshaping governance, governmentality and citizenship in ways that erode civil liberties. It is these issues, and living well with data, that I concentrate on here.

There is no doubt that the era of ubiquitous computing and big data has resulted in excessive data extraction. Many digital technologies and services practise ‘over-privileging’; that is, seeking permission to access more data and device functions than they need for their operation alone. This has eroded privacy, created new predictive privacy harms, expanded data markets and the ways in which companies can accumulation profit by leveraging value from personal data, and underpins new forms of technocratic, algorithmic, predictive and anticipatory governance. There is significant data power – expressed through data capitalism and the state’s use of data – that reproduce structural inequalities unevenly across people (related to class, gender, race, ethnicity, disability, etc.) and places (well-off and poorer neighbourhoods, regions, global north/south).

The question is what to do about this? Our answer is what we term ‘slow computing’, a term that draws on notions at the heart of the slow living movement – well-being, enjoyment, patience, quality, sovereignty, authenticity, responsibility, and sustainability. Slowness is about enacting a different kind of society, in our case both in relation to time and data. It is about using devices and apps without feeling harassed, stressed, coerced, or exploited; and it is about challenging and transforming iniquitous and exploitative structural relations.

Conceptually, our argument is underpinned by the concepts of an ethics of digital care, data justice, and time and data sovereignty. Rooted in the ideas and ideals of feminism, an ethics of digital care promotes moral action at the individual and collective level to ensure personal wellbeing and aid for others. It recognizes that we are bound within webs of responsibility, obligations and duties and advocates acting reciprocally and non-reciprocally to tackle data injustices. Data justice draws much of its moral argument from the ideas of social justice, seeks fair treatment of people through data-driven processes, and challenges data power in various ways, including data activism. Data sovereignty, rooted in the work of indigenous scholars, is the idea that we should retain some degree of authority, power and control over the data that relates to us, that we should also have a say in the mechanisms by which those data are extracted, and that other entities, such as companies and states, should recognize that sovereignty as legitimate.

Using these ideas we set out individual and collective tactics – both practical and political – for asserting data sovereignty and expressing an ethics of digital care. At an individual level, this includes various means to curate digital lives, use open source alternatives, step away from technologies, and obfuscate. At a collective level, it includes political campaigning and lobbying, placing pressure on companies, creating data commons, undertaking counter-data actions, producing open source, privacy enhancement tech and civic tech.

At the same time, we recognize that different groups of people have varying opportunities to practice slow computing; to live well with data. The ability to exert data sovereignty varies by class, gender, race, ethnicity, etc. Poorer and more marginalized populations are more often the focus of data power and are least able to resist and pushback. This is why a collective ethics of digital care is vital; to seek data justice for all.

Of course, we’re not the only folks thinking about this, with much of the work concerning data ethics, data justice and data activism seeking to envisage a different kind of digital society and push back against the worst excesses of dataveillance and data capitalism. However, there is much more theoretic, empirical, advocacy and activist work needed within and beyond the academy if we are to live well with data

Will CovidTracker Ireland work?

The coronavirus pandemic has posed enormous challenges for governments seeking to delay, contain and mitigate its effects. Along with measures within health services, a range of disruptive public health tactics have been adopted to try and limit the spread of the virus and flatten the curve, including social distancing, self-isolation, forbidding social gatherings, limiting travel, enforced quarantining, and lockdowns. Across a number of countries these measures are being supplemented by a range of digital technologies designed to improve their efficiency and effectiveness by harnessing fine-grained, real-time big data. In general, the technologies being developed and rolled-out fall into four types: contact tracing, quarantine enforcement/movement permission, pattern and flow modelling, and symptom tracking. The Irish government is pursuing two of these – contact tracing and symptom tracking – merged into a single app ‘CovidTracker Ireland’. In this short essay, I outline what is known about the Irish approach to developing this app and assess whether it will work effectively in practice.

CovidTracker Ireland

On March 29th 2020 the Health Services Executive (HSE) announced that it hoped to launch a Covid-19 contact tracing app within a matter of days. Few details were given about the proposed app functionality or architecture, other than it would mimic other tracing apps, such as Singapore’s TraceTogether, using Bluetooth connections to record proximate devices and thus possible contacts, together with additional features for reporting well-being. The HSE made it clear that it would be an opt-in rather than compulsory initiative, that the app would respect privacy and GDPR, being produced in consultation with the Data Protection Commission, and it would be time-limited to the coronavirus response. It was not stated who would develop the app beyond it being described as a ‘cross-government’ effort.

On April 10th, the HSE revealed more details through a response to questions from Broadsheet.ie, stating that the now named CovidTracker Ireland App will:

  • “help the health service with its efforts in contact tracing for people who are confirmed cases;
  • allow a user to record how well they are feeling, or track their symptoms every day;
  • provide links to advice if the user has symptoms or is feeling unwell;
  • give the user up-to-date information about the virus in Ireland.”

Further, they reiterated that the app ‘will be designed in a way that maximises privacy as well as maximising value for public health. Privacy-by-design is a core principle underpinning the design of the CovidTracker Ireland App – which will operate on a voluntary and fully opt-in basis.’ There was no mention of the approach being taken; however the use of the HSE logo on the PEPP-PT (Pan-European Privacy-Preserving Proximity Tracing) website indicates that it has adopted that architecture, an initiative that claims seven countries are using their approach, with reportedly another 40 countries involved in discussions.

As of April 22nd the CovidTracker Ireland app is under development, with HSE stating on April 17th that it was being tested with a target of launching by early May when it is planned that some government restrictions will be lifted.

Critique and concerns

From the date it was announcement concerns have been expressed about the CovidTracker Ireland, particularly by representatives of Digital Rights Ireland and the Irish Council for Civil Liberties. A key issue has been the lack of transparency and openness in the approach being taken. An app will simply be launched for use without any published details of the approach and architecture being adopted, consultation with stakeholders, piloting by members of the public, and external feedback and assessment.

There are concerns that a centralized, rather than decentralized approach will be taken, and there is no indication that the underlying code will be open for scrutiny, if not by the public, at least by experts. It is not clear if the app is being developed in-house, or if it has been contracted out to a third-party developer and if the associated contract includes clauses concerning data ownership, re-use and sale, and intellectual property. There are no details about where data will be stored, who will have access to it, how will it be distributed, or how it be acted upon. There is unease as to whether the app will be fully compliant with GDPR and fully protect privacy, especially given that a Data Protection Impact Assessment (DPIA), which is legally required before launch, has seemingly not yet been undertaken. Such a DPIA would allow independent experts to be able to assess, validate and provide feedback and advice.

Critics are also concerned that CovidTracker Ireland merges the tasks of contact tracing and symptom tracking which have been pursued separately in other jurisdictions. Here, two sets of personal information are being tied together: proximate contacts and health measures. This poses a larger potential privacy problem if they are not adequately protected. Moreover, critics are worried that CovidTracker Ireland might become a ‘super app’, which extends its original ambition and goals. Here, the app might enable control creep, wherein it starts to be employed beyond its intended uses such as quarantine enforcement/movement permission. For example, Antoin O’Lachtnain of Digital Rights Ireland has speculated that we might eventually end up with an app to monitor covid-19 status that is “mandatory but not compulsory for people who deal with the public or work in a shared space.”

As Simon McGarr argues, the failure to adequately engage with these critiques and to be open and transparent means that “the launch of the app will inevitably be marred by immediately being the subject of questions and misinformation that could have been avoided by simply overcoming the State’s institutional impulse for secrecy.”

Internationally, there is scepticism concerning the method being used for app-based contact tracing and whether the critical conditions needed for successful deployment exist. Bluetooth does not have sufficient resolution to determine two metres or less proximity and using a timeframe to denote significant encounters potentially excludes fleeting, but meaningful contacts. There are also concerns with respect to representativeness (for example, 28% of people do not own a smartphone in Ireland), data quality, reliability, duping and spoofing, and rule-sets and parameters. The technical limitations are likely to lead to sizeable gaps and a large number of false positives that might produce an unmanageable signal-to-noise ratio, leading to unnecessary self-isolation measures and potentially overloading the testing system.

There is a concern that app-based contact tracing is being rushed to mass roll-out without it being demonstrated that it is fit-for-purpose. Moreover, the app will only be effective in practice if: there is a program of extensive testing to confirm that a person has the virus and if tracing is required; and 60% of the population participate to ensure reach across those who have been in close contact (c.80% of smartphone users). The symptom tracking relies on self-reporting, which lacks rigour and, as testing has shown, a large proportion of the population who were tested because they were experiencing symptoms returned negative. This is likely to lead to a large number of false positives and it is doubtful that these data should guide contact tracing.

At present, while Ireland is ramping up its testing capability towards 100,000 tests a week, it might need to increase that further. The Edmond J. Safra Center for Ethics at Harvard University suggest that in the United States: “We need to deliver 5 million tests per day by early June to deliver a safe social reopening. This number will need to increase over time (ideally by late July) to 20 million a day to fully remobilize the economy. We acknowledge that even this number may not be high enough to protect public health.” The equivalent rate for Ireland would be 300,000 tests per day. In Singapore, only 12% of people have registered to use the TraceTogether app, which raises doubts as to whether 60% of the population in Ireland will participate, especially since the public are primed to be sceptical given media coverage about the app have raised issues of privacy, data security and data usage.

Will CovidTracker Ireland work and what needs to happen?

There is unanimous agreement that contact tracing is a cornerstone measure for tackling pandemics. Assuming that the privacy and data protection issues can be adequately dealt with it, it would be good to think that CovidTracker Ireland will make a difference to containing the coronavirus and stopping any additional waves of infection.

However, there are reasons to doubt that app-based contact tracing and symptom tracking will make the kind of impact hoped for unless:

  • its technical approach is sound and civil liberties protected;
  • there is testing at sufficient scale that potential cases, including false ones, are dealt with quickly;
  • the government can persuade people to participate in large numbers.

The government might also have to supply smartphones to those that do not own them, as they did in Taiwan. Persuading people to participate will especially be a challenge since the government is not being sufficiently transparent at present in explaining the approach being taken, the app’s intended technical specification, how it will operate in practice, its procedures for oversight, and how it will protect civil liberties.

It is essential that the government follow the guidance of the European Data Protection Board that recommends that strong measures are put in place to protect privacy, data minimization is practised, the source code is published and regularly reviewed, there is clear oversight and accountability, and there is purpose limitation that stops control creep.

If implemented poorly, the app could have a profound chilling effect on public trust and public health measures that might be counterproductive. As a consequence, the Ada Lovelace Institute, a leading UK centre for artificial intelligence research, is advising governments to be cautious, ethical and transparent in their use of app-based contact tracing. Ireland might do well to heed their advice.

Rob Kitchin

Using digital technologies to tackle the spread of the coronavirus: Panacea or folly?

Update: a revised version of this working paper has now been published as open access in Space and Polity.

A new paper by Rob Kitchin (Programmable City Working Paper 44) examines whether digital technologies will be effective in tackling the spread of the coronavirus, considers their potential negative costs vis-a-vis civil liberties, citizenship, and surveillance capitalism (see table below), and details what needs to happen.

PDF of working paper          (PDF of revised version in Space and Polity)

Using digital technologies to tackle the spread of the coronavirus: Panacea or folly?

Abstract
Digital technology solutions for contact tracing, quarantine enforcement (digital fences) and movement permission (digital leashes), and social distancing/movement monitoring have been proposed and rolled-out to aid the containment and delay phases of the coronavirus and mitigate against second and third waves of infections. In this essay, I examine numerous examples of deployed and planned technology solutions from around the world, assess their technical and practical feasibility and potential to make an impact, and explore the dangers of tech-led approaches vis-a-vis civil liberties, citizenship, and surveillance capitalism. I make the case that the proffered solutions for contact tracing and quarantining and movement permissions are unlikely to be effective and pose a number of troubling consequences, wherein the supposed benefits will not outweigh potential negative costs. If these concerns are to be ignored and the technologies deployed, I argue that they need to be accompanied by mass testing and certification, and require careful and transparent use for public health only, utilizing a privacy-by-design approach with an expiration date, proper oversight, due processes, and data minimization that forbids data sharing, repurposing and monetization.

Keywords: coronavirus; COVID-19; surveillance; governmentality; citizenship; civil liberties; contact tracing; quarantine; movement; technological solutionism; spatial sorting; social sorting; privacy; control creep; data minimization; surveillance capitalism; ethics; data justice.

coronavirus tech issues

Smart spaces and smart citizens?

I attended the Smart Cities and Regions Summit in Croke Park, Dublin, today and took part in the ‘smart spaces and smart citizens?’ panel. We were asked to produce a short opening statement and thought I’d share it here.

I’m going to discuss smart citizens by considering Dublin as a smart city. To start, I want to ask you a set of questions which I’d like you to respond to by raising a hand. Don’t be shy; this requires participation.

How many of you have a good idea as to what Smart Dublin is and what it does?

How many of you feel you have a good sense of smart city developments taking place in Dublin?

Would you be able to tell me much about the 100+ smart city projects that are taking place in the city in conjunction with Smart Dublin and it four local authority partners?

Would you be able to tell me much about the extent to which these projects engage with citizens?

Or how the technologies used impact citizens, either in direct or implicit ways?

Or whether Smart Dublin and the four local authorities have a guiding set of principles or a programme for citizen engagement or smart citizens?

You’re all people interested in smart cities. You’re here because it relates to your work in some way. You have a vested interest in knowing about smart cities.

Do you think that citizens in Dublin know about these projects, which might be taking place in their locality?

Do you think that they have sufficient knowledge to be able judge, in an informed way, a project’s merits?

Do you think they have an active voice in these projects’ conception, their deployment, the work that they do? In how any data generated are processed, analysed, shared, stored, and value extracted, etc.?

Do local politicians – citizen representatives – know about them? And do they have an active voice in smart city development in Dublin?

This panel is titled ‘Smart spaces and smart citizens’.

What is difficult to see in most smart city initiatives is the ‘smart citizen’ element. It seems that what is implied by ‘smart citizen’ is simply being a person living in a city where smart city technology is deployed, or being a person that uses networked digital technology as part of everyday life.

To create a smart citizen, all a state body or company apparently needs to do is say people should be at the heart of things, or enact a form of stewardship (deliver a service on behalf of citizens) and civic paternalism (decide what’s best for citizens), rather than citizens being meaningfully involved in the vision and development of the smart city.

In our own research concerning networked urbanism and smart cities from a social sciences perspective we have been interested in exploring these kinds of questions, and how the citizen fits into the smart city. It’s a central concern in our latest book published next month, ‘The Right to the Smart City’, which explores the smart city in relation to notions of citizenship and social justice.

What our research shows is that citizens can be varyingly positioned, and perform very different roles, in the smart city depending on the type of initiative.

ladder

It is perhaps no surprise then that citizens in numerous jurisdictions have started to push back against the more technocratic, top-down, marketised versions of the smart city – the on-going protests in Toronto over the Sidewalk Labs waterfront development being a prominent example. Instead, they demand more inclusive, empowering and democratic visions, with Barcelona’s notion of technological sovereignty often providing inspiration (see my recent piece comparing Toronto and Barcelona and links to articles and organisation websites).

It is difficult to argue that we are enabling ‘smart citizens’ if they are not informed, consulted or involved in the development and roll-out of smart city initiatives. As such, if we are truly interested in creating smart citizens then we need to make a meaningful move beyond the dominant tropes of stewardship and civic paternalism to approach smart cities in a smarter way.

For a fuller discussion see the opening and closing chapters of The Right to the Smart City, which are available as open access versions.

Kitchin, R., Cardullo, P. and di Feliciantonio, C. (2018) Citizenship, Social Justice and the Right to the Smart City. Pre-print Chapter 1 in The Right to the Smart City edited by Cardullo, P., di Feliciantonio, C. and Kitchin, R. Emerald, Bingley.

Kitchin, R. (2018) Towards a genuinely humanizing smart urbanism. Pre-print Chapter 14  in The Right to the Smart City edited by Cardullo, P., di Feliciantonio, C. and Kitchin, R. Emerald, Bingley.

Rob Kitchin

Queering code/space: special section of GPC

There a new special section of Gender, Place and Culture on queer theory and software studies and the queering of code/space edited by Dan Cochayne and Lizzie Richardson. I’ve not had anything to do with the issue other than to referee one paper. It’s nice to see the code/space concept though being re-worked with queer theory and software studies and the digital thought about with respect to sexuality and space because in many ways that is its origin and its publication provides the opportunity to provide a short anecdote of our initial thinking. Myself and Martin Dodge started our work on the first code/space paper in 2002/3.  At that time I was finishing an ESRC-funded project on homophobic violence in Northern Ireland and had been using Michel Foucault, Judith Butler, Gillian Rose and queer theory in general to frame this material. One of the papers I drafted at the time with Karen Lysaght was titled ‘Queering Belfast: Some thoughts on the sexing of space’, which was published as a working paper. Our initial working of code/space was rooted in this work, with the term ‘code/space’ echoing Foucault’s power/knowledge (in terms of being a dyadic relationship). Martin then discovered Adrian Mackenzie’s use of transduction and technicity (borrowed from Simonden), which was also ontogenetic in conception and more centrally focused on technology. I seem to remember us trying to blend performativity and transduction together, then moving to favour transduction. It’s nice to see those ideas now coming together in productive ways. Check out what are a fascinating set of papers.

Rob Kitchin

 

The limits of social media big data

handbook social media researchA new book chapter by Rob Kitchin has been published in The Sage Handbook of Social Media Research Methods edited by Luke Sloan and Anabel Quan-Haase. The chapter is titled ‘Big data – hype or revolution’ and provides a general introduction to big data, new epistemologies and data analytics, with the latter part focusing on social media data.  The text below is a sample taken from a section titled ‘The limits of social media big data’.

The discussion so far has argued that there is something qualitatively different about big data from small data and that it opens up new epistemological possibilities, some of which have more value than others. In general terms, it has been intimated that big data does represent a revolution in measurement that will inevitably lead to a revolution in how academic research is conducted; that big data studies will replace small data ones. However, this is unlikely to be the case for a number of reasons.

Whilst small data may be limited in volume and velocity, they have a long history of development across science, state agencies, non-governmental organizations and business, with established methodologies and modes of analysis, and a record of producing meaningful answers. Small data studies can be much more finely tailored to answer specific research questions and to explore in detail and in-depth the varied, contextual, rational and irrational ways in which people interact and make sense of the world, and how processes work. Small data can focus on specific cases and tell individual, nuanced and contextual stories.

Big data is often being repurposed to try and answer questions for which it was never designed. For example, geotagged Twitter data have not been produced to provide answers with respect to the geographical concentration of language groups in a city and the processes driving such spatial autocorrelation. We should perhaps not be surprised then that it only provides a surface snapshot, albeit an interesting snapshot, rather than deep penetrating insights into the geographies of race, language, agglomeration and segregation in particular locales. Moreover, big data might seek to be exhaustive, but as with all data they are both a representation and a sample. What data are captured is shaped by: the field of view/sampling frame (where data capture devices are deployed and what their settings/parameters are; who uses a space or media, e.g., who belongs to Facebook); the technology and platform used (different surveys, sensors, lens, textual prompts, layout, etc. all produce variances and biases in what data are generated); the context in which data are generated (unfolding events mean data are always situated with respect to circumstance); the data ontology employed (how the data are calibrated and classified); and the regulatory environment with respect to privacy, data protection and security (Kitchin, 2013, 2014a). Further, big data generally capture what is easy to ensnare – data that are openly expressed (what is typed, swiped, scanned, sensed, etc.; people’s actions and behaviours; the movement of things) – as well as data that are the ‘exhaust’, a by-product, of the primary task/output.

Small data studies then mine gold from working a narrow seam, whereas big data studies seek to extract nuggets through open-pit mining, scooping up and sieving huge tracts of land. These two approaches of narrow versus open mining have consequences with respect to data quality, fidelity and lineage. Given the limited sample sizes of small data, data quality – how clean (error and gap free), objective (bias free) and consistent (few discrepancies) the data are; veracity – the authenticity of the data and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to; and lineage – documentation that establishes provenance and fit for use; are of paramount importance (Lauriault, 2012). In contrast, it has been argued by some that big data studies do not need the same standards of data quality, veracity and lineage because the exhaustive nature of the dataset removes sampling biases and more than compensates for any errors or gaps or inconsistencies in the data or weakness in fidelity (Mayer-Schonberger and Cukier, 2013). The argument for such a view is that ‘with less error from sampling we can accept more measurement error’ (p.13) and ‘tolerate inexactitude’ (p. 16).

Nonetheless, the warning ‘garbage in, garbage out’ still holds. The data can be biased due to the demographic being sampled (e.g., not everybody uses Twitter) or the data might be gamed or faked through false accounts or hacking (e.g., there are hundreds of thousands of fake Twitter accounts seeking to influence trending and direct clickstream trails) (Bollier, 2010; Crampton et al., 2012). Moreover, the technology being used and their working parameters can affect the nature of the data. For example, which posts on social media are most read or shared are strongly affected by ranking algorithms not simply interest (Baym, 2013). Similarly, APIs structure what data are extracted, for example, in Twitter only capturing specific hashtags associated with an event rather than all relevant tweets (Bruns, 2013), with González-Bailón et al. (2012) finding that different methods of accessing Twitter data – search APIs versus streaming APIs – produced quite different sets of results. As a consequence, there is no guarantee that two teams of researchers attempting to gather the same data at the same time will end up with identical datasets (Bruns, 2013). Further, the choice of metadata and variables that are being generated and which ones are being ignored paint a particular picture (Graham, 2012). With respect to fidelity there are question marks as to the extent to which social media posts really represent peoples’ views and the faith that should be placed on them. Manovich (2011: 6) warns that ‘[p]eoples’ posts, tweets, uploaded photographs, comments, and other types of online participation are not transparent windows into their selves; instead, they are often carefully curated and systematically managed’.

There are also issues of access to both small and big data. Small data produced by academia, public institutions, non-governmental organizations and private entities can be restricted in access, limited in use to defined personnel, or available for a fee or under license. Increasingly, however, public institution and academic data are becoming more open. Big data are, with a few exceptions such as satellite imagery and national security and policing, mainly produced by the private sector. Access is usually restricted behind pay walls and proprietary licensing, limited to ensure competitive advantage and to leverage income through their sale or licensing (CIPPIC, 2006). Indeed, it is somewhat of a paradox that only a handful of entities are drowning in the data deluge (boyd and Crawford, 2012) and companies such as mobile phone operators, app developers, social media providers, financial institutions, retail chains, and surveillance and security firms are under no obligations to share freely the data they collect through their operations. In some cases, a limited amount of the data might be made available to researchers or the public through Application Programming Interfaces (APIs). For example, Twitter allows a few companies to access its firehose (stream of data) for a fee for commercial purposes (and have the latitude to dictate terms with respect to what can be done with such data), but with a handful of exceptions researchers are restricted to a ‘gardenhose’ (c. 10 percent of public tweets), a ‘spritzer’ (c. one percent of public tweets), or to different subsets of content (‘white-listed’ accounts), with private and protected tweets excluded in all cases (boyd and Crawford, 2012). The worry is that the insights that privately owned and commercially sold big data can provide will be limited to a privileged set of academic researchers whose findings cannot be replicated or validated (Lazer et al., 2009).

Given the relative strengths and limitations of big and small data it is fair to say that small data studies will continue to be an important element of the research landscape, despite the benefits that might accrue from using big data such as social media data. However, it should be noted that small data studies will increasingly come under pressure to utilize the new archiving technologies, being scaled-up within digital data infrastructures in order that they are preserved for future generations, become accessible to re-use and combination with other small and big data, and more value and insight can be extracted from them through the application of big data analytics.

Rob Kitchin