ACROSS THE DECADES - 40 years of data archiving

The future

futures image

"Prediction", observed Niels Bohr, the Danish Nobel prize-winning physicist, "is very difficult - especially about the future". As this review of the Archive has demonstrated, change has been an important aspect of its forty-year life - indeed, the last few years have witnessed a number of rapid and sweeping changes. Thus, it would be naïve to suggest that one can predict with any certainty what the next decade, let alone the next forty years, will bring.

However, a number of trends are already underway, and it is probable that several will continue. The data-using communities which the Archive serves will certainly expand. This is not just about expanding numbers - which in itself presents major challenges - but the growing heterogeneity of the user community. Not only is the Archive serving academic researchers from a growing range of disciplines, but increasingly Archive supplied data are being used in learning and teaching at all levels. This expansion is mirrored outside of the academic sector, and is being driven by the increased availability of materials available for online browsing and analysis. In short, the Archive's user community is becoming more diverse, and will continue to do so. This is to be welcomed but it will also present new challenges, as diversity will inevitably give rise to differential expectations of support and data delivery.

Hand in hand with greater diversity, individual users will in future place greater demands for information drawn from across traditional disciplinary boundaries.This will entail not just a greater mixing of qualitative and quantitative sources, but hopefully a breaking down of disciplinary divides by resorting to methods and data normally thought to be the preserve of others.In this scenario, the Archive will need to ensure that it provides appropriate bridges and gateways to subject-specific collections held by specialist centres, in particular in the humanities, the natural and environmental sciences, and within genetics, life sciences and epidemiology.

Related to this is the fact that the world of data is becoming increasingly fragmented and distributed. Gone are the days (if they ever existed) when a single data centre could claim to hold all the research data critical even for a specific subject.Thus a growing challenge for the Archive will be to ensure that the research communities that it serves can locate and access the data they require regardless of where they are held.This will require working with data creators, repositories and others to ensure that adequate metadata and finding aids are created and joined-up access and authentication procedures are in place.

As researchers increasingly aspire to work within a global data network, where data can be located and moved across international borders in a transparent fashion, the Archive will need to work to ensure that wherever possible international barriers are broken down, by creating an interoperable distributed data infrastructure.A further challenge will be to ensure that data are secure, are not subject to misuse and that trusted mechanisms are in place whereby sensitive data can be used for research without infringing growing confidentiality concerns. This can only work through co-operation and trust.

While an increasingly distributed data landscape poses problems for resource discovery and access, of greater concern is the impact on sustainability and long-term preservation of data resources. Research projects and other data creators may well have a short-term desire or requirement to disseminate the material they hold, perhaps via institutional repositories, but they rarely have the resources, expertise or longevity to ensure the data are accessible and usable in the long-term. While institutional repositories will grow in importance and use, the need for centralised repositories for unwieldy or complicated research data, which demand painstaking work to prepare for preservation, will not disappear. Thus, to maximise research investments and to retain data resources for re-use, the Archive will need to play an increasingly important role in managing the data lifecycle and digital preservation. Also envisaged are mixed models, where, in some cases, access is provided by researchers and institutions in the short-term within a distributed setting, with the Archive taking longer-term custodianship within a process of informed records management.

Kevin Schürer

All of these future challenges will require tools - tools for processing, discovery, access, authentication, security, dissemination, analytical, and preservation - as well as standards to power these tools. In the past, out of necessity, the Archive has largely invented or developed such tools itself. In the future, given the magnitude and interdisciplinary nature of the tasks ahead, it will need to work increasingly with others, within the e-Science, cyber infrastructure community and beyond.

Developing these tools must also reflect the fact that the nature of data is changing. The traditional boundaries between 'data' and 'publication' are being eroded. A future challenge will thus be to provide seamless access to research resources rather than data per se, whereby users can move easily from research sources and outputs as the gap between the two narrows and blurs.

Quite how far along these various roads the Archive will have travelled within the next forty years is open to speculation, but I am sure the journey will be eventful and exciting!

Kevin Schurer signature
Professor K Schürer
Director UK Data Archive