ACROSS THE DECADES - 40 years of data archiving

Acquisition of data

Pam Huntsman and Margaret Ward

At the first meeting of the council for the newly-formed SSRC Data Bank, held on 8 February 1968, the council urged that contact be made with "industrial firms" who were seen as "an important source of data", and recommended that the "data collection should not be restricted to materials relating to individuals and that data relating to firms should be included".

Acting on this, much effort was placed in the early years of the Archive on acquiring opinion poll data generated by leading market research companies - especially National Opinion Polls (NOP), Market and Opinion Research International (MORI), and Gallup.

Although several studies were being worked on simultaneously, formally the first dataset to be acquired by the Archive and made available to others was Village Life in Hampshire, 1965, a study undertaken by Mass-Observation, on behalf of Hampshire County Council. However, it did not prove an easy task for the Archive to persuade data creators to deposit, and many of the early datasets acquired were generated by academics working in universities and related research centres. The first catalogue produced in 1968 listed only six datasets - as David Allen, the research council's link person for the Archive later commented:

"The cupboard [of studies] was embarrassingly almost bare when the time came for Mother Hubbard in the shape of the SSRC to peer in at the end of the initial ... five years".
In the late 1970s a significant change was brought about with the proactive acquisition of government-generated data - especially large-scale, nationally-representative, repeated surveys. The first breakthrough came with the aggregated small area statistics for the 1971 census, which, as director at the time Ivor Crewe remembers, was a "mark of government recognition and approval" and was in part a result of relieving government departments of "the burden of providing data to bona fide users". In this sense, this reinforced the Archive's position as a conduit between data creator and data user.

This beginning was followed by the acquisition of the General Household Survey, the Labour Force Survey, and the Family Expenditure Survey, together with a growing range of other, smaller, government surveys. With these acquisitions, usage both increased and widened, reflecting the broad spectrum of research issues which could now be analysed. Government-generated data are now a critically important part of the Archive's collection, especially given the availability of series of survey data collected over a period of more than 20 years.

In developing its current acquisition strategy the Archive is informed by the ESRC-led National Data Strategy and works in close collaboration with The National Archives (TNA) in order to put in place a coherent joined-up policy to ensure that key materials for the understanding of society are available for research, now and in the future.

A critical partner in achieving this goal is the UK Office for National Statistics (ONS), with whom the Archive now has a contract and concordat, the result of a close and long-term working relationship. The strength of this relationship is reflected in the trust and confidence placed in the Archive as the custodian of data for which the creators may have concerns about satisfying confidentiality promises made to respondents, as well as the upholding of intellectual property rights and adherence to data protection issues.

UKDA 40 logo "In the early days ... a £1 was transferred between the depositor and the University of Essex... When I joined the Archive many years later, the transfer of the pound was no more but when a study had been processed and released for analysis, we would send a copy of the Agreement back to the depositor and affix a 10p stamp - clearly some tough negotiations had taken place to reduce the sum by 90p."

Susan Cadogan
Senior Acquisitions Officer, UKDA

Just as the last decade has witnessed heightened user demand for research data of increasing detail, so concerns over the protection of anonymity have grown. Striving to maintain a balance between these competing pressures, the Archive has recently developed, in conjunction with ONS, a framework for access to more sensitive data, available under Special Licence, which allows registered users access to more detailed and potentially disclosive data under stricter conditions of use.

Another important source of data for social science research has been that created by the academic community, for the academic community. In this respect, over the past 40 years the SSRC/ESRC has itself been a significant sponsor of data creation projects. Major surveys in this category include: the , the British Household Panel Survey, the Millennium Cohort Study, as well as the UK contribution to the European Social Survey. The latest in this formidable line of data investments is the UK Longitudinal Household Study, which, with a sample of some 40,000 households, will be the largest survey of its kind in Europe.

The Medical Research Council (MRC) has also contributed to the collection with important data such as the National Diet and Nutrition Survey.

In recent years the ESRC has invested in the licensing of data in addition to the funding of data collection. This policy has enabled data access agreements to be brokered for data collections which would otherwise have been inaccessible to most researchers due to commercial constraints. This trend was started with ESDS providing access to important international macro data series (aggregated data such as those held by the Organisation for Economic Co-operation and Development (OECD), International Monetary Fund (IMF) and World Bank) via its partnership with MIMAS, and has more recently been extended to micro data (individual-level data) such as the LatinoBarometers.

The Archive changed its name from the 'Survey' to the 'Data' Archive back in 1982 in order to reflect the fact that its remit had widened to all machine-readable data in the social sciences rather than just survey data. Although many still think of the Archive's collection consisting mostly (if not entirely) of numerical survey-orientated databases, the collection has become more and more diverse over the years in terms of theme and file type.

This is reflected most strongly in the creation of the specialist History Data Service in 1992 and the merger with Qualidata in 2001. Both have expanded the collection to contain non-numeric, textual, image and mixed methods collections.

These changes reflect the evolving nature of the Archive's acquisition and collections development policy, which needs to be continuously refashioned to keep pace with the changing data requirements of the social science research community. Web sites are already being archived for future use, the challenge is preparing for the future generations of data - podcasts, blogs, CCTV.