Data discovery and dissemination
Talk about revolution
"The SSRC Data Bank...is claimed to represent a revolution in
information retrieval techniques" (Times Educational Supplement, 1 March 1968).
Being formed in 1967 - the year of revolution - the Archive obviously predates
the world of the web, e-science and cyber-infrastructures, and even email for
that matter, by decades. Consequently, in terms of the Archive's history, it is
only fairly recently that the point has been reached whereby most researchers
and users will interact with the service wholly through the internet. Web-based
delivery has revolutionised the entire industry of data-based services and
changed them beyond recognition, but, for a service born 40 years ago, access
to resources and the dissemination of those resources has not always been so
straightforward.
It was not until May 1984 that the UKDA launched a service for the exchange of
material on floppy disks. Prior to that, data were supplied on tape as the
'hard' medium of choice and occasionally via telnet in what might be thought of
as the e-medium of its day! Even then, supply via telnet (and later ftp)
required a technical expertise on the part of the remote user. Indeed, it was
not be unusual for users to require the assistance of colleagues in computing
services to 'pull' data across the network using telnet protocols. But even
with these technical obstacles in place, dissemination via telnet remained a
faster and cheaper option than copying data to a hard medium and then
distributing the material by post.
In the early 1990s, in collaboration with the Archive, MIDAS (as it was then)
at Manchester began to host remote access to a selected number of large-scale
government surveys. Rather than requesting copies of the data to be sent from
Essex, researchers could logon to the national computing service at Manchester
and conduct their analyses using standard software such as Scientific
Information Retrieval (SIR) or Statistical Package for Social Science (SPSS).
The early 1990s was also when the Archive started to supply data on CD - which
it described in a release to users in 1991 as something "which can be played on
most desktop computers, ... providing an excellent (and inexpensive) vehicle for
distributing large amounts of stable data".
Spinning a web
A significant turning point, of course, was the introduction of the web. The
Archive developed a web-based presence fairly early, launching its first
web site in 1994. Initially, like a lot of web services at the time, it was
limited in scope, essentially allowing users to search the holdings using the
Bibliographic Information Retrieval ONline (BIRON) catalogue and, for
the first time, place orders online. At this time, all users were still
required to print off and sign a user undertaking outlining their
responsibilities as a user each time they placed an order.
The arrival of the new millennium saw further technological advances and in
November 2000 a new suite of web pages were launched. Users were now able to
register with the Archive online - paper registration forms, processed manually
and subject to postal delay/misdirection, became a thing of the past - removing
a significant barrier to use. Requests for data became instantaneous and users
were (and still are) able to order data using a 'shopping-basket' facility.
A year later users were not only able to request data instantaneously, but
could browse and download selected data at the click of a mouse. Nesstar, the
internet-based software, permitted users to browse, visualise and undertake
exploratory analyses on survey data. Now the Nesstar Catalogue has been linked
to the Council of European Social Science Data Archives (CESSDA) data portal
allowing UK-based researchers to locate, access and browse data from across
several European countries through a common interface.
2001 also saw the launch of UKDA Download - an online data download facility
for registered users, offering data in SPSS, Stata, tab-delimited and Rich Text
Format (RTF) format. This download facility remains a core part of the user
experience today with all new studies, whether numeric, textual or
multi-format, routinely prepared for instant download, and all back catalogue
datasets prepared for download as and when they are ordered. Restricted
datasets can also be disseminated with full security controls - a significant
improvement over the previous situation where restricted datasets would have to
be delivered via ftp or offline media.
In 2003, an online browsing system for qualitative data called ESDS Qualidata
Online was developed in order to compliment the Nesstar system for survey-type
data. This allows free-text interview transcripts to be searched in a
structured manner and also allows related qualitative research materials (such
as audio files and images) to be embedded for download.
In the year 2005/06 the overall number of datasets delivered
to users rose by nearly 20 per cent over the previous year, from
41,134 to 49,169.
ESDS Annual Report,
2005-2006
Together with the introduction of these web-based technologies which speed the
process of delivering data to researchers significantly, perhaps one of the
greatest achievements in making access to data quicker and easier has been the
introduction of the one-stop registration service. First introduced in 2002 and
subsequently modified to included Athens authentication, this allows
researchers and users to register once and agree standard terms and conditions of
use online and then access as many data collections as they wish. This is a
huge advance from the days when researchers had to complete, sign and send a
data request form for every dataset required.
It's there somewhere...
Providing information about the content of the data collection held by the
Archive has always been a high priority. In the early days of the Archive it
was thought that the main finding aid should be in the form of a comprehensive
(paper-based, of course) cross-referenced list of questions derived from
surveys. This was to include questions asked in all social surveys, including
those not held by or accessible from the Archive. The KWIC (keyword-in-context)
project, as it was known, was hugely ambitious for its time and proved to be
almost too burdensome to create and cumbersome to use. As David Allen later
commented: "Few who consulted KWIC discovered what they were hoping for, more
were confused by it and even found it a source or merriment ('colour', for
example, lead one not to questions on racial problems but to a survey on
paint)".
KWIC was relatively short-lived and replaced by a more conventional catalogue
or inventory of data holdings, updated on a regular basis. The first of these
was produced in 1969 and is remembered with due reverence by Eric Roughley
(Deputy Director of the Archive from 1967 until his retirement in 1992):
"With due ceremony, the Computing Service's printer was cleaned and prepared, a
new ribbon inserted and, in a most stately fashion, a wodge of print-out would
be carried over to the Computing Centre where we gathered around to observe the
new catalogue being printed off. The whole exercise was conducted in a manner
more like a masonic ritual - the operator even wore white gloves! The pristine
product was then distributed to our subscribers (six) who paid £3...for the
privilege".
Later editions of the catalogue were printed, bound
and published - the last such publication of this kind being a weighty
two-volume tome.
An important change was implemented in 1981 when the Archive formerly adopted a
standard study description for survey data developed collectively by a group of
European social science data archives. Common description has subsequently
facilitated exchanges of information between archives and cross archive
information retrieval. In January 1986 the UKDA launched its first computerised
bibliographic retrieval system, known as Bibliographic Information Retrieval
ONline (BIRON), developed under the leadership of Bridget Winstanley.
A thesaurus of terms used for the indexing of its humanities and social science
datasets known as Humanities and Social Science Electronic Thesaurus (HASSET)
had also been developed building on the 1971 UNESCO thesaurus. Still under
continuous development and renewal, this has subsequently been implemented in a
number of informational retrieval systems around the world, and in a condensed
form (the European Language Social Science Thesaurus (ELSST)) has been
translated into eight languages to date, with others planned. Alongside BIRON,
in 1995 the Archive released its first web-enabled catalogue that allowed
cross-searching of other European archives collection descriptions. This
resource was one of the first web portals found in the social sciences. The
UKDA eventually said goodbye, sadly, to BIRON in December 2002 as advances in
technology enabled much faster, and more flexible, access to the catalogue via
the Archive's web site.