ToBS logo Thesaurus of British Surnames

A Proposed National Project

Introduction - Objectives of the initial meeting - Project Objectives - Resource Requirements - Project Ownership - Acceptance Criteria - Project Steering Group

On 4 December 2001 a meeting, chaired by Dr Ian Galbraith, CEO of Origins.net, was held at the Public Record Office to discuss a possible British Surname Thesaurus project. Among those present at the meeting were representatives of the Federation of Family History Societies, FreeBMD, Genealogical Society of Utah, Institute of Heraldry and Genealogical Research, National Archives of Scotland, PRO, Scottish Association of Family History Societies, Society of Genealogists, UK Data Archive (University of Essex). What follows is an updated version of a paper originally distributed by Dr Galbraith prior to the meeting.

Introduction

A major problem for users of genealogical databases is that of variant spellings of surnames. "Soundex" offers a little help, but is a technique poorly adapted to surnames. Various databases offer some degree of surname grouping, such that a search on one surname will return other "possibles", but no database offers anything like a comprehensive coverage of surname variants and with more large datasets being generated (eg the NBI) the problem is getting worse.

An on-going project to develop and manage an online thesaurus of British surname variants. By "British" we mean surnames in general use in the British Isles over the last five hundred years or so, not necessarily originating in the British Isles. Because of their complexity and relatively infrequent use in written documentation, it is proposed that Gaelic (Scots and Irish), Welsh and other Celtic names be included only via their anglicised equivalents, and that native spellings be specifically excluded. However, it is suggested that we should not be too pedantic about what is included and what not.

The intent is that the thesaurus should become a public resource, able to be searched independently and also to be incorporated directly into search engines. To get the project going the tasks to be undertaken must be defined, resources identified and assigned to the creation and long-term maintenance of the thesaurus.

The project is not intended to consider the origins, etc, of surnames: it is strictly concerned with variant spellings and their use in identifying records of potential interest to the genealogist and family historian. The objective is specifically to create an aid to searching, not to provide an authoritative list of "true’ surname variants. So when used in a database search, searching on one surname from a particular group of surnames would result in retrieval of all records containing other surnames in that group; this would not mean that all such records may refer to connected individuals, only that there might be a connection.

The project should ultimately provide a suitable infrastructure – technical and non-technical – to allow new datasets to be processed so as to capture new surname variants, and so ensure that the thesaurus is kept up to date.

As a master surname database grows, it will become practicable to compare "new" surname lists against it to identify surnames which do not appear in the master database. Such surnames may either be members of a new surname group or, more likely, are mis-transcriptions of other surnames. Thus the thesaurus can become a powerful tool for checking surname lists for accuracy. (A surname variant which occurs many times in different datasets is likely to be a "genuine" surname variant, but a surname spelling which occurs once only is more likely to indicate a transcription error.)

One aspect to the project will be the definition of meta structures and associated algorithms for lexical analysis of surnames, such that they may automatically assigned to surname groups. But inevitably, much manual effort will be required to deal with surnames which cannot be automatically assigned to groups. We expect that substantial manual effort will be required in the early stages of the project, but we would expect that as the thesaurus grows the need for such effort will reduce rapidly.

Objectives of the initial meeting

Project Objectives

Resource Requirements

Project Ownership

It is desirable that the project be run under the aegis of an independent organisation which (a) has a real interest in the project; (b) is directly involved in work related to the project and has standing within Britain for such work; (c) can ensure the permanence of the project; (d) can allocate some tangible resources to the project on an on-going basis.

As custodian of historic British datasets, the UK Data Archive at the University of Essex’s Department of History could be an appropriate "owner" of the project. Professor Kevin Schürer, Director of the UK Data Archive, is agreeable to this.

Acceptance Criteria

While virtually all surname datasets contain errors, it is important that, so far as practically possible, surname datasets contributed to the project should be of known provenance. We would not expect to accept data submitted by the general public. While we might prefer to accept only datasets of known quality (ie accuracy referred to the source), heavily used datasets should be accepted even if their quality was uncertain (eg the IGI). Specific datasets which we would expect to be submitted and accepted would include, for example (surnames extracted from):

It is estimated that the above datasets may contain around around a million different surnames.

Project Steering Group

This should consist of a small number of people with complementary skills and experience who would meet regularly to define the initial project plan, allocate task to the available resources, seek additional resources as required, monitor progress against plan, prepare and distribute regular progress reports to all interested parties (possibly these could be posted on an open web site). Members of the Steering Group were appointed at the PRO meeting.

The project manager would, of course, be a member of this group, but the steering group should have a separate chairman.

Ian Galbraith, Origins.net, 16 December 2001
[edited for the web, Peter Christian, 31 July 2002

ToBS Home Page