A red line


GIS Guide to Good Practice
Section 5: Documenting your GIS Data set

5.3 Information to be recorded

It is generally a good idea to start recording information about your data as early as possible, and ideally you should begin recording as soon as you start using or creating the data. If you wait until just before depositing with ADS to start creating metadata and documentation, it will be difficult for you to provide some pieces of information at all, and far harder to write most of the rest than it would have been at the time you were actually doing it.

Assuming that you choose to record relevant details as you go along, it might be useful for you to start a formal log book of some kind. This way, it will be easier to find information later, rather than having to rifle through various old envelopes, scrap paper, and whatever else you scribbled on at the time.

Within this log book, it is normal to record such general details as the software you are using, the versions thereof, and the type of computer (PC, Macintosh, Sun workstation, etc. rather than Dell, Viglen, Compaq, etc.) you are running it on. As time passes, people discover problems with earlier versions of software, and if someone finds out that SuperGIS version 23.7 displaced all green lines on maps by 3mm, then it is undoubtedly useful for you to be able to look back through the log book and find that all your maps displaying public rights of way were created three years ago using SuperGIS 23.7. Knowing there is a problem, you can do something about retrospectively fixing it with adequate documentation.

5.3.1 Sources of data

Information about where the data you use are acquired from is one of the most important things you can record whilst constructing and using a GIS.

Data are acquired from numerous sources, including Ordnance Survey and other mapping agencies, local authorities, special interest groups, etc., and are gathered and displayed at a wide variety of — often different — scales or resolutions.

Each of these sources are of value for a different set of purposes, and each brings with it a different set of problems; data acquired at 1:50,000 scale, for example, may be ideally suited for plotting maps of artefact distributions, but wholly improper for recording the layout of individual excavation trenches (1 centimetre on a 1:50,000 map, after all, is equivalent to 50,000 centimetres, or 500 metres, on the ground).

In order to aid the user in deciding how best to incorporate your data within their own work, it is desirable to provide them with information such as the scale or resolution of the original survey, scale or resolution at which that survey was digitised into the computer, assumed errors from the data capture process (often expressed as a Root Mean Square, or RMS, error on printed maps), and the method by which the data were originally acquired (although both ultimately plotted at a scale of 1:100, a user will presumably be interested to know that one topographic data set was constructed by survey with measuring tapes and dumpy level, whilst the other is the result of a detailed survey by state of the art Total Station Theodolite).

Ownership of data is also an important attribute to record about any data set, and may well prove quite complex. Data owned by the Ordnance Survey, for example, might be used by North Yorkshire County Council to derive a new data set, 'owned' by the County Council. This, in turn, is used by York Archaeological Trust to derive a new data set, now 'owned' by them. Although little, if any, of the original Ordnance Survey resource may survive in this latest incarnation of the data, Ordnance Survey in reality continue to hold intellectual property rights which should be recognised and which may well affect the ease with which, for example, York Archaeological Trust could later legally sell 'their' data to Yorkshire Water.

Complicated data trails such as this are extremely common with digital data, and it makes life easier for everyone if the evolution of every data set is tracked through every reincarnation.

In short, then, a non–exhaustive list of the information you might wish to record during your everyday creation, collection, and use of data includes:

  • Computer hardware used
  • Computer software used
  • Date the data were captured/purchased/whatever
  • Who did the work
  • Data source ('bought from Ordnance Survey', etc.)
  • Scale/resolution of data capture
  • Scale/resolution at which data are currently stored
  • Root Mean Square error or other assessments of data quality
  • Purpose of data set creation, where known
  • Method of original data capture (Total Station Survey, etc.)
  • Purpose for which you acquired the data (might differ from the previous information where the data were created by someone else for one purpose, and bought from them by you for another)
  • Complete history of data ownership/rights.

5.3.2 Processes applied

As well as recording information such as that suggested above, most of which will probably only need recording once when you start work with a data set, it is also extremely valuable to log the manner in which data are manipulated and modified. Not only does this allow you to keep track of — and back–track from, if necessary — changes you make to the data, but it also allows you and others to work out how data you lifted from your local Sites & Monuments Record, for example, and incorporated into your own GIS differs from those same records still residing in the SMR. How many records have you enhanced? For how many have you had to re–enter the grid references, as you discovered that those provided by the SMR actually placed sites in the North Sea?

The sorts of information you may wish to consider logging for these purposes include:

  • The date of any change/modification
  • The reason for any change/modification
  • The record numbers affected by the change
  • Relationships to other resources; where, for example, you derive a new GIS data set by passing a mathematical filter or some other modification through an existing data set, you may wish to record the relationship formally between the original data and the new set

Where you edit an existing data set to correct spelling in text fields, or some similar operation, it makes more sense to simply record this as 'Corrected spelling throughout data set' and give the numbers of those records altered if relevant, rather than to list every single correction made to every single record. For processes such as converting an elevation matrix to a Triangulated Irregular Network (TIN) or an equally drastic data set-wide modification, it is worth recording the parameters you used in undertaking this process so that you — and others — may repeat or undo it in the future.


Next Bibliography Back Glossary Contents

A red line
Archaeology Data Service
© Mark Gillings, Peter Halls, Gary Lock, Paul Miller, Greg Phillips, Nick Ryan, David Wheatley, and Alicia Wise 1998

The right of Mark Gillings, Peter Halls, Gary Lock, Paul Miller, Greg Phillips, Nick Ryan, David Wheatley, and Alicia Wise to be identified as the Authors of this Work has been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

All material supplied via the Arts and Humanities Data Service is protected by copyright, and duplication or sale of all or part of any of it is not permitted, except that material may be duplicated by you for your personal research use or educational purposes in electronic or print form. Permission for any other use must be obtained from the Arts and Humanities Data Service(info@ahds.ac.uk).

Electronic or print copies may not be offered, whether for sale or otherwise, to any third party.

Arts and Humanities Data Service
A red line