A red line

 

GIS Guide to Good Practice
Section 3 - Spatial data types

























































































































3.10 Issues to consider when structuring and organising a flexible attribute database

When attempting to structure and organise a flexible attribute database the following factors are of critical importance. In the following section each of these issues will be looked at in turn.

  • Naming conventions
  • Key fields
  • Character field definitions
  • Grid references
  • Validation
  • Numeric data
  • Data entry control
  • Confidence values
  • Consistency
  • Documentation
  • Dates

3.10.1 Naming conventions

Try to keep field names descriptive rather than cryptic. The crib sheet for decoding cryptic names may easily get lost, and your fields are likely to be too numerous for you to remember their contents easily.

3.10.2 Key fields

Key fields are the most important fields in your attribute database and are the fields that will be used for primary searching of the database and/or for linking tables within your database. It is essential that the same data definitions are used for all instances of the key field in your database and that the same codes are used in each.

3.10.3 Character field definitions

Take care with character field definitions. Most databases require character data to be stored in a fixed length form and so, inevitably, this means that every record must contain enough space for the largest expected, even where this is not required for the vast majority of records. As an example, there is no point in defining a location name field large enough to store the longest name in Monmouthshire, Llanvihangel-Ystern-Llewern, if the name Monmouth happens to be the longest in the data set!

3.10.4 Grid references

Store grid references in an appropriate notation for easy transition to a GIS or conversion to an appropriate map projection (e.g. British National Grid references are commonly held as alphanumeric attributes in a single column which require some processing before points can be mapped on a GIS, a more appropriate form of notation would be in two numerical columns e.g. 456344 / 267833 for SP 5634467833).

3.10.5 Validation

Get in the habit of ensuring that the data entered into any field in your attribute database makes sense. For example, check that you haven't typed the letter 'O' instead of '0' (zero). Another tip is to check that numeric values are within range - for example that a slip of the old typing fingers hasn't moved your Norman site from 1066 to 2066. It's often helpful to have someone else validate data that you have entered as typos are more easily detected by a fresh pair of eyes. If your data input tools allow you to define validation checks, use them, but remember that - like spelling checkers - they cannot catch all possible input errors.

3.10.6 Numeric data

It is best to use numeric field types rather than text fields if you have numeric data. This can have three benefits. First, confusing characters -- such as that familiar O (letter) instead of 0 (zero) problem -- cannot be stored in the wrong field type. Second, in many computer-based databases numeric information is stored more efficiently than text and occupies less space. This means that your GIS data set will be leaner and meaner. Third, when data is held in numeric form the data can more readily be manipulated with the arithmetic operators.

If you are using numeric data, also ensure that you use the most appropriate numeric type - integer or floating point. Integer types are used for storing whole numbers and floating point numbers are used for storing numbers which have, or may have, a fractional part.

3.10.7 Data entry control

Where possible the fields should be set up to use dictionaries or thesauri to ensure that typing errors are kept to a minimum and restricted to free text fields, and that terms used to describe real world objects are used accurately and consistently. Adhere to established appropriate project data standards (e.g. the RCHME/English Heritage Urban Archaeological Database Data Standards). If no project standards exist, adhere to the data standards of the digital archive for your data whether that be the SMR or the ADS. Remember that your data will need a home if it is to remain a useful and accessible resource in the future, and it is your responsibility to ensure its compatibility with other data sets of a similar spatial or temporal resolution.

3.10.8 Confidence values

These indicate the level of certainty that is associated with an entry in the attribute database. For example, your certainty that the location, identification, dating, etc. of the object is accurate. It is very good practice to maintain this information at all times.

3.10.9 Consistency

Try to ensure that the codes used to record your attribute data are consistent. Ensuring consistency is especially difficult when data entry is performed by more than one person, or if data entry is carried out incrementally over time. The use of thesauri and documentation standards can be helpful in ensuring consistency within your database and between your database and others. See Appendix 2 for a list of standards that may be appropriate.

3.10.10 Dates

Calendar dates should be recorded in a date field-type rather than character field-type to avoid the loss of crucial data when transferring into different software packages. Be aware some software will not prompt you if you are about to lose data due to incompatible field types.

3.10.11 Documentation

The most important thing of all is to document the way you have organised your database and entered information into it! All of Section 5 is devoted to this topic. It is essential that source-specific information is recorded as and when data is generated, as this task becomes increasingly difficult retrospectively. Where did the source data originate from, what was the scale at which it was prepared, if based on others' work where can this be found, and what are the copyright restrictions involved in its use by a third party? What levels of accuracy were accepted and what errors were recorded during digitization etc? What data standards were adhered to (dated if possible, as revisions will occur) and what naming conventions have been adopted.

 

Next Bibliography Back Glossary Contents

A red line
Archaeology Data Service
© Mark Gillings, Peter Halls, Gary Lock, Paul Miller, Greg Phillips, Nick Ryan, David Wheatley, and Alicia Wise 1998

The right of Mark Gillings, Peter Halls, Gary Lock, Paul Miller, Greg Phillips, Nick Ryan, David Wheatley, and Alicia Wise to be identified as the Authors of this Work has been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

All material supplied via the Arts and Humanities Data Service is protected by copyright, and duplication or sale of all or part of any of it is not permitted, except that material may be duplicated by you for your personal research use or educational purposes in electronic or print form. Permission for any other use must be obtained from the Arts and Humanities Data Service(info@ahds.ac.uk).

Electronic or print copies may not be offered, whether for sale or otherwise, to any third party.

Arts and Humanities Data Service
A red line