MacDougall (1975), writing in the context of paper maps, provided an early warning of what to expect from cartographic processing in GIS: “It is quite possible that map overlays by their very nature are so inaccurate as to be useless and perhaps misleading for planning.” The concern was that maps, whilehaving individually adequate planimetric accuracy and “purity,” may when combined produce a seriously degraded product. Chrisman (1987), however, considered MacDougall’s analysis to be ”dangerously oversimplified.” Another early concern lay with the nature of the data themselves.
Most GIS are reductionist and support parametric forms of enquiry, which force data into well-defined categories even though in reality they may lie along a continuum (e.g., mutually exclusive land use classes or soil types). Boundaries (within and between layers) can be unintentionally misleading in that they imply both spatial homogeneity within a polygon and equal homogeneity for all areas of the same class (Robinove, 1981). Even though methods of classification, categorization, and boundary definition may be known for each layer, their effects on the validity of results due to combination may be very difficult to assess with scope for misinterpretation of the results.
Many of these problems already existed in paper thematic maps. Thus, ”the essence of mapping is to delineate areas that are homogeneous or acceptably heterogeneous for the intended purpose of the map” (Varnes, 1974). This again leads to a reductionist process of defining a hierarchical structure of classes, assigning each individual to a class, and placement of the classified individual in its correct position (Robinove, 1981).
Fixed scale also applies to paper topographic maps and accuracy tests are scale dependent (see below). Transferred to a digital environment certain safeguards inherent in paper maps, such as the fixed scale, were removed, though traditional attitudes toward spatial data persisted: If it wasn’t a problem in the past, so why is it now (Openshaw, 1989)? There is also evidence that many users poorly understood the accuracy of paper maps, often attributing them with higher levels of accuracy than warranted. “Data for which no record of its precision and reliability exists should be suspect” (Sinton, 1978). While there were accuracy standards developed for primary data collection, in secondary data collation (e.g., digitizing of existing maps) the assumption seems to have been that since the source documents were originally compiled to the prevailing standards (or were somehow authoritative), they could also be converted to GIS and used without problem. The ability to change scale in the digital environment and combine data sources at will was viewed only as a positive advantage. There appears to have been little regard for, or understanding of, the cumulative effects of these data combinations on the quality of the informational outputs.
Three widely reported studies seemed to clinch it for the GIS community. Blakemore (1984) tested an actual database of employment office areas and manufacturing establishments for which it was necessary to identify which manufacturer (a point) fell within which employment office area (a polygon); in other words, a standard point in a polygon test. By making basic assumptions regarding data input accuracy for both the points and the digitized polygon boundaries (ε or epsilon band, as illustrated in Figure 8.3), Blakemore found that only 55% of points could be unambiguously assigned. Newcomer and Szajgin (1984) tested error accumulation as probabilities when combining raster data layers. Their conclusion was similar to that of MacDougall (1975): a general rule of thumb that the result of an analysis would generally be less accurate than the least accurate layer used. Then Walsh et al. (1987) extended Newcomer and Szajgin’s analysis by using typical data sets for that time.
By using more than two layers and by varying cell size, they found that errors inherent in the data and additional errors introduced through combining layers led to a sufficient total error to render composite maps highly inaccurate. For two-layer combinations, the highest accuracy was 29% and the lowest 11%, while for three layer combinations, the accuracy ranged from 11% to just 6%. Though the situation today with higher resolution and more accurate spatial data is unlikely to result in such poor analytical products, such studies were nevertheless a very clear indication of the potential seriousness of the problem.
Equipment used for measurement, whether it be the time-honored tape measure, weighing scales, or advanced uses of the laser beam, all have their design accuracy and level of precision, some equipment requiring calibration before use. Most equipment, if used for repeated observations of the same static object or phenomenon, will result in a cluster of measurements normally distributed around the true value and spread in relation to the equipment’s precision. There is a tendency for this to be compounded by errors or bias in recording measurements. A gross error or blunder, such as putting the decimal point in the wrong place or writing too many zeros, should be noticeable as an extreme value or outlier when all the measurements are graphically plotted, such as by using box plots. Bias often occurs through unintentional rounding of the observations, say upward to the nearest 0.5 or to the nearest integer. Bias can occur if equipment is poorly calibrated or is adversely affected by temperature and/or humidity resulting in a shift in the readings.
These biases tend to result in systematic errors that are more difficult to detect and correct, but which, as we will see later, can compound during analyses and raise the level of uncertainty. It is also worth mentioning here the ‘small number’ problem. This often crops up, for example, in population dynamics when working with proportional data. A population of two that increases by two has increased by 100%, whereas an increase of two in a population of 100 increases by only 2%. Though not a measurement error, this is a measurement scale effect of working with small numbers where proportional increases can appear disproportionately large, but nevertheless result in biased analyses.