Data formats

The term file format refers to the logical structure used to store information in a GIS file. File formats are important in part because not every GIS software package supports all formats. If you want to use a data set, but it isn’t available in a format that your GIS supports, you will have to find a way to transform it, find another data set, or find another GIS.
Almost every GIS has its own internal file format. These formats are designed for optimal use inside the software and are often proprietary. They are not designed for use outside their native systems. Most systems also support transfer file formats. Transfer formats are designed to bring data in and out of the GIS software, so they are usually standardized and well documented.
If your data needs are simple, your main concern will be with the internal format your GIS software supports. If you have complex data needs, you will want to learn about a wider range of transfer formats, especially if you want to mix data from different sources. Transfer formats will be required to import some data sets into your software.

Many GIS applications are based on vector technology, so vector formats are the most common. They are also the most complex because there are many ways to store coordinates, attributes, attribute linkages, database structures, and display information. Some of the most common formats are briefly described below and summarized in Table 1.
Arc Export
Arc Export is a transfer format, either ASCII or compressed into binary used to transfer files between different versions of ARC/INFO. It is undocumented and will work only with ESRI products.

ARC/INFO Coverages
An ARC/INFO "coverage" is a set of internal binary files used by ARC/INFO, a GIS program. This file format is proprietary and not readily usable by other programs.

AutoCAD" Drawing Files (DWG)
DWG is the internal, proprietary format used in AutoCAD® software, which is a computer-aided design/drafting (CAD) program. Despite its proprietary nature, AutoCAD can convert any DWG file to a DXF file (described below) without loss of graphic information. As with DXF files, there are a number of ways to store attribute information in DWG files. The emerging standard is one that uses Extended Entity Data (EED) to link attributes, but many others are possible. However, the lack of one standard for linking attributes can cause problems when data is transferred between systems.

Autodesk’s Data Interchange File (DXF) Format
DXF is probably the most widely used vector data transfer format, and a file in DXF format offers some very strong advantages. It contains very complete display information, and almost every graphics program can read it. However, there are several different ways to store attribute information in DXF and to link DXF entities to external attributes. Because there are no attribute standards, many programs that claim to read DXF files still do not import attribute information properly.

Digital Line Graphs (DLG)
DLG, a transfer format used by the US Geological Survey (USGS), depicts vector information portrayed on printed paper maps. It carries very accurate coordinate information and sophisticated feature-classification information but no other attribute data. DLG does not include any display information. The DLG standard is significant because the USGS and other US government agencies have used it to publish large numbers of digital maps.

Hewlett-Packard Graphic Language (HPGL)
HPGL is a language that controls computer plotters; it contains display information but no geographic coordinates or attribute data. It is usually not appropriate for the storage or transfer of GIS data.

MapInfo" Data Transfer Files (MIF/MID).
MIF/MID is a transfer standard used by MapInfo, a desktop mapping system. It carries all three types of GIS information: geographic, attribute, and display. Attribute links are implicit in the file format.

MapInfo Map Files.
MapInfo has its own internal binary format, known as a map file. It is undocumented and proprietary, so it cannot be used outside a MapInfo system.

MicroStation Design Files (DGN).
DGN is the internal format used by Bentley Systems Inc.’s MicroStation, a CAD program. It is well documented and standardized, so it may also be used as a transfer standard. DGN files contain detailed display information. The most common way to store attributes is to place them in an external database file and record links in the MSLINK field-a data item carried for each element in the DGN file.

Spatial Data Transfer System (SDTS)
SDTS, a new transfer format developed by the US government, was designed to handle all types of geographic data. SDTS can be either binary or ASCII but is generally binary. Virtually all geographic concepts can be encoded in SDTS, including coordinate information, complex attribute information, and display information. This versatility causes a corresponding increase in complexity. To simplify things, several standard subsets of SDTS have been adopted. The first of these, the Topological Vector Profile (TVP), is used to store certain types of vector maps. SDTS can also be used for raster information. Not much data is available in SDTS format at this time, nor do many software systems support it. However, it will be the foundation of the US National Spatial Data Infrastructure (NSDI). Its importance will increase as more NSDI data becomes available.

Topologically Integrated Geographic Encoding and Referencing Files (TIGER).
TIGER is an ASCII transfer format used by the US Census Bureau to store the street maps constructed for the 1990 census. It contains complete geographic coordinates and is line, not polygon, based (although polygons can be constructed from its attribute information). The most important attributes include street name and address information. TIGER does not contain display information. Maps of the entire US are available in TIGER format.

Vector Product Format (VPF)
VPF is a binary format used by the US Defense Mapping Agency. It is well documented and can be used as an internal format and as a transfer format. It carries geographic and attribute information but no display data. VPF files are sometimes referred to as VMAP products. The Digital Chart of the World (DCW) is published in this format.

Raster files generally are used to store image information, such as scanned paper maps or aerial photographs. They are also used for data captured by satellite and other airborne imaging systems. Images from these systems are often referred to as remote-sensing data. Unlike other raster files, which express resolution in terms of cell size and dots per inch (dpi), resolution in remotely sensed images is expressed in meters, which indicates the size of the ground area covered by each cell.
Some common raster formats are described below and summarized in Table 2.
Arc Digitized Raster Graphics (ADRG).
ADRG is a format used by the US military to store raster images of paper maps.

Band Interleaved by Line (BIL),.
Band Interleaved by Pixel (BIP), and Band Sequential (BSQ). BIL, BIP, and BSQ are formats produced by remote-sensing systems. The primary difference among them is the technique used to store brightness values captured simultaneously in each of several colors or spectral bands.

Digital Elevation Model (DEM).
DEM is a raster format used by the USGS to record elevation information. Unlike other raster file formats, DEM cells do not represent color brightness values, but rather the elevations of points on the earth’s surface.

PC Paintbrush Exchange (PCX).
PCX is a common raster format produced by most scanners and personal computer (PC) drawing programs.

Spatial Data Transfer Standard (SDTS).
As was indicated under vector formats above, SDTS is a general-purpose format designed to transfer geographic information. One SDTS variant is the raster profile, designed as a standard format for transferring raster data. However, this protocol has not as yet been finalized.

Tagged Image File Format (TIFF).
Like PCX, TIFF is a common raster format produced by PC drawing programs and scanners.

Any digital map is capable of storing much more information than a paper map of the same area, but it’s generally not clear at first glance just what sort of information the map includes. For example, more information is usually available in a digital map than what you see on-screen. And evaluating a given data set simply by looking at the screen can be difficult: What part of the image is contained in the data and what part is created by the GIS program’s interpretation of the data? You must understand the types of data in your map so you can use it appropriately.
Three general types of information can be included in digital maps:
  • Geographic information
    which provides the position and shapes of specific geographic features.
  • Attribute information
    which provides additional non-graphic information about each feature.
  • Display information
    which describes how the features will appear on the screen.
Some digital maps do not contain all three types of information. For example, raster maps usually do not include attribute information, and many vector data sources do not include display information.
Geographic Information
The geographic information in a digital map provides the position and shape of each map feature. For example, a road map’s geographic information is the location of each road on the map.
In a vector map, a feature’s position is normally expressed as sets of X,Y pairs or X,Y,Z triples, using the coordinate system defined for the map (see the discussion of coordinate systems, below). Most vector geographic information systems support three fundamental geometric objects:
  • Point: A single pair of coordinates.
  • Line: Two or more points in a specific sequence.
  • Polygon: An area enclosed by a line.
Some systems also support more complex entities, such as regions, circles, ellipses, arcs, and curves.
Attribute Information
Attribute data describes specific map features but is not inherently graphic. For example, an attribute associated with a road might be its name or the date it was last paved. Attributes are often stored in database files kept separately from the graphic portion of the map. Attributes pertain only to vector maps; they are seldom associated with raster images.
GIS software packages maintain internal links tying each graphical map entity to its attribute information. The nature of these links varies widely across systems. In some the link is implicit, and the user has no control over it. Other systems have explicit links that the user can modify. Links in these systems take the form of database keys. Each map feature has a key value stored with it; the key identifies the specific database record that contains the feature’s attribute information.
Should problems arise, it is important for you to know how your software establishes and maintains attribute links.
Display Information
The display information in a digital-map data set describes how the map is to be displayed or plotted. Common display information includes feature colors, line widths and line types (solid, dashed, dotted, single, or double); how the names of roads and other features are shown on the map; and whether or not lakes, parks, or other area features are color coded.
However, many users do not consider the quality of display information when they evaluate a data set. Yet map display strongly affects the information you and your audience can obtain from the map -- no matter how simple or complex the project. A technically flawless, but unattractive or hard-to-read map will not achieve the goal of conveying information easily to the user.
Oddly enough, many common data sets contain no display information. For example, USGS Digital Line Graph files provide no display information at all. Each feature contains an attribute that describes the entity but does not indicate display features. Users, and their GIS software, must interpret those attributes and decide how each will look on the final display.

No comments:

Post a Comment