Documentation Standards for Automatic Data Processing in Mammalogy
American Society of Mammalogists
Committee on Information Retrieval
Suzanne B. McLaren, Chair
Peter V. August, Leslie N. Carraway, Paisley S. Cato, William L. Gannon, Marie Lawrence, Norman A. Slade, Philip D. Sudman, Richard W. Thorington, Jr., Stephen L. Williams, Susan M. Woodward
In 1979, the American Society of Mammalogists' Committee on Information Retrieval published one of the first documentation standards guides (Williams et al., 1979) for the computerization of museum specimens in any discipline. The idea of computerizing collections was less than a decade old and the capability of networking among collections seemed a dream that was almost within reach. A special workshop funded by the National Museums Act (Grant No. FC-5-50896) was held to develop plans for a national network called NIRM (Network for Information Retrieval in Mammalogy). One of the principal aims of the workshop was to develop data documentation standards that would ultimately be used among networking collections. In planning a national network, a basic set of specimen data was declared "mandatory." Documentation Standards for Automatic Data Processing in Mammalogy (Williams et al., 1979) was an outgrowth of that workshop.
In the years since the NIRM workshop, hardware and software technology have moved much closer to making the idea of networking a reality. In the meantime, a number of things have occurred with respect to collections and their computerization: 1) some museums have used the Documentation Standards (Williams et al., 1979) publication to develop their databases; 2) some museums were unaware of the publication and developed their own standards; 3) quite a few collections changed the type of database software they were using (McLaren et al., 1989); 4) Geographic Information Systems (GIS) usage has begun to affect the way we look at locality data (McLaren and Braun, 1993); 5) ancillary collection development grew at a rapid rate in many collections; 6) a heightened awareness of specimen care and museum conservation issues has occurred; 7) the National Science Foundation and the Association of Systematics Collections have begun to encourage planning for inter-disciplinary networking; and 8) the Canadian Heritage Information Network (CHIN) utilizes an inter-disciplinary approach that can provide tested methods of dealing with data handling problems, and may limit the ways in which Canadian mammal collections could choose to follow the standards listed here. As a result of these occurrences, it is time to re-examine the 1979 publication.
The intent of this publication is simply to refine and expand upon the original work. One basic premise was at work in the refinement of the standards. It is recognized that each collection has its own idiosyncracies, and these standards merely serve as a framework. Each collection has peculiarities in content, philosophy, and usage that make strict adherence to a single set of standards difficult. Also, older computer systems may lack the flexibility that newer hardware and software can allow. Comments from committee members during the development of this publication have underscored the diversity of elements that must be considered if a set of standards is to be of value.
Some collections are just beginning to computerize. These can benefit from the hands-on experience that refinement of the original document will provide. For this reason, specific cautionary notes are given within the comments section of some categories.
A large portion of the audience is the group of collections already computerized. Whether this set of standards was used or not, it is inevitable that choices sometimes will be made that do not necessarily fit "the standards." Suggesting that those collections change their paths is not appropriate. The Society of Vertebrate Paleontology has produced its first documentation standards publication (Blum, 1991). They were met with the same dilemma among their already computerized collections. The treatment of the entire subject of collection computerization and particularly of functional flexibility among documentation standards makes that publication very worthwhile reading. (Available from Michael Novacek, American Museum of Natural History). The fundamental message there, and here, is that the standards serve as a framework within which individual institutions must be allowed to have flexibility. When sharing data, the provider can use the Standards as a checklist to inform the receiver of variations from the Standards. Researchers who interact with Canadian museums and Canadian researchers who deal with non-Canadian institutions should recognize that there are differences between these standards and those developed for CHIN. An attempt has been made to point out common differences between the two standards by including information about CHIN variations in Appendix C in this document.
Essential, preferred and optional fields
As mentioned earlier, the NIRM workshop developed a list of “mandatory” fields. The fields have been separated into "Essential", "Preferred", and "Optional" (Appendix A). The essential fields include those that each networking partner should expect to receive when transmitting data. The preferred fields are those that have proven useful for research and collection management according to the experience of current committee members. The "optional" fields are ones that may be useful to some collections. Clearly, the use of all the fields mentioned in this document would require considerable effort in data input, verification, maintenance, and disk space.
The new documentation standards for ancillary collections data and specimen condition reporting will no doubt require the same kind of refinement after several years of broad, hands-on experience. Many older systems are based on software that employs a single flat file. Newer software provides the mechanism for true relational database management. With these latter systems, data subsets can be linked to the main specimen record database. Although the use of multiple linked subsets can slow record processing, it can save computer storage space. It also allows for the development of a dataview (i.e. subset of the database) that provides only portions of the specimen record to outside users. A linked subset arrangement is ideal for the recording of ancillary collections data and specimen condition reporting. This is particularly true because each of these new collection data entities involves more than a single field. Inclusion of these fields in the main database would require a considerable expansion of the specimen record. Collections in the development stage of computerization would be well-advised to understand the differences between flat file and relational database software in order to maximize their future capabilities.
As is the case with many new technologies, the topic of ethics in collection data exchange is becoming an important issue. In 1988, the Committee on Information Retrieval published a set of guidelines for usage of computerized mammal collection data (McLaren et al, 1988b). Basic tenets of those guidelines include the ideas that 1) data providers would not knowingly provide erroneous data and 2) data receivers would not disperse data to other users without acknowledging and informing the data provider. The Association of Systematics Collections (ASC) has also addressed the issue of data receivers who use collected information to build a new database which is later dispersed as though it is owned by the receiver (McLaren, 1993). ASC has suggested the development of a Collections Data Transmission Agreement to be used when data providers feel it is necessary to establish an understanding with data receivers about the ultimate use of collections data. In the late 1980's, it was felt that computerized collections data were covered by federal copyright law (McLaren, 1988a, 1988c) but recent U.S. Supreme Court decisions have clouded this issue. Similarly, in Canada, two publications (Erola and Fox, n.d.[ 1985]; Fontaine, 1985) which suggested proposed revisions to the Canadian Copyright Act discussed the problems surrounding data stored on electronic media. However, recommendations favoring protection of database contents were not included in the revised Copyright Act. If copyright protection for collections databases does prove to be upheld, each database must be registered in order to make full use of the law. However, the type of document proposed by ASC would serve as a one-on-one understanding that clearly spells out the limits of the data receivers rights in data manipulation and dispersal. Appendix D. provides a sample Data Release Agreement which was drawn up with legal assistance and is in use at a number of institutions. The indication of a monetary exchange may discomfort many but this is where legal advice came into play. Contractual agreements are controlled by state law in the United States. In many states and in Canada, if there is not an exchange of money, the data provided are considered a gift which is then used at the user's discretion. The exchange of a nominal fee ($1.00) legally validates the signing of the agreement as a binding contract.
This issue may seem trivial to new computer users. Data shared with professionals and graduate students for their research always has been dispersed through publication for the greater understanding of the subjects being studied. However, data that become buried within a new database and dispersed through the sale of that information can be a serious disservice to the science. Most collections are able to justify their continued existence and specimen care costs by demonstrating collection usage. When the ability to demonstrate that volume of usage is short-circuited by second-hand data dispersal, a long-standing system of accountability is eroded. The collection data still are being used but caretakers would be unaware of it.
A second aspect of the subject lies in the recognition that a computerized collection database, while costly to maintain and time-saving to use, is only a tool in the study of mammals and the management of collections. Just because they come from a computer does not mean the data are accurate. Research in mammalogy never should be done strictly with computer output. Specimen examination and identification verification is the responsibility of every primary researcher. Second-hand data dispersal eliminates the link to the original collection and to the specimens in question. That is the main reason why such practices should be considered unethical.
Organization of this document
The body of this document is organized along the format of the original Documentation Standards for Automatic Data Processing in Mammalogy publication (Williams et al., 1979). Pagination within subsections is linked specifically to the subsection. Each field or category is described in detail. Information has been incorporated about Canadian collections that utilize the Canadian Heritage Information Network (CHIN) (Woodward, 1989a, 1989b). Reference to variations between CHIN and this document are noted under the "Comments" subheading of appropriate fields but detailed under Appendix C. It is hoped that an awareness of variations between documentation standards listed herein and those used by CHIN will facilitate cooperative data exchange between CHIN and non-CHIN mammal collections. Appendix A provides a re-organized view of all fields based on their inclusion in the Essential, Preferred, or Optional categories.
To obtain a printed copy of this online publication, please contact Suzanne McLaren.
I. INSTITUTIONAL DATA
II. TAXONOMIC DATA
III. SPECIMEN DATA
IV. GEOGRAPHIC DATA
V. OTHER DATA
APPENDIX A. LISTING OF CATEGORIES BY USAGE STATUS
APPENDIX B. LISTING OF STANDARD ABBREVIATIONS
APPENDIX C. NOTATIONS ON THE CHIN SYSTEM
APPENDIX D. SAMPLE DATA RELEASE AGREEMENT
VIII. LITERATURE CITED