Mark H. Needleman
University of California
Office of the President
Kaiser Center Room 854
300 Lakeside Drive
Oakland, CA 94612-3550
USA
(510) 987-0530
(510) 839-3573 (fax)
email: Mark.Needleman@ucop.edu
Introduction
This paper provides a review of activities
in the area of standards and standardization that are of importance
to librarians and information professionals. It is not intended
to be a comprehensive coverage of all such standards and standardization
activities, but rather to focus on some recent developments and
activities in the area of electronic and networked based information.
It will cover work going on in both the traditional library and
information communities as well as activities occurring in the
internet community that have implications for the library and
information world.
Attention will be focused both on the
standards themselves, and on some implementations that are going
on to take those standards and build real world working applications
based on them, and to promote the development of the infrastructure
needed to support the world of on-line networked information.
Space constraints preclude, however,
any detailed technical description of how these protocols operate.
Pointers to more information on them can be found in both the
references and bibliography at the end of the paper.
Finally, the paper will conclude with
a discussion of some other infrastructure issues that go beyond
just standards that are important building blocks for the development
of a world of electronic and networked information. The context
for that discussion are lessons learned from a research project
at the University of California that made available to its users
a large amount of full text electronic content. While there is
an attempt in this paper to cover international developments,
much of the paper will cover work going on in the United States,
simply due to the fact that this is where the bulk of the authorís
own knowledge and involvement has been.
Z39.50 Developments
One of the most important areas in
which development activities have been occurring is with the Z39.501
protocol. Z39.50 is a client server oriented protocol that defines
capabilities for information retrieval systems to communicate
with each other to search and retrieve information. The protocol,
which originally was approved in 1992 is a US national standard
developed and maintained by the National Information Standards
Organization (NISO)2. NISO is the American National
Standards Institute (ANSI) accredited standards developing body
in the United States that produces standards in the areas of Libraries,
Information Service Providers, and Publishers.
The NISO voting members approved a
new version of Z39.50 in 1995. This version significantly extends
the functionality of the original protocol. Among the new capabilities
incorporated in the new version, are ones for sorting, scanning,
a segmentation facility that allows the transfer of larger data
objects than was practically possible in the 1992 version, and
an Extended Services Facility that allows the initiation of services
within a Z39.50 session that get executed outside of it. This
allows standard definitions of services that are associated with
search and retrieval process but not explicitly part of that activity
to be requested using the Z39.50 protocol.
Among the Extended Services defined
in the 1995 document are services for requesting printing or other
delivery methods of records from result sets, initiation and maintenance
of periodic queries, initiation of document delivery requests,
and database update. Additional new functionality in the 1995
version include an Explain Facility that allows a client to use
the Z39.50 protocol itself to dynamically discover characteristics
about a particular server, support for concurrent operations to
allow multiplexing requests in a single connection, better support
for international character sets, and new record syntaxís
including a generalized record syntax called GRS-1 which defines
a very generalized facility for extremely precise requests, including
the ability to retrieve specific portions of items, variant forms
of the same item, and meta-data about an item.
Because Z39.50 is intended to be used
in a wide variety of application domains, it does not directly
define or constrain either the attributes used to search data,
or the format of the data returned on retrievals. There is defined
and registered in the standard a basic attribute set known as
Bib-1 that is intended to be used to search basic bibliographic
type data, and most of the major flavors of MARC are registered
for use with Z39.50. This is an area in which a lot of work has
been going on in the last couple of years. An attribute set known
as STAS, the Scientific and Technical Attribute Set, has been
defined and registered for use with scientific type data, and
has been implemented in several systems. Other attributes sets
have also been defined or are under development.
The intent behind Z39.50 is that the
various communities that make use of the protocol will define
sets of rules for how Z39.50 is to be used in those applications
domains. These sets of rules can define things such as what Z39.50
features must be supported, what attribute sets and actual attribute
combinations are to be used, the format of the data to be returned
on retrieval, and various other aspects of the Z39.50 protocol,
as well as things that are beyond the scope of Z39.50 but may
be important for particular types of data or applications. These
sets of rules are known as profiles or implementors agreements.
Several profiles for the use of Z39.50
have already been defined and others are under development at
this time. There is a profile called ATS-1 (Author, Title, Subject)
that defines a basic use of Z39.50 and the Bib-1 attribute set
to provide access to bibliographic data for such things as on-line
library systems.
GILS3, the Government Information
Locator System, is a profile that defines the use of Z39.50 for
providing access to government data. GILS is a US federal government
standard and is being looked at by other governments both in the
US and internationally. There is work going on to define the use
of Z39.50 to access geo-spatial data, and some folks in the research
community are working on trying to add natural language support
to Z39.50.
Work is also going on to define the
use of Z39.50 for accessing digital collections, and in the museum
community, an organization called the Consortium for the Computer
Interchange of Museum Information (CIMI)4 has a project
known as CHIO (Cultural Heritage Information On-line). CHIO is
building demonstration systems to demonstrate on-line access to
museum information. As part of that project a profile for the
use of Z39.50 to provide access to museum information is being
developed.
There is an international version of
Z39.50 known as International Standards Organization (ISO) 10162/101635.
The US standard is a compatible superset of the ISO version, and
many of the enhancements in the US version have been proposed
for inclusion in it. Additionally, because of the desire to ensure
future compatibility and international interoperability, there
is now a proposal within the ISO committee responsible for 10162/10163
to replace the ISO version with Z39.50. it is possible that this
will be voted on in late 1996 or early 1997. Finally, while officially
Z39.50 is defined by NISO, the actual technical development of
the 1995 standard was done by voluntary committee known as the
Z39.50 Implementors Group (ZIG)6 . The ZIG is a group
of people representing organizations that are actually developing
implementations of the Z39.50 protocol. This includes most of
the major library automation vendors, major library utilities
like OCLC and RLG, major universities and library consortiums,
and major on-line database vendors. The ZIG typically meets 2
to 3 times a year to work on enhancements to the protocol, and
on interoperability issues. There has been in recent years an
increasing international participation both in those meetings,
and also in the on-line mailing list over which most of the discussions
occur. Because of the increasing international use of Z39.50,
there has been a commitment in the ZIG to holding one of its meetings
each year In Europe.
Z39.56 Serial Item/Contribution
Identifier Standard
Another standard that will play an
important role in the world of electronic networked information
is NISO Z39.567 Serial Item/Contribution Identifier
(SICI). Z39.56 defines data elements and a structure for a standardized
code to identify serial items (issues of journals) and contributions
(articles) in them. The original version of the standard was approved
by NISO in 1991. Normally NISO standards only come up for review
every five years, but in 1994, due to the growth of on-line indexing
and abstracting databases and document delivery services, NISO
decided to form a committee to look at revising the standard to
make it more usable in an on-line electronic environment. A new
version of Z39.56 was developed and is currently in the ballot
process, as of this writing. This new version incorporates some
new mechanisms including a Code Structure Identifier (CSI) to
define the type of SICI being dealt with, a Media Format Identifier
(MFI) to allow designation of the type of media the SICI is referring
to in cases where serials may be published in multiple formats,
and a Derivative Part Identifier (DPI) which facilitates references
to portions of a serial or contribution such as the table of contents
of an issue or the abstract of an article. The DPI is seen as
being especially useful in applications such as on-line document
delivery. Some other changes that were made include standardizing
the punctuation used internally in SICI codes, lengthening the
title code and simplifying the rules for its construction. It
should be noted that the SICI code was added as a search attribute
to the basic bibliographic search attribute set Bib-1 in Z39.50.
At the present time there is not, as far as the author is aware,
a corresponding international version of Z39.56, which, of course,
does not prevent the US version from being used internationally.
The Interlibrary Loan (ILL) Protocol
and Developments
The Interlibrary Loan Protocol (ILL)
(ISO 10160/10161)8 was approved by ISO in 1991. It
was developed to permit the exchange of ILL messages between systems
that use different hardware and ILL systems. Its goals are to
overcome barriers to ILL communications by providing a standardized
message set and format, to facilitate ILL automation and provide
the foundation for automation of requesting and supplying material
and tracking requests, and to support resource sharing. It supports
multiple models of ILL networking and defines a full set of services
representing all stages of an ILL transaction. It supports both
an electronic mail mode of operation using EDIFACT and a direct
connect model.
Canada, especially the National Library
of Canada, has been heavily involved in both the development and
implementation of the ILL protocol, and there are other implementations
in Europe as well. Due in part to the centralized nature of ILL
using the large utilities, implementation of the ILL protocol
in the United States up until recently has been slow. However,
in 1993 the Association of Research Libraries formed the North
American Interlibrary Loan and Document Delivery (NAILDD) Project
to promote developments that will improve the delivery of library
materials to users at costs that are sustainable to libraries.
One of the objectives of the NAILDD project is to promote standards
and automation efforts that will improve the efficiency of the
ILL process. In the fall of 1995 NAILDD formed a ILL Protocol
Implementors Group (IPIG) to promote the use of the ILL protocol
in the United States and to promote the development of the infrastructure
required to support it. Using a model similar to what was done
with the Z39.50 Implementors Group (ZIG) that promoted the development
of Z39.50, the IPIG has put together a group of library vendors
and other organizations who have committed to build real working
version of the ISO ILL protocol within a defined time frame, and
to setup a testbed to do interoperability testing among those
implementations. Phase One calls for the participants to implement
the ILL Request and Status/Error messages using ISO Basic Encoding
Rules in a direct connection mode on top of the TCP/IP transport
protocols by the summer of 1996. Later phases of the project will
add additional ILL messages based on experiences gained in the
interoperability testing of this first phase, and will have additional
testbeds created. Since many of the IPIG participants also have
Z39.50 implementations, the direct connect model using BER and
TCP/IP was chosen to capitalize on investments that had already
been made by those organizations in developing their Z39.50 implementations.
Character Set Standards
There are many different character
set standards, some defacto, some national standards, and some
international in scope. Many of those standards define single
character sets or provide rules for the mapping of one character
set to another, such as standards for the Romanization of non
Roman characters. However, due to their global scope, two standards
deserve mention here, ISO 106469, and Unicode10.
ISO 10646, the Universal Multiple-Octet Coded Character Set (UCS)
was adopted by ISO in 1993. It is the first officially standardized
coded character set whose eventual purpose is to include all characters
used in all written languages in the world (including all mathematical
and other symbols). The first addition covers all major and commercially
important languages. There is both a 2 octet and 4 octet form
of the coding space defined. The first 128 positions in the 2
octet coding space are used for the basic ASCII character set.
In practice, currently only the 2 octet form is in use. Unicode
is a coded character set specified by a consortium primarily of
major computer and software manufacturers whose goal was to overcome
the chaos of different coded character sets in use when creating
multilingual program and internationalizing software. As of version
1.1, Unicode has been aligned with ISO 10646, and the intent is
to keep it strictly compatible with the international standard.
The Unicode consortium is a contributor to the ISO work to further
develop 10646. Unicode can be characterized as an implementation
of the 2 octet form of the UCS that includes such things as spacing
diacritics and other combining characters, and that defines a
more precise specification of the bi-directional behavior of characters
when used in such things as the Arabic And Hebrew scripts. Version
2.0 of Unicode will extend its functionality to make use of the
wider 4 octet character coding space. A number of semantics traditionally
thought to be associated with character sets, most notably sorting
or collation order, are explicitly excluded from the definition
of Unicode/10646. In addition, in a universal character set that
unifies different languagesí use of a single script, the
order of elements in a language can not be determined simply from
the order of elements in the script.
While there has not yet been wide
scale deployment or implementation of either ISO 10646 or Unicode,
it is expected this will change over the next few years in that,
at least for Unicode, there has been a commitment to it by major
computer and software manufacturers. This implementation and deployment
is extremely important to resolving the long-standing problems
with the lack of ability for systems to be able to deal with anything
but relatively limited character sets, and will provide the basis
for resolving that situation in an internationally standardized
way. Unicode and ISO 10646 doe not, by themselves, address all
of the requirements for fully supporting the representation and
processing of the worldís languages. Other standards must
be developed, and existing standards must be extended to make
use of the functionality Unicode and 10646 provide. However, Unicode
and 10646 do provide the basic foundation on which to start the
task of constructing computer systems and software capable of
working with many, and perhaps someday, all of the worldís
languages.
Text Formatting Standards
There are many text formatting standards
in existence. Many of them are proprietary to various vendors
of such things as word processing software programs but have
become defacto standards due to their widespread use. Two standards
that are, however, worthy of discussion here, are SGML11
and PDF12.
SGML (Standard generalized Markup Language)
is an ISO standard (ISO 8879) for electronic document exchange,
archival, and processing. SGML does not impose any specific structure
onto documents, but rather is a language to write formal definitions
of document structures. Document structures known as DTDís
(Document Type Declarations) are defined for particular categories
of documents. SGML software, by being configured to understand
a particular DTD, can thus process all documents that have been
encoded to conform to that DTD. It should be stressed that SGML
DTDís, unlike many other text formatting systems, are not
intended to describe the physical layout of a document, but rather
its logical structure, hierarchy, and semantics. It also defines
the allowable tags that can be used within a particular document
type. This is an extremely powerful concept that enables, depending
on the richness of the DTD definition, sophisticated searching
and navigation within documents, and also allows for such things
as automatic indexing of fields within documents depending again
on the nature of the tagging the DTD provides. Since physical
layout description is not the purpose of the DTD, having this
knowledge of the logical structure of the document allows reformatting
its physical manifestation to be dependent on the output device,
so that same document encoded in SGML could be rendered in different
ways depending, for example, if the output device was a computer
screen or a printer. It should be noted that HTML13
(Hypertext Markup Language), the language used to define Web pages,
is nothing more than a very simple minded SGML DTD, and was in
fact expressly designed to conform to SGML rules and to use SGML
capabilities in its definition.
There are some interesting activities
going on involving SGML. Many major publishers have converted
or are in the process of converting their production processes
to make use of it. Because of the capabilities discussed above,
this will allow them to have a single unified input source, yet
create different outputs tailored to different media.
Project CHIO, mentioned above, is defining
SGML DTDís to encode museum information which will then
be accessed using Z39.50.
There is a project being lead by the
University of California at Berkeley, but involving several other
universities, called the Electronic Archival Description (EAD)
project that is working on developing a DTD to encode things like
the finding aids that often accompany special collections in
libraries and archives.
ISO 12083 Electronic Manuscript Preparation
and Markup14 is an international standard that defines
four DTDís for books, serials, articles, and mathematics.
ISO 12083 has also been adopted as a US national standard. The
TEI (Text Encoding Initiative) has defined DTDís to facilitate
the encoding and exchange of machine readable texts intended for
literary, linguistic, historical, or other textual research. And,
there is much more other work as well going on involving SGML.
Another potentially important text
formatting standard is PDF, Portable Document Format. Unlike SGML
which is an openly available international standard PDF is proprietary
standard of Adobe Systems and is follow on technology to their
Postscript printer language. Unlike SGML which is concerned with
the logical structure, PDF defines the physical format of a document
and page layout, although it does have searching, navigational
and other utility type features. PDF makes electronic documents
have much of the same qualities of paper. Some of the reasons
that PDF is potentially important are that, while a proprietary
format, Adobe does make a reader freely available and does have
software that can convert many of the popular word processing
formats to PDF, as well as software to convert Postscript documents.
They also have software that can take paper documents that have
been scanned in and convert them to PDF. In additional PDF seems
to be an increasingly popular format for making documents available
on the Web, and the Netscape browser, through the use of its new
plugin technology, can now display PDF directly within in the
browser window.
Both SGML and PDF are potentially important
technologies and both may have a role to play in making on-line
networked information available. SGML, because of its logical
structures and search capabilities may be of use for that, and
for long term archival storage. PDF may become a defacto standard
for on-line display because it preserves much of the familiar
and comfortable print metaphor.
Electronic Data Interchange
Electronic Data Interchange (EDI)15
is the exchange of commercially oriented information in standard
electronic formats between automated systems. In commerce, industry
and government EDI is used to replace paper purchase orders, invoices,
price lists, shipping and customs documents, and other business
oriented documents. In the library and publishing arena two organizations
are working on developing EDI transaction sets. SISAC, the Serials
Industry Systems Advisory Council has concentrated on developing
EDI transactions for the serials industry and has developed or
is in the process of developing EDI transactions for serials orders,
order acknowledgments, claims, cancellations, and invoices. SISAC
also developed a machine readable bar code representation of the
1991 version of the Z39.56 SICI code which major publishers are
printing on the covers of serials publications, and major library
systems vendors have developed interfaces for those bar codes
for use in automated checkin systems. BISAC, the Book Industry
Systems Industry Advisory Committee, plays a similar role to SISAC
for the book industry and has developed EDI transaction sets such
as purchase order, purchase order acknowledgment, invoice, title
status format, advanced ship notice, ship note/invoice. Currently,
both SISAC and BISAC EDI transactions use the US national standard
EDI format X12, but both have plans to migrate their X12 implementation
to the international standard EDIFACT format. The use of EDI,
while perhaps not as glamorous as some of the other standards
discussed above, is still important In that it allows libraries
to operate more efficiently and thus provide better services to
their users.
Internet Community Standards
There is a great deal of work going
on in the Internet community on protocols and standards that has
had and will continue to have profound impact on the global information
infrastructure, and it is worth focusing a bit of attention to
that work here. Internet Standards are developed by the Internet
Engineering Task Force (IETF)16 which provides a forum
for working groups to coordinate technical development of new
protocols and standards. The IETF mission includes identifying
and proposing solutions to pressing operational problems in the
Internet, specifying the development or usage protocols and the
near-term architecture to solve technical problems for the Internet,
and providing a forum for the exchange of relevant information
within the Internet community between vendors, users, researchers,
government agencies, and network managers. The IETF is organized
into several areas: Applications, Internet, IP Next Generation
Development, Network Management, Operational Requirements, Routing
Security, Transport, and User Services. Working groups are formed
within areas to solve specific identified problems or needs. Due
to the nature of some of those needs, some working groups are
co-sponsored by more than one area. The IETF meets three times
a year, but most of the work of the working groups is done by
electronic mail on the publicly open mailing lists the groups
create. The IETF is, for the most part, consensus driven, and
for something to become a standard it must represent, not only
the consensus of those attending the meetings, but also those
contributing to electronic discussions. In addition, standards
go through several stages, and in order for a standard to reach
full standard status, there must be at least two independent demonstrated
interoperable implementations of it. Standards in the IETF are,
for historical purposes, known as Requests for Comments (RFCís).
Activities of especial importance to
librarians and information professionals going on in the IETF
include the work being done by the HTML and HTTP working groups.
HTML, as mentioned above is the SGML compliant DTD for Web page
definition, and HTTP is the transport protocol used by the Web
to move Web pages and other content between the server and the
web browser. While both of these protocols had their origins outside
of the IETF ( this happens fairly often - one of the things the
IETF does is take technologies from the outside and standardizes
them), there has been a lot of work in the IETF on enhancing and
standardizing both HTML and HTTP. An IETF working group developed
WHOIS++, a protocol for indexing and accessing directories of
information. One component of WHOIS++ is an indexing protocol.
There is currently an IETF working group trying to develop a common
indexing protocol that can be used in a variety of protocols and
applications in the Internet that have indexing requirements.
IETF working groups developed MIME, the Multimedia Internet Mail
Extensions, that extends the capability of simple internet mail
to be able to handle complex document types and multiple formats.
IETF working groups are currently working on definitions for encapsulating
EDI objects and SGML documents in mail using the MIME capabilities.
In the Internet area, there is a working
group that recently released a draft document for a Service Location
protocol that addresses how to locate various services in a distributed
internetworked environment. In the area of Security, areas of
interest include working groups working on Common Authentication
technology, Web transport security, Privacy Enhanced Mail (PEM)
which add privacy, security, and authentication to internet mail,
and to integrate PEM with the MIME standards.
Among the work with the greatest implications
for librarians and information professionals taking place in the
IETF are the Uniform Resource Identifier (URI) activities. An
IETF working group standardized and built on the Uniform Resource
Locator (URL) mechanism that had originally been defined in the
Web, and added definitions for new objects and protocols into
the URL specification. There has also been work going on to define
Uniform Resource Names (URNís) that would define persistent
unchanging names for internet resources that might exist in multiple
forms or copies or move around the Internet. URNís are
seen as being a hierarchical distributed naming structure that
will replace URLís in Web pages (and other places URLís
are currently being used). Protocols will be developed and infrastructure
deployed that will allow resolution of a URN to whatever series
of URLís happened to exist for that resource at any given
time, much as domain names and the Domain Name Resolution System
infrastructure replaced the use of raw IP addresses. This work
has not (unfortunately due to a variety of circumstances) progressed
as quickly as was initially envisioned, although there have been
experimental URN systems built and deployed in order to gain some
experience with what works best and what are the infrastructure
issues that need to be dealt with in order to build large scale
production systems. A third area of the URI work as been the
work done to develop Uniform Resource Citations (URCís).
The URC work has been attempting to define the data elements and
structures needed to describe Internet resources and could include
such things as language, size, cost, format, and availability
of the resource. This is another area in which work has progressed
more slowly than hoped, although there is work in this area also
currently going on in other communities besides the IETF.
The final IETF activities worth mentioning
here are some of the work being done by working groups in the
User Services area. Among other things working groups in the User
Services area are developing a guide to help artists use the Internet
and create and make content available on it. There is a working
group chartered to address issues related to the connection of
primary and secondary schools worldwide to the Internet, a working
group to deal with issues related to end user training, including
creating a catalogue of existing training materials, identify
gaps in those materials, and provide users with self paced learning
materials, and working group to develop a guide describing responsible
use of the Internet.
Obviously, this is not an exhaustive
description of the entire scope of work going on in the IETF,
but rather an attempt to highlight those activities of especial
interest and importance to librarians and information professionals,
and ones that will have the greatest impact on them. Much of the
work of the IETF is focused on much more traditional computer
and networking protocol development and issues.
Additional Standards
A few additional standards are worth briefly mentioning here. NISO Z39.58-1992 Common Command Language for On-line Interactive Information Retrieval and its international counterpart ISO 8777 which specified a uniform command terminology for on-line search systems are worth mentioning if only to note that, while there has been some deployment of these standards, there have not been wide scale implementations of them. With the shift in emphasis in on-line systems from terminal based command driven interfaces to graphically oriented interfaces, its likely that these standards will continue to decrease in importance in the future, although they will probably still have some limited role to play for some time to come.
Other standards worth calling out include
NISO 39.53-1995 Codes for the representation of Languages for
Information Exchange which defines almost 400 three character
alphabetic language codes. There is a corresponding international
standard ISO 639. ISO 3166 Codes for the Representation of Names
of Countries, which was recently approved, is the definitive international
standard for country code definitions.
In the area of data element definition
standards there are NISO Z39.44 Serials Holdings Statements, NISO
Z39.57 Holdings Statements for Non-Serials Items, and NISO Z39.63
Interlibrary Loan Data Elements. These are all currently in the
process of being revised and updated, and a three new standards,
NISO Z39.69 Record Format for Patron Records, NISO Z39.70 Format
for Circulation Transactions, and NISO Z39.71 Holdings Statements
for Bibliographic Items are in development.
In the international arena, ISO 8459
defines a directory of bibliographic data elements for various
library and information retrieval applications. ISO Draft International
Standard (DIS) 690-2 specifies data elements to be included in
bibliographic references to electronic documents and also sets
out a prescribed order for those elements in the reference. In
addition, it establishes conventions for the transcription and
presentation of information derived from the source electronic
document. In the database world, SQL, Structured Query Language,
is an important standard that defines a language for building
queries that can be executed against relational databases, and
has been adopted by virtually all major relational database vendors.
This is obviously not an exhaustive
list of all relevant standards. There are many communities developing
standards that are important for librarians and information professionals
that space constraints preclude mentioning here, and this is a
dynamic and changing arena with new developments constantly occurring.
Infrastructure Issues
Finally, to conclude It is worth spending
a little time discussing some other issues besides standards that
are important in relation to building the global information superhighway.
While all of the standards discussed above, and others, are crucial
if that information superhighway is to exist, other things such
as a supporting infrastructure based on those standards are all
vital to its success. The context for this discussion will be
some lessons that were learned about infrastructure issues that
came out of the experience the University of California had from
a research venture it engaged in to provide on-line access to
the full electronic content of scholarly material. The University
of California was one of nine US universities that was involved
in a joint venture with Elsevier Science Publishers, known as
the TULIP project, to provide access to its users on-line to
the full bitmapped images of approximately 40 journals in the
area of material science. The major goals of the TULIP project
were to learn what types of infrastructure were required to support
delivery of this type of material to end users, to gain an understanding
of what affect having such materials available would have on
the scholarly research process, and to begin to develop economic
models that made sense for the delivery of this type of electronic
content The project ran from 1992 through the end of 1995. While
there is not space in this paper to provide details on the actual
implementation (the bibliography at the end of this paper contains
pointers to articles that do describe in detail both the UC implementation
and those of the other participants), the lessons we learned from
that implementation have important implications for the future
of networked information.
The basic lesson that was learned in
that project was that, as important as standards and technologies
are in such systems, just as important (or perhaps even more
crucial) was having in place the proper infrastructure to support
those technologies. This supporting infrastructure covers a wide
variety of areas. Among them:
1) Storage - Providing on-line access
to large amount of electronic content requires large and ever
increasing amounts of computer disk storage to house the material.
The TULIP project, which was only 40 some journals for 4 years
and contained only black and white images, required about 35 gigabytes
of disk storage. When one contemplates scaling such a system to
hundreds and even thousands of journals and going beyond black
and white images to color and other types of multimedia objects,
the storage requirements for such systems quickly become immense.
(A follow on project to TULIP at the University of California
is already storing over 200 gigabytes of images of journal articles).
2) Network Bandwidth - Having adequate
network bandwidth available is crucial to being able to access
networked information resources. This includes adequate bandwidth
both in the global wide area internet, and also to the desktop
inside building and campuses local area networks, as well as
to the end users home. This need for every increasing bandwidth
is being driven by new applications and the bandwidth intensive
resources that are being made available through them. We are already
seeing the implications of the incredible growth of the Worldwide
Web in the last couple of years and some of the performance problems
that have been driven by lack of adequate bandwidth. Applications
such as mobile computing also have implications and opportunities
in this area.
3) Equipment Infrastructure - New applications and network resources are driving the need for ensuring adequate equipment infrastructure at an ever increasing rate. These applications and the data being made available through them require increasingly faster computers to support them. Organizations need to have strategies for managing both the increasing need for more equipment infrastructure and the ever decreasing cycles with which that equipment needs to be replaced with new generation technology.
4) User Authentication, encryption,
and Electronic Commerce - These new application and electronic
resources will also require new and better authentication infrastructures
that do not currently exist in large scale today. Simple application
based password schemes will not work in the environment and will
need to be replaced by public/private key based authentication
systems that can work across multiple application domains. In
addition, data encryption protocols and supporting infrastructure
will be needed to protect both the privacy of the users and the
authenticity of the information resources. Finally, multiple economic
models will exist in this environment, and there will be a need
for protocols and supporting infrastructure for electronic commerce
in support of those models.
5) Printing - The growth of electronic
networked information will by no means reduce the need for printing.
In fact, due to user behavior patterns and the current state of
the art in computer and display technology, the will be an ever
increasing need to be able to print off electronic information.
One of the major lessons that was learned from the TULIP project
at the University of California was that the universityís
printing infrastructure was not complete or ubiquitous enough
to support the types of printing that users want to be able to
do. The infrastructure was not there to support remote applications
being able to print to all of the different types of printers
that users had, being able to automatically determine what type
of printer a user had locally, and being able to do what ever
type of charging that was required by the local campus printing
infrastructure.
These are by no means all of the infrastructure
issues, but rather the major ones that came up within the context
of the TULIP research project. Some of them are issues of technology
and will improve over time as technology improves and supporting
infrastructures are put in place. However, even in areas of technology,
new applications and data types that make use of them, for the
foreseeable future, will continue to place constantly increasing
demands on whatever technology and infrastructure is in place.
Also, many infrastructure issues go beyond technology and have
social, economic, legal, and political considerations. Even with
all of the work and discussion going on about these issues, there
is still a long way to go toward resolving them, and much work
remains to be done to determine what models make sense in an electronic
environment and how much of our current print oriented models
and metaphors can and should be carried forward.
Conclusion
This paper has attempted to survey
some of the major standards and standards developments in the
world electronic and networked information that librarians and
information professionals should be aware of. It also attempted
to put those standards and developments into the context of the
larger infrastructure issues that must be dealt with as part of
building a world of networked information, using experiences gained
and lessons learned from one research project, to highlight some
of those infrastructure issues. It is not intended to serve as
a complete or comprehensive study of those developments, nor does
it cover all of the many activities and developments that have
occurred or are currently going on. This is a fluid and changing
arena, with new developments, projects, research, and deployments
constantly taking place. The paper is intended to serve as a guide
and road map, and, hopefully, to encourage the reader to seek
out further information in areas of interest to them. Pointers,
in both print and electronic format, to the topics covered (as
well as some others) can be found in the references and bibliography
section below.
References
1) ANSI/NISO Z39.50-1995: Information
Retrieval (Z39.50): Application Service Definition and Protocol
Specification, NISO Press 1995, ISSN: 1041-5653. An on-line version
is also available from the Library of Congress Web site for Z39.50
(see Bibliography below)
2) All NISO published standards and
many draft standards can be obtained from: NISO Press Fulfillment
Center, P.O. Box 338, Oxon Hill, MD, USA 20750-0338 (301) 567-9522
Fax: (301) 567-9553 US and Canada Toll Free: 1 800-282-6476
3) Application Profile for the Government
Information Locator Service. (The GILS Profile is available from
the Library of Congress Web site for Z39.50 (see Bibliography
below)
4) Information on CIMI and the CHIO
Project can be found on-line at:
http://www.cimi.org
5) ISO TC46/SC4/WG4 10162 Documentation
- Search and Retrieve Service Definition
ISO TC46/SC4/WG4 10163 Documentation
- Search and Retrieve Protocol Specification
6) The ZIG maintains an electronic
mailing list where technical discussions occur and meeting are
announced. To join, send electronic mail to:
LISTSERV@NERVM.NERDC.UFL.EDU
with the body of the note containing:
Subscribe Z3950IW <Your Name>
7) Z39.56-1991 Serial Item and Contribution
Identifier (SICI), NISO Press, ISBN: 1-880124-15-7 (The revised
draft is also available from NISO Press as Z39.56-199x)
8) ISO 10160 Information and Documentation
- Open Systems Interconnection -Interlibrary Loan Application
Service Definition, 1993.
ISO 10161-1 Information and Documentation
- Open Systems Interconnection -Interlibrary Loan Application
Protocol Specification - Part 1: Protocol Specification, 1993.
9) ISO/IEC International Standard 10646-1:1993(E):
Information Technology--Universal Multiple-Octet Coded Character
Set (UCS)--Part 1: Architecture and Basic Multilingual Plane.
International Organization for Standardization, Geneva, 1993
10) The Unicode Consortium: The Unicode
Standard Worldwide Character Encoding, Version 1.0. Volume 1(Architecture,
non-ideographic characters) Addison-Wesley, 1991
The Unicode Consortium: The Unicode
Standard Worldwide Character Encoding, Version 1.0. Volume 2 (Ideographic
characters) Addison-Wesley, 1992
Unicode Technical Report #4: The Unicode
Standard 1.1. The Unicode Consortium, 1993
11) ISO/IEC International Standard
8879: Standard Generalized Markup Language (SGML). International
Organization for Standardization, Geneva, 1986
12) Portable document format reference
manual / Adobe Systems Incorporated ; Tim Bienz and Richard
Cohn. Reading, Mass. : Addison-Wesley Pub. Co., c1993.
13) RFC 1866: T. Berners-Lee, D. Connolly,
"Hypertext Markup Language - 2.0î, 11/03/1995 (Available
on-line at the IETF Web page listed below in the Bibliography).
Note that there are compendium documents, both RFCís and
drafts, that define additional HTML features beyond what is in
RFC 1866
14) NISO/ANSI/ISO 12083 Electronic
Manuscript Preparation and Markup. ISBN: 1-880124-20-3 (Available
from NISO Press)
15) A few of reference materials on
EDI include:
The EDI handbook : trading in the 1990s
/ edited by Mike Gifkins & David Hitchcock ; with a foreword
by Lord Young of Grafham. London : Blenheim Online, 1988.
Electronic Data Interchange (EDI)
Gaithersburg, MD : Computer Systems Laboratory, National Institute
of Standards and Technology, 1991. Series title: Federal information
processing standards publication ; 161.
What is EDI? : a guide to electronic
data interchange / Martin Parfett. 2nd ed. Manchester, England
: NCC Blackwell, 1992.
16) A good introduction to the IETF
is: RFC 1718 The Tao of IETF -- A Guide for New Attendees of the
Internet Engineering Task Force. October 1994. (Available on-line
at the IETF Web site listed below in the Bibliography - All IETF
RFCís and working drafts may be obtained on-line at that
location. Information about the organization of the IETF as well
as announcements of upcoming meetings and proceedings of past
ones may be found there as well.)
Bibliography
Printed Materials:
Moen, William: A Guide to the ANSI/NISO
Z39.50 Protocol: Information Retrieval in the Information Infrastructure.
NISO Press.
Z39.50 Implementation Experiences.
NISO Press
The Government Information Locator
System (GILS) Expanding Research and Development on the ANSI/NISO
Information Retrieval Standard. ISBN: 1-880124-11-4, NISO Press.
From A to Z39.50 : a networking primer
/ by James J. Michael and Mark Hinnebusch. Westport : Mecklermedia,
1995.
Needleman, Mark, ìThe Z39.50
Protocol: An Implementor's Perspective.î Resource Sharing
and Information Networks. In Resource Sharing and Information
Networks, Haworth Press. volume 8, number 1, 1992
Needleman, Mark, "The Z39.50 Information
Retrieval Protocol: The Promise and the Myth", paper presented
at the Central European Conference and Exhibition for Academic
Libraries and Informatics, Vilnius, Lithuania, September 27-29,
1993.
Practical SGML / by Eric van Herwijnen.
2nd ed. Boston : Kluwer Academic Publishers, 1994.
The SGML handbook / Charles F. Goldfarb
; edited by Yuri Rubinsky. Oxford : Clarendon Press ; Oxford
; New York : Oxford University Press, 1990.
Zeeman, Joe, ìInterlending in
the Emerging Networked Environment: Implications for the ILL Protocol
Standard.î (Report #8 in the IFLA UDT Series on Data Communications
Technologies and Standards for Libraries), ISBN: 0-9694214-8-6
(Available from NISO Press)
Holm, Liv, ìModels for Open
System Protocol Development.î (Report #6 in the IFLA UDT
Series on Data Communications Technologies and Standards for Libraries),
ISBN: 0-9694214-7-8 (Available from NISO Press)
Special Issue on TULIP Project, edited
by Nancy Gusack and Clifford A Lynch. Library Hi Tech, Issue 52
13:4 1995, ISSN 0737-8831.
Gelfand, Julia and Needleman, Mark,
"TULIP: Participating In An Experiment of Electronic Journal
Access: Administrative and Systems Ensure Success," Paper
Presented at the IATUL Annual Meeting, Sheffield University, Sheffield,
West Yorkshire, United Kingdom, July 7, 1994.
BISAC X12 Implementation Guidelines
for Electronic Data Interchange. ISBN: 0-940016-44-3 (Available
from NISO Press)
Machine-Readable Coding Guidelines
for the US Book Industry. ISBN: 0-940016-33-8 (Available from
NISO Press)
SISAC X12 Implementation Guidelines
for Electronic Data Interchange. ISBN: 0-940016-57-5 (Available
from NISO Press)
Text Encoding Initiative (TEI) Guidelines
(Available from NISO Press)
Selected Web Sites:
NISO - National Information Standards Organization:
ANSI - American National Standards Institute:
NIST - National Institute for Standards and Technology:
IETF - Internet Engineering Task Force:
http://www.ietf.cnri.reston.va.us/home.html
ISO - International Organization for Standardization:
http://www.iso.ch (Available
in both English and French)
IEC - International Electrotechnical Commission:
ITU - International Telecommunications Union:
DISA - Data Interchange Standards Association:
Z39.50 Information and Pointers:
http://lcweb.loc.gov/agency/z3950
Unicode Information:
http://www.stonehand.com/unicode/standard.html
SGML Information:
http://www.sil.org/sgml/sgml.html
EDI Information and Resource Pointers:
http://www.premenos.com/Resources/
BISAC - Book Industry Systems Advisory Committee:
http://www/bookwire.com/bisg/bisac.html
SISAC - Serials Industry Systems Advisory Committee:
http://www/bookwire.com/bisg/sisac.html
-------------------------------------------------------------
Adobe and Postscript are registered
trademarks of Adobe Systems Incorporated which may be registered
in certain jurisdictions