Transcription metadata
     The NeumesXML Schema:
 * Definitions of Terminology
 * Structure of the Schema


I. Definitions

Definition 1.  Content is data pertaining to one particular example from a group of related things. The content of a NeumesXML document consists of the chant transcription (including the chant text) of one chant in a source manuscript, plus various information about this chant, the manuscript in which it is found, the person who made the transcription, and so on. NeumesXML defines the words and syntactical constructs that constitute the NeumesXML language; this language serves as a framework for content and is distinct from content.

Definition 2.  An instance document is an XML (Extensible Markup Language) file that contains content structured in accordance with an XML language. As the term is used here, an instance document is a file that contains the transcription and description of exactly one chant (typically, a chant as it was written in a medieval manuscript) and that adheres to the NeumesXML language. A NeumesXML instance document is differentiated from NeumesXML per se, which just defines the NeumesXML language.

Definition 3.  A grammar is, essentially, a consistent set of rules that decides whether instance documents are "well-formed," i.e., that the arrangement of content within a document is consistent with such rules. Consistency is needed so that all instance documents in a related group can be computer-processed in a uniform way (viz., so that a program written to do something to one instance document will work equally well on any instance document within the related group). The NeumesXML language is an implementation of a formal grammar, and all NeumesXML instance documents ought to be consistent with this grammar. XML as a grammar-implementation tool is, however, not sufficiently powerful to express all the features that a formal grammar might have. And so, NeumesXML might not be capable of enforcing that instance documents are completely correct grammatically; more strict enforcement might be done by a data-entry program.

Definition 4.  An XML Schema is a special type of XML language definition, distinct from the more usual DTD language definition (XML Document Type Definition). XML Schema is far more expressive than DTD, and so "well-formedness" of instance docments can be enforced in greater conformity to a formal grammar. The file "NeumesXML.xsd" is an XML Schema, and it defines an implementation of the formal grammar for transcription of neumed sources. All XML instance documents declare the name and Internet location of the language-definition file (or files) that decide their well-formedness; and so, NeumesXML instance documents must include a declaration that they conform to the XML language defined in "NeumesXML.xsd". A slightly subtle, but important point is that XML Schemas function only as grammar-checkers; they do not actually define the file-format of instance documents, but only ensure that whatever format an instance document appears in is consistent with the defined language.

Definition 5.  An XML tag is part of an XML language, similar to the widely-familiar tags of HTML (Hypertext Markup Language in which Web pages are written, for example "<IMG>" or "<TITLE>"). In an XML language, the left angle-bracket character '<' is reserved to mark the beginning of an XML tag, and it distinguishes tags from other data in the file. A tag ends at the right angle-bracket character '>'. As in HTML, an XML tag can be a "paired tag" to mark a span of content, e.g., "<STRONG>...</STRONG>" (where the ellipsis "..." denotes that additional data may appear here). The "closing" tag of such a pair is distinguished from an "opening" tag by a right-slash '/' just after the left angle-bracket (e.g., "</STRONG>"). Alternatively, an XML tag can be a so-called "empty tag" (one that does not mark a span, and so has no closing tag). In XML, empty tags must have a right-slash just before the right angle-bracket (e.g., "<genre type='Gr'/>", which is not a requirement in HTML).

Definition 6.  A Schema Element is a grammar particle that normally corresponds to an XML tag in instance documents. [Technically, because a Schema is itself an XML document, Elements in a Schema are themselves tags. The NeumesXML Schema is an instance document with regard to a more general grammar defined in an XML "Schema of Schemas" document. This kind of recursive definition is the basis of XML's "extensibility."] Elements are arranged in a Schema essentially as an inverted tree, starting at the "root element" and divides along branches. The branching structure defines a containment relation. An particular Element is the "parent" of any branches immediately under it, which are its "child" Elements. For example, if the Schema says that Element A is the parent of Element B, then an instance document can have tags named A and B, in which case B must appear inside A. Therefore, the following construction could appear in a well-formed instance document: "<A>...<STRONG>...</STRONG>...</A>".

Definition 7.  Meta-data is a relative term that basically means, data about data. In order to avoid confusion, we just use this term to mean any content of a NeumesXML file that is not part of the "prima-facie semantic" transcription. By "prima-facie semantic" we mean what is visible on the face of the source (usually, a medieval manuscript) and has "semantic value" (that is, has meaning to the chant). And so, a title and rubrical notations that appear on the face of the source are not meta-data. Dimensions of the source parchment, the presence of pinholes, changes of scribal hand, pictures or decorations, and so on, appear on the face of the source but do not have semantic value for the chant; and so, they are meta-data. Likewise, a library stamp or the library's folio number written on the source is prima-facie, but it doesn't have semantic value to the chant and so it is meta-data. The library accession number for the source, transcriber's name and date, hyperlinks to digital photographs of the source, and so on, are not on the face of the source and so they are meta-data. A chant title that is known by the transcriber, but that does not appear on the source, is meta-data. (If, however, a chant title is on the face of the source, the title could appear in two places in the file: as prima-facie content and as meta-data.)

Definition 8.  Character data are data that appear in an instance document between tags, and not within the angle brackets of any tag. (For example, "<STRONG>Hello</STRONG>" consists of the character data "Hello" and the tags "<STRONG>" and "</STRONG>".) It is often said that XML documents consist of character data and "markup." We prefer the term "tags" instead of "markup," because "markup" suggests that tags are used for presentation formatting, whereas NeumesXML tags are used to record meta-data about the source, not to format the document for presentation. In NeumesXML, character data are reserved for prima-facie semantic content, consisting mainly of the chant text, the chant notation, and rubrical text. All other content of NeumesXML is placed inside tags.

Definition 9.  An Attribute is a part of a tag that gives content information. Every tag has a name plus one or more Attributes. (Each Attribute can be required or optional in an instance document, depending on the rules stated in the Schema.) Each Attribute has a name that is unique within the tag. It is followed immediately by an equals sign "=" and a Value enclosed in quotes (either single or double). In this manner a tag (be it paired or "empty") can carry content information within itself. And so, in NeumesXML one sees many tag declarations that put desciptive information about a source inside the tag. For example, an instance document can have the tag "<genre type='Gr'/>", where "genre" is the tag name, and "type" is an Attribute of genre whose value is "Gr" in this instance.


II. STRUCTURE OF THE NeumesXML SCHEMA

The root Element is named "NeumesXML". All content of a NeumesXML instance document is contained within the paired tag "<NeumesXML>...</NeumesXML>". This pair appears exactly once in any NeumesXML instance document.

Under the root Element there are three branches. Each of these parts must appear exactly once in an instance document, and they must appear in this order:
  • encoding_declaration
  • description_part
  • transcription_part

  • Tags pertaining to one of these parts cannot appear within a different part.

    The encoding_declaration states the version number of NeumesXML language that is being used, and simplifies computer-processing by declaring whether the content is an Eastern or Western chant.

    The description_part consists entirely of tags. No character data (i.e., inter-tag data) can appear in this part. Any content of this part is contained entirely within the Attributes of tags.

    In the transcription_part, semantic content on the face of the source manuscript source is transcribed as character data (i.e., it appears between tags). The data representation for neumatic symbols is as Unicode[tm] Private Use Area characters, where sequences of such characters conform to a regular grammar. This methodology was chosen mainly for computatinoal efficiency during pattern-matching searches (e.g., for matching melodic contours), during certain operations for musicological analysis, and so on, where a large number of transcriptions must be inspected, possibly located at various locations on the Worldwide Web. Tags in this part record non-semantic features that appear on the face of the manuscript source, and to record in-context any editorial judgements that were made by the transcriber.

    Any transcription markup not defined by this Schema (such as for purposes of visual-presentation styling, or for documenting the results of musicological analysis) must be done in a secondary XML file that is separate from the NeumesXML transcription itself. Any DTD or XML Schema that defines such "secondary markup" of NeumesXML transcriptions will typically be written by end-users. The mechanism by which a secondary markup document can refer to specific locations, or spans of locations, in a NeumesXML transcription shall typically be via the XPointer language. (Such a secondary-markup document could refer to locations in multiple NeumesXML transcriptions, so that comparative presentation could be effected.) Currently, no XML processors are available that handle XPointer, but we expect such processors to be available in the near future. When they are, the NEUMES technical team will provide paradigms for using the XPointer language in a DTD, XML Schema, XSLT, and XML instance documents to refer to locations in NeumesXML transcriptions.




       Copyright © 2003 by Louis W. G. Barton.