goto: The NEUMES Project homepage Neumed and Ekphonetic Universal Manuscript Encoding Standard  


§ Overview of Transcription and Visualization
§ Architecture for a Distributed e-Library
§ Content of a Transcription File
§ Content of NeumesXML Meta-data
§ Decomposition of the Transcription Part

I. Overview of Transcription and Visualization
Source Artifact Transcriber Transcription
Server WWW Client
Narrative A source artifact is transcribed by a transcriber into a transcription file. Typically, a transcription file is stored -- together with other transcription files -- on a server connected to the Worldwide Web (WWW). The client (viz., an end-user's computer) retrieves a transcription file via the Web for visualization (i.e., visual rendering) or for other use on the client computer.

Intellectual Property Ownership: The source artifacts are hundreds of years old and, therefore, are in the public domain. Transcriptions, descriptions, and photographic images, however, may be protected by copyright. In such cases, please contact the copyright owner of the material for permission to use it.

II. Architecture for a Distributed e-Library
Distributed architecture. Narrative Transcription files can exist on a client computer and on the Worldwide Web.
Transcription files are designed to be "first class objects" on the Web. By this we mean that a transcription file is retrievable directly by its own URL.
Search engines and specialized databases for chant can act as indexes (or, catalogs) to help users find transcription files.
We call this type of architecture a "distributed e-library."

III. Content of a Transcription File
Content of a transcription file
NeumesXML :
 Description Part
NEUMES : Transcription Part
Narrative A transcription file is an XML (Extensible Markup Language) file that has two principal parts:
     1) the NeumesXML Description Part (detailed in Diagram IV, below); and
     2) the NEUMES Transcription Part (discussed here, and detailed in Diagram V).
The Transcription Part records all prima-facie semantic content of a source artifact.
  • By "prima facie" we mean information that is visible on the face of the source without interpretive extrapolation.
  • By "semantic" we mean information that is significant to the meaning (or, the sense) of the chant and/or the text.
    Specifically excluded from our definition of "semantic content" are: information about the handwriting of individual scribes (i.e., paleographic information); and illuminations, decorations, or other prima-facie content that does not inform the chant or the text. (Note, however, that such information can be recorded at a 'higher level', viz., in the NeumesXML markup.)
    We say that an encoding scheme is a lossless data representation if can capture all prima-facie semantic content of sources in the domain of discourse, and if the resultant data can satisfy all principal end-uses required by the domain of discourse. NEUMES (Neumed and Ekphonetic Universal Manuscript Encoding Standard) is a lossless data representation.
    NEUMES is a formal language (i.e., a set of strings, where set membership is decided by a formal grammar) whose alphabet consists of Unicode™-compatible characters encoded in the UTF-8 standard encoding scheme. NEUMES is optimized for content-search across potentially millions of records, and pattern-matching involving uncertainty and what we call "complex traversal" (viz., random access) in linear data streams.

  • IV. Content of NeumesXML Meta-data
    Content of NeumesXML metadata
    of the Source
    Links to
    Links to
    & Editorial
    Narrative NeumesXML is an extension of XML, and it is defined as an XML Schema. NeumesXML plays several roles, principally as a wrapper (or, 'vehicle') for convenient disk storage and Internet transmission of NEUMES transcription data. Thus, transcriptions appear on disk and on the Web just as XML files (eg, "NeumesExample.xml"). NEUMES data appear in such a file between the NeumesXML tags <transcription_part> and </transcription_part>.
    NeumesXML also allows the transcriber to record descriptive information about the source artifact, the transcriber, the editorial methods used, and so on (see diagram), and--to a lesser extent--allows for markup of a transcription, such as to document the logical structure of the source and to insert in situ editorial comments.
    Unlike NEUMES data (which are Unicode™ character strings), NeumesXML tags are written in plain ASCII. The UTF-8 standard allows the mixing of ASCII and Unicode; this coding separation allows NEUMES content to be parsed unambiguously during "complex traversal" of linear data streams.

    V. Decomposition of the Transcription Part
    Decomposition of the Transcription Part
    Narrative The Transcription Part [cf., Diagram III for general discussion] contains a string of Unicode™ characters, which are treated as character data by XML. This string has a pattern that can repeat many times in the Transcription Part. (Such repetition is called a sequence and is denoted by wide-angle brackets.) The pattern has two principal pieces that always occur in the same order, as follows.
         1) The first piece is always a chant text segment. It is a sequence of characters that records part of the intoned or recited text.
         2) The second piece is 'optional'; it contains zero or more cantillation segments. These are further decomposed, below.

    A passage of recited text (i.e., intended to be spoken only) typically does not have any cantillation segments. If the source artifact contains a long passage of recited text without any neumation, then the entire passage can be encoded as one chant text segment. An intoned (or, chanted) text, however, is always segmented in the data, such that one chant text segment is one syllable or some other unit of text that the scribe treated individually for neumation in the source artifact.

    A cantillation segment is a NEUMES sequence, which we define as a sequence of characters, such that each character is a member of the NEUMES character set, where the sequence conforms to the NEUMES language. The NEUMES character set consists of 24-bit characters from the Private Use Area of the Unicode, as 'filtered' (or, restricted) by the NEUMES grammar.

      Notice See the file NEUMES_characters.xml for a list of NEUMES characters with mnemonic names.  

    The NEUMES language is 'generated' (or, defined) by the NEUMES grammar. The NEUMES grammar decides whether particular NEUMES sequences are grammatical according the NEUMES language (this verification process is typically done at the time of data-entry and during transformation of a transcription file for visualization).

      Notice See the file NEUMES_grammar.xml for the declaration of this grammar in terms of the mnemonic names that are listed in the NEUMES character set file.  

    Copyright © 2004-2007, Louis W. G. Barton.