| I. Overview of Transcription and Visualization | ||||||||
| ||||||||
| Source Artifact | Transcriber | Transcription Files |
Server | WWW | Client | |||
Intellectual Property Ownership: The source artifacts are hundreds of years old and, therefore, are in the public domain. Transcriptions, descriptions, and photographic images, however, may be protected by copyright. In such cases, please contact the copyright owner of the material for permission to use it. |
||||||||
| II. Architecture for a Distributed e-Library | ||||
![]() |
Transcription files are designed to be "first class objects" on the Web. By this we mean that a transcription file is retrievable directly by its own URL. Search engines and specialized databases for chant can act as indexes (or, catalogs) to help users find transcription files. We call this type of architecture a "distributed |
|||
| III. Content of a Transcription File | ||||
![]() |
||||
| Transcription File |
NeumesXML : Description Part |
NEUMES : Transcription Part | ||
|
1) the NeumesXML Description Part (detailed in Diagram IV, below); and 2) the NEUMES Transcription Part (discussed here, and detailed in Diagram V). The Transcription Part records all prima-facie semantic content of a source artifact. Specifically excluded from our definition of "semantic content" are: information about the handwriting of individual scribes (i.e., paleographic information); and illuminations, decorations, or other prima-facie content that does not inform the chant or the text. (Note, however, that such information can be recorded at a 'higher level', viz., in the NeumesXML markup.) We say that an encoding scheme is a lossless data representation if can capture all prima-facie semantic content of sources in the domain of discourse, and if the resultant data can satisfy all principal end-uses required by the domain of discourse. NEUMES (Neumed and Ekphonetic Universal Manuscript Encoding Standard) is a lossless data representation. NEUMES is a formal language (i.e., a set of strings, where set membership is decided by a formal grammar) whose alphabet consists of Unicode-compatible characters encoded in the UTF-8 standard encoding scheme. NEUMES is optimized for content-search across potentially millions of records, and pattern-matching involving uncertainty and what we call "complex traversal" (viz., random access) in linear data streams. |
||||
| IV. Content of NeumesXML Meta-data | ||||||||
| ||||||||
| Transcriber Information |
Transcription Chronicle |
Physical Characteristics of the Source |
Links to External Images |
Links to Catalog Indexes |
Attribution, Structural, & Editorial Remarks |
|||
|
NeumesXML also allows the transcriber to record descriptive information about the source artifact, the transcriber, the editorial methods used, and so on (see diagram), and--to a lesser extent--allows for markup of a transcription, such as to document the logical structure of the source and to insert in situ editorial comments. Unlike NEUMES data (which are Unicode character strings), NeumesXML tags are written in plain ASCII. The UTF-8 standard allows the mixing of ASCII and Unicode; this coding separation allows NEUMES content to be parsed unambiguously during "complex traversal" of linear data streams. |
||||||||
| V. Decomposition of the Transcription Part | ||||||||
| ||||||||
|
1) The first piece is always a chant text segment. It is a sequence of characters that records part of the intoned or recited text. 2) The second piece is 'optional'; it contains zero or more cantillation segments. These are further decomposed, below. A passage of recited text (i.e., intended to be spoken only) typically does not have any cantillation segments. If the source artifact contains a long passage of recited text without any neumation, then the entire passage can be encoded as one chant text segment. An intoned (or, chanted) text, however, is always segmented in the data, such that one chant text segment is one syllable or some other unit of text that the scribe treated individually for neumation in the source artifact. A cantillation segment is a NEUMES sequence, which we define as a sequence of characters, such that each character is a member of the NEUMES character set, where the sequence conforms to the NEUMES language. The NEUMES character set consists of 24-bit characters from the Private Use Area of the Unicode, as 'filtered' (or, restricted) by the NEUMES grammar.
The NEUMES language is 'generated' (or, defined) by the NEUMES grammar. The NEUMES grammar decides whether particular NEUMES sequences are grammatical according the NEUMES language (this verification process is typically done at the time of data-entry and during transformation of a transcription file for visualization).
|
||||||||