Risk Analysis Report
22 March 2002
This document is a risk analysis report for the NEUMES Project. The aim of the report is to identify the main risks right now, and give an analysis of the risks. The two main risks are how to check the syntax of the Unicode NEUMES data in the XML tags and how to ensure backward compatibility between NeumesXML and the Digital Scriptorium Transcription DTD.
§ Check Syntax of the Unicode NEUMES data
The motivation for checking the syntax of the Unicode NEUMES data is to verify that a NeumesXML document is not only well-formed but that it is also a Schema-valid document. A NeumesXML document could be a well-formed XML document but it may not necessarily be NeumesXML Schema-valid or contain gramatically-correct strings of NEUMES transcription characters. Only well-formed and NeumesXML-valid documents should be processed. A context-free grammar for NEUMES data already exists. If an XML Schema is created for the context-free grammar, existing parsers can be used to perform validation not just on the meta-data but also on transcription character strings. The Xerces parser is freely available and can be used for this purpose. Xerces supports UTF-8.
The other model for performing XML validation is to write the actual Java code. The Java Architecture for XML Binding (JAXB) is an architecture that provides an API [application programming interface] and tools that automate the mapping between XML documents and Java objects. An XML Schema definition is mapped into Java classes and these classes perform the validation. The JAXB advocates argue that the generated classes perform validation faster than a SAX or DOM parser. However, the drawbacks with this approach are that JAXB is only in the early access stages and it does not yet support XML Schema. Given these restrictions it would be necessary to roll-your own implementation of the JAXB specification and this would be far more labor intensive than using existing parsers to perform validation. See, http://java.sun.com/xml/jaxb/ for more information on JAXB.
A risk involved with Xerces parser is that it is a proprietary program. I believe the risks involved with this are minimal, because Xerces has an excellent reputation and there has been widespread adoption of the parser. Xerces releases new versions on a regular basis and the project is aware of standards development. The Xerces Project is part of the Apache Software Foundation . If JAXB were not in early access, I would also have minimal reservations about using their proprietary software.
§ Backward Compatibility between NeumesXML and the Digital Scriptorium Transcription DTD
One of the main goals for this project is to provide backward compatibility between NeumesXML and the Digital Scriptorium DTD. The two main benefits to backward compatibility are that there is not duplication of effort in transcribing documents that contain text and music notation and that users can switch between text-only and text-and-chant examination. The main risk associated with these goals is that the Digital Scriptorium Transcription is defined as a DTD, and the NEUMES project will be defining an XML Schema for the NeumesXML data. DTD and XML Schema are not very compatible. There are three approaches to solving this problem.
1. All DTD solution
If the Digital Scriptorium folks update their DTD with a <NEUMES> tag and the NEUMES Project develops a NeumesXML DTD that would reference the Digital Scriptorium DTD, backward compatibility is guaranteed. However, the risks associated with this approach are high, because DTD syntax does not support the types of validation needed for NEUMES transcription data. [Remark that a NeumesXML DTD does exist (see, NeumesXML.dtd under 'Data Representation') that covers all the elements of the NeumesXML Schema. Solution #1 is currently infeasible, however, due to the fact that Digital Scriptorium represents meta-data about manuscript sources as character data, but NeumesXML represents these meta-data as attributes of tags. These two coding methods are formally equivalent (viz., either of them can be transformed into the other automatically), but they cannot be mixed in a Schema-valid transcription file. - l.b.]
2. All XML Schema solution
An XML Schema will be created for NeumesXML data. If an XML Schema were created for the Digital Scriptorium transcription DTD, we could easily integrate these two XML Schemas by using Namespaces, as follows:
<?xml version="1.0"?> <DS xmlns:="http://sunsite.berkeley.edu/DS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:NEUMES="http://purl.oclc.org/NEUMES/ref/NeumesXML" > <tei.2> <teiheader> .... </teiheader> <text> .. <NEUMES:tag1> ... </NEUMES:tag1> <NEUMES:tag2> .... </NEUMES:tag2> </text> </tei.2>
However, it is unlikely that the Digital Scriptorium folks will migrate the transcription DTD to an XML Schema. All their existing XML documents would have to be migrated and all existing applications software changed. Because the move to an all XML Schema solution is unlikely, the risk of adopting this approach is very high.
3. Mix-and-match DTD and XML Schema solution
The NEUMES Project needs a mechanism for integrating the existing Digital Scriptorium transcription DTD with the NeumesXML Schema. If the Digital Scriptorium folks update their DTD with a <NEUMES/> tag and ignore it in their applications programs, this placeholder can be used by NEUMES applications that are XML Schema aware, such that Digital Scriptorium transcriptions and NEUMES transcriptions can be combined. The question then becomes: what kinds of information should be stored in the attributes of the <NEUMES/> tag ?
Using the W3C XLink Recommendation (see, http://www.w3.org/TR/xlink/), a Digital Scriptorium transcription could link to a 'NEUMES application'. [XLink (i.e., XML Linking Language) allows elements to be inserted into XML documents in order to create and describe links between resources, where a 'resource' is any addressable unit of information or service. In particular, an XLink could be a hyperlink to a NeumesXML chant-transcription document, or a reference to a NEUMES applications program. At the time of this writing, Digital Scriptorium was simply using a <music> tag wherever neumation appears in a source manuscript.] The link would also need to contain an identifier, such as the URL of a NeumesXML transcription, or a filepath to the NEUMES software being used. And so, the Digital Scriptorium transcription DTD might contain the following:
<!ELEMENT NEUMES EMPTY> <!ATTLIST NEUMES %a.global; xlmns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink" xlink:type CDATA #FIXED "simple" xlink:href CDATA #REQUIRED xlink:role CDATA #IMPLIED xlink:title CDATA #IMPLIED >
I think the risks with this approach are a lot lower than with #1 and #2. It is feasible to ask the Digital Scriptorium folks to update their DTD with one <NEUMES/> tag that their applications should ignore. Also, I think using a W3C standard for performing the linking between the two different systems is a widely used approach. In terms of the user interface, there may be issues to work out but at least it ensures backward compatibility between the NeumesXML Schema and the Digital Scriptorium DTD.