Neumed & Ekphonetic Universal Manuscript Encoding Standard


SOFTWARE  DESIGN  HELP  help

  • Integration of Sourcecode and Documentation
  • Data-driven Programming

  • I. Integration of Sourcecode and Documentation
    PDF
    facsimile
    Design
    Document
    Topical 'master'
    XML file
    specialized
    Transforms
    'process'
    XML files
    A long-standing problem in software engineering that greatly needs to be resolved is the integration of sourcecode and documentation. Design documents normally are classified into three layers:
    1. system requirements specification (SRS), which is a high-level narrative discussion of what the software system must do, expressed largely from the viewpoint of end-users, i.e., in terms of the domain of application;
    2. abstract design documents, viz., a formal model that software engineers create at the "articulation point" which joins the domain of application to the discrete, deterministic idiom of digital computers (this involves making 'ontological commitments', as it is sometimes said in the literature);
    3. technical documentation that explains a concrete implementation of the formal model, given that many different implementations are possible using different data representations, programming lanuages, and so on.
    For the purpose of this discussion, we won't distinguish between these layers of documentation, but instead just use the term 'design documents' generally.

    Usually, design documents are created during the first phases of software development. In later phases, time and energy increasingly are absorbed by software implementation problems: it becomes burdensome to maintain the design documents (especially if the project is thinly funded or insufficiently staffed, and when expectations for operable programs are mounting from outside observers). If program sourcecode and design documents have to be maintained separately, the documentation often becomes out-of-date and inconsistent with respect to the implementation.

    Various solutions to this problem have been proposed and tried in the past forty years or so. The technique usually relied upon by small projects is to comment sourcecode by 'inline documentation'; in larger, more well-funded projects (e.g., in development of an operating system at IBM), documentation is maintained continuously by specialists (called 'technical writers'), while the programmers focus on just the sourcecode and inline documentation. A more general solution to this problem needs to be found, however.

    In the NEUMES Project, we wish to maintain sourcecode and documentation both at a single point, that is, as integrated documents. The 'documentation' aspect of it should have the full features of a modern word-processor (viz., use of colors, fonts, graphics, automated tables of contents, indexes, and so on), and be exportable to PDF (Portable Document Format) for distribution on the Web. Sourcecode should be distinguished by the use of word-processing styles, which may have a distinctive typographical presentation or even be suppressed in versions of the PDF that are intended for non-technical readers.

    A word-processing program* where the native (i.e., internal) representation of documents is XML, and which implements styles also via XML, allows us to extract in XML just the sourcecode that is relevant to a particular topic [cf., "Topical master XML file," in the diagram above]; this 'extraction' is done by XSLT (Extensible Stylesheet Language Transformations). The master XML document on one topic can be further transformed via XSLT into any number of specialized XML file-formats that are needed by processes in the concrete application [cf., "process XML files," in the diagram above]. This technique might not be feasible to integrate with current IDE's (Integrated Development Environments) where we do our Java programming. Also, it likely isn't worth the effort of maintaining HTML code (viz., webpages) of the NEUMES website by this technique. Nevertheless, this tecnhique seems feasible for all kinds sourcecode in the XML family of languages, as well as JavaScript, database specifications, and so on. For example, the NEUMES symbol taxonomy can be maintained as a high-level design document with lots of text, illustrations, etc., and the NEUMES_characters.pen file can be generated from it.

    * Our plan is to implement this technique by using the word-processing component of OpenOffice: it satisfies our criteria above (viz., native XML documents, and styles denoted in XML); it is freely licensed; and it runs on most computer platforms. Also, the database component of OpenOffice uses HSQLDB as its database engine: this engine is implemented entirely in the Java programming language; it is 'XML-friendly'; and we have already adopted it for server-side database components of NEUMES software kit. Additional opportunities for integrating OpenOffice with NEUMES software thus might present themselves. We considered also Microsoft's new version of Word (cf., Office InfoPath), which they claim has sophisticated support for XML: we concluded, however, that Microsoft's entry into this field is largely just a veneer over their proprietary Word format for documents; too, the product is expensive, and it is bound to specific computer platforms. We also evaluated YAWC and other programs that export documents from Microsoft Word to XML, but we found these solutions deficient in various regards. Nevertheless, we expect that native-XML format for documents is the 'wave of the future', and so more products in this field probably are forthcoming.
    II. Data-driven Programming
    PDF
    facsimile
    XML Schema
    Specification
    Topical 'master'
    XML file
    specialized
    Transforms
    'process'
    XML files
    The extent to which the human mind can handle complexity is quite small compared to the capabilities of computer software. Complexity is usually not an impediment for automated processes as programmed for computers [except, of course, regarding the 'boundary conditions' of computability, and practical limits on the time- and space-complexity of computable processes]. The complexity of a software system concerns not only the amount of detail and inter-dependencies found in the software when considered in a 'snapshot' view at a particular moment: perhaps of greater concern is the complexity of maintaining integrity and consistency regarding internal relationships in the software when parts of the software are changed (such as, in response to the evolving requirements of end-users, or the creation of new data types).

    Following the von Neumann model of computation [cf., John von Neumann, "First Draft of a Report on the EDVAC," University of Pennsylvania, 1945; and his subsequent publications], software is divided basically into processes and data. Today, this distinction is somewhat outmoded as, for instance, the rules of a knowledge-based system (or, 'expert system') are stored as data, and yet they define processes to be executed at runtime; conversely, sourcecode of processes are sometimes treated as data (such as with the visualization of the NeumesXSLTServlet.java and other sourcecode in the 'Project Reports' section of this website). Nevertheless, if one bears in mind that a simple processes/data dichotomy is not universally valid, this basic classification continues to be a useful perspective on software.

    Historically, the main focus of software development has been on processes (viz., algorithms and programming). If one has a process for computing the trajectory of a cannonball or missile, for example, then one simply 'plugs in' the data for parameters or variables, and computes the result. Indeed, the name 'computer' is suggestive of this emphasis on processes. In recent decades, however, the purposes to which these machines are being put have broadened greatly, and today one might more appropriately call them 'information machines', or some such term. More powerful programming-language processes for manipulation of string data (i.e., textual information) were developed in the late 1960s (e.g., the PL/1 language at IBM), and this trend has spread into the manipulation of images (static and video), audio, and so on, plus various representations of human thought by abstract languages comprised of symbols or icons. Although processes have also tended to become more hierarchical (such as, the inheritance hierarchy, polymorphism, and so on, in an Object-Oriented programming language), one reasonably can say that the creation of complex structures of information in software is far outpacing the analogous trend for processes.

    We can use the term "data stack" for these complex, hierarchical structures of data. This term borrows from the concept of a protocol stack in computer systems, as in a network protocol or the Internet Protocol (IP); each layer of the stack is specialised to perform one type of processes on the data, and 'messages' get passed 'up' and 'down' the stack. In the case of a "data stack," each higher layer represents information of a more abstract type, until, at the highest layers, information hardly looks like data at all (in the historical view of 'data'). As with protocol stacks, each layer in a "data stack" depends for its meaning on the layer immediately below it, and so forth recursively down to the lowest layer, where it is 'data' by the historical view. Typically, the representation of information in the top layers is designed to be easily understood and manipulated by humans. Often the task of software engineering involves creating a representation that is intuitively obvious in the idiom of the domain of application (such as, for instance, medieval musicology), while, at the same time, sufficiently formal and well-defined that can be translated to the representation of the next-lower layer, and eventually become 'low-level data' without loss of semantic integrity.

    Maintenance of integrity and consistency within a data stack if functional requirements evolve or new data types are added, is becoming increasingly difficult with the growing complexity of data. (Certainly, the complexity of structure and content in the NeumesXML Schema is an example of this.) Section #1 on this page describes a strategy for automated maintenance of data relationships via XML Transformations; it does not, however, address the problem of maintaining integrity and consistency in processes when design changes occur in the data stack. Historically, the focus in software development has been on processes, and data were considered as ... well, 'data', viz., something that you feed into a process. For this to work, many assumptions about the data must be made at the time the processes are written as programs. This is viable in applications where the data types are stable and well understood. Massive maintenance problems can, however, result if fundamental changes to the data are required (recall the catastrophic expense involved in the 'Y2K' problem, when the year data of dates just had to be changed from two digits to four digits).

    Partial solutions were achieved in programming, first by using abstraction barriers, and later by using encapsulation of processes and data under Object-Oriented programming. (Both solutions were intended to tame the side-effects of 'spaghetti code' programming.) These solutions when well applied, succeeded only in 'modularising' the inter-depedencies, such that maintenance could be localised to a few classes if the structure of data was changed. It did not solve the problem that such maintenance is labor-intensive and prone to mistakes of integrity and consistency.

    * Our plan is to implement this technique by using the word-processing component of for long-standing problem in software engineering that greatly needs to be resolved is the integration of sourcecode and documentation. Design documents normally are classified into three layers:
    1. system requirements specification (SRS), which is a high-level narrative discussion of what the software system must do, expressed largely from the viewpoint of end-users, i.e., in terms of the domain of application;
    2. abstract design documents, viz., a formal model that software engineers create at the "articulation point" which joins the domain of application to the discrete, deterministic idiom of digital computers (this involves making 'ontological commitments', as it is sometimes said in the literature);
    3. technical documentation that explains a concrete implementation of the formal model, given that many different implementations are possible using different data representations, programming lanuages, and so on.
    For the purpose of this discussion, we won't distinguish between these layers of documentation, but instead just use the term 'design documents' generally.

    * Our plan is to implement this technique by using the word-processing component of OpenOffice: it satisfies our criteria above (viz., native XML documents, and styles denoted in XML); it is freely licensed; and it runs on most computer platforms. Also, the database component of OpenOffice uses HSQLDB as its database engine: this engine is implemented entirely in the Java programming language; it is 'XML-friendly'; and we have already adopted it for server-side database components of NEUMES software kit. Additional opportunities for integrating OpenOffice with NEUMES software thus might present themselves. We considered also Microsoft's new version of Word (cf., Office InfoPath), which they claim has sophisticated support for XML: we concluded, however, that Microsoft's entry into this field is largely just a veneer over their proprietary Word format for documents; too, the product is expensive, and it is bound to specific computer platforms. We also evaluated YAWC and other programs that export documents from Microsoft Word to XML, but we found these solutions deficient in various regards. Nevertheless, we expect that native-XML format for documents is the 'wave of the future', and so more products in this field probably are forthcoming.

     
    Copyright © 2004, Louis W. G. Barton. Copyright © 2006, the University of Oxford.