|
|
| I. Integration of Sourcecode and Documentation | |||||||
![]() |
|||||||
| PDF facsimile |
Design Document |
Topical 'master' XML file |
specialized Transforms |
'process' XML files |
|||
Usually, design documents are created during the first phases of software development. In later phases, time and energy increasingly are absorbed by software implementation problems: it becomes burdensome to maintain the design documents (especially if the project is thinly funded or insufficiently staffed, and when expectations for operable programs are mounting from outside observers). If program sourcecode and design documents have to be maintained separately, the documentation often becomes out-of-date and inconsistent with respect to the implementation. Various solutions to this problem have been proposed and tried in the past forty years or so. The technique usually relied upon by small projects is to comment sourcecode by 'inline documentation'; in larger, more well-funded projects (e.g., in development of an operating system at IBM), documentation is maintained continuously by specialists (called 'technical writers'), while the programmers focus on just the sourcecode and inline documentation. A more general solution to this problem needs to be found, however. In the NEUMES Project, we wish to maintain sourcecode and documentation both at a single point, that is, as integrated documents. The 'documentation' aspect of it should have the full features of a modern word-processor (viz., use of colors, fonts, graphics, automated tables of contents, indexes, and so on), and be exportable to PDF (Portable Document Format) for distribution on the Web. Sourcecode should be distinguished by the use of word-processing styles, which may have a distinctive typographical presentation or even be suppressed in versions of the PDF that are intended for non-technical readers. A word-processing program* where the native (i.e., internal) representation of documents is XML, and which implements styles also via XML, allows us to extract in XML just the sourcecode that is relevant to a particular topic [cf., "Topical master XML file," in the diagram above]; this 'extraction' is done by XSLT (Extensible Stylesheet Language Transformations). The master XML document on one topic can be further transformed via XSLT into any number of specialized XML file-formats that are needed by processes in the concrete application [cf., "process XML files," in the diagram above]. This technique might not be feasible to integrate with current IDE's (Integrated Development Environments) where we do our Java programming. Also, it likely isn't worth the effort of maintaining HTML code (viz., webpages) of the NEUMES website by this technique. Nevertheless, this tecnhique seems feasible for all kinds sourcecode in the XML family of languages, as well as JavaScript, database specifications, and so on. For example, the NEUMES symbol taxonomy can be maintained as a high-level design document with lots of text, illustrations, etc., and the NEUMES_characters.pen file can be generated from it. * Our plan is to implement this
technique by using the word-processing component of
OpenOffice: it satisfies our criteria above (viz., native
XML documents, and styles denoted in XML); it is freely licensed; and it runs on most
computer platforms. Also, the database component of OpenOffice uses
HSQLDB as its database engine:
this engine is implemented entirely in the Java programming language; it is 'XML-friendly'; and we have
already adopted it for server-side database components of NEUMES software kit. Additional opportunities for
integrating OpenOffice with NEUMES software thus might present themselves. We considered also
Microsoft's new version of Word (cf., Office InfoPath),
which they claim has sophisticated support for XML: we concluded, however, that Microsoft's entry into this field is largely just
a veneer over their proprietary Word format for documents; too, the product is expensive, and it is bound to
specific computer platforms. We also evaluated YAWC and other programs that export documents from
Microsoft Word to XML, but we found these solutions deficient in various regards.
Nevertheless, we expect that native-XML format for documents is the 'wave of the future', and so more products
in this field probably are forthcoming.
|
|||||||
| II. Data-driven Programming | |||||||
![]() |
|||||||
| PDF facsimile |
XML Schema Specification |
Topical 'master' XML file |
specialized Transforms |
'process' XML files |
|||
Following the von Neumann model of computation [cf., John von Neumann, "First Draft of a Report on the EDVAC," University of Pennsylvania, 1945; and his subsequent publications], software is divided basically into processes and data. Today, this distinction is somewhat outmoded as, for instance, the rules of a knowledge-based system (or, 'expert system') are stored as data, and yet they define processes to be executed at runtime; conversely, sourcecode of processes are sometimes treated as data (such as with the visualization of the NeumesXSLTServlet.java and other sourcecode in the 'Project Reports' section of this website). Nevertheless, if one bears in mind that a simple processes/data dichotomy is not universally valid, this basic classification continues to be a useful perspective on software. Historically, the main focus of software development has been on processes (viz., algorithms and programming). If one has a process for computing the trajectory of a cannonball or missile, for example, then one simply 'plugs in' the data for parameters or variables, and computes the result. Indeed, the name 'computer' is suggestive of this emphasis on processes. In recent decades, however, the purposes to which these machines are being put have broadened greatly, and today one might more appropriately call them 'information machines', or some such term. More powerful programming-language processes for manipulation of string data (i.e., textual information) were developed in the late 1960s (e.g., the PL/1 language at IBM), and this trend has spread into the manipulation of images (static and video), audio, and so on, plus various representations of human thought by abstract languages comprised of symbols or icons. Although processes have also tended to become more hierarchical (such as, the inheritance hierarchy, polymorphism, and so on, in an Object-Oriented programming language), one reasonably can say that the creation of complex structures of information in software is far outpacing the analogous trend for processes. We can use the term "data stack" for these complex, hierarchical structures of data. This term borrows from the concept of a protocol stack in computer systems, as in a network protocol or the Internet Protocol (IP); each layer of the stack is specialised to perform one type of processes on the data, and 'messages' get passed 'up' and 'down' the stack. In the case of a "data stack," each higher layer represents information of a more abstract type, until, at the highest layers, information hardly looks like data at all (in the historical view of 'data'). As with protocol stacks, each layer in a "data stack" depends for its meaning on the layer immediately below it, and so forth recursively down to the lowest layer, where it is 'data' by the historical view. Typically, the representation of information in the top layers is designed to be easily understood and manipulated by humans. Often the task of software engineering involves creating a representation that is intuitively obvious in the idiom of the domain of application (such as, for instance, medieval musicology), while, at the same time, sufficiently formal and well-defined that can be translated to the representation of the next-lower layer, and eventually become 'low-level data' without loss of semantic integrity. Maintenance of integrity and consistency within a data stack if functional requirements evolve or new data types are added, is becoming increasingly difficult with the growing complexity of data. (Certainly, the complexity of structure and content in the NeumesXML Schema is an example of this.) Section #1 on this page describes a strategy for automated maintenance of data relationships via XML Transformations; it does not, however, address the problem of maintaining integrity and consistency in processes when design changes occur in the data stack. Historically, the focus in software development has been on processes, and data were considered as ... well, 'data', viz., something that you feed into a process. For this to work, many assumptions about the data must be made at the time the processes are written as programs. This is viable in applications where the data types are stable and well understood. Massive maintenance problems can, however, result if fundamental changes to the data are required (recall the catastrophic expense involved in the 'Y2K' problem, when the year data of dates just had to be changed from two digits to four digits). Partial solutions were achieved in programming, first by using abstraction barriers, and later by using encapsulation of processes and data under Object-Oriented programming. (Both solutions were intended to tame the side-effects of 'spaghetti code' programming.) These solutions when well applied, succeeded only in 'modularising' the inter-depedencies, such that maintenance could be localised to a few classes if the structure of data was changed. It did not solve the problem that such maintenance is labor-intensive and prone to mistakes of integrity and consistency. * Our plan is to implement this
technique by using the word-processing component of
for long-standing problem in
software engineering that greatly needs to be resolved is the integration of sourcecode and documentation.
Design documents normally are classified into three layers:
* Our plan is to implement this
technique by using the word-processing component of
OpenOffice: it satisfies our criteria above (viz., native
XML documents, and styles denoted in XML); it is freely licensed; and it runs on most
computer platforms. Also, the database component of OpenOffice uses
HSQLDB as its database engine:
this engine is implemented entirely in the Java programming language; it is 'XML-friendly'; and we have
already adopted it for server-side database components of NEUMES software kit. Additional opportunities for
integrating OpenOffice with NEUMES software thus might present themselves. We considered also
Microsoft's new version of Word (cf., Office InfoPath),
which they claim has sophisticated support for XML: we concluded, however, that Microsoft's entry into this field is largely just
a veneer over their proprietary Word format for documents; too, the product is expensive, and it is bound to
specific computer platforms. We also evaluated YAWC and other programs that export documents from
Microsoft Word to XML, but we found these solutions deficient in various regards.
Nevertheless, we expect that native-XML format for documents is the 'wave of the future', and so more products
in this field probably are forthcoming.
|
|||||||