The Neume Notation Optical Character-Recognition (OCR) Program is written in the Java programming language. Java "applets" can ordinarily be run in Java-enabled Web browsers. Security restrictions imposed by Web browsers, however, prohibit access to the file system on the user's computer by a Java applet. Consequently, the Neume OCR Program is currently available only as a stand-alone Java "application" and cannot be run over the Web.
Use of the Neume OCR Program on your computer requires that you have JDK (Java Development Kit) version 1.1, or higher, installed. Please contact the author for more information.
The fly in the ointment for the Neume Notation Editor is that the data-entry process is laborious. We think it would be tremendously helpful to have an optical character-recognition (OCR) program that could give a good first approximation of transcriptions from scanned, manuscript photographs to the "universal" data representation of the Neume Notation Project.
The falling cost of computer processor cycles and computer memory, coupled with substantial advances in software theory and practice, have made feasible similar OCR applications, for example, automatic text recognition on hand-written checks. The huge financial resources of banks, and the cost-saving incentive for developing such software, have made possible this dream, which only a few years ago was considered to be infeasible. Unfortunately, there is no such financial incentive nor capital to drive similar work in medieval musicology. In general, the interest in medieval music notation is confined to a small number of musicologists at universities and a few religious organizations. In commercial marketing terms, this produces a very narrow, vertical market and one without enough financial base to support the development of OCR software for neume notation.
The purpose of the Neume OCR Program is to demonstrate a proof-of-concept, that through the use of neural networks and genetic algorithms (discussed below), software could be evolved to do a good first-approximation of neume recognition from scanned manuscripts. Our purpose is to show that such software could be developed almost free of cost, and make feasible an automatic transcription to the Neume Notation Project's data representation. If we can successfully evolve software to recognize just one style of neume notation, then it should be mainly a matter of iteration to develop software to recognize other notational styles.
The sample screen output, below, is from the Neume OCR Program. The training images are in the lower left quadrant. After training the program, the user tests the program's accuracy of recognition. The user traces a neume to be recognized in the blue box at lower right. The results of the recognition are displayed in the dialog box at upper right. In this example, the program reported that it recognized (from left to right) pes, pes, virga.
In the current version of the Neume OCR Program, we use four neume forms and up to three "training" examples of each form. The training examples come from scanned images, which we have pre-processed by hand, using a "paint" program. The scanned images were cleaned up to eliminate extraneous markings and background noise, and were saved as black-and-white images. Each image was scaled to a uniform 24-by-24 pixel matrix. The images are stored individually in .gif graphics files.
The user enters "testing data" into the program by moving the mouse around in the test drawing area (see screen snapshot, above). The dialog box in the upper right-hand quadrant of the screen reports which character the program believes it has recognized from the user's drawing.
Preliminary results of the Neume OCR Program have been encouraging. Nevertheless, we have only subjective measures of accuracy from our trial runs; a quantitative metric of the accuracy of recognition is needed. It has been a useful expedient during program development to use mouse traces as input for testing accuracy of recognition. The next step should be to present the program with scanned images for recognition. Success in this regard would make a much more convincing proof-of-concept, as we would have more quantifiable results.
Initially, the scanned images we present for recognition could be pre-processed to clean up "background noise." We could gradually increase the difficulty of the recognition process by first allowing "noisy" specs in the image, then allowing grey-scale images instead of only black-and-white, and finally using images from scanned manuscripts that have considerable image degradation and/or messy handwriting.
To improve accuracy, the number of training cases needs to be increased. Perhaps the 24-by-24 pixel matrix needs to be enlarged to provide sufficient feature resolution. More importantly, the structure of the neural network should be fine-tuned using a genetic algorithm (refer to discussion below).
Ultimately, we wish to present the program with a scanned image of an entire page of neume notation. Doing so would present the Neume OCR Program with a number of difficulties, as follows:
Rather than "hard-coding" the samples of neume forms that are used in training the neural net (as we have done), the program should allow the user to specify arbitrary image files for use in training the network. In this way, the user could train the neural network to recognize any one of the many neume notation styles. The user would have to tell the Neume OCR Program which neume family was being worked on, so that the program could load the proper recognition network.
The Neume OCR Program uses a neural network to do the "training" and recognition of test images. Traditional artificial intelligence (AI), following the work of Alan Turing, John McCarthy, and Marvin Minksy, has approached problems of intelligence largely in terms of symbol manipulation and tree searches. Recently, however, there has been much interest in solving problems by using neural network programs. Neural network programs use a rough approximation of the behavior of neurons in biological nervous systems. Rather than being explicitly programmed, neural network programs "learn" by being presented repeatedly with training data.
The type of neural network that is most commonly used, and the one employed in the Neume OCR Program, is the feed-forward backwards error propagation model. This model uses three sets of neurons: a set of input neurons, as set of hidden neurons, and a set of output neurons. The input neurons are fully connected to the hidden neurons (i.e., each input neuron is connected to every hidden neuron), and the hidden neurons are fully connected to the output neurons. The individual neurons have "weights," that are floating-point numbers ranging in value from -0.5 to +0.5, which are analogous to the activation energy fed to a biological neuron. The weight of a hidden neuron is calculated as the sum of terms, where each term is the activation weight of the input neuron multiplied by the weight of the connection between the input neuron and the hidden neuron: Sum((activation of input) * (weight of input-to-hidden)). This sum is then passed to a Sigmoid function, which returns the value 1/(1+(-x)^2), for a given sum x.
For each neume form that we wish to recognize, an array of input values is presented to the input neuron. The neurons are initially assigned a pseudo-random number. As the network is trained repeatedly with the sample input data, the output data structures are increasingly refined. The outputs are compared to the inputs, and a record is kept of the amount of error in recognition. The training repeats up to 3,000 times and stops when the error, summed over the output neurons, falls below 0.1.
After the neural net has been trained, a new set of data are presented to the network, and the network attempts to recognize which neume form has been presented. This is useful because the "testing" data are similar to, but not homomorphic with, the training data. Thus, the neural net does a kind of feature recognition (or pattern matching) to identify the form of the presented test neume.
We are currently working on a genetic algorithm to optimize the neural net of the Neume OCR, but at this point we cannot offer a report. The idea is to have an extrinsically-defined fitness function as accuracy-of-match to the testing images. "Chromosomes" are defined as arrays of duploid "genes." In a population of candidate neural nets, they can "reproduce" with mutation or gene cross-over. We wish to implement "parasites" into the population, in order to avoid having the evolution of the networks get stuck at local optima. That is to say, if a recognition-network evolves that performs more accurately than any network that is a short Hamming-distance away from it (viz., a close relative) This often happens even though the population might not be at the globally-optimal network configuration. To prevent this, some "parasitism" needs to be introduced that will tend to prevent the population from reaching stasis.
Genetic algorithms have been shown to be effective in some contexts for evolving efficient software, or software that performs a task well. Hillis** reports on a genetic algorithm he used for evolving a sorting program. Data sorting algorithms are of great interest in computer science, because of the massive amounts of data that businesses and other organizations must deal with routinely. A small saving in the time required to sort a small set of data can be multiplied over millions of records and result in a substantial benefit. For relative comparison of sorting algorithms, it has become traditional in the literature to use a 16-element set of data. In 1962, Bose and Nelson announced an algorithm that required only 65 swaps to correctly sort the data set. Their results were surpassed in 1964 by Batcher, Floyd, and Knuth, who found a 63-swap algorithm; more generally, their algorithm could sort a data set of n elements using n log^2 n - 1 exchanges. Surprisingly, in 1969 Shapiro announced a 62-swap algorithm, and later that year Green published a 60-swap version. The remarkable thing about Hillis' 1991 paper is that he was able to evolve a program using a genetic algorithm, which could sort the data set in as few as 61 exchanges. In other words, he did not design the sorting algorithm himself; instead, he had "autonomous software agents" reproduce, mutate, and compete with each other for "fitness" in a "world" where efficient sorting is the key to survival. Hillis used a genetic algorithm in conjunction with "parasites" that would prevent evolution from stopping after the algorithm had found a local optimum.
Hillis' results were achieved at very high cost in terms of computer processing cycles, but at virtually no cost in human analysis, design, and programming. While the evolved software was marginally less efficient than the best human-designed software for this task, nevertheless, as an engineering solution it was remarkably successful. One must be wary of the cost/benefit ratio for developing vertical-market software such as an OCR for medieval neume manuscripts. Typically genetic algorithms are computationally expensive, but as computer machine cycles and memory become less expensive (which seems a certainty), it would appear that using autonomous evolution of software programs to satisfy a small market niche will become commonplace.
|*||The following individuals made contributions or helpful suggestions to the design and coding of the Neume OCR Program: Dave Cliff; Christopher Evan; Edward Kogan; Mark Watson.|
|**||W. Daniel Hillis, "Co-Evolving Parasites Improve Simulated Evolution as an Optimization Procedure," Artificial Life II, vol. X (1991), pp. 313-24.|
Homepage E-mail Glossary Audio
Revision: 12 December 1999
Copyright © 1998, 1999, Louis W. G. Barton