This book describes a series of neural network models devised to represent music listening processes. Backpropagation, Adaptive Resonance Theory, and other connectionist procedures are used to model melodic perception, interpretation, and expression. Preface; Over the last two decades, Harold Fiske has developed a rigorous and refined theory of music cognition (Fiske 1984, 1990, 1993, 1996). The premise of Fiske's theory, first introduced in his 1990 publication, Music and Mind, is that music cognition is comprised of decisions made in the classification and comparison of tonal-rhythmic patterns, rather than knowledge about those patterns (e.g. key note, scale degree or meter). Thus, Fiske's interest is in the procedural knowledge of music cognition rather than in its declarative knowledge, an emphasis that has distinguished his work from that of many other researchers in the field. In effect, Fiske, wielding Occam's razor, asks the question, "What is left in music cognition once all cultural and historical stylistic features have been removed?" Or, to put it another way, what is the panstylistic, cross-cultural kernel of music cognition?
Fiske's answer is that musical units (tonal-rhythmic patterns) are compared with each other and this comparison process yields three categories of inter-pattern relationship: (1) two patterns are judged to be the same as each other (a P relationship), (2) they are judged to be related to each other (a P' relationship) or (3) they are judged to different from each other (a Pn relationship). Different listeners may come to different conclusions regarding the relationship between patterns, and their conclusions will be shaped by how many levels of a pattern-comparison hierarchy they are able or choose to negotiate. These inter-individual differences may be determined by, say, aural acuity, memory, or musical training or, more generally, by what facets of a sonic pattern are considered relevant by a given listener in a particular situation. The current volume presents an operationalization of Fiske's theory as a set of connectionist computer models.
In Chapter 1, Fiske provides a comprehensive review of the pros and cons of connectionist or neural network modeling, including support for the idea of the parallel processing of rhythm and pitch information from his own chronometric (or reaction-time) perceptual experiments. Central to this whole exposition are the similarities and differences between computer and biological neural networks, and the debate as to whether connectionist models reproduce neural activities, or whether they represent cognitive processing on a more metaphorical level. Historically, this has been a complex debate with passionate and eloquent adherents on both sides. Fiske notes, however, that, at the very least, a properly functioning neural network model supports the internal logic of the theory that initiated it. Moreover, if the neural net approximates human behavior, then its theory provides one plausible explanation of that behavior, thus bolstering the external validity of the theory. Therefore the prerequisites for maximizing both the internal and external validity of a theory are (1) a solid theory, (2) a well-designed connectionist model and (3) carefully collected human-response data.
With these principles in mind, in the ensuing chapters, Fiske leads the reader on a carefully conducted tour of his theory and the neural-net models derived from it. Chapter 2 revisits Fiske's cognitive theory of hierarchical decision making and (re)presents it as an operationalized conceptual model. The basic mechanics of a connectionist model are explained, including the defining of musical pitch information as a vector. Interestingly, the author also takes some time to address two objections to his theory: (a) that there are actually only two cognitive categories (same versus different) and (b) that there is an infinite array of possible cognitive categories. In addition to the arguments that Fiske presents at this juncture, there is also the evidence from writings on musical structure, admittedly circumstantial evidence perhaps, that suggest that Fiske's three categories accord with the intuitions of music theorists. I shall return to this issue at the end of this Foreword. Chapter 3 describes Fiske's supervised model. After further discussion of vectors, autoassociators and backpropagation, the construction of the connectionist model is portrayed in detail.
The goal of this model is to compare two tonal-rhythmic patterns and assign them to one of Fiske's three categories (P, P', Pn). For each tonal-rhythmic pattern, separate vectors for pitch class, octave and rhythm are constructed, the differences between two patterns for each parameter are computed, and then the difference computations are synthesized. The overall difference vector is assigned to one of the three categories but the boundaries between categories overlap, reflecting the disagreement among human listeners. This network is tested using 20 different melodies and seems to perform similarly to human subjects presented with the same melodies. The human-response data are then used to tweak the fuzzy boundaries between categories in the computer model and to test the model's ability to generalize to a second set of patterns. Chapter 4 demonstrates the wide applicability of Fiske's theory in that his pattern-comparison categories are germane not only to structural differences between patterns, but also to differences of interpretation and expression.
Reviewing a number of performance studies in music cognition, the author argues that structure and interpretation interact so intimately that they can effectively be equated. For this reason, other parameters of sound that are usually thought to convey a performer's expressive intention, are easily incorporated as extra vectors into the model. These supplementary parameters are attack/dynamic accent, metrical placement, intonation, and rubato. Chapter 5 expands the compass of the model by arguing that musical reasoning involves not only comparisons between successive patterns, but also between those that are temporally separated. To accommodate these more long-term, formal aspects of music listening, a new "harmony" model is presented. This harmony model has the added feature that any number of patterns can be stored as vectors, and all possible pair-wise comparisons between successive and non-successive patterns can be carried out. As was the case in Chapter 3, the new model is put through its paces and learns to categorize the relationships among 15 different patterns.
Fiske then goes on to provide a useful list of all the possible comparison cells that exist when three consecutive patterns are presented to the model (in each case yielding two successive and one non-successive pattern-comparison relationships). The reasoning is extended further to provide four formal-groundplan analyses of movements by Vivaldi, Couperin, Prokofiev and Joplin using Fiske's notation. The chapter concludes with an account of a further experiment, where human listeners prepared pattern-comparison analyses of the third movement of Stravinsky's Ebony Concerto. From the data obtained here, three general categorization strategies emerge, demonstrating how different groups of listeners negotiate different numbers of levels of Fiske's pattern-comparison hierarchy. Chapter 6 presents a new type of connectionist model, an unsupervised Adaptive Resonance Theory (ART) model. ART models differ from back-propagation models in that the network has no idea of the desired set of categorizations before being given its input vectors.
In this sense, ART models are closer to human learning than supervised models as human learners also often have to muddle through on their own without a clear learning goal. The chapter provides a general overview of ART architecture, including the vigilance factor that can be adjusted to fine-tune the network's ability to discriminate between patterns. The distinction is also made between bottom-up weights, the strengths of connections between input and output nodes and top-down weights, the strengths of connections between output and input nodes. The first set of weights is likened to short-term memory, the second set to expectancy. The author then unveils the details of his specific ART model and shows how, in this model, the pitch and rhythm of each tonal-rhythmic pattern are represented as a single, composite vector. To test its ability to learn the pattern-comparison trichotomy, the network is fed a vectoral representation of the opening phrase of Schubert's Wohin? from Die Schone Mullerin. Nineteen other melodies are then compared with it.
The precise pattern of categorization grouping depends on the vigilance factor, but overall, the model seems to behave in a similar fashion to a group of expert human listeners hearing the same melodies. Interestingly, no human listener was as discriminating as the ART network at its highest (most finely-tuned) vigilance level. The final chapter, chapter 7, ties together many of the ideas, both theoretical and practical, that have already arisen in the course of Fiske's exposition. Here the topics of universality, performance, imagery, consciousness and memory are all discussed and tied in with a comprehensive overview of their respective literatures. With regard to the last on this list, memory, Fiske finds convincing parallels between the way neural nets must store information and the so-called proceduralist paradigm in short-term memory studies (Crowder, 1993). The process of lateral musical thinking is also pondered, whereby pattern-comparisons are made not just within a single performance of a single piece, but between different performances or even between different pieces.
Thus, Fiske's model allows for a sort of cognitive "intertextuality," to borrow a commonly used postmodernist term. Of course, the notion of intertextuality, with its blurring of the boundaries between individual works, emphasizes the set of common references shared by a number of similar pieces. In terms of cognitive psychology, these references are manifested as style-specific expectations. Fiske concludes the chapter by considering the role of expectations and context on perception. By extension then, learning (and learning is ultimately what connectionist models do) is the modification of context and expectation. As is the case with all wide-ranging and stimulating theories, engaged readers of this volume will find ideas between the lines and beyond the page margins, forging links with their own areas of musical experience. One such association that intrigued this reader is the similarity between Fiske's pattern-decision trichotomy and the tenets of traditional formal theory. As already noted in my summary, Fiske makes this connection explicit by providing formal groundplans of four contrasting pieces towards the end of Chapter 5.
However, perhaps it would not be inappropriate to conclude this Foreword by elaborating on the resemblance a little more. Fiske's designations P, P' and Pn are synonymous with the terms "repetition" (or "reiteration", "recurrence", "restatement" etc.), "variation" and "contrast." As the following quotations suggest, these three processes are widely regarded as primary principles of organization in music: Forms in music result from the presentation of musical patterns in recognizable schemes of reiteration, variation and contrast...When the listener recognizes patterns of like and unlike elements and discerns their order of arrangement, he is conscious of the organizational scheme - the form. (Christ et al, 1973, p. 32) All cultures make some use of internal repetition and variation in their musical utterances. (Nettl, 1983, p. 40) How are formal designs created in music? By means of repetition, contrast and variation. (Wright, 1996, p. 55) Wright goes on to define and exemplify repetition, contrast and variation, and, in so doing, suggests that repetition and contrast meet the listener's need for the comfort of the familiar and the excitement of the unknown respectively.
Wright sees variation as standing midway between repetition and contrast and adds that "the listener has the satisfaction of hearing the familiar melody, yet is challenged to recognize in what way it has been changed" (Wright, 1996, p.55). Indeed, the balance of unity and variety is a basic credo of much writing on aesthetics, and Fiske's P, P' and Pn categories simply trichotomize the continuum between these extremes. Interestingly, however, one dissenting voice may be noted, but dissenting in a very specific way: A restatement differs from repetition in that the former is a recurrence after contrast while the latter is an immediate recurrence. Restatements are important elements of form; repetitions are not. (Green, 1979, p. 94) It is hard to see the logic behind Green's assertion that immediate repetitions are not important elements of form. After all, for the listener approaching an unfamiliar Classical sonata-form movement, what clearer indication is there that the end of the exposition has been reached than the replaying of the opening of the movement (the playing of the repeat of the exposition)?
Even on a smaller scale, the immediate repetition of motives or phrases seems to be one of several decisive determinants of perceptual grouping (for example, Lerdahl and Jackendoff's Grouping Preference Rule 6, Lerdahl and Jackendoff 1983). To be fair to Green, much of the discourse in his book concerns form as harmonic motion, and in this definition, some very local repetitions may be static in terms of functional harmony. However, local repetition and harmonic stasis are not necessarily correlated: imagine a musical unit (its length is immaterial) that modulates to or ends by tonicizing a different key, and is then repeated in its entirety. In this example the harmonic motion and the repetition are coterminous. For this reason, it is difficult to agree with Green's blanket dismissal of repetition as a determinant of form. The distinction that Green wishes to draw rests not only on the time-span over which recurrence of material occurs, but also whether there is any intervening contrasting material. R.O.
Morris makes a similar separation between immediate and temporally disjunct reiteration, although less rigidly and using "repetition" as the umbrella term: The plain truth about music is that hitherto it has always admitted and depended on repetition, with some variation, of a single pattern, a minute pattern of notes rhythmically ordered. 'Recapitulation' is the repetition, also with some variation, of the much larger and more complex pattern displayed in the exposition. (Morris, 1935, p. 52) In terms of cognition, the primary difference between restatement and repetition (to use Green's terms again for a moment) is the type of memory used to store the first pattern so that subsequent pattern comparison can take place. If one adopts widely accepted memory types, a local comparison (repetition) will be carried out in working (short-term) memory or, possibly even echoic memory. By contrast, the cognitive processing of a restatement must involve the initial presentation of a pattern being encoded in long-term memory, with the greater degree of abstraction that that type of encoding seems to imply (Dowling1978).
However, Fiske's theory is not a theory of memory, and, as such, can justifiably remain agnostic as to the exact location (or processing) of memory traces. Likewise, Fiske's connectionist models can simply store the initial pattern (P) as vectors because perceptual processing and encoding is assumed to have already taken place. Where memory may play a role in Fiske's theory, however, is in determining the levels of the pattern-comparison hierarchy that are traversed by different listeners. It is likely that patterns separated by, say, five seconds can be scrutinized in much greater detail than patterns separated by five minutes, allowing, in the former situation, for greater navigation of the pattern-comparison hierarchy. The confirmation of such speculation, however, lies in the domain of future experimentation and model building. For now, as the reader will discover in the pages that follow, Harold Fiske, has provided us with a fascinating and appealing theory, deftly operationalized, meticulously modeled and thoroughly tested against human-listener behavior. Matthew Royal, Ph.D. University of Western Ontario; London, Ontario; August, 2004