"Dynamic autonomous mapping layers for concurents control of audio and video synthesis"

"Dynamic Independent Mapping Layers for Concurrent Control of Audio and Video Synthesis"

Ali Momeni and Cyrille Henry

Computer Music Journal, 30:1, pp. 49-66, Spring 2006

The work we describe here is motivated by a desire for intimate and expressive control over creative processes implemented in real-time performance software. We seek a manner of control that offers a “low entry fee with no ceiling on virtuosity” and allows expressive control of musical and visual control structures (Wessel and Wright 2001). Like many colleagues, we believe that the answer lies in enriching the approach to mapping (Winkler 1995; Rovan et al. 1997; Arfib et al. 2002; Hunt et al. 2002). Correspondence between sound and image is an incredibly rich area of exploration ranging from psychoacoustic research to cinema and most recently to the world of computer-based media. Our interest in the latter category motivated a survey of what repertoire and tools explore interaction between sound and image in real-time work. Most works can be characterized as (1) sound-to-image, (2) image-to-sound, or (3) concurrent generation of sound and image. The first two categories represent a unidirectional relationship between sound and image. In
sound-to-image applications, audio analysis provides control information for image synthesis and manipulation. Perhaps the most common examples
of such an approach is the music visualization feature present in many commercial music library managers, such as Apple’s iTunes (www.apple.com)
and Nullsoft’s Winamp (www.winamp.com). In programs such as these, analysis of audio streams creates often psychedelic moving images.

In image-to-sound applications, image analysis provides the means for synthesis and manipulation of sound. Pioneering works in this field include that
of Lesbros (1996) on image-to-sound mappings and the software application Metasynth (released in 1997 and most recently updated for Mac OS X; see www.uisoftware.com/PAGES/acceuil_meta.html). Metasynth uses a mapping scheme between sound and image that considers time on the horizontal axis and frequency on the vertical axis of the source image. Manipulating the source image—a kind of score for the generated sound—results in changes in the synthesized audio. Many other interesting examples of image-to-sound and sound-to-image par adigms are found in areas ranging from computer-based installations and performances—including Golan Levin’s Messa vi Voce (http://tmema.org/messa/messa.html), Mark Domino’s recent contributions to the Silk Road Project (Domino 2004;www.silkroadproject.org), Bas Van Koolwijk’s FDBCK AV (www.clubtransmediale.de/index.php?id=1090), and Tanaka’s Bondage (www.xmira.com/atau/bondage)—to data-mining applications (Hunt 2005) and medical applications like Meijer’s vOICe system for the visually impaired (www.seeingwithsound.com/voice.htm). Our research falls in the last category: the concurrent generation and control of audio and video from a third, independent and variable data set.

This article summarizes our efforts to ease the exploration of rich gesture-mapping techniques ap plied to creative work with generative real-time audio and video instruments. We describe an ap proach to mapping that involves an independent layer of algorithms with time-varying behavior that is affected, explored, or observed in some way with gesture. We begin by introducing the concept of dynamic, independent visual-mapping layers; we then step back and briefly discuss the roots of this work to clearly contextualize it and its future directions. We conclude by discussing some of the advantages of our proposed systems and providing a number of examples.