In our appreciation of auditory environments, distance perception is as crucial as lateralization. Although research work has been carried out on distance percep- tion, modern auditory displays do not yet take advantage of it to provide additional information on the spatial layout of sound sources and as a consequence enrich their content and quality. When designing a spatial auditory display, one must take into account the goal of the given application and the resources available in order to choose the optimal approach. In particular, rendering auditory perspec- tive provides a hierarchical ordering of sound sources and allows to focus the user attention on the closest sound source. Besides, when visual data are no longer available, either because they are out of the visual eld or the user is in the dark, or should be avoided to reduce the load of visual attention, auditory rendering must convey all the spatial information, including distance. The present research work aims at studying auditory depth (i.e. sound sources displayed straight ahead of the listener) in terms of perception, rendering and applications in human com- puter interaction. First, an overview is given of the most important aspects of auditory distance perception. Investigations on depth perception are much more advanced in vision since they already found applications in computer graphics. Then it seems nat- ural to give the same information in the auditory domain to increase the degree of realism of the overall display. Depth perception may indeed be facilitated by combining both visual and auditory cues. Relevant results from past literature on audio-visual interaction eects are reported, and two experiments were carried out on the perception of audio-visual depth. In particular, the in uence of auditory cues on the perceived visual layering in depth was investigated. Results show that auditory intensity manipulation does not aect the perceived order in depth, which is most probably due to the lack of multisensory integration. Besides, the second experiment, which introduced a delay between the two auditory-visual stimuli, re- vealed an eect of the temporal order of the two visual stimuli. Among existing techniques for sound source spatialization along the depth di- mension, a previous study proposed the modeling of a virtual pipe, based on the exaggeration of reverberation in such an environment. The design strategy follows a physics-based modeling approach and makes use of a 3D rectangular Digital Waveguide Mesh (DWM), which had already shown its ability to simulate complex, large-scale acoustical environments. The 3D DWM resulted to be too resource consuming for real-time simulations of 3D environments of decent size. While downsampling may help in reducing the CPU processing load, a more ef- cient alternative is to use a model in 2D, consequently simulating a membrane. Although sounding less natural than 3D simulations, the resulting bidimensional audio space presents similar properties, especially for depth rendering. The research work has also shown that virtual acoustics allows to shape depth perception and in particular to compensate for the usual compression of distance estimates. A trapezoidal bidimensional DWM is proposed as a virtual environment able to provide a linear relationship between perceived and physical distance. Three listening tests were conducted to assess the linearity. They also gave rise to a new test procedure deriving from the MUSHRA test and which is suitable for direct comparison of multiple distances. In particular, it reduces the response variability in comparison with the direct magnitude estimation procedure. Real-time implementations of the rectangular 2D DWM have been realized as Max/MSP external objects. The rst external allows to render in depth one or more static sound sources located at dierent distances from the listener, while the second external simulates one moving sound source along the depth dimension, i.e. an approaching/receding source. As an application of the rst external, an audio-tactile interface for sound naviga- tion has been proposed. The tactile interface includes a linear position sensor made by conductive material. The touch position on the ribbon is mapped onto the lis- tening position on a rectangular virtual membrane, modeled by the 2D DWM and providing depth cues of four equally spaced sound sources. Furthermore the knob of a MIDI controller controls the position of the mesh along the playlist, which allows to browse a whole set of les by moving back and forth the audio window resulting from the virtual membrane. Subjects involved in a user study succeeded in nding all the target les, and found the interface intuitive and entertaining. Furthermore, another demonstration of the audio-tactile interface was realized, using physics-based models of sounds. Everyday sounds of \frying", \knocking" and \liquid dripping" are used such that both sound creation and depth rendering are physics-based. It is believed that this ecological approach provides an intuitive interaction. Finally, \DepThrow" is an audio game, based on the use of the 2D DWM to render depth cues of a dynamic sound source. The game consists in throwing a virtual ball (modeled by a physics-based model of rolling sound) inside a virtual tube (modeled by a 2D DWM) which is open-ended and tilted. The goal is to make the ball roll as far as possible in the tube without letting it fall out at the far end. Demonstrated as a game, this prototype is also meant to be a tool for investi- gations on the perception of dynamic distance. Preliminary results of a listening test on the perception of distance motion in the virtual tube showed that duration of the ball's movement in uences the estimation of the distance reached by the rolling ball.

Auditory perspective: perception, rendering, and applications

DEVALLEZ, Delphine
2009

Abstract

In our appreciation of auditory environments, distance perception is as crucial as lateralization. Although research work has been carried out on distance percep- tion, modern auditory displays do not yet take advantage of it to provide additional information on the spatial layout of sound sources and as a consequence enrich their content and quality. When designing a spatial auditory display, one must take into account the goal of the given application and the resources available in order to choose the optimal approach. In particular, rendering auditory perspec- tive provides a hierarchical ordering of sound sources and allows to focus the user attention on the closest sound source. Besides, when visual data are no longer available, either because they are out of the visual eld or the user is in the dark, or should be avoided to reduce the load of visual attention, auditory rendering must convey all the spatial information, including distance. The present research work aims at studying auditory depth (i.e. sound sources displayed straight ahead of the listener) in terms of perception, rendering and applications in human com- puter interaction. First, an overview is given of the most important aspects of auditory distance perception. Investigations on depth perception are much more advanced in vision since they already found applications in computer graphics. Then it seems nat- ural to give the same information in the auditory domain to increase the degree of realism of the overall display. Depth perception may indeed be facilitated by combining both visual and auditory cues. Relevant results from past literature on audio-visual interaction eects are reported, and two experiments were carried out on the perception of audio-visual depth. In particular, the in uence of auditory cues on the perceived visual layering in depth was investigated. Results show that auditory intensity manipulation does not aect the perceived order in depth, which is most probably due to the lack of multisensory integration. Besides, the second experiment, which introduced a delay between the two auditory-visual stimuli, re- vealed an eect of the temporal order of the two visual stimuli. Among existing techniques for sound source spatialization along the depth di- mension, a previous study proposed the modeling of a virtual pipe, based on the exaggeration of reverberation in such an environment. The design strategy follows a physics-based modeling approach and makes use of a 3D rectangular Digital Waveguide Mesh (DWM), which had already shown its ability to simulate complex, large-scale acoustical environments. The 3D DWM resulted to be too resource consuming for real-time simulations of 3D environments of decent size. While downsampling may help in reducing the CPU processing load, a more ef- cient alternative is to use a model in 2D, consequently simulating a membrane. Although sounding less natural than 3D simulations, the resulting bidimensional audio space presents similar properties, especially for depth rendering. The research work has also shown that virtual acoustics allows to shape depth perception and in particular to compensate for the usual compression of distance estimates. A trapezoidal bidimensional DWM is proposed as a virtual environment able to provide a linear relationship between perceived and physical distance. Three listening tests were conducted to assess the linearity. They also gave rise to a new test procedure deriving from the MUSHRA test and which is suitable for direct comparison of multiple distances. In particular, it reduces the response variability in comparison with the direct magnitude estimation procedure. Real-time implementations of the rectangular 2D DWM have been realized as Max/MSP external objects. The rst external allows to render in depth one or more static sound sources located at dierent distances from the listener, while the second external simulates one moving sound source along the depth dimension, i.e. an approaching/receding source. As an application of the rst external, an audio-tactile interface for sound naviga- tion has been proposed. The tactile interface includes a linear position sensor made by conductive material. The touch position on the ribbon is mapped onto the lis- tening position on a rectangular virtual membrane, modeled by the 2D DWM and providing depth cues of four equally spaced sound sources. Furthermore the knob of a MIDI controller controls the position of the mesh along the playlist, which allows to browse a whole set of les by moving back and forth the audio window resulting from the virtual membrane. Subjects involved in a user study succeeded in nding all the target les, and found the interface intuitive and entertaining. Furthermore, another demonstration of the audio-tactile interface was realized, using physics-based models of sounds. Everyday sounds of \frying", \knocking" and \liquid dripping" are used such that both sound creation and depth rendering are physics-based. It is believed that this ecological approach provides an intuitive interaction. Finally, \DepThrow" is an audio game, based on the use of the 2D DWM to render depth cues of a dynamic sound source. The game consists in throwing a virtual ball (modeled by a physics-based model of rolling sound) inside a virtual tube (modeled by a 2D DWM) which is open-ended and tilted. The goal is to make the ball roll as far as possible in the tube without letting it fall out at the far end. Demonstrated as a game, this prototype is also meant to be a tool for investi- gations on the perception of dynamic distance. Preliminary results of a listening test on the perception of distance motion in the virtual tube showed that duration of the ball's movement in uences the estimation of the distance reached by the rolling ball.
2009
Inglese
auditory perspective; perception; rendering
Università degli Studi di Verona
128
File in questo prodotto:
File Dimensione Formato  
tesi_DelphineDevallez.pdf

accesso aperto

Dimensione 4.14 MB
Formato Adobe PDF
4.14 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/113672
Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-113672