Demonstration of SWMUMDIS
Each link found in the column "signal name" of the table points to a directory containing a number of wav-files (22.05 kHz, single channel) for audio demonstration and jpg-files (all 800x400, scales vary) for visual demonstration. If your browser is configured correctly all you have to do is to click on the file name. Some remarks follow:
Remarks on the wav-audio files
Abbreviations of analysing procedures
ZFKI | Time-Frequency-Contours, 4P1, B3dB = 0,5 Bark, delay compensation |
ZFKII | Time-Frequency-Contours, 4P1, B3dB = 0,3 Bark, delay compensation |
ZFKI+S | as ZFKI, visualize together with FTT-spectrogram |
ZFKII+S | as ZFKII, visualize together with FTT-spectrogram |
KTX | Contour/Texture-Representation, 4P1, B3dB = 0,3 Bark, delay compensation |
KTXOZ | Contour/Texture-Representation, 4P1, B3dB = 0,3 Bark, delay compensation |
M-TTZM | optimized Part-Tone-Time-Pattern, 4P1, B3dB = 0,3 Bark, delay compensation |
SM-TTZM | improved Part-Tone-Time-Pattern, 2P1, B3dB = 0,25 Bark, time-smoothed spectrum |
HB-TTZM | Heinbach's Part-Tone-Time-Pattern, P1, B3dB = 0,1 Bark, time-smoothed spectrum |
AMS | FTT-Spectrogram (Auditory-Magnitude-Spectrogram), 4P1 , B3dB = 0,3 Bark, delay compensation |
Abbreviations of reconstruction procedures
HORN-RS | Horn's spectrogram-resynthesis, N = 5 |
HORN-RS1 | Horn's spectrogram-resynthesis, N = 1 |
RKHP | reconstruction from contours using phase-heuristic |
RKHPTX | as RKHP, reconstruction from texture added |
RKOP | reconstruction from contours using original phases |
RKOPTX | as RKOP, reconstruction from texture added |
TTSD | part-tone-resynthesis using triangular window |
TTSR | Heinbach's Part-tone-resynthesis using rectangular window |
Abbreviations of speech codecs
HB-4k4 | Heinbach's speech codec 4.4 kbit/s, based on Part-Tone-Time-Pattern |
MUM-4k4 | Speech codec 4.4 kbit/s, based on Contour/Texture-Representation |
MUM-30k | Speech codec 30 kbit/s, based on Contour/Texture-Representation |
signal name | signal description | comment (* = any string of characters) |
|
sinusoidal burst 1 kHz, hard-switched, white noise superimposed, signal duration 0.2 s | the presence of white noise renders phase heuristic for time contours unsuitable, therefore clicks almost inaudible with *RKHP* |
|
two tone beat, both tones start at 1 kHz and move to 1040 resp. 960 Hz, signal duration 2 s | artefacts caused by synthesis window with *TTSR* and *TTSD*; all reconstructions - except *ZFKI.RKOP* - have passages that sound like narrow-band noise, caused by phase incoherence or because tonal portions move over into texture |
|
dirac-impulse train, impulse rate increasing from 20 to 200 Hz, signal duration 2 s | distinct change in sound with *TTZM* which can be prevented by processing time-contours or texture; yet artefacts may appear due to double-representation, phase incoherence and/or time-localization jitter; texture can only be a coarse replacement for time contours |
|
female speaker, ("electroacoustics"), signal duration 1.5 s | sound proves quite uncritical |
|
frequency modulation, sinusoidal carrier 1 kHz, sinusoidal modulator moving from 0 to 100 Hz, frequency lift +/- 100 Hz, signal duration 2 s | see two tone beat; perceptible amplitude modulation even with *ZFKI.RKOP* due to double-representation of signal portions by time and frequency contours |
|
4 sinusoidal bursts 1 kHz, Gaussian-switched, Gaussian-3dB-bandwidths (B=2f) 50/100/500/infinity Hz, signal duration 0.8 s | clicks caused by increasing steepness of slopes are truly represented by time contours only; representing clicks via texture results in a perceptual approximation (noise bursts, with *KTXOZ.RKHPTX*) ; *AMS.HORN-RS* renders clicks weakened |
|
male speaker with music (German "Interessiert Sie ein neuer Job?", from commercial), signal duration 2 s | sound to demonstrate robustness of the speech codecs against interfering sound sources |
|
male speaker (German "Kalk setzt sich bei jeder ...", from commercial), signal duration 2.07 s | very critical sound because pronunciation is over-articulated and accelerated, and because it is spoken by a male speaker (its dense harmonics being prone to audible phase incoherence in reconstruction); listening by headphone essential |
|
male speaker ("The demonstration is repeated once"), signal duration 2 s | processing of time contours helps to retain naturalness |
|
white noise (sampled analog thermal noise source), signal duration 2 s | nasal, comb-filter-like tinge, swirling or rippling ("tonalization") caused by disregarding time-contours and/or by phase incoherence within reconstruction |
$Date: 1999/07/06 23:39:40 $