Demonstration of SWMUMDIS (English version)

Demonstration of SWMUMDIS

Each link found in the column "signal name" of the table points to a directory containing a number of wav-files (22.05 kHz, single channel) for audio demonstration and jpg-files (all 800x400, scales vary) for visual demonstration. If your browser is configured correctly all you have to do is to click on the file name. Some remarks follow:

Remarks on the wav-audio files
Remarks on the jpg-pictures
Abbreviations of analysing procedures
Abbreviations of reconstruction procedures
Abbreviations of speech codecs
Table

Remarks on the wav-audio files

Naming convention: Abbreviations of the analysing procedure and the reconstruction procedure compose the center part between signal name and .wav-extension. The unprocessed signal has no center part. The center part of signals processed by the speech codecs is the codec's abbreviation.
Extensions .f, .t, .x and .xt to the center part: These indicate that only frequency contours, time contours (after operation MSK), texture or time contours plus texture can be heard. With ZFKI.RKOP and ZFKII.RKHP these wav-files exist only for sounds 2tb, fm-3db, kalk, ea, repeated and wr.
Critical signals are highlighted within the table - these signals are: 2tb, dp20_200, fm-3db, kalk and wr.
All signals used in this demonstration are limited to frequencies below 6 kHz.
All reconstruction procedures (ctxadmin, drdadmin of SWMUMDIS) are implemented with a loss of 6 dB in level, measured by a stationary sinusoid. For convenience wav-files of the unprocessed signals have been readjusted to match this level. The level of AMS-resynthesis, however, is 1 dB too high. Remaining differences in loudness reflect different behaviour of the individual procedures for a given signal.
SWMUMDIS operates on raw-audio format only. Conversion between raw- and wav-format was performed using a separate tool.
Listening by speaker (diffuse sound => introduce comb-prefiltering) reveals slight tonalisation artefacts rather than listening by headphone. Effects of smoothing time structure and especially artefacts of codecs can best be heard using a headphone (direct sound => no loss in modulation transfer).

Remarks on the jpg-pictures

Naming convention: The center part between "signal name" and .jpg-extension indicates the analysing procedure.
There are no pictures for the speech codecs.
The time-axis has tic marks every 100 ms. The frequency-axis starts at zero, is linear with respect to critical-bandwidth and has tic marks at the following frequencies: 100 200 300 400 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 6000. The display ends at 6.4 kHz.
All pictures are derived from signal representations stored at 1.25 ms-evaluation intervals (corresponding reconstructed signals, however, may be derived from a finer time resolution, depending on the default specification of the analysing procedure). With signals shorter than 1 s, the 800x400 picture size may render the evaluation interval annoyingly visible, especially within time contour lines.
SWMUMDIS generates pbm/pgm-picture format only. Conversion to jpg-picture format was performed using a separate tool.
SWMUMDIS allows to generate arbitrary picture sizes and sections.

Abbreviations of analysing procedures

ZFKI	Time-Frequency-Contours, 4P1, B3dB = 0,5 Bark, delay compensation
ZFKII	Time-Frequency-Contours, 4P1, B3dB = 0,3 Bark, delay compensation
ZFKI+S	as ZFKI, visualize together with FTT-spectrogram
ZFKII+S	as ZFKII, visualize together with FTT-spectrogram
KTX	Contour/Texture-Representation, 4P1, B3dB = 0,3 Bark, delay compensation
KTXOZ	Contour/Texture-Representation, 4P1, B3dB = 0,3 Bark, delay compensation
M-TTZM	optimized Part-Tone-Time-Pattern, 4P1, B3dB = 0,3 Bark, delay compensation
SM-TTZM	improved Part-Tone-Time-Pattern, 2P1, B3dB = 0,25 Bark, time-smoothed spectrum
HB-TTZM	Heinbach's Part-Tone-Time-Pattern, P1, B3dB = 0,1 Bark, time-smoothed spectrum
AMS	FTT-Spectrogram (Auditory-Magnitude-Spectrogram), 4P1 , B3dB = 0,3 Bark, delay compensation

Abbreviations of reconstruction procedures

HORN-RS	Horn's spectrogram-resynthesis, N = 5
HORN-RS1	Horn's spectrogram-resynthesis, N = 1
RKHP	reconstruction from contours using phase-heuristic
RKHPTX	as RKHP, reconstruction from texture added
RKOP	reconstruction from contours using original phases
RKOPTX	as RKOP, reconstruction from texture added
TTSD	part-tone-resynthesis using triangular window
TTSR	Heinbach's Part-tone-resynthesis using rectangular window

Abbreviations of speech codecs

HB-4k4	Heinbach's speech codec 4.4 kbit/s, based on Part-Tone-Time-Pattern
MUM-4k4	Speech codec 4.4 kbit/s, based on Contour/Texture-Representation
MUM-30k	Speech codec 30 kbit/s, based on Contour/Texture-Representation

signal name	signal description	comment (* = any string of characters)
1kwr	sinusoidal burst 1 kHz, hard-switched, white noise superimposed, signal duration 0.2 s	the presence of white noise renders phase heuristic for time contours unsuitable, therefore clicks almost inaudible with RKHP
2tb	two tone beat, both tones start at 1 kHz and move to 1040 resp. 960 Hz, signal duration 2 s	artefacts caused by synthesis window with TTSR and TTSD; all reconstructions - except ZFKI.RKOP - have passages that sound like narrow-band noise, caused by phase incoherence or because tonal portions move over into texture
dp20_200	dirac-impulse train, impulse rate increasing from 20 to 200 Hz, signal duration 2 s	distinct change in sound with TTZM which can be prevented by processing time-contours or texture; yet artefacts may appear due to double-representation, phase incoherence and/or time-localization jitter; texture can only be a coarse replacement for time contours
ea	female speaker, ("electroacoustics"), signal duration 1.5 s	sound proves quite uncritical
fm-3db	frequency modulation, sinusoidal carrier 1 kHz, sinusoidal modulator moving from 0 to 100 Hz, frequency lift +/- 100 Hz, signal duration 2 s	see two tone beat; perceptible amplitude modulation even with ZFKI.RKOP due to double-representation of signal portions by time and frequency contours
gser1kea	4 sinusoidal bursts 1 kHz, Gaussian-switched, Gaussian-3dB-bandwidths (B=2f) 50/100/500/infinity Hz, signal duration 0.8 s	clicks caused by increasing steepness of slopes are truly represented by time contours only; representing clicks via texture results in a perceptual approximation (noise bursts, with KTXOZ.RKHPTX) ; AMS.HORN-RS renders clicks weakened
job	male speaker with music (German "Interessiert Sie ein neuer Job?", from commercial), signal duration 2 s	sound to demonstrate robustness of the speech codecs against interfering sound sources
kalk	male speaker (German "Kalk setzt sich bei jeder ...", from commercial), signal duration 2.07 s	very critical sound because pronunciation is over-articulated and accelerated, and because it is spoken by a male speaker (its dense harmonics being prone to audible phase incoherence in reconstruction); listening by headphone essential
repeated	male speaker ("The demonstration is repeated once"), signal duration 2 s	processing of time contours helps to retain naturalness
wr	white noise (sampled analog thermal noise source), signal duration 2 s	nasal, comb-filter-like tinge, swirling or rippling ("tonalization") caused by disregarding time-contours and/or by phase incoherence within reconstruction

$Date: 1999/07/06 23:39:40 $