Courses

Last Modified : March 22, 2013, at 09:32 PM

Auditory Models

A collection of software, research, history, reflections, and data related to auditory models.

chronological history of HSR papers (If you are logged in, you will see the details.)

Research Objectives and Accomplishments

The research in the Human Speech Recognition group is directed at a fundamental understanding of speech perception in both normal-hearing (NH) and Hearing-Impaired ears. These are related problems, and are actually a continiuium, not two separate things. Most people are born with normal hearing. Within a few years we learn, without seeming effort, to understand human speech. How this happens is a mystery. But what happens is not a mystery. The research we have been doing over the past 10 years, as documented in the section below, is a systematic study of the nature of the failure to process and communicate under various conditions. Only by stressing the system, causing failure, can we hope to understand it.

  • The first level of experiments is with NH ears, with speech in noise.
  • The second level of experiments are filtering experiments, where the speech is filtered before the noise is added.
  • In the third series of experiments, the speech is truncated in time.
  • Finally small regions of speech are modified by a few dB, or removed altogether.

Examples of such processing are given in later on this page.

We have found that speech perception is a discrete (binary) zero error task Singh and Allen, 2012. Working at the token level, we defined 2 groups: ZE, NZE. Zero-Error (ZE) speech is defined as speech that NH listeners never make an error in identifying, at and above above -2 dB SNR. The non-ZE (NZE) sounds are all the rest. All of the speech CV sounds that we have tested contain many ZE tokens: most CV consonants consist of more than 80% ZE utterances.

The remaining 20% of the CVs may be broken down into 0% < medium-error (ME) <10% and >10% high-error (HE) groups. ME consonants are typically utterances having varying degrees of mispronounced utterances. HE consonants are typically those that are heard as a different sound, with high probability (>20%). Based on the entropy across normal hearing listeners, we view such sounds as mislabled. The reasons for these errors can typically be traced to a specific flaw in the production of the sound, which is typically easily identified.

Chronological summary of UIUC-HSR Experiments (last update Feb, 2013)

YearExperimentStudentsDetailsPublications
2004MN04SWN/MN64Phatak & LovittRepeat Miller Nicely, 1955 [MN55] with SWNPhatak & Allen (2007) [PA07]
2005StudyAllen, J. B."Consonant recognition and the articulation index,"JASA 117(4), p. 2212-2223. (2005) pdf
2005MN05WN (MN16R)Phatak & LovittReplicate MN04 (WN)Phatak, Lovitt & Allen (2008)
2005MN05SWN (MN64)Phatak & LovittRepeat of MN64 for increasing number of subjects(SWN) 
2005HIMCL05Yoon & PhatakCVs in 10 HI ears @ MCL in WNPhatak, Yoon, Gooler & Allen (2009)
2006HINALR05YoonCVs in 10 HI ears with NALR@MCL in SWN 
2006VerificationRegnierModifications of /ta/Regnier & Allen (2008)
2006CV06SWNPhatak9C+8V SWN /d, b, k, p, s, t, xs, xz, z/ 
2006CV06WNRegnier9C+8V WN /d, b, k, p, s, t, xs, xz, z/ 
2007CV06PanAnalysis of 9 Vowels of CV062 unpublished MSs
2007HL07LiHigh and Low pass Repeat of FletcherLi Allen (2009), JASA
2008TR07LiTime Truncation after Furui86Allen Li (2009) ASSP Magazine
2008TR08LiTime Truncation after Furui86? 3 vowels ?
20093DDSLiPut 3 experiments into one (MN64, HL07, TR07-8)Li Allen (2010) JASA;
Li Allen (2010) IEEE TLSP;
Li Trevino, Allen (2012) Oct JASA
2009VerificationMenonRemove Primary burst 
2009VerificationAbhinauvModify (failed to render LaTeX6 dB)+Remove Primary burstKapoor and Allen, 131(1), 2012
2009VerificationCvengrosModify burst + devoiced + voiced transitionJASA, Under Review
2009mn64 high error analysisSinghAccount for the High error sounds removed in PA07JASA, April 2012
2010HIMCL10-I/-IIIWoojae HanBasic CV experiments on 46 HI ears wiht N=4/ConsonantEH submitted
2010HI10NALR-II/-IVWoojae HanBasic CV experiments on 17 HI ears with N=20/Consonant 
2011HL11TrevinoHigh/Low filter CVs of HI10 

Demos and Software

  • KunLun software to analyze and modify speech (wav format) using the AI-gram model KunLun (zip) and wav files example phrases (zip)
  • Video-demos showing what KunLun can do
  • Support documentation that describes the basic speech perception research behind KunLun:
    -Allen, Jont and Li, Feipeng (2009). Speech perception and cochlear signal processing, IEEE Signal Processing Magazine, Invited: Life-sciences, 26(4), pp 73-77, July. (pdf, djvu)
    -Feipeng Li, Anjali Menon, and Jont B. Allen, (2010) A psychoacoustic method to find the perceptual cues of stop consonants in natural speech, J. Acoust. Soc. Am., March 2010; (pdf); Menon, A, Li, FP, Allen, JB, A new Methodology to study perceptual cues of 8 Fricative Consonants in Natural Speech, (Submitted to JASA Feb 2, 2010) (pdf)
  • AIgram source code zip, txt; If you would like to download this code, ask me for the password.

Databases

HSR Pictures

  • IHCON 2010: presentation click, hiking with Jont Allen, Brian Moore, Stefan Launer, Woojae Han, Riya Singh and Angali Menon jpg, and biking jpg
  • Mead Killion visits HSR (5/7/2010): jpg
  • ICSLP 2006 jpg
  • ASRU 2009
  • Parties for Bob Shannon 2004 and Chris Shera
  • Third Mechanics of Hearing (Kemp and Brown, 1988) (Historic Photos) Keele England

Historical Documents