RESEARCH INTERESTS I work on core algorithmic aspects of computer voice recognition, and artificial intelligence applied to voice forensics. My focus is on the development of technology for the automated discovery, measurement, representation and learning of the information encoded in voice signal for optimal voice intelligence.

I worked on computer speech recognition and general audio processing from 1997 to 2014. During that time, I worked on a wide range of topics, including algorithms that made speech processing systems completely generalizable (agnostic to language), algorithms that enabled automated discovery and learning of information from speech, algorithms that could process speech using minimal external (human-generated) knowledge etc. My goal was to enable greater automation, create more powerful search strategies and more scaleable learning algorithms for voice processing systems, and to find ways to make them work more accurately in high-noise and other kinds of complex acoustic environments.

In December 2014, I began building up the science of profiling humans from their voice. This involves the concurrent deduction of myriad human parameters from voice. Like the DNA and fingerprints, every human's voice is unique. It carries more information than we realize (or can hear). It carries signatures of the speaker's physical, physiological, medical, psychological, sociological, behavioral and environmental parameters, among other things. Profiling is based on quantitative discovery and measurement of micro-features from the voice signal, and the intricacies of the physics and bio-mechanics of human voice production. Because it focuses on the voice signal, and not its pragmatic content, it is agnostic to language.

Media coverage

More about this work

My latest presentations


OUR WORK ON COVID-19 VOICE DETECTOR

Current Status as of 1 Oct 2020

I gave a seminar on this topic at the Center for Mathematical Modeling, Universidad de Chile, on 29th Sept. This presentation gives details of where we are currently in the detection of Covid-19 from voice and what needs to be done.

I also talked about it in a workshop hed by Cambridge University 25th Sept. That video is not released yet. We are now in the process of documenting our insights obtained from Covid-19 data collected in clinical settings.

In February 2020, we began repurposing our past work on human profiling from voice to apply to the potential detection of Covid-19 from voice. At the time, we were not in possession of acutal voice data, and our hypotheses were entirely based on reports from the medical community on the manner in which Covid-19 afftected respiratory functions in patients. For lack of data, our system for detecting covid-19 from voice then was designed to be a self-learning system that can improve as data come in. An early version of the system was displayed on our website in late March 2020, and was based on inferences and extrapolations made from what is clinically known about the signatures of respiratory illnesses (in general) in voice.

Since then, we have received more data from patients who have tested positive for Covid-19 and are symptomatic. We have been able to gain valuable insights from them and have been able to validate (and invalidate) many of our hypotheses made in the absence of adequate data early on in this pandemic. The more we study these, however, the deeper we find we must explore. We find that the signatures of Covid-19 in voice are (as expected) not easily separable from those of other illnesses that have similar symptoms. We find that while standard features and standard AI architectures may seem to perform well on data, there is catch: they do so only on data that represent a limited number of Covid-19 positive cases with no other confounding factor(s) included in the validation process. These results do not generalize to performace in the real world. Their statistical significance will remain unclear until more data types are included in the studies.

The problems and challenges we have observed must be addressed and solved before we can confidently propose a specific AI driven methodology to detect covid-19 from voice. We have refrained from publishing results that are not results of investigations that close most (if not all) inferential loopholes. We are however in the process of documenting our insights in a paper, with appropriate warnings and caveats clearly mentioned within.

Until we get to the point where we can go for an FDA approval of this technology, we will not be releasing an actual system for public use. Until then, our system may be licensed (for evaluation purposes only) from Carnegie Mellon University.

As of now, we continue to collect crowdsourced data. However, in spite of a good response from the public, we find that our most valuable data (in terms of gaining scientific insights) still come from clinical settings.

Please donate your voice for research on covid

Recent news: We are very glad for voca.ai, an Israeli (now American) company that put up data collection for COVID almost overnight in February 2020, as our effort began. They have diligently worked for this cause since then. We wish them continued success!


Courses Spring 2020

  1. Computational Forensics and AI
    Spring 2020, Website

  2. Advanced Topics: Quantum Computing Lab
    Spring 2020, Website

  3. Large-Scale Multimedia Processing ( 2 versions: grad level and exec-ed)
    Spring 2020, Website

    s3 cmu

Courses Fall 2020

  1. Tracking Cybercrimes (canceled for this semester)
    Fall 2020, Website
    This will be taught at the CMU-Africa campus where I plan to be in Fall 2020.

  2. Introduction to Deep Learning
    Fall 2020, Website
    I am a co-instructor. I am writing this Book in tandem with the course for students to read.


Students

  • Yolanda Gao, PhD, Electrical and Computer Engineering
  • Wayne Zhao, PhD, Electrical and Computer Engineering
  • Yandong Wen, PhD, Electrical and Computer Engineering Website
  • Shahan Ali Memon, Masters, LTI, School of Computer Science Website
  • Hira Yasin, Masters, LTI, School of Computer Science
  • Mahmoud Al Ismail, Masters, LTI, School of Computer Science


Technical Publications: Books

 Profiling Humans from their Voice
Profiling Humans from their Voice
Rita Singh
First published: July 2019
Publisher: Springer, Singapore
Copyright 2019 Springer-Nature, Switzerland, July 2019
ISBN: ISBN 978-981-13-8402-8
Also available on springer.com, other bookstores and ebay.
Chapters of this book are separately available from Springer. Click this link to see the list.
 Techniques for Noise Robustness in Automatic Speech Recognition
Techniques for Noise Robustness in Automatic Speech Recognition
Tuomas Virtanen, Rita Singh, Bhiksha Raj (Eds)
First published:5 October 2012
Copyright 2013 John Wiley & Sons, Ltd
Print ISBN:9781119970880 |Online ISBN:9781118392683 |DOI:10.1002/9781118392683

Research Publications

Recent papers on voice profiling

  • Detection of COVID-19 through the analysis of vocal fold oscillations, Mahmoud Al Ismail, Soham Deshmukh, Rita Singh, arXiv:2010.10707, pdf

  • Interpreting glottal flow dynamics for detecting COVID-19 from voice, Soham Deshmukh, Mahmoud Al Ismail, Rita Singh, arXiv:2010.10707, pdf

  • Speech-based parameter estimation of an asymmetric vocal fold oscillation model and its application in discriminating vocal fold pathologies, Wenbo Zhao, Rita Singh, Int. conf. on Acoustics, Speech and Signal Processing (ICASSP), April 2020. pdf (nominated for the best student paper award at ICASSP 2020)

  • Hierarchical Routing Mixture of Experts, Wenbo Zhao, Yang Gao, Shahan Ali Memon, Bhiksha Raj, Rita Singh, 2020 25th International Conference on Pattern Recognition (ICPR), 2020.   pdf

  • Optimizing neural network embeddings using a pair-wise loss for text-independent speaker verification Hira Dhamyal, Tianyan Zhou, Bhiksha Raj, and Rita Singh, IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 742-748. IEEE, 2019. pdf

  • Face reconstruction from voice using generative adversarial networks, Yandong Wen, Bhiksha Raj, Rita Singh, Advances in Neural Information Processing Systems (NEURIPS 2019), 2019, pp. 7344-7348   pdf (created a social media and media furore over totally unrelated transgender issues...)

  • Disjoint mapping network for cross-modal matching of voices and faces, Yandong Wen, Mahmoud Al Ismail, Wenbo Liu, Bhiksha Raj, Rita Singh, International Conference on Learning Representations (ICLR), 2019.   pdf

  • Detecting gender differences in perception of emotion in crowdsourced data, Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh (arXiv:1910.11386), 2020. pdf

  • Neural Regression Trees, Shahan Ali Memon, Wenbo Zhao, Bhiksha Raj, Rita Singh, (IJCNN), 2019. pdf

  • The phonetic bases of vocal expressed emotion: natural versus acted, Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh (INTERSPEECH), 2020. pdf

  • list to be updated..........

  • Voice impersonation using generative adversarial networks, Yang Gao, Rita Singh, Bhiksha Raj, Int. conf. on Acoustics, Speech and Signal Processing (ICASSP),Calgary, Canada, 15-20 April 2018 Canada. pdf

  • A corrective training approach for text-independent speaker verification, Yandong Wen, Tianyan Zhou, Rita Singh, Bhiksha Raj, Int. conf. on Acoustics, Speech and Signal Processing (ICASSP),Calgary, Canada, 15-20 April 2018 Canada. pdf

  • Voice disguise by mimicry: deriving statistical articulometric evidence to evaluate claimed impersonation, Rita Singh, Abelino Jiminez and Anders Oland, IET Biometrics, January 2017. pdf

  • more below....

    Literary creations



    Research publications (by topic)


    1. Forensics    Papers
      General theme: Forensic deductions from human voice. Speech and audio forenics are included.

    2. General audio analysis, microphone array processing, denoising, dereverberation, signal restoration    Papers
      General theme: Our approach is that of modeling the effect of highly-nonstationary noise and reverberation as compositional phenomena. Clean signals can then be recomposed from the bases of the composition. This approach differs from ones that model audio phenomena using dynamic generative models.

    3. Semi-supervised learning, structure discovery, statistical pattern recognition, classification    Papers
      These papers cover diferent topics such as learning basic units of sound from data, discovering pronunciations for words in terms of these units, selecting better classifiers using weaker classifiers iteratively in a gradient ascent solution to training good acoustic models from completely untranscribed data etc.. They also include general developments in classification techniques.

    4. Acoustic modeling, decoding, speech processing, speech recognition, adaptation, keyword spotting    Papers
      These papers relate to core and peripheral issues in speech recognition and processing for HMM-based ASR systems.

    5. Systems, applications, projects    Papers
      These papers describe systems developed or deployed for specific tasks. Also include papers from short-term student projects, technical reports and other writeups

    6. Miscellaneous    Papers
      Patents, papers on other topics such as chaos theory, radar signal design, geodynamics. From 1993-1998 I worked on these topics. Chaos and complexity theory remain my favorite hobby subjects.


    Other activities

    • Associate Editor, IEEE Signal Processing Letters (Retired recently!)
    • Sphinx-4
    • LDC And other things for me...


    Earlier Teaching


    1. Computational Forensics and Investigative Intelligence
      Taught in Spring 2017 and Spring 2018, simultaneously at
      • CMU Pittsburgh
      • Hamad Bin Khalifa University (HBKU), Qatar
      • CMU Qatar
      • CMU Africa
      • Syllabus
    2. An Introduction to Knowledge based Deep Learning and Socratic Coaches
      11-364 CMU Pittsburgh
      This course was taught in person by Prof. James Karl Baker at the CMU Pittsburgh location. I was nominally co-instructor but couldn't help Jim much.
    3. Design and Implementation of Speech Recognition Systems
      Last taught many years ago. Earliest version co-taught with Prof. James Baker

    About me: I'm happiest where I come from. I like simple things. I admire art. When I have time I spend much of it looking at art. I write poetry. I collect comics (the Harvey Pekar and Blake and Mortimer kind..) and puzzles (the Charles Wysocki and Jane Wooster Scott kind..). I read mysteries. I don't watch TV or movies, I haven't switched on my TV for years. I dont know if my TV works. I don't use a cellphone, I have one but its mostly lost anyway. I'd rather watch the clouds in the sky, and the birds and the leaves. A groundhog lives in a grand home under the deck stairs just outside my window. It even has a solar-powered lamp outside its home. I can tell you all about its likes and dislikes, habits, friends and daily routine. In the summer I wake up to the song of the cardinal. I want nothing more from life or the world, except for medical science to hurry up and make everyone well. Other than that, I am content.

    Some hi_res pictures of me


    Home