Rita Singh's homepage: Profiling

The human voice has been studied for centuries, not just for the few years or decades that computer voice processing has been around. My work -- and the science of profiling -- is truly built upon the centuries of observations, hard work, brilliant and meticulous studies conducted by hundreds of scientists in the past. Some of those lost their lives in the quest to understand the mysteries of the human voice. One name that comes to mind is that of Klatt, who conducted x-ray studies on himself to understand the voice production mechanism better, and lost his life to throat cancer that may have resulted from over-exposure to x-rays. My work stands on the shoulders of giants. Truly.

My contribution
Myriad studies in the past, in multiple fields of research, have found positive correlations between isolated human paramaters and different aspects of voice. Many of these aspects were not quantifiable. There have been very few attempts to actively predict different categories of human factors from voice, with the exception of emotion. Of those that were made, many used the content of speech for prediction.

Profiling is different from prior work in human-parameter deduction in two important aspects -- a) it is based on quantifying the previously unquantifiable entities and building discovery and prediction mechanisms based on those, and b) it is not based on the content of voice, and does not analyze language or words. It is based on the voice signal and the sounds produced in the human vocal tract, through phonation, articulation or other means. My work draws upon different aspects of the fundamental bio-mechanical process of voice production. It is agnostic to language.

From a broader perspective, in what I seek to do now, the basic observations that voice is related to some human parameters are not mine. Their bold extensions are mine. A hypothesis that enables profiling for many many more entities than were thought possible before, and a slew of ensuing methodologies that conform to it, are my contribution.

I was the first person in the world to articulate the possibility (now a fact) that the human face could be reconstructed from voice, and that in fact the entire human body form could be reconstructed from voice. I spoke about this on television in 2017, and to various people in scientific circles and the news media since early 2015. before I did that, I had begun working -- with anthropologist Mark Shriver in Penn State University, and computational geneticist Peter Claes from Leuven University, using data collected by them -- to build the methodologies needed to enable such reconstructions. The initial work did not reach fruition due to lack of sufficient data and due to the fact that we were too early on in the methodologies. Some foundational concepts needed to be developed further. The fact that voices could be reconstructed from facial structure (to some extent, as will always be), was emergent from a conversation I had with people from J Walter Thompson Inc., and Rijkmuseum of Holland in the summer of 2018.

A word about my colleagues Mark Shriver and Peter Claes: Mark has been making groundbreaking discoveries for a while now. His latest one is that the shapes of human noses are largely determined by climate. This has been extensively covered in the media and has appeared in Science. (Mark's latest work). Peter was the first person in the world to show that human faces could be reconstructed from the DNA. His work was also covered extensively by the media worldwide. (Peter's work).

The bold extensions I speak of are tied to my hypothesis: "If any factor influences a speaker's body or mind, and if a biological pathway can be established to link that influence to the voice production mehcanism of the speaker, then there must exist an influence on the voice signal produced, and it must be possible to measure that influence."

Building on this hypothesis, I devise mechanisms, based on artificial intelligence, based on machine learning, statistics, signal processing or other methodologies (not everything is data-driven), to discover these "micro-signatures" in voice. I then try to find ways to map them to their causal parameters.

The extensions are the concurrent deduction of bio-relevant parameters such as physical stature, height, weight, age, facial structure, body structure, mental and physical health conditions etc. Many (though not all) of these are new, and before I spoke of them, there is no mention of such deductions (or the possibility thereof) in the literature.

Profiling is rife with many, many challenges. A couple of very significant ones are that of disambiguation and of profiling accurately under voice disguise. The challenge of disambiguation relates to the accurate indetification of influences of specific bio-parameters in the presence of other (thousands of) influences that are exerted on our voice by myriad factors on a daily basis. I probably don't have to explain what the second challenge -- profiling accurately in the presence of voice disguise -- means. The range of variability of human voice has not been charted in its entirety, at least through quantifiable mathematical relations that machines can use. No one really knows the true depths of human voice. That remains to be discovered.

My group at CMU crossed two milestones by 2019:

September 2018: World Economic Forum in Tianjin, China: We demonstrated a live voice profiling system that could deduce many human parameters from voice, and reconstruct the speaker's face in three dimensions (both on a screen and in a virtual reality environment), from less than a minute's worth of voice recording.
February 2019: We reversed the technology for reconstruction of face from voice, and recreated the voice of Rembrandt from his facial profiles. This was done in collaboration with the Rijksmuseum of Holland, J Walter Thompson of Amsterdam, and ING Bank of Europe. Our team included historians from Holland, and linguists from Leiden University in Amsterdam.

In 2020, a third milestone was crossed:

February 2020: Towards the end of February 2020, at the height of the raging Covid-19 pandemic that was claiming thousands of lives each day, we put up a knowledge-driven system (as opposed to a system entirely trained on data) for the automatic detection of the possible presence of Covid-19 from voice. In an effort to help the world community at a time when tests were not available to most, our team at CMU put up a website where people could test themselves through voice. We were asked by our legals to bring it down in a couple of days unfortunately, but in the two days it was up, over 20,000 people across the globe used it. Many thanked us. Later on, other institutions took up the cue and came up with data-driven systems as data collection became possible. We continued to help those who wanted to build the technology and were able to offer it as a service. We did not do this ourselves, since CMU is not a medical institution and does not aim to provide or support diagnostic services.

PUBLICATIONS

Book about this technology: This is a technical book, written primarily to educate and inform students who wish to do further research on this technology

Profiling humans from their voice
432 pages.
Author: Rita Singh
Publisher: Springer-Nature
Release date: June 15, 2019