Introduction
The human voice is a medium that is the primary means of communication between people. It is among the most natural and energy-efficient methods for people to engage with one another. The voice, as intricate arrays of sound emanating from our vocal cords, holds a wealth of information and serves an essential function in social contact by allowing us to express insights about emotions, anxieties, sensations, and excitement by altering its tone (Kraus, 2018). The growth of artificial intelligence (AI), computer sciences, and technologies to obtain a human-like level have driven the path to new possibilities for the area of digital health, the end goal of which is to relieve peoples’ lives and medical practitioners through the use of technologies.
While existing voice searches are primarily limited to simple topics, there are tremendous opportunities for rapid growth in the healthcare sector. The advancement of audio signal assessment, voice technology, and natural language recognition methods has paved the way for a wide range of potential voice applications, including the recognition of vocal biomarkers for diagnostic tests, categorization, or patient remote surveillance systems to improve clinical exercise (Fagherazzi et al., 2021). The paper provides a complete overview of all current and prospective voice applications for health-related reasons from a patient or research point of view and its future challenges.
Vocal Biomarkers
A biomarker is a feature that can be objectively tested and assessed to indicate a biological or pathological process or pharmacological reaction to medical approaches and can be utilized as a surrogate measure of a clinical outcome (Kraus, 2018). A vocal biomarker is a sign, characteristic, or mixture of characteristics from a spoken audio signal that is connected with a clinical result and can be utilized to assess patients, diagnose a medical condition, grade the severity or phases of a disease, or for development of a drug (Dashtipour et al., 2018). It must possess all the qualities of a conventional biomarker, which has undergone verification using an evidential evaluation.
Parkinson’s Disease
Research on vocal biomarkers has primarily been done in neurodegenerative illnesses, including Parkinson’s. Voice abnormalities are common in this area, and voice alterations are predicted to be used as an early screening indicator of illness development. It may one day be used to enhance the state-of-the-art manual exam in assessing symptoms to guide therapy commencement or to evaluate its efficacy (Mucke et al., 2022). These voice issues are usually caused by phonation and pronunciation problems, such as pitch changes, diminished energy in the upper portions of the harmonic continuum, and inaccurate articulation of consonants and vowels, which leads to decreased comprehension (Dashtipour et al., 2018). Even though patients and clinicians frequently neglect changes in voice in the early phases of the disease, the objective assessments demonstrate changes in speech characteristics in up to 78% of Parkinson’s patients in the early stages.
Mild Cognitive Deficiency and Alzheimer’s Disease
Years before the manifestation of prodromal signs of Alzheimer’s disease, modest changes in language and voice can be noted. They are also found in the early stages of moderate cognitive impairment. Both moderate disorder and Alzheimer’s disease have been shown to impact verbal fluency. It represents the patient’s hesitancy to speak, sluggish speech tempo, and other disabilities, such as difficulty finding words (Dashtipour et al., 2018).
Circumlocution and the regular use of filler consonants, repetitions, semantic mistakes, indeterminate phrases, and grammatical simplicity are all results. Conversation in Alzheimer’s patients is marked by decreased coherence and improbable and irrelevant facts. Prosodic elements such as pitch fluctuation have also been altered, which may impair the patient’s emotional reaction (Dashtipour et al., 2018). Voice characteristics can become noninvasive and simple biomarkers for the early detection of dementia-related disorders.
Rheumatoid Arthritis and Multiple Sclerosis
Voice dysfunction and dysarthria are common comorbidities in persons with multiple sclerosis. In persons with multiple sclerosis, voice features and phonatory habits should be evaluated over time to determine the best moment to begin a treatment like deep brain stimulation (Dashtipour et al., 2018). Some vocal characteristics, like articulation, breathing, and prosody, have been recognized as strong candidates for monitoring multiple sclerosis (Dashtipour et al., 2018). Pathological alterations in the larynx develop with the advancement of rheumatoid arthritis. As a result, tracking speech quality has previously been demonstrated beneficial for monitoring patients.
Monitoring Emotions and Mental Health
Stress is a well-known risk factor for voice problems. It was discovered that self-assessed stress through the smartphone was connected to voice characteristics. There is also a favorable relationship between stress levels and verbal conversation duration. Voice problems appear commonly in those with high cortisol levels, typical in depressed patients. As a result, vocal features are utilized to determine the degree of depression.
Categories of Voice Recordings
There is no established procedure for identifying vocal biomarkers using voice recording. However, the noises made by a human mouth may be classified and analyzed for illness diagnosis into three major groups. They are verbal (single words, brief-phrase repetition), vowel (continuous vowel phonation), and nonverbal, like coughing and breathing. To have command over the documented vocal task while allowing patients to use their own words to maintain semi-spontaneous voice activities are devised in which the patient is directed to talk about a specific topic.
Persistent syllable phonations are another typical sort of recording in which participants are asked to voice a vowel as consistently as they can (Dashtipour et al., 2018). Persistent vowel phonations convey information for assessing dysphonia and allow measuring a patient’s voice despite articulatory effects, without being impacted by speaking pace, stress, and with less influence from the speaker’s accent (Shalev, 2020). It is beneficial for multilingual analysis, where diverse accents might generate misunderstanding. The test involves the quick tongue, lips, and soft palate movements and demonstrates the patient’s ability to maintain speech pace and intelligibility.
Selection of Audio Features
Methods for selecting features, like lowest redundancy, most significant relevance, and Gram-Schmidt orthogonalization, allow a portion of the original features and functionality to be chosen without modifying it (Brabenec et al., 2017). It eliminates traits that are highly linked as well as those with missing data or low variance. It aids in determining the most relevant collection of characteristics to examine for a specific result of interest in a forecasting or classification problem (Brabenec et al., 2017). Dimensionality reduction approaches such as linear discriminant assessment and principal component analysis help escape the curse of dimensionality.
From Research to Practice
Once a voice biomarker has been found, the road to regular clinical usage is still protracted. There are further hurdles for vocal biomarkers, as their relevance may be limited to particular dialects (Dashtipour et al., 2018). No voice biomarkers have yet been authorized by the European Medicines Agency or the US Food and Drug Administration. As a result, further conjecture on the theoretical foundation of such a procedure is limited. It considers intimate situations in conventional biomarkers and difficulties in digital health. The first phase would define standards for collecting vocal biomarkers and large-scale voice sample archives for clinical usage.
Technological and Ethical Challenges
Before they may be deployed extensively, vocal biomarkers and voice technologies must consider accent and language. Otherwise, they may exacerbate systemic biases against persons with a particular accent and exacerbate a pre-existing socioeconomic and digital divide in some minorities (Shalev, 2020). Therefore, the area of speech technology can draw from other industries, like radiology, in which the application of AI is far more advanced, and systematic biases have been established (Califf, 2018). Specific voice-specific concerns will need to be addressed since vocal biomarkers are likely to identify the language or culture-specific traits first before moving on to more general, accent, and language-independent aspects.
Voice data is deemed sensitive because it might expose a person’s identity, ethnic background, and, in the event of vocal biomarkers, their health status (Brabenec et al., 2017). Measures like encoding voice data and dividing data into spontaneous components independently prepared to process voice data without unauthorized disclosure safely should be used to handle ethical issues about processing and voice data collection.
Conclusion
Voice will be employed more frequently in future healthcare systems. Vocal biomarkers could remotely track essential health factors and be utilized to deep phenotype people, paving the road for precision treatment. Nonetheless, speech technology will be incorporated into the clinical process to make the life of healthcare professionals and patients more manageable. There is a need to shift from a technology-oriented towards a more health-oriented strategy to mature the profession. It will henceforth develop research and high-value datasets, demonstrating the benefits of such a strategy.
References
Brabenec, L., Mekyska, J., Galaz, Z., & Rektorova, I. (2017). Speech disorders in Parkinson’s disease: Early diagnostics and effects of medication and brain stimulation. Journal of neural transmission (Vienna, Austria: 1996), 124(3), 303–334. Web.
Califf R. (2018). Biomarker definitions and their applications. Experimental biology and medicine (Maywood, N.J.), 243(3), 213–221. Web.
Dashtipour, K., Tafreshi, A., Lee, J., & Crawley, B. (2018). Speech disorders in Parkinson’s disease: pathophysiology, medical management, and surgical approaches. Neurodegenerative disease management, 8(5), 337–348. Web.
Fagherazzi, G., Fischer, A., Ismael, M., & Despotovic, V. (2021). Voice for health: The use of vocal biomarkers from research to clinical practice. Digital Biomarkers, 5(1), 78-88. Web.
Kraus V. (2018). Biomarkers as drug development tools: discovery, validation, qualification, and use. Nature reviews. Rheumatology, 14(6), 354–362. Web.
Maor, E., Perry, D., Mevorach, D., Taiblum, N., Luz, Y., Mazin, I. & Shalev, V. (2020). A vocal biomarker is associated with hospitalization and mortality among heart failure patients. Journal of the American Heart Association, 9(7), e013359. Web.
Mucke, J., Krusche, M., & Burmester, G. R. (2022). A broad look into the future of rheumatoid arthritis. Therapeutic Advances in Musculoskeletal Disease, 14(1). Web.