TY  - JOUR
AU  - Rubaiat, Rahmina
AU  - Templeton, John Michael
AU  - Schneider, Sandra L
AU  - De Silva, Upeka
AU  - Madanian, Samaneh
AU  - Poellabauer, Christian
PY  - 2025
DA  - 2025/2/12
TI  - Exploring Speech Biosignatures for Traumatic Brain Injury and Neurodegeneration: Pilot Machine Learning Study
JO  - JMIR Neurotech
SP  - e64624
VL  - 4
KW  - speech biosignatures
KW  - speech feature analysis
KW  - amyotrophic lateral sclerosis
KW  - ALS
KW  - neurodegenerative disease
KW  - Parkinson's disease
KW  - detection
KW  - speech
KW  - neurological
KW  - traumatic brain injury
KW  - concussion
KW  - mobile device
KW  - digital health
KW  - machine learning
KW  - mobile health
KW  - diagnosis
KW  - mobile phone
AB  - Background: Speech features are increasingly linked to neurodegenerative and mental health conditions, offering the potential for early detection and differentiation between disorders. As interest in speech analysis grows, distinguishing between conditions becomes critical for reliable diagnosis and assessment. Objective: This pilot study explores speech biosignatures in two distinct neurodegenerative conditions: (1) mild traumatic brain injuries (eg, concussions) and (2) Parkinson disease (PD) as the neurodegenerative condition. Methods: The study included speech samples from 235 participants (97 concussed and 94 age-matched healthy controls, 29 PD and 15 healthy controls) for the PaTaKa test and 239 participants (91 concussed and 104 healthy controls, 29 PD and 15 healthy controls) for the Sustained Vowel (/ah/) test. Age-matched healthy controls were used. Young age-matched controls were used for concussion and respective age-matched controls for neurodegenerative participants (15 healthy samples for both tests). Data augmentation with noise was applied to balance small datasets for neurodegenerative and healthy controls. Machine learning models (support vector machine, decision tree, random forest, and Extreme Gradient Boosting) were employed using 37 temporal and spectral speech features. A 5-fold stratified cross-validation was used to evaluate classification performance. Results: For the PaTaKa test, classifiers performed well, achieving F1-scores above 0.9 for concussed versus healthy and concussed versus neurodegenerative classifications across all models. Initial tests using the original dataset for neurodegenerative versus healthy classification yielded very poor results, with F1-scores below 0.2 and accuracy under 30% (eg, below 12 out of 44 correctly classified samples) across all models. This underscored the need for data augmentation, which significantly improved performance to 60%‐70% (eg, 26‐31 out of 44 samples) accuracy. In contrast, the Sustained Vowel test showed mixed results; F1-scores remained high (more than 0.85 across all models) for concussed versus neurodegenerative classifications but were significantly lower for concussed versus healthy (0.59‐0.62) and neurodegenerative versus healthy (0.33‐0.77), depending on the model. Conclusions: This study highlights the potential of speech features as biomarkers for neurodegenerative conditions. The PaTaKa test exhibited strong discriminative ability, especially for concussed versus neurodegenerative and concussed versus healthy tasks, whereas challenges remain for neurodegenerative versus healthy classification. These findings emphasize the need for further exploration of speech-based tools for differential diagnosis and early identification in neurodegenerative health. 
SN  - 2817-092X
UR  - https://neuro.jmir.org/2025/1/e64624
UR  - https://doi.org/10.2196/64624
DO  - 10.2196/64624
ID  - info:doi/10.2196/64624
ER  -