TY - JOUR AU - Rubaiat, Rahmina AU - Templeton, John Michael AU - Schneider, Sandra L AU - De Silva, Upeka AU - Madanian, Samaneh AU - Poellabauer, Christian PY - 2025 DA - 2025/2/12 TI - Exploring Speech Biosignatures for Traumatic Brain Injury and Neurodegeneration: Pilot Machine Learning Study JO - JMIR Neurotech SP - e64624 VL - 4 KW - speech biosignatures KW - speech feature analysis KW - amyotrophic lateral sclerosis KW - ALS KW - neurodegenerative disease KW - Parkinson's disease KW - detection KW - speech KW - neurological KW - traumatic brain injury KW - concussion KW - mobile device KW - digital health KW - machine learning KW - mobile health KW - diagnosis KW - mobile phone AB - Background: Speech features are increasingly linked to neurodegenerative and mental health conditions, offering the potential for early detection and differentiation between disorders. As interest in speech analysis grows, distinguishing between conditions becomes critical for reliable diagnosis and assessment. Objective: This pilot study explores speech biosignatures in two distinct neurodegenerative conditions: (1) mild traumatic brain injuries (eg, concussions) and (2) Parkinson disease (PD) as the neurodegenerative condition. Methods: The study included speech samples from 235 participants (97 concussed and 94 age-matched healthy controls, 29 PD and 15 healthy controls) for the PaTaKa test and 239 participants (91 concussed and 104 healthy controls, 29 PD and 15 healthy controls) for the Sustained Vowel (/ah/) test. Age-matched healthy controls were used. Young age-matched controls were used for concussion and respective age-matched controls for neurodegenerative participants (15 healthy samples for both tests). Data augmentation with noise was applied to balance small datasets for neurodegenerative and healthy controls. Machine learning models (support vector machine, decision tree, random forest, and Extreme Gradient Boosting) were employed using 37 temporal and spectral speech features. A 5-fold stratified cross-validation was used to evaluate classification performance. Results: For the PaTaKa test, classifiers performed well, achieving F1-scores above 0.9 for concussed versus healthy and concussed versus neurodegenerative classifications across all models. Initial tests using the original dataset for neurodegenerative versus healthy classification yielded very poor results, with F1-scores below 0.2 and accuracy under 30% (eg, below 12 out of 44 correctly classified samples) across all models. This underscored the need for data augmentation, which significantly improved performance to 60%‐70% (eg, 26‐31 out of 44 samples) accuracy. In contrast, the Sustained Vowel test showed mixed results; F1-scores remained high (more than 0.85 across all models) for concussed versus neurodegenerative classifications but were significantly lower for concussed versus healthy (0.59‐0.62) and neurodegenerative versus healthy (0.33‐0.77), depending on the model. Conclusions: This study highlights the potential of speech features as biomarkers for neurodegenerative conditions. The PaTaKa test exhibited strong discriminative ability, especially for concussed versus neurodegenerative and concussed versus healthy tasks, whereas challenges remain for neurodegenerative versus healthy classification. These findings emphasize the need for further exploration of speech-based tools for differential diagnosis and early identification in neurodegenerative health. SN - 2817-092X UR - https://neuro.jmir.org/2025/1/e64624 UR - https://doi.org/10.2196/64624 DO - 10.2196/64624 ID - info:doi/10.2196/64624 ER -