Classification of connected speech in aging and neurodegenerative disease based on multilingual NLP models

Project leader: Rik Vandenberghe

Rationale

Semi-spontaneous speech elicited by a structured interview or scene description is a very rich source of information about the speaker. As a diagnostic tool it can become an alternative to classical neuropsychological assessment which is based on relatively reductionist tests that patients often experience as confrontational. Multilingual NLP models are well suited for the crosslingual goals of the Include Network, and can be applied to auditory speech or to written transcripts. In this network-wide call we aim to make the connected speech data from the partners accessible and apply and compare the diagnostic performance of NLP models in a collaborative spirit. The collaboration is twofold: providing access to connected speech data from different groups of patients and controls across different languages and, secondly, application and comparison of different NLP models in a coordinated manner.

Illustration of the approach

Based on the Autobiographical Memory Interview and four additional interview questions as well as the Cookie Theft Scene description we are collecting one hour of spontaneous speech per individual. We currently have audiofragments and respective written transcripts from 74 deeply phenotyped older adults as well as AD patients. We are still recruiting AD patients as well as patients with PPA, FTD behavioral variant and Dementia with Lewy bodies. We aim for the preclinical, the prodromal and early dementia stages. The primary aim is multilingual disease classification across cohorts. Furthermore, within a given disease there is heterogeneity in the degree of language involvement. Apart from speech fragments we also collect structural and resting-state functional MRI. We are currently using multilingual Transformer Models alongside models based on explicit linguistic features.

Invitation to join our ongoing cross-linguistic studies

We invite partners to share connected speech data in healthy controls with or without known amyloid biomarker status, patients with AD in a prodromal or early dementia stage, patients with Primary Progressive Aphasia, Frontotemporal dementia behavioral variant and patients with MCI due to Lewy body disease or clinically probable Dementia with Lewy bodies. We also invite partners to share different analysis approaches and models in a coordinated manner.

What do we offer?

1. Access to a multicentre, cross-lingual dataset of connected speech from a variety of neurodegenerative disease as as well as matched controls.

2. Coordination between centres for comparison of models and methods for their accuracy and diagnostic performance.

3. We aim to publish innovative manuscripts in high-impact journals, with a fair and consensual co-authorship agreement (to be jointly decided among participating sites).

What do you need to participate?

Inclusion in the study requires the following data from deeply phenotyped cognitively intact older adults, patients with Alzheimer disease, primary progessive aphasia, frontotemporal dementia behavioral variant and Dementia with Lewy bodies or MCI with Lewy Bodies

1. Sociodemographic data (sex, age, education, handedness).

2. Scale for staging, ideally Clinical Dementia Rating (CDR) scale or CDR with NACC FTLD modules

3. Connected speech data elicited either by a semi-structured interview or by scene description

4. If available, structural MRI

How do I proceed if I am interested?

Please contact Prof Dr Rik Vandenberghe (rik.vandenberghe@uzleuven.be) to set up a call.

Thanks!

More projects to explore