Izaque de Sousa Maciel, Aino-Kaisa Piironen, Alexey M. Afonin, Mariia Ivanova, Arto Alatalo, Kaustubh Kishor Jadhav, Jordi Julvez, Maria Foraster, Irene van Kamp, and Katja M. Kanninen. Nature Mental Health, 1, pages596–605 (2023).
Abstract.
An estimated 10–20% of adolescents experience mental health conditions, and most of them remain underdiagnosed and undertreated. Discovering new susceptibility biomarkers is therefore important for identifying individuals at high risk of developing mental health problems, and for improving early prevention. Here we aimed to discover plasma protein-based susceptibility biomarkers in children/adolescents aged 11–16 years at risk of developing mental health issues. Risk was evaluated on the basis of self-reported Strengths and Difficulties Questionnaire (SDQ) scores, and plasma proteomic data were obtained for individuals participating in the Spanish WALNUTs cohort study by liquid chromatography–tandem mass spectrometry. Bioinformatic analyses were performed to identify the biological processes and pathways in which the identified biomarker candidates are involved; 58 proteins were significantly associated with the SDQ score. The most prominent enriched pathways related to these proteins included immune responses, blood coagulation, neurogenesis and neuronal degeneration. This exploratory study revealed several alterations of plasma proteins associated with the SDQ score in adolescents, which opens a new avenue to develop novel susceptibility biomarkers to improve early identification of individuals at risk of mental health problems.
Predictive models generation.
The relatively large number of samples made it possible to employ modern strategies to determine potentially predictive biomarkers for the low versus raised SDQ score groups. A novel QLattice algorithm39 was used to create models containing predictive biomarkers that best separate the two groups with low and raised SDQ scores. The Bayesian information criterion (BIC) was used to ensure that the resulting models generalize well from the training to test set. We performed fivefold cross-validation of running logistical regression model with QLattice on different partitions of the data keeping the lowest BIC-scoring model from each partition. The receiver operator characteristic (ROC) curves and area under the ROC curve (AUC) for each of the models are presented in Supplementary Fig. 2.
Five diverse models were created using a fivefold cross-validation scheme. These models bring similar—albeit complementary—insights, as the whole dataset was split into training and validation sets five times, and each round contained different sub-samples of the data. The five unique models (Table 3) contained eleven proteins in total (Supplementary Table 3). Four of the five models contained proteins with a previously shown connection to the CNS. The first model contained three such proteins: amyloid beta precursor-like protein 1 (APLP1) (P51693), calcium/calmodulin dependent protein kinase II beta (CAMK2B) (Q13554/Q13555) and Reticulon 4 (RTN4; Q9NQC3), the ROC parameters for the models are shown in the Supplementary Fig. 2. Only the fifth model contained no proteins, previously connected to brain development. The proteins present in the models can be investigated further as potential biomarkers.