Predicting low cognitive ability at age 5 - feature selection using machine learning methods and birth cohort data
More details
Hide details
1
INFANT Research Centre, Cork, Ireland
2
School of Nursing, Psychotherapy, and Community Health, Dublin City University, Dublin, Ireland
Publication date: 2023-04-27
Popul. Med. 2023;5(Supplement):A1032
ABSTRACT
Background and Objectives: Early life is a crucial period for shaping the developing brain. A failure to achieve early foundational cognitive skills may Results in a permanent loss of opportunity to achieve full cognitive potential. Developmental screening programmes which rely on the presence of a delay may miss the opportunity for early preemptive intervention in the period of optimal neuroplasticity. The objectives of this study were to 1) apply the random forest (RF) algorithm to birth-cohort data to train a model to predict low cognitive ability at 5 years of age using maternal, infant, and sociodemorgaphic characteristics 2) to identify the important predictive features and interactions. Methods: Data was from 1,070 participants in the Irish population-based BASELINE cohort. A RF model was trained to predict an intelligence quotient (IQ) score £90 at age 5 years using maternal, infant, and sociodemographic features. Feature importance was examined and internal validation performed using 10-fold cross validation repeated 5 times. Results: The five most important predictive features were the total years of maternal schooling, infant Apgar score at 1 minute, socioeconomic index, maternal BMI, and alcohol consumption in the first trimester. On internal validation a parsimonious RF model based on 11 features showed excellent predictive ability, correctly classifying 95% of participants. Examination of the model revealed important predictive interactions between many features, for example between total years of maternal schooling and maternal alcohol intake in the first trimester. This model provides a foundation suitable for external validation in an unseen cohort. Conclusions: Machine learning approaches to large existing datasets can provide accurate feature selection to improve risk prediction. Further validation of this model is required in cohorts representative of the general population. Predicting later cognitive function has important potential risks which warrant careful attention, but it may provide an opportunity for early preemptive intervention.