Predictive modelling2026

Predicting Student Dropout in Online Learning

MSc thesis on dropout behaviour in computerised adaptive learning systems, applying machine learning to identify students at risk.

Abstract diagram of weekly engagement states flowing towards either continued engagement or dropout.

Context

This project is my MSc thesis in Behavioural Data Science at the University of Amsterdam, carried out with Prowise Learn, a computerised adaptive learning platform.

The central question is how student engagement develops over time in an adaptive learning environment, and whether patterns in that engagement can predict dropout before it happens.

Problem & research question

Dropout in online learning is rarely a single event — it is usually the end of a gradual disengagement process. Platforms record enormous amounts of behavioural data, but raw activity logs do not by themselves say who is at risk.

The research question focuses on whether patterns in behavioural engagement data can identify students at risk of dropout early enough to intervene.

My role

I am responsible for the full research cycle: framing the research question, preparing and validating the data, choosing and implementing the modelling approach, and reporting the results.

Data

The analysis uses longitudinal behavioural data describing how students interact with the adaptive learning platform over time. A substantial part of the work is data preparation: cleaning, structuring, and validating that the engineered variables faithfully represent behaviour.

Approach

The core of the approach is applying machine learning models to behavioural engagement data, so that a student's history becomes a signal about risk rather than a flat activity count:

Preparing and validating longitudinal behavioural data from the adaptive learning platform.
Training machine learning models to identify students at risk of dropout, evaluated out-of-sample.

Key decisions

Framing dropout as a behavioural process that unfolds over time, not a single event — which shapes both the features and the evaluation.
Keeping the models interpretable enough that educators could act on the risk signals.
Evaluating predictive performance out-of-sample rather than relying on in-sample fit.

Results

The thesis is in progress; results will be added when the analysis is finalised.

Challenges & limitations

Behavioural log data is noisy: absence of activity is ambiguous between disengagement and simply learning elsewhere.
Engagement states are a modelling choice — different state definitions can lead to different conclusions, which requires sensitivity checks.
Findings from one platform do not automatically generalise to other learning environments.

What I learned

This project deepened my applied statistics skills in R far beyond coursework — especially survival analysis on real, imperfect longitudinal data.

It also sharpened a research instinct I care about: the modelling choices that make results interpretable for the people who need to act on them matter as much as raw predictive performance.

Related project

Engagement Analysis for Algebrakit