SCIENTIFIC REPORTS, cilt.15, sa.1, 2025 (SCI-Expanded, Scopus)
Today, the evaluation of large-scale examinations is mostly carried out using optical marking systems. Although these systems provide fast and accurate scoring, they often overlook behavioral patterns on answer sheets. Features such as blank responses, double markings, repetitive patterns, and response diversity can support a more comprehensive assessment of exam security and student behavior. This study aims to predict students' risk levels from optical marking data using machine learning. The target variable consists of expert-defined five-level risk labels. Since all features were categorical, OrdinalEncoder and 5-fold stratified cross-validation were applied. Among all models, CatBoost achieved the highest performance on the five-level scale (test accuracy 73%, weighted F1 0.72, macro-F1 0.69) and clearly outperformed a rule-based baseline constructed from expert thresholds (13% accuracy, macro-F1 0.11). Confusion matrix and ROC analyses showed high discriminative power, with minor confusion between adjacent classes. Additionally, following a consolidated three-level setting, the model achieved 91% accuracy and 0.85 macro-F1, improving stability for decision-making. The results show that optical marking data reveal meaningful behavioral patterns and enable reliable risk assessment beyond simple correct-incorrect analysis.