EDUCATIONAL TECHNOLOGY & SOCIETY, cilt.28, sa.3, ss.36-50, 2025 (SSCI, Scopus)
Language teachers mostly spend much time scoring students' writing and may sometimes hesitate to provide reliable scores since essay scoring is time-consuming. In this regard, AI-based Automated Essay Scoring (AES) systems have been used and Generative AI (GenAI) has recently appeared with its potential in scoring essays. Therefore, this study aims to focus on the differences and relationships between human raters (HR) and GenAI scores for the essays produced by English as a Foreign Language (EFL) learners. The data consisted of 210 essays produced by 35 undergraduate students. Two HR and GenAI evaluated the essays using an analytical rubric divided into the following five factors: (1) ideas, (2) organization and coherence, (3) support, (4) style, and (5) mechanics. This study found that there were significant differences between the scores given by HR and those generated by GenAI, as well as variations among the HR themselves; nonetheless, GenAI's scores were similar across dual evaluations. It was also noted that GenAI's scores were statistically significantly lower than those of HR. On the other side, it was found that HR scores correlated weakly, while GenAI scores correlated strongly. A significant correlation was observed between HR-1 and GenAI across all factors, whereas the second HR-2 showed significant correlations with GenAI in only three factors. Therefore, this study can guide EFL teachers on how to reduce their workload in writing assessments by giving GenAI more responsibility in scoring essays. The study also offers many suggestions for future studies on AES based on the findings and limitations of the study.