Unproctored Computerized Adaptive Testing (CAT) is gaining traction due to its convenience, flexibility, and scalability, particularly in high-stakes assessments. However, the lack of proctor can give rise to aberrant testing behavior. These behaviors can impair the validity of test scores. This paper explores the use of a verification test to detect aberrant testing behavior in unproctored CAT environments. This study aims to use multiple measures to detect aberrant response patterns in CAT via a paper-and-pencil (P&P) test as well as to compare the sensitivity and specificity performances of the l_z person-fit statistic (PFS) using no-stage and two-stage (l_z is used after the Kullback–Leibler divergence (KLD) measure) methods in different conditions. Three factors were manipulated – the aberrance percentage, the aberrance scenario, and the aberrant examinee’s ability range. The study found that in all scenarios, the specificity performance of l_z in classifying examinees was higher than its sensitivity performance in no-stage and two-stage analyses. However, the sensitivity performance of〖 l〗_z was higher in two-stage analysis.
Unproctored Computerized Adaptive Testing (CAT) is gaining traction due to its convenience, flexibility, and scalability, particularly in high-stakes assessments. However, the lack of proctor can give rise to aberrant testing behavior. These behaviors can impair the validity of test scores. This paper explores the use of a verification test to detect aberrant testing behavior in unproctored CAT environments. This study aims to use multiple measures to detect aberrant response patterns in CAT via a paper-and-pencil (P&P) test as well as to compare the sensitivity and specificity performances of the l_z person-fit statistic (PFS) using no-stage and two-stage (l_z is used after the Kullback–Leibler divergence (KLD) measure) methods in different conditions. Three factors were manipulated – the aberrance percentage, the aberrance scenario, and the aberrant examinee’s ability range. The study found that in all scenarios, the specificity performance of l_z in classifying examinees was higher than its sensitivity performance in no-stage and two-stage analyses. However, the sensitivity performance of lz was higher in two-stage analysis.