The effects of data imbalance on fraud detection model accuracy

Rusma Anieza Ruslan

Nureize Arbaiy

Pei-Chun Lin

International Journal of Artificial Intelligence

The effects of data imbalance on fraud detection model accuracy

Abstract

Machine learning (ML) model performance is often assessed by accuracy, but the quality and balance of data also play crucial roles. Imbalanced datasets, where the minority class has fewer samples than the majority class, can lead to biased predictions favoring the majority class. This study addresses the issue of class imbalance through resampling techniques, including random undersampling (RUS) and random oversampling (ROS), specifically applied to a fraud detection dataset. We classify the resampled datasets using random forest (RF) and gradient boosting (GB) models. Our findings indicate that the RF model, when combined with ROS, achieves an accuracy of 97.4%, surpassing the 96.1% accuracy of the GB model with RUS. This approach demonstrates the importance of addressing class imbalance to improve prediction accuracy in ML.

Cite

Full View

DOI

10.11591/ijai.v15.i2.pp1402-1408

ISSN Information

2089-4872

Pages

1402-1408

More Information

Volume 15

Issue 2

Publish at 2026-04-01

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now