A dual-model machine learning approach to medicare fraud detection: combining unsupervised anomaly detection with supervised learning
Computer Science and Information Technologies

Abstract
Medicare fraud, costing $54.35 billion in improper payments in 2024, undermines U.S. healthcare by draining resources meant for vulnerable populations. Traditional detection methods struggle with reactive designs, high false positives, and reliance on scarce labeled data, exacerbated by a 0.017% fraud prevalence. This paper proposes a dual-model machine learning framework to tackle these challenges. Unsupervised anomaly detection uses cluster-based local outlier factor (CBLOF) and empirical cumulative outlier detection (ECOD) to identify novel fraud patterns across 37 million records. These findings are validated by the list of excluded individuals/entities (LEIE). Supervised classification, with C4.5 decision trees and logistic regression, refines these anomalies using an 80:20 balanced dataset, reducing false positives by 63%. Key innovations include hybrid sampling to address class imbalance, LEIE integration for labeled validation, and parallelized processing of 2.1 million claims hourly. Achieving an area under the curve (AUC), a measure of model accuracy, of 88.3%, this approach outperforms single-model systems by 24%, blending exploratory detection with actionable precision. This scalable, interpretable framework potentially advances fraud detection, safeguarding public funds and Medicare’s integrity with a practical, adaptable solution for evolving threats.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.
