Retrieval-augmented generation for Arabic legal information: the family code case study
Telecommunication Computing Electronics and Control
Abstract
This document describes the implementation and evaluation of a retrieval-augmented generation (RAG) system to improve access to and understanding of Moroccan law, particularly the family code in Arabic. The research addresses the drawbacks of the widely used linguistic model applied to complex legal terminology in Arabic and aims to help citizens access crucial legal data. We built a new custom dataset with 2.5 k question-answer pairs while preprocessing and using the BGE-m3 embedding model in this experiment. Performance metrics, such as mean reciprocal rank (MRR), Recall@k, and F1-score, indicate that the RAG approach is effective compared to the use of standalone large language models (LLMs). Moreover, an evaluation on metrics such as the blue score, fidelity, response relevance, and contextual relevance indicated that the matching of meanings and context were well captured, which signifies a very good semantic understanding. The research highlights the need for language-specific model specialization in Arabic and presents its main challenges, such as dialectal variations and appropriate evaluation measures. The results indicate that well-developed RAG systems offer a promising approach to improving access to legal information in Arabic-speaking practice communities and to guiding future research and development in this field.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.





