Articles

Access the latest knowledge in applied science, electrical engineering, computer science and information technology, education, and health.

Filter Icon

Filters article

Years

FAQ Arrow
0
0

Source Title

FAQ Arrow

Authors

FAQ Arrow

29,939 Article Results

Music genre classification using Inception-ResNet architecture

10.11591/ijai.v14.i4.pp3300-3310
Fauzan Valdera , Ajib Setyo Arifin
Music genres help categorize music but lack strict boundaries, emerging from interactions among public, marketing, history, and culture. With Spotify hosting over 80 million tracks, organizing digital music is challenging due to the sheer volume and diversity. Automating music genre classification aids in managing this vast array and attracting customers. Recently, convolutional neural networks (CNNs) have been used for their ability to extract hierarchical features from images, applicable to music through spectrograms. This study introduces the Inception-ResNet architecture for music genre classification, significantly improving performance with 94.10% accuracy, precision of 94.19%, recall of 94.10%, F1-score of 94.08%, and 149,418 parameters on the GTZAN dataset, showcasing its potential in efficiently managing and categorizing large music databases.
Volume: 14
Issue: 4
Page: 3300-3310
Publish at: 2025-08-01

Application of self-organizing map for modeling the Aquilaria malaccensis oil using chemical compound

10.11591/ijai.v14.i4.pp2889-2898
Mohammad Arif Fahmi Che Hassan , Zakiah Mohd Yusoff , Nurlaila Ismail , Mohd Nasir Taib
Agarwood oil, known as ‘black gold’ or the ‘wood of God,’ is a globally prized essential oil derived naturally from the Aquilaria tree. Despite its significance, the current non-standardized grading system varies worldwide, relying on subjective assessments. This paper addresses the need for a consistent classification model by presenting an overview of Aquilaria malaccensis oil quality using the self-organizing map (SOM) algorithm. Derived from the Thymelaeaceae family, Aquilaria malaccensis is a primary source of agarwood trees in the Malay Archipelago. Agarwood oil extraction involves traditional methods like solvent extraction and hydro-distillation, yielding a complex mixture of chromone derivatives, oxygenated sesquiterpenes, and sesquiterpene hydrocarbons. This study categorizes agarwood oil into high and low grades based on chemical compounds, utilizing the SOM algorithm with inputs of three specific compounds: β-agarofuran, α-agarofuran, and 10-epi-φ-eudesmol. Findings demonstrate the efficacy of SOM-based quality grading in distinguishing agarwood oil grades, offering a significant contribution to the field. The non-standardized grading system's inefficiency and subjectivity underscore the necessity for a standardized model, making this research crucial for the agarwood industry's advancement.
Volume: 14
Issue: 4
Page: 2889-2898
Publish at: 2025-08-01

Traffic flow prediction using long short-term memory-Komodo Mlipir algorithm: metaheuristic optimization to multi-target vehicle detection

10.11591/ijai.v14.i4.pp3343-3353
Imam Ahmad Ashari , Wahyul Amien Syafei , Adi Wibowo
Multi-target vehicle detection in urban traffic faces challenges such as poor lighting, small object sizes, and diverse vehicle types, impacting traffic flow prediction accuracy. This study introduces an optimized long short-term memory (LSTM) model using the Komodo Mlipir algorithm (KMA) to enhance prediction accuracy. Traffic video data are processed with YOLO for vehicle classification and object counting. The LSTM model, trained to capture traffic patterns, employs parameters optimized by KMA, including learning rate, neuron count, and epochs. KMA integrates mutation and crossover strategies to enable adaptive selection in global and local searches. The model's performance was evaluated on an urban traffic dataset with uniform configurations for population size and key LSTM parameters, ensuring consistent evaluation. Results showed LSTM-KMA achieved a root mean square error (RMSE) of 14.5319, outperforming LSTM (16.6827), LSTM-improved dung beetle optimization (IDBO) (15.0946), and LSTM-particle swarm optimization (PSO) (15.0368). Its mean absolute error (MAE), at 8.7041, also surpassed LSTM (9.9903), LSTM-IDBO (9.0328), and LSTM-PSO (9.0015). LSTM-KMA effectively tackles multi-target detection challenges, improving prediction accuracy and transportation system efficiency. This reliable solution supports real-time urban traffic management, addressing the demands of dynamic urban environments.
Volume: 14
Issue: 4
Page: 3343-3353
Publish at: 2025-08-01

Exploring bibliometric trends in speech emotion recognition (2020-2024)

10.11591/ijai.v14.i4.pp3421-3434
Yesy Diah Rosita , Muhammad Raafi'u Firmansyah , Annisaa Utami
Speech Emotion Recognition (SER) is crucial in various real-world applications, including healthcare, human-computer interaction, and affective computing. By enabling systems to detect and respond to human emotions through vocal cues, SER enhances user experience, supports mental health monitoring, and improves adaptive technologies. This research presents a bibliometric analysis of SER based on 68 articles from 2020 to early 2024. The findings show a significant increase in publications each year, reflecting the growing interest in SER research. The analysis highlights various approaches in preprocessing, data sources, feature extraction, and emotion classification. India and China emerged as the most active contributors, with external funding, particularly from the NSFC, playing a significant role in the advancement of SER research. SVM remains the most widely used classification model, followed by KNN and CNN. However, several critical challenges persist, including inconsistent data quality, cross-linguistic variability, limited emotional diversity in datasets, and the complexity of real-time implementation. These limitations hinder the generalizability and scalability of SER systems in practical environments. Addressing these gaps is essential to enhance SER performance, especially for multimodal and multilingual applications. This study provides a detailed understanding of SER research trends, offering valuable insights for future advances in speech-based emotion recognition.
Volume: 14
Issue: 4
Page: 3421-3434
Publish at: 2025-08-01

Optimized pap-smear image enhancement: hybrid Perona-Malik diffusion filter-CLAHE using spider monkey optimization

10.11591/ijai.v14.i4.pp2765-2775
Ach Khozaimi , Isnani Darti , Wuryansari Muharini Kusumawinahyu , Syaiful Anam
Pap-smear image quality is crucial for cervical cancer detection. This study introduces an optimized hybrid approach that combines the Perona-Malik diffusion (PMD) filter with contrast-limited adaptive histogram equalization (CLAHE) to enhance pap-smear image quality. The PMD filter reduces the image noise, whereas CLAHE improves the image contrast. The hybrid method was optimized using spider monkey optimization (SMO PMD-CLAHE). Blind/reference-less image spatial quality evaluator (BRISQUE) and contrast enhancement-based image quality (CEIQ) are the new objective functions for the PMD filter and CLAHE optimization, respectively. The simulations were conducted using the SIPaKMeD dataset. The results indicate that SMO outperforms state-of-the-art methods in optimizing the PMD filter and CLAHE. The proposed method achieved an average effective measure of enhancement (EME) of 5.45, root mean square (RMS) contrast of 60.45, Michelson’s contrast (MC) of 0.995, and entropy of 6.80. This approach offers a new perspective for improving pap-smear image quality.
Volume: 14
Issue: 4
Page: 2765-2775
Publish at: 2025-08-01

Transforming images into words: optical character recognition solutions for image text extraction

10.11591/ijai.v14.i4.pp3412-3420
Jyoti Wadmare , Sunita Patil , Dakshita Kolte , Kapil Bhatia , Palak Desai , Ganesh Wadmare
Optical character recognition (OCR) tool is a boon and greatest advancement in today’s emerging technology which has proven its remarkability in recent years by making it easier for humans to convert the textual information in images or physical documents into text data making it useful for analysis, automation processes and improvised productivity for different purposes. This paper presents the designing, development and implementation of a novel OCR tool aiming at text extraction and recognition tasks. The tool incorporates advanced techniques such as computer vision and natural language processing (NLP) which offer powerful performance for various document types. The performance of the tool is subject to metrics like analysis, accuracy, speed, and document format compatibility. The developed OCR tool provides an accuracy of 98.8% upon execution providing a character error rate of 2.4% and word error rate (WER) of 2.8%. OCR tool finds its applications in document digitization, personal identification, archival of valuable documents, processing of invoices, and other documents. OCR tool holds an immense amount of value for researchers, practitioners and many organizations which seek effective techniques for relevant and accurate text extraction and recognition tasks.
Volume: 14
Issue: 4
Page: 3412-3420
Publish at: 2025-08-01

Modified zero-reference deep curve estimation for contrast quality enhancement in face recognition

10.11591/ijai.v14.i4.pp3274-3286
Muhammad Kahfi Aulia , Dyah Aruming Tyas
Face recognition systems remain challenged by variable lighting conditions. While zero-reference deep curve estimation (Zero-DCE) effectively enhances low-light images, it frequently induces overexposure in normal- and high-brightness scenarios. This study introduces modified Zero-DCE combined with three established enhancement techniques: contrast stretching (CS), contrast limited adaptive histogram equalization (CLAHE), and brightness preserving dynamic histogram equalization (BPDHE). Evaluations employed the extended Yale face database B and face recognition technology (FERET) datasets, with 10 representative samples assessed using the blind/referenceless image spatial quality evaluator (BRISQUE) metric. Modified Zero-DCE with BPDHE produced optimal enhancement quality, achieving a mean BRISQUE score of 16.018. On the extended Yale face database B, visual geometry group 16 (VGG16) integrated with modified Zero-DCE and CLAHE attained 83.65% recognition accuracy, representing a 6.08-percentage-point improvement over conventional Zero-DCE. For the 200-subject FERET subset, residual network 50 (ResNet50) with modified Zero-DCE and CLAHE achieved 67.41% accuracy. Notably, standard Zero-DCE with CLAHE demonstrated superior robustness in extremely low-light conditions, highlighting the illumination-dependent performance characteristics of these enhancement approaches.
Volume: 14
Issue: 4
Page: 3274-3286
Publish at: 2025-08-01

A deep learning-based framework for automatic detection of COVID-19 using chest X-ray and CT-scan images

10.11591/ijai.v14.i4.pp3192-3200
Sivanagireddy Kalli , Bukka Narendra Kumar , Saggurthi Jagadeesh , Kushagari Chandramouli Ravi Kumar
COVID-19 has profoundly impacted global public health, underscoring the need for rapid detection methods. Radiography and radiologic imaging, especially chest X-rays, enable swift diagnosis of infected individuals. This study delves into leveraging machine learning to identify COVID-19 from X-ray images. By gathering a dataset of 9,000 chest X-rays and CT scans from public resources, meticulously vetted by board-licensed radiologists to confirm COVID-19 presence, the research sets a robust foundation. However, further validation is essential expanding datasets to encompass enough COVID-19 cases enhances convolutional neural network (CNN) accuracy. Among various machine learning techniques, deep learning excels in identifying distinct patterns on imaging characteristics discernible in chest radiographs of COVID-19 patients. Yet, extensive validation across diverse datasets and clinical trials is crucial to ensure the robustness and generalizability of these models. The conversation extends into complexities, including ethical considerations around patient privacy and integrating intelligent tech into clinical workflows. Collaborating closely with healthcare professionals ensures this technology complements the established diagnostic approach. Despite the potential to detect COVID-19 using chest X-ray imaging findings, thorough research and validation, alongside ethical deliberations, are vital before implementing it in the healthcare field. The results show that the proposed model achieved classification accuracy and F1 score of 96% and 98%, respectively, for the X-ray images.
Volume: 14
Issue: 4
Page: 3192-3200
Publish at: 2025-08-01

Challenges of recommender systems in finance and banking: a systematic review

10.11591/ijai.v14.i4.pp2559-2567
Lossan Bonde , Abdoul Karim Bichanga
Recommender systems are widely applied in various domains, including e-commerce, marketing, and education. Despite their popularity, recommender systems are not widely used in finance and banking. This paper aims to identify the challenges associated with using recommender systems in finance and banking and recommend directions for future research. Using a systematic literature review (SLR) method, 52 papers were selected and analyzed. A three-step process was used to make the selection. First, a keyword search was made to identify a seed list of sources. A snowball technique with specific inclusion and exclusion criteria was applied to expand the list. Finally, a quick study was made to produce the final list of sources to consider. Through the study of the 52 relevant papers, three main challenges: i) transparency, ethics, and data privacy; ii) handling complex content information and accounting for multiple user behaviors; and iii) explainability of AI models were identified. This study has established the barriers to adopting recommender systems in the finance and banking industry. Specific subjects of concern identified include cold-start problems, personalization, fraud detection, transparency, and data privacy. The study recommends further research leveraging advanced machine learning models and emerging technologies to fill the gap.
Volume: 14
Issue: 4
Page: 2559-2567
Publish at: 2025-08-01

Survey on 3D biometric traits for human identification

10.11591/ijai.v14.i4.pp3143-3152
Divya Gangachannaiah , Mamatha Aruvanalli Shivaraj , Honganur Chandrasekharaiah Nagaraj , Prasanna Gururaj Paga
Individuals are verified and identified using Biometric technology based on their biological or behavioral traits. Biometric-based personal authentication systems are more reliable and user friendly, overruns the traditional personal authentication systems. The physiological biometric traits get abraded due to aging and massive work, while the behavioral biometric traits are having high variations due to external factors such as fatigue, and mood. Among the physiological biometric traits, Finger geometry patterns are widely deployed authentication system reason being its stability, user acceptability and uniqueness. Recent trends in Biometrics attempt to incorporate 3D domain traits, 3D reconstruction is done using 2D multiple images. 3D images are usually more robust and illumination invariant as compared to their 2D counterparts. 3D reconstruction algorithms are compared by finding mean square error (MSE).
Volume: 14
Issue: 4
Page: 3143-3152
Publish at: 2025-08-01

A novel fuzzy logic based sliding mode control scheme for non-linear systems

10.11591/ijai.v14.i4.pp2676-2688
Abdul Kareem , Varuna Kumara
Sliding mode control (SMC) has been widely used in the control of non-linear systems due to many inherent properties like superposition, multiple isolated equilibrium points, finite escape time, limit cycle, bifurcation. This research proposes super-twisting controller architecture with a varying sliding surface; the sliding surface being adjusted by a simple single input-single output (SISO) fuzzy logic inference system. The proposed super-twisting controller utilizes a varying sliding surface with an online slope update using a SISO fuzzy logic inference system. This rotates sliding surface in the direction of enhancing the dynamic performance of the system without compromising steady state performance and stability. The performance of the proposed controller is compared to that of the basic super-twisting sliding mode (STSM) controller with a fixed sliding surface through simulations for a benchmark non-linear system control system model with parametric uncertainties and disturbances. The simulation results have confirmed that the proposed approach has the improved dynamic performance in terms of faster response than the typical STSM controller with a fixed sliding surface. This improved dynamic performance is achieved without affecting robustness, system stability and level of accuracy in tracking. The proposed control approach is straightforward to implement since the sliding surface slope is regulated by a SISO fuzzy logic inference system. The MATLAB/Simulink is used to display the efficiency of proposed system over conventional system.
Volume: 14
Issue: 4
Page: 2676-2688
Publish at: 2025-08-01

Deep transfer learning for classification of ECG signals and lip images in multimodal biometric authentication systems

10.11591/ijai.v14.i4.pp3160-3171
Latha Krishnamoorthy , Ammasandra Sadashivaiah Raju
Authentication plays an essential role in diverse kinds of application that requires security. Several authentication methods have been developed, but biometric authentication has gained huge attention from the research community and industries due to its reliability and robustness. This study investigates multimodal authentication techniques utilizing electrocardiogram (ECG) signals and face lip images. Leveraging transfer learning from pre-trained ResNet and VGG16 models, ECG signals and photos of the lip area of the face are used to extract characteristics. Subsequently, a convolutional neural network (CNN) classifier is employed for classification based on the extracted features. The dataset used in this study comprises ECG signals and face lip images, representing distinct biometric modalities. Through the integration of transfer learning and CNN classification, improving the reliability and precision of multimodal authentication systems is the primary objective of the study. Verification results show that the suggested method is successful in producing trustworthy authentication using multimodal biometric traits. The experimental analysis shows that the proposed deep transfer learning-based model has reported the average accuracy, F1-score, precision, and recall as 0.962, 0.970, 0.965, and 0.966, respectively.
Volume: 14
Issue: 4
Page: 3160-3171
Publish at: 2025-08-01

Unpacking the drivers of artificial intelligence regulation: driving forces and critical controls in artificial intelligence governance

10.11591/ijai.v14.i4.pp2655-2666
Ibrahim Atoum , Salahiddin Altahat
The burgeoning field of artificial intelligence (AI) necessitates a nuanced approach to governance that integrates technological advancement, ethical considerations, and regulatory oversight. As various AI governance frameworks emerge, a fragmented landscape hinders effective implementation. This article examines the driving forces behind AI regulation and the essential control mechanisms that underpin these frameworks. We analyze market-driven, state-driven, and rights-driven regulatory approaches, focusing on their underlying motivations. Furthermore, critical regulatory controls such as data governance, risk management, and human oversight are highlighted to demonstrate their roles in establishing effective governance structures. Additionally, the importance of international cooperation and stakeholder collaboration in addressing the challenges posed by rapid technological change is emphasized. By providing insights into the strengths, weaknesses, and potential synergies of different governance models, this study contributes to the development of equitable and effective AI regulatory frameworks that encourage innovation while safeguarding societal interests. Ultimately, the findings aim to inform policymakers, industry leaders, and civil society organizations in their efforts to foster a future where AI is utilized responsibly and equitably for the betterment of humanity.
Volume: 14
Issue: 4
Page: 2655-2666
Publish at: 2025-08-01

Imagery based plant disease detection using conventional neural networks and transfer learning

10.11591/ijai.v14.i4.pp2701-2712
Ali Mhaned , Salma Mouatassim , Mounia El Haji , Jamal Benhra
Ensuring the sustainability of global food production requires efficient plant disease detection, challenge conventional methods struggle to address promptly. This study explores advanced techniques, including convolutional neural networks (CNNs) and transfer learning models (ResNet and VGG), to improve plant disease identification accuracy. Using a plant disease dataset with 65 classes of healthy and diseased leaves, the research evaluates these models' effectiveness in automating disease recognition. Preprocessing techniques, such as size normalization and data augmentation, are employed to enhance model reliability, and the dataset is divided into training, testing, and validation sets. The CNN model achieved accuracies of 95.45 and 94.52% for 128×128 and 256×256 image sizes, respectively. ResNet50 proved the best performer, reaching 98.38 and 98.63% accuracy, while VGG16 achieved 97.99 and 98.34%. These results highlight ResNet50's superior ability to capture intricate features, making it a robust tool for precision agriculture. This research provides practical solutions for early and accurate disease identification, helping to improve crop management and food security.
Volume: 14
Issue: 4
Page: 2701-2712
Publish at: 2025-08-01

The growth and trends information technology endangered language revitalization research: Insight from a bibliometric study

10.11591/ijece.v15i4.pp3888-3903
Leonardi Paris Hasugian , Syifaul Fuada , Triana Mugia Rahayu , Apridio Edward Katili , Feby Artwodini Muqtadiroh , Nur Aini Rakhmawati
Since United Nations Educational, Scientific and Cultural Organization (UNESCO) declared endangered languages, researchers have revitalized endangered languages in many fields. This study discusses a bibliometric analysis conducted to investigate research on the topic of revitalization of endangered languages in information technology. The study's aim is to assess research topics by identifying authors, institutions, and countries that influence research collaboration. The Scopus dataset (from 2002-2024) was obtained from journal articles (n=62) and conference papers (n=76) and visualized using VOSviewer 1.6.20. The analysis outcomes reveal a fluctuating trend with an increasing pattern. The United States, Canada, and China were identified as the top three countries in terms of publications. Meanwhile, the University of Alberta, Université du Québec à Montréal, University of Auckland, and University of Hawaiʻi at Mānoa are the most prolific institutions on this topic, with two authors from the Université du Québec à Montréal, Sadat and Le, being the most productive. The dominant research is related to computational linguistics. Meanwhile, topics such as phonetic posteriograms, integrated frameworks, and artificial intelligence are some of the potential research areas that can be explored in the future. Its implications for exposing the extent to which the development of endangered language revitalization can be accommodated in the field of information technology.
Volume: 15
Issue: 4
Page: 3888-3903
Publish at: 2025-08-01
Show 147 of 1996

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration