Unveiling the influence of back-translation on sentiment analysis of Indonesian hotel reviews
Indonesian Journal of Electrical Engineering and Computer Science
Abstract
This study aims to conduct sentiment analysis on hotel reviews in Indonesian using several machine learning classification algorithms, namely multinomial naive bayes (MNB), support vector machine (SVM), and random forest (RF). The back translation method is employed to generate synthetic data variations that are used as additional data variations in building classification models. This research tests three scenarios based on the datasets used: the original dataset, the dataset resulting from back translation, and the combined dataset of both. The experimental results show that the use of combined data yields better results, with the random forest algorithm standing out as the best performer. Back translation significantly improves model evaluation in sentiment analysis for several reasons, including enriching the dataset with new variations, enhancing model robustness, and increasing dataset complexity. However, the differences in the number of word features among scenarios indicate that back translation also significantly influences the dataset's characteristics.
Discover Our Library
Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.





