Indonesian J our nal of Electrical Engineering and Computer Science V ol. 37, No. 3, March 2025, pp. 2021 2031 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v37.i3.pp2021-2031 2021 Assessing nger printing and machine lear ning appr oaches f or wir eless indoor localization Azkario Rizk y Pratama 1 , Muhammad Ev an Anindya W ah yuaji 1 , Muhammad F adhil Nur Hidayat 1 , Bimo Sunarfri Hantono 1 , Nur Abdillah Siddiq 2 1 Department of Electrical and Information Engineering, Uni v ersitas Gadjah Mada, Y ogyakarta, Indonesia 2 Department of Nuclear Engineering and Engineering Ph ysics, Uni v ersitas Gadjah Mada, Y ogyakarta, Indonesia Article Inf o Article history: Recei v ed Apr 2, 2024 Re vised Sep 29, 2024 Accepted Oct 7, 2024 K eyw ords: Ambient intelligence Bayesian estimation Bluetooth lo w ener gy Fingerprint feature e xtraction Fingerprinting Indoor localization Machine learning ABSTRA CT This paper presents a comparati v e analysis of ngerprinting and machine learn- ing techniques for bluetooth lo w ener gy (BLE)-based localization. T w o nger - printing algorithms, namely ngerprint feature e xtraction (FPFE) and Bayesian estimation (BE), along with v arious machine learning approaches including sup- port v ector re gression (SVR), ensemble learning, and instance-based learning, are in v estig ated. The selection of techniques depends on the a v ailability of training data or the ngerprint database, e xplored in both ideal scenario and real- w orld scenario. In ideal scenario where the system administrator can collect n- gerprint data through users’ de vices, FPFE emer ges as the preferred algorithm, achie ving superior performance with a mean error of 0.50 m. In the conte xt of real-w orld scenario, where data collection from multiple de vices is limited, the system administrator may g ather ngerprint data for localization using one or a fe w specic de vices. Our e xperime nts re v eal that when there is a scarcity of ngerprint data, BE and SVR e xhibit acceptable performance, re aching a mean error of 1.785 m and 1.965 m, respecti v ely . This is an open access article under the CC BY -SA license . Corresponding A uthor: Azkario Rizk y Pratama Department of Electrical and Information Engineering, Uni v ersitas Gadjah Mada Y ogyakarta, Indonesia Email: azkario@ugm.ac.id 1. INTR ODUCTION W ireless positioning and localization techniques are crucial for v arious applications, including indoor na vig ation, asset tracking, and location-based services. Ho we v er , one of the biggest challenges in achie v- ing accurate localization in lar ge deplo yments is terminal heterogeneity , such as the presence of smartphones from dif ferent brands in indoor en vironments [1]. While solutions ha v e been proposed to address the local- ization of heterogeneous de vices indoors, such as the de v elopment of more rob ust algorithms and standardiza- tion ef forts, man y researchers still struggle to achie v e optimal localization performance [2]. In this research, we aim to in v estig ate and e v aluate dif ferent methods to understand their performance under v arying condi- tions. This study seeks to pro vide v aluable recommendations for researchers de v eloping indoor localization systems (ILS). There are essentially tw o prominent methods for localization: ngerprinting and machine l earning- based approaches. Fingerprinting and machine learning methods each ha v e unique strengths and weaknesses. Fingerprinting, a traditional technique, in v olv es creating a reference database of pre-collected signal character - istics from kno wn locations [3]. These ngerprints are then compared to the measured signal characteristics J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
2022 ISSN: 2502-4752 to estimate de vice location. On the other hand, machine learning-based approaches generate a learning model trained on data, which can then predict de vice location using captured signal data. Data collection is a criti- cal aspect of both techniques. Fingerprinting requires e xtensi v e site surv e ys to collect signal characteristics at dif ferent locations, while machine learning-based approaches rely on data capturing with ground truth location information for model training. The latter requires di v erse and representati v e datasets to ensure rob ustness and generalization. The selection of the most appropriate method is generally based on the e v aluation of classication sys- tems. Generally , ngerprinting can pro vide high performance in well-surv e yed en vironments, b ut it may suf fer from decreased accurac y due to changes in en vironmental conditions or the introduction of ne w de vices. Ma- chine learning-based approaches, with their ability to learn comple x relationships between signal characteristics and locations, of f er adaptability and can handle v ariations better . Ho we v er , claims of ef fecti v eness of these tw o approaches cannot be directly compared each other , due to dif ferent metrics and procedures to measure the per - formance of the dif ferent indoor localization proposals [4], [5]. In f act, dif ferent e xperiment conguration and procedures, such as beacon density , may af fect localization performance. While higher density usually bring higher localization accurac y [6], this relationship (i.e., density-accurac y) may not be straightforw ard be yond a certain threshold [7]. Researchers often use dif ferent assessment method to either measure the quality of hard decision or the quality of system scores [8]. The earlier , the classier directly outputs the predicted class label for each instance in the dataset without an y additional information. F or e xample, Subakti et al. [9] achie v ed an a v erage localization error of 0.68 m in a 5 m x 8 m area using 4 beacon nodes. This w ork outperforms other methods that also use the same a v erage error metrics [10]-[13]. Alternati v ely , community also accept the measure of quality of the system. F or e xample, the Bayesian estimator (BE) proposed by F aragher and Harle [7] reported that an error of less than 3 m is achie v ed 95% o v er the time (cumulati v e probability). Researchers often struggle to choose the best method due to dif ferences in metrics used and e xperiment procedures, resulting in suboptimal performance. In this paper , we in v estig ate v arious techniques in both domains (i.e., ngerprinting and machine learning-based) to determine the most suitable approach for indoor localization in specic scenarios. This study contrib utes by demonstrating the adaptability of v arious methods across di v erse conditions and of fering recommendations for implementing ILS. The structure of the paper is as follo ws: secti on 2 pro vides an o v ervie w of related w ork, encompassing bluetooth lo w ener gy (BLE) ngerprinting and machine learning. In section 3 details the obtained data and localization techniques. The e v aluation of v ari ou s scenarios emplo ying di v erse techniques is presented in section 4. Finally , section 5 summarizes and discusses the ndings presented in the paper . 2. RELA TED W ORKS V arious methods ha v e been emplo yed for object localization in a gi v en space, often relying on either ngerprinting or machine learning techniques [14]. Fingerprinting is a technique that in v olv es coll ecting and using unique signatures (or “ngerpri nts”) of a specic area to determine the locati on of an object. Con v ersely , machine learning techniques le v erage data-dri v en models to estimate location based on signal characteristics. 2.1. Machine lear ning Man y researc h e rs ha v e delv ed into the use of machine learni ng methods to determine the location of track ed objects, whether in terms of coordinates or within specic room locations [15], [16]. Bai et al. [17] combine trilateration-based method and ngerprinting-based method before supplying to the machine learning classier . Furthe rmore, the location classication in this study di vides the location into 36 grids (each grid is 1 m × 1 m) and the machine learning classier is assigned to classifying the e xisting data according to the grid. The authors reported a good accurac y of more than 90% with v arious machine learning classier methods such as Nai v e Bayes (NB), sequential minimal optimization (SMO), random forest (RF), BayesNet, and J48. Madurang a and Abe ysek era [18] utilize feed forw ard neural netw ork (FFNN) in classifying a loca- tion. The authors di vide the space into four zones, and assign the input into the zones. The e xperiment tak es place on the rst oor of W estern Michig an Uni v ersity’ s W aldo library . The FFNN model has successfully predicted the location with 86% accurac y . Sthapit et al. [19] conducted e xperiments on indoor positioning using BLE with a machine learning approach, as discussed in their w ork. In their study , the authors emplo yed machine learning techniques such as support v ector machines (SVM) and logistic re gression (LR) to determine the position. Unlik e pre vious studies, the y partitioned the space into sub-areas kno wn as radio maps. SVM Indonesian J Elec Eng & Comp Sci, V ol. 37, No. 3, March 2025: 2021–2031 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 2023 and LR were then utilized to calculate the probability of predicting the radio map based on recei v ed signal strength indicator (RSSI) samples. The estimated position w as subsequently calculated based on the generated probability . Despite the authors reporting a relati v ely lo w a v erage error , it is w orth noting that the en vironment and dataset used in their research were smaller compared to those in other studies lik e [17], [18]. Ale xander et al. [20] use machine learning with re gression tas ks to estimate position . S e v era l methods ha v e been e xplored, including articial neural netw ork (ANN), multiple linear re gression (MLR), RF , and support v ector re gression (SVR). The machine learning models are assigned to generating x position estimation ( ˆ x ) and y position estimation ( ˆ y ) directly based on preprocessed data (RSSI distance, x coordinate, and y coordinate). The best tw o machine learning models were then obtained, namely machine learning re gression for x -coordinate position estimation and machine learning re gression for y -coordinate position estimation. In the testbed with size 4 m x 6 m, Ale xander reported the best model with a mean a mean error of 134.92 cm using SVR. 2.2. Bluetooth lo w ener gy nger printing ILS using beacon ngerprinting ha v e been implemented and e v aluated in se v eral papers. These sys- tems may combine BLE beacons with techniques such as multilateration (ML T) and ngerprinting to determine the location of a mobile de vice indoors. Recent w ork highlights the potential of ngerprinting o v er the ML T with 79.31% accurac y [21]. BLE e v en may outperform global positioning system (GPS) when beacons’ posi- tions are kno wn, the density is suf cient, and data is a v ailable for both calibration and training [7]. F aragher and Harle [7] reported that BLE readings in the s ame coordinate will dif fer o v er time. BE is a method based on probability theory that tak es adv antage on this principle. Probability theory can be implemented in ILS to predict track ed de vice (TD) coordinate from RSSI readings. In a 600 m 2 area with 19 beacons, accurac y reaches less than 3 m in 95% of the time. If 7 beacons is utilized instead, accurac y reaches less than 8.5 m in 95% of the time and less than 3.1 m in 66% of the time. Subakti et al. [9] introduce ngerprint feature e xtraction (FPFE) method, which utilizes ngerprint- ing and pro vides tw o e xtraction choices: autoencoder (AE) or principal component analysis (PCA). In an 8 m × 5 m area with four beacons, FPFE with AE e xtraction attains the highest accurac y mean of 0.70 m, while with PCA e xtraction, the top accurac y mean in the same space is 0.68 m. It is important to note that these accurac y v alues are reported based on a single mobile phone. It is possible that the performance could be af fected if there are v ariations, such as using a dif ferent mobile phone for ngerprint collection. The aforementioned studies of fer v aluable insights into emplo ying dif ferent methods with BLE tech- nology . Ho we v er , directly comparing their performance could be more challenging due to dif ferent en viron- ments, dif ferent test areas, and dif ferent performance metrics. Potort ` ı et al. [4] this study aims to o v ercome this challenge by implementing prospecti v e approaches, particularly those proposed by F aragher and Harle [7] and Subakti et al. [9], using a consistent setup and tw o standard scenarios. Finally , this study pro vides guidance on best practices for readers. 3. METHOD 3.1. Data collection Gi v en a set of beacons B = { b 1 , . . . , b N } installed in the testbed, we collect M RSSI during database collection phase in reference points (RPs) as sho wn in Figure 1, where M = { b R P 1 , . . . , b R P N } . In the localiza- tion phase, we collect L RSSI in the same points using a TD, where L = { b T D 1 , . . . , b T D N } . W e use three mobile phones as TDs from v arious brands and models, including Realme C20, Realme 5 Pro, and Samsung Galaxy A32. W e de v elop an android application that reads transmitted BLE signal with interv al 2,000 ms. In our testbed, we install N=6 BLE beacons transmitting signal po wer -4 dBm with adv ertis- ing interv al 1,500 ms. W e collect 211 sampling data using three mobile phones in 42 RPs. Each RP is separated 1 m with other RPs. In total, we collect 26 , 586 instances. These data are di vided as ngerprint database (or training), M, and localization test, L. Figure 1 ilustrates the testbed in our li ving lab . W e consider tw o scenarios of de vice utilization during database collection and tes ting phase, namely: 1. Ideal scenario: ngerprinting and localization using whole kno wn de vices (i.e., M and L are collected using three de vices). W e sample 80% of the whole collected dataset (stratied random sampli ng, resulting M = 21 , 268 instances) for ngerprinting or training machine learning models, and use the rest of the data for localization ( L = 5 , 318 instances). The aim of this scenario is to simulate a condition where the system kno ws all types of mobile phones in adv ance. Assessing ng erprinting and mac hine learning appr oac hes ... (Azkario Rizk y Pr atama) Evaluation Warning : The document was created with Spire.PDF for Python.
2024 ISSN: 2502-4752 2. Real-w orld scenario: ngerprinting with 1 de vice and e v aluation with other de vices (i.e., M are collected using a de vice and L are collected using tw o other de vices). W e select one de vice as ngerprinting de vice and randomly sample 80% of the collected data (resulting M = 7 , 089 instances) as ngerprint or training data. W e thus localize the TDs using the rest of dataset (resulting L = 19 , 497 instances). W e repeat this process using the other tw o de vices as ngerprint collectors. This scenario results three localization errors (each uses a mobile de vice as a ngerprint collector). The aim of this scene is to simulat e typical / real-life scenario where the system does not kno w in adv ance all types of mobile phones. Figure 1. Floor plan of testing en vironment 3.2. Finger printing Fingerprinting is a technique used to determine the location of an object or TDs within an indoor en vi- ronment. The location is determined by comparing the signal characteristic collected by TD to the ngerprints stored in the database. Once a matching ngerprint is found, the system estimates the TD’ s location based on the kno wn location associated with that ngerprint. In this research, we select a method based on probabilistic models to estimate the current location and another based on dimensionality reduction. A probabilistic model, such as a BE, is chosen due to its superior capability in handling uncertainty , particularly in dynamic en vi- ronments with v arying signal propag ation. Con v ersely , a dimensionality reduction technique is emplo yed to enhance generalization, lter out noise, and retain only the most informati v e features. The detailed discussion of these methods follo ws. 3.2.1. Bay esian estimator BE is ngerprinting technique based on probabilistic model. W e be gin the process by doing normal- ization using min-max scaling. Min-max scaling scal es and transforms numerical features of a dataset within a range, between 0 and 1. The process in v olv es det ermining the minimum and maximum v alues of the feature and then scaling each data point proportionally , as sho wn in (1). This step benets in processing and comparing data by machine learning algorithms. x nor m = x x min x max x min (1) Ne xt, we calculate euclidean distance between mean RSSI captured by TD and mean RSSI captured in each RP using (2), dist ( B , L, M ) = v u u t N X i =1 ( L ( b i ) M ( b i )) 2 N (2) where L is signals recei v ed during localization by a track ed de vice TD, L = { b T D 1 , . . . , b T D N } , and M is refer - ence RSSI collected during ngerprinting phase, M = { b R P 1 , . . . , b R P N } . Indonesian J Elec Eng & Comp Sci, V ol. 37, No. 3, March 2025: 2021–2031 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 2025 Finally , we calculate lik elihood based on the distance using Bayesian lik elihood function as (3): p = exp dist 2 2 σ 2 (3) where σ is standard de viation that represents noises during ngerprinting. Prior distrib ution is assumed to be a constant as localization is on one shot, not tracking. Once Bayesian lik elihood v alues are obtained, location of track ed de vice is estimated using maximum a posteriori (MAP). 3.2.2. Finger print featur e extraction FPFE aims t o e xtract characteristics of beacon ngerprint using AE or PCA as proposed by Subakti et al. [9]. W e be gin the ngerprinting process by normalizing ngerprint data as in (1). Subse- quently , FPFE e xtracts features as a dimensionalit y reduction process. Namely , the an initial set of data in high-dimensional space is projected to lo w dimension space without losing important information Fingerprint data for an RP are of the shape of 4 × 200. The y are transformed to be of the shape of 1 × 800 Autoencoder for FPFE. AE is a type of ANN that encodes higher -dimension input features to be a lo wer -dimension inter - nal represent ation called the code. An AE model consists of three parts: encoder , code, and decoder . The encoder compresses the input features to generate the code, and the decoder then reconstructs the input from the generated code. In this w ork, the BLE beacon node RSSI v alues from a RP are used as input features (1 × 800 dimension). The y are encoded as a code (1 × 8 dimension), which in turn is decoded as output features (1 × 800 dimension). Figure 2 illustrates the architecture of the AE used in this w ork. The AE model is associated with the Adapti v e moment estimation (Adam) optimizer [22] and the mean squared error (MSE) loss function. The number of epochs of training the AE model is 1,000. Figure 2. The AE structure adopted by the FPFE method [9] Mink o wski distance is then used as the ngerprint similarity measurement to select k RP candidates with the smallest distances. Mink o wski distance can be calculated with as (4) and (5): D ( x, y ) = ( d X i =1 | x i y i | p ) 1 /p (4) D ( L, M ) = ( N X i =1 | L ( b i ) M ( b i ) | p ) 1 /p (5) where p is order of Mink o wski distance ( p = 2 is Euclidean distance). In the FPFE methods using AE feature e xtraction, the p v alue is 8, because the feature e xtraction output of the AE is 8 features. By calculating the Mink o wski distance between features of the TD and all RPs, k RPs with k smallest Mink o wski distances are selected. The y are called RP candidates whose positions are used to estimate the TD’ s position The track ed de vice location is nally calculated by a v eraging coordinates of the k selected RP can- didates. The TD’ s position (x, y) is calculated simply as the centroid of the k RP candidates as in (6) where ( x i , y i ) is the coordinate of selected R P i with the k smallest distance. ( x, y ) = 1 k k X i =1 ( x i , y i ) (6) Assessing ng erprinting and mac hine learning appr oac hes ... (Azkario Rizk y Pr atama) Evaluation Warning : The document was created with Spire.PDF for Python.
2026 ISSN: 2502-4752 3.3. Machine lear ning models As an alternati v e to ngerprinting, we in v estig ate machine learning methods for TD localization. As the localization outcome is greatly inuenced by preprocessing, we meticulously preprocess the input data through multiple phases outlined belo w . W e thus utilize machine learning techniques specie d for re gression tasks that in v olv e distance calculation and some techniques based on ensemble learning. - Con v ersion signal s trength to miliw att. The recei v ed signal strength is quantied in decibels milliw att ( dB m ). dB m is a unit of measurement in a log arithmic unit that compares the ratio of P , the po wer being measured in mW , with the reference po wer of one milliw att ( 1 mW ) as sho wn in (7). dB m = 10 · l og 10 P ( mW ) 1 mW (7) In approaches based on machine learning, we con v ert dBm to linear v alue in P ( mW ) as sho wn in (8) to mitig ate the issue of machine learning models that are sensiti v e to the scale of features. P ( mW ) = 1 ( mW ) · 10 P ( dB m ) 10 (8) - Min-max scaling. W e transform the signal strength within the range [0 , 1] by subtracting the minimum v alue of the RSSI and di viding by the range between the maximum and minimum RSSI. This normalization aids machine learning algorithms that rely on distance calculations, promoting a more equitable inuence of each feature in the subsequent analyses. - Splitting train test data. A signicant portion of the data, 80% is allocated as the training set, while reserving the remaining portion for testing. This approach helps pre v ent o v ertting, where the model becomes too tailored to the training data and f ails to generalize ef fecti v ely . 3.3.1. Support v ector r egr essor This technique is an e xtension of the SVM algorithm for re gression problems. SVR w orks by nding a h yperplane that best ts the data within a dened tube (mar gin) while allo wing for a certain le v el of error [23]. It le v erages the k ernel trick to map the data into a higher -dimensional space, making it possible to capture comple x relationships in the data. W e use polynomial k ernel with de gree of 3, ϵ = 0 . 1 , and re gularization parameter C = 100 . 3.3.2. K-neighbors r egr essor K-neighbors re gressor is an instance-based learning algorithm, also kno wn as lazy learning. K-nearest neighbors (k-NN) algorithms mak e predictions based on the similarity of instances in the training data. W e use euclidean distance. F or re gression tasks, the predicted v alue for the tar get data point i s often the mean (or median) of the tar get v alues of its k nearest neighbors. In this w ork we use k = 3 . 3.3.3. Random f or est r egr essor This technique is an ensemble learning algorithms that operates by constructing man y decision trees during training and outputs mean prediction of the indi vidual trees [24]. In our w ork, the number of trees is 100. each tree is b uilt using a dif ferent bootstrap s ample dra wn from the original dataset during training process. The sample is randomly pick ed up with replacement while the number of samples to dra w is the same with the number of dataset. F or each bootstrap sample and at each node of each decision tree, the algorithm selects the best split among the randomly chosen subset of features. T o measure the quality of a split, we use MSE. The tree continues to split until a stopping criterion is met. 3.3.4. Gradient boosting Gradient boosting (GBoost) w orks by sequentially adding weak lea rners (i.e., re gression trees), each focusing on correcting the errors of the pre vious model [25]. In this study , we use 100 estimators with a maximum depth of 15. It starts by training the rst tree on the data and updating weights based on prediction errors. W e use MSE function to calculate errors during training. Subsequent trees are trained to focus on the pre viously misclassied e xamples with learning rate 0.1. The nal prediction is made by aggre g ating the predictions of all the trees. This approach gradually learns comple x patterns by combining multiple simple trees. Indonesian J Elec Eng & Comp Sci, V ol. 37, No. 3, March 2025: 2021–2031 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 2027 3.4. P erf ormance measur es W e calculate the localization error by computing the euc lidean distance between the predicted posit ion ( x , y ) and the kno wn position of RPs ( x, y ) . W e then cal culate the mean and standard de viation of the error . W e also pro vide cumulati v e distrib ution function (CDF) 95%. Finally , we generate and plot a CDF for a normal distrib ution based on the mean and standard de viation calculation for comparing the performance of the approaches. 4. RESUL TS AND DISCUSSION 4.1. Ideal scenario In an ideal condition, each track ed de vice should ha v e been used for data collection in b uilding a ngerprint database. In other w ords, the ngerprinting process need to be repeated whene v er a ne w track ed de vice will be used in the loca lization. While this is ideal, the repeated process of collecting ngerprint requires much ef fort. This scenario accommodates such condition; we tune models based on some part of collected data and test with the other part of the data. In this scenario, we use 7 , 089 instances from each mobile phone (i.e., 80% of collected dataset) for ngerprinting or training machine learning models ( M = 21 , 268 from 3 phones). W e test on 1 , 773 instances collected from each mobile phone ( L = 5 , 318 from 3 phones). When ngerprinting data source is from 3 mobile phones, best results achie v ed using FPFE, reaching a v erage error of 0.50 m, as sho wn in T able 1. Using BE, the mean error may reach a v erage error of 1.636 m. There are se v eral possible e xplanations for this result. First, FPFE has sucessfully e xtracted features from ngerprints collected by three dif ferent mobile pho ne s. This mak es the localization using FPFE resulting lo wer error . Second, BE struggles to generalize the collected data, particularly when constructing a single BE model from a dataset g athered using three dif ferent de vices. The challenges lie in estimating param- eters lik e the prior distrib ution and lik elihood function, which characterize the data di strib ution. This limitation results in suboptimal outcomes and imprecise inferences. As sho wn in T able 1, the w orst localization is resulted using SVR with mean localization error of 2.057 m. It is probable the polynomial k ernel with de gree of 3 does not learn the training data and generalize to w ard testing data. The other machine learning techniques tend to perform better than SVR on localization. Ensemble models, i.e., RF a n d GBoost, result mean localization error of 0.847 m - 0.909 m, while K-neigbors re gressor pro vides slightly lo wer mean error of 0.829 m b ut higher CDF 95%. Among the considered machine learning methods, RF is more lik ely to perform better in the tes t, as 95% probabi lity of localization error is belo w 2.488 m. RF slightly outperforms GBoost wit h 95% of localization error less than 2.637 m. RF b uild multiple decis ion trees in parallel that learn the same data. The best features are selected from these trees and will be combined and combined to e xhibit predictions. Compared to another ensemble learning, i.e., GBoost, trees are b uilt sequentially . The most recent tree model will be used to predict the output. In scenarios lik e localization, where signal data is subject to uct uations from f actors lik e signal f ading and de vice di v ersity , RF slightly outperforms GBoost, primarily o wing to i ts utilization of the most ef fecti v e features rather than relying solely on the latest model. T able 1. Localization error in ideal scenario (i.e., 3 track ed de vices, each has been ngerprinted) in m 2 Mean Std. De v . CDF 95% (m) BE [7] 1.636 1.437 4.000 FPFE [9] 0.502 0.815 1.844 SVR 2.057 0.988 3.683 K-neigbors re gressor 0.829 1.220 2.836 RF 0.847 0.997 2.488 GBoost 0.909 1.051 2.637 4.2. Real-w orld scenario In a real-w orld scenario, we operate under the assumption that g athering ngerprint data from e v ery mobile phone is unfeasible. Therefore, we can rely only on ngerprint dataset collected from a single mobile phone. W e use M=7,089 instances collected from one mobile phone as ngerprint data, and test on L=19,497 instances (from three mobile phones, i.e., 1,773; 8,862; and 8,862 instances, respecti v ely). Assessing ng erprinting and mac hine learning appr oac hes ... (Azkario Rizk y Pr atama) Evaluation Warning : The document was created with Spire.PDF for Python.
2028 ISSN: 2502-4752 T able 2 sho ws the optimal outcomes obtained with BE, with a mean error of 1,765 m depending on the de vice utilized as the ngerprint collector . It is probable that BE can assume a distrib ution from signals collected by one mobile phone collector . This distrib ution appears to resemble the dataset distrib ution from other mobile phones. Con v ersely , FPFE, which e xcelled in the earlier scenario (i.e., Ideal scenario), no w yields the poorest localization, with a mean error of 3.2 m. This outcome could be attrib uted to the f act that FE e xtracts features e xclusi v ely from a particular phone. The features e xtracted are not applicable to datasets from other mobile phones. Examining machine learning techniques, SVR demonstrates relati v ely superior performance com- pared to ensemble methods and K-neigbors r e gres sor , as sho wn in T able 2. A plausible reason is that SVR operates on h yperplanes that ef fecti v ely t the data. When trained with RSSI data e xclusi v ely collected from one phone, SVR can learn its patterns and apply them to ne w datasets collected from other mobile phones. T able 2. Localization error in real w orld scenario (i.e., 3 track ed de vices, 1 ngerprint collector) in m 2 Fingerprint collector Phone1 Phone2 Phone3 BE [7] Mean err . 1.965 1.785 1.956 Std. De v . 1.557 1.582 1.450 CDF 95% 4.526 4.386 4.341 FPFE [9] Mean err . 3.048 3.257 3.283 Std. De v . 1.440 1.584 1.582 CDF 95% 5.417 5.863 5.885 SVR Mean err . 2.118 1.965 2.239 Std. De v . 1.018 0.982 0.982 CDF 95% 3.792 3.579 3.855 K-neigbors re gressor Mean err . 2.711 2.439 2.294 Std. De v . 1.687 1.780 1.275 CDF 95% 5.485 5.366 4.391 RF Mean err . 2.527 2.787 2.335 Std. De v . 1.492 1.577 1.270 CDF 95% 4.981 5.381 4.425 GBoost Mean err . 2.383 1.907 2.007 Std. De v . 1.262 1.294 1.238 CDF 95% 4.458 4.035 4.044 T o better compare localization performance, we ha v e summarized and plotted the cumulati v e dist ri- b ution function in Figure 3. What is particularly noticeable in this gure is the pattern of localization accurac y across dif ferent scenarios. Notably , the localization accurac y in an ideal scenario in Figure 3(a) is generally superior to that in a real-w orld scenario in Figure 3(b), as indicated by a lo wer CDF of localization error . This observ ation suggests that localization performance is signicantly inuenced by the a v ailabi lity and quality of the training data or ngerprint database. (a) (b) Figure 3. CDF of localization error , (a) ideal scenario and (b) real w orld scenario Indonesian J Elec Eng & Comp Sci, V ol. 37, No. 3, March 2025: 2021–2031 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 2029 5. CONCLUSION One common challenge encountered in the de v elopment of ILS is ensuring accurac y and compatibil ity across a range of de vices. T o address this issue, researchers ha v e e xplored multiple methodologies to enhance localization performance while accommodating de vice di v ersity . Ho we v er , these studies often emplo y dis- parate datasets and setups to construct their indi vidual proof of concepts. In our in v estig ation, we address this g ap by e v aluating se v eral promising methodologies within a unied test en vironment. Specically , we com- pare tw o ngerprinting algorithms: one based on feature e xtraction (referred to as FPFE) and another based on probability theory (referred to as BE). Furthermore, we assess v arious machine learni ng approaches, includ- ing SVR, which utilizes geometric boundaries (h yperplanes), ensemble learning, and instance-based learning. The ef fecti v eness of these methods depends on f actors lik e the a v ailability of training data or the ngerprint database, as e xplored through the Ideal scenario and Real-w orld scenario. Our ndings indicate that outcomes can v ary based on the specic conditions. Therefore, in this study , we propose methods suitable for particular circumstances. In the conte xt of three ngerprint collectors (Ideal scenario), the system administrator has the op- tion to g ather ngerprint data through users’ de vices. When such a scenario occurs, wherein the algorithm is pro vided with all possible data v aria tions, it is advisable to emplo y FPFE. FPFE demonstrates superior perfor - mance compared to alternati v e algorithms, achie ving a mean error of 0.50 m. This indicates that FPFE has the capability to ef fecti v ely e xtract features from ngerprint data. While FPFE e xcels in ngerprinting, the perfor - mance of ngerprinting with BE is subpar . BE struggles to achie v e generali ty in estimating common parameters from ngerprint data collected across di v erse de vices. FPFE, based on AE as outlined in this study , demands substantial resources for feature e xtraction. In situations where c o m putational resources are limited, opting for machine learning techniques becomes a viable alternati v e. Machine learning-based approaches pro vide adaptability and scalability by harnessing algorithmic po wer to deduce locations. W e observ e that ensemble learning, e x emplied by RF and GB, outperforms other machine learning techniques. Specically , RF benets from random feature selection across multiple trees. In the case of one ngerprint coll ector (Real-w orld scenario), there is limitation in collecting data using other de vices. In other w ords, system admin may collect ngerprint data for localization using one or a fe w de vice (b ut not all de vices). When the ngerprint data is a v ailable, BE and SVR lead the performance, according to our e xperiment. This indicates that BE may set parameters that describe the distrib ution of data collected using a mobile phone. Such model is then applicable the other users’ de vices that ha v e the same characteristic. Machine learning techniques may also be an option in localization while there is a limitation in a ne w ngerprint collection. When only a batch data from a single data source (i.e., single mobile phone) with a specic characteristic is a v ailable, SVR might be chosen. SVR performs better due to its ability to set h yperplane using the data from one source. In our future w ork, we intend to assess the asymmetric distrib ution po wer t ransmission in each beacon. A symmetric conguration might enhance performance as a measure to counteract the multipath f ading ef fect. Furthermore, there is potential to create a single BE model for each ngerprinting collector . These indi vidual BE models could then be emplo yed and collecti v ely contrib ute to the nal decision. Contrary to this approach, in our current study , a single generic BE model w as constructed using ngerprint data from three distinct de vices. A CKNO WLEDGEMENT The authors ackno wledged the funding support from Doctoral Competenc y Impro v ement Program Uni v ersitas Gadjah Mada Number 7743/UN1.P .II/Dit-Lit/ PT .01.03/2023. The authors thank to M. Nauv al Rai and Rasyid Aulia Alba for their assistance in the data collection process. REFERENCES [1] F . Dwiyasa and M. H. Lim, A surv e y of problems and approaches in wireless-based indoor positioning, in 2016 International Confer ence on Indoor P ositioning and Indoor Navigation (IPIN) , Oct. 2016, pp. 1–7, doi: 10.1109/IPIN.2016.7743591. [2] E. Jimenez and R. W ei, “Indoor localization of ubiquitous heterogeneous de vices, in Pr oceedings of the 2013 IEEE 17th International Confer ence on Computer Supported Cooper ative W or k in Design (CSCWD) , Jun. 2013, pp. 698–703, doi: 10.1109/CSCWD.2013.6581045. Assessing ng erprinting and mac hine learning appr oac hes ... (Azkario Rizk y Pr atama) Evaluation Warning : The document was created with Spire.PDF for Python.
2030 ISSN: 2502-4752 [3] A. A. Khudhair , S. Q. Jabbar , M. Q. Sulttan, and D. W ang, “W ireless indoor localization systems and techniques: surv e y and comparati v e study , Indonesian J ournal of Electrical Engineering and Computer Science (IJEECS) , v ol. 3, no. 2, pp. 392–409, Aug. 2016, doi: 10.11591/ijeecs.v3.i2.pp392-409. [4] F . Potort ` ı et al. , “Comparing the performance of indoor localization systems through the EvAAL frame w ork, Sensor s , v ol. 17, no. 10, p. 2327, Oct. 2017, doi: 10.3390/s17102327. [5] Y . Ibnatta, M. Khaldoun, and M. Sadik, “Exposure and e v aluation of dif ferent indoor localization system s, in Lectur e Notes in Networks and Systems , v ol. 216, 2022, pp. 731–742. [6] Y . Zhuang, J. Y ang, Y . Li, L. Qi, and N. El-Sheimy , “Smartphone-based indoor localization with bluetooth lo w ener gy beacons, Sensor s , v ol. 16, no. 5, p. 596, Apr . 2016, doi: 10.3390/s16050596. [7] R. F aragher and R. Harle, “Location ngerprinting with bluetooth lo w ener gy beacons, IEEE J ournal on Selected Ar eas in Com- munications , v ol. 33, no. 11, pp. 2418–2428, 2015. [8] L. Ferrer , Analysis and comparison of classication metrics, arxiv , 2022, [Online]. A v ailable: http://arxi v .or g/abs/2209.05355. [9] H. Subakti, H.-S. Liang, and J.-R. Jiang, “Indoor localization with ngerprint feature e xtraction, in 2020 IEEE Eur asia Confer ence on IO T , Communication and Engineering (ECICE) , Oct. 2020, pp. 239–242, doi: 10.1109/ECICE50847.2020.9301994. [10] T .-M. T . Dinh, N.-S. Duong, and K. Sandrase g aran, “Smartphone-based indoor positioning using BLE iBeacon and reliable lightweight ngerprint map, IEEE Sensor s J ournal , v ol. 20, no. 17, pp. 10283–10294, Sep. 2020, doi: 10.1109/JSEN.2020.2989411. [11] M. Li, L. Zhao, D. T an, and X. T ong, “BLE ngerprint indoor localization algorithm based on eight-neighborhood template match- ing, Sensor s (Switzerland) , v ol. 19, no. 22, 2019, doi: 10.3390/s19224859. [12] S. Subedi, H. S. Gang, N. Y . K o, S. S. Hw ang, and J. Y . Pyun, “Impro ving indoor ngerprinting positioning with af nity propag ation clustering and weighted centroid ngerprint, IEEE Access , v ol. 7, pp. 31738–31750, 2019, doi: 10.1109/A CCESS.2019.2902564. [13] P . Mart ins, M. Abbasi, F . Sa, J. Celiclio, F . Mor g ado, and F . Caldeira , “Intelligent beacon location and ngerprinting, Pr ocedia Computer Science , v ol. 151, pp. 9–16, 2019, doi: 10.1016/j.procs.2019.04.005. [14] Y . Zhuang, C. Zhang, J. Huai, Y . Li, L. Chen, and R. Chen, “Bluetooth localization technology: principles, applications, and future trends, IEEE Internet of Things J ournal , v ol. 9, no. 23, pp. 23506–23524, Dec. 2022, doi: 10.1109/JIO T .2022.3203414. [15] A. R. Pratama, A. Lazo vik, and M. Aiello, “Of ce multi-occupanc y detection using BLE beacons and po wer meters, in 2019 IEEE 10th Annual Ubiquitous Computing , Electr onics & Mobile Communication Confer ence (UEMCON) , Oct. 2019, pp. 0440–0448, doi: 10.1109/UEMCON47517.2019.8993008. [16] A. Nessa, B. Adhikari, F . Hussain, and X. N. Fernando, A surv e y of machine learning for indoor positioning, IEEE Access , v ol. 8, pp. 214945–214965, 2020, doi: 10.1109/A CCESS.2020.3039271. [17] L. Bai, F . Cira v e gna, R. Bond, and M. Mulv enna, A lo w cost indoor positioning system using bluetooth lo w ener gy , IEEE Access , v ol. 8, pp. 136858–136871, 2020, doi: 10.1109/A CCESS.2020.3012342. [18] M. W . P . Madurang a and R. Abe ysek era, “Bluetooth lo w ener gy (BLE) and feed forw ard neural netw ork (FFNN) based indoor positioning for location-based IoT applications, International J ournal of W ir eless and Micr owave T ec hnolo gies , v ol. 12, no. 2, pp. 33–39, Apr . 2022, doi: 10.5815/ijwmt.2022.02.03. [19] P . Sthapit, H.-S. Gang, and J.-Y . Pyun, “Bluetooth based indoor positioning using machine learning algorithms, in 2018 IEEE International Confer ence on Consumer Electr onics - Asia (ICCE-Asia) , Jun. 2018, pp. 206–212, doi: 10.1109/ICCE-ASIA.2018.8552138. [20] I. Ale xander and G. P . K usuma, “Predicting indoor position using bluetooth lo w ener gy and machine learning, International J ournal of Scientic and T ec hnolo gy Resear c h , v ol. 8, no. 9, pp. 1661–1667, 2019. [21] A. H. Elhussein y , M. Zamzam, and Y . Zaghloul, “Precision localization: an e xperimental study on BLE ngerprinting and trilat- eration with ESP32, in 2023 International Confer ence on Advances in Electr onics, Communication, Computing and Intellig ent Information Systems (ICAECIS) , Apr . 2023, pp. 60–65, doi: 10.1109/ICAECIS58353.2023.10170249. [22] J. L. Ba and D. P . Kingma, Adam: a method for stochastic optimization, 3r d International Confer ence on Learning Repr esenta- tions, ICLR 2015 - Confer ence T r ac k Pr oceedings , pp. 1–15, 2015. [23] C.-C. Chang and C.-J. Lin, “LIBSVM: a library for support v ector machines, A CM T r ansactions on Intellig ent Systems and T ec h- nolo gy , v ol. 2, no. 3, pp. 1–27, Apr . 2011, doi: 10.1145/1961189.1961199. [24] L. Breiman, “Random forest, Mac hine learning , v ol. 45, pp. 5–32, 2001, doi: 10.1023/a:1010933404324. [25] T . Hastie, R. T ibshirani, and and J. H. F . J. H. Friedman, “The elements of statistical learning: data mining, inference and prediction, Mathematical Intellig encer , v ol. 27, no. 2, pp. 83–85, 2005, doi: 10.1007/BF02985802. BIOGRAPHIES OF A UTHORS Azkario Rizk y Pratama completed the Ph.D. de gree in Computer Science from the Uni v ersity of Groningen, The Netherlands, in 2020. He serv es as an Assistant Professor in the De- partment of Electrical and Information Engineering at Uni v ersitas Gadjah Mada, Indonesia. His main research interests include Ambient intelligence, conte xt-a w areness, and mobile computing. He can be contacted at email: azkario@ugm.ac.id. Indonesian J Elec Eng & Comp Sci, V ol. 37, No. 3, March 2025: 2021–2031 Evaluation Warning : The document was created with Spire.PDF for Python.