TELK OMNIKA T elecommunication, Computing, Electr onics and Contr ol V ol. 23, No. 6, December 2025, pp. 1729 1742 ISSN: 1693-6930, DOI: 10.12928/TELK OMNIKA.v23i6.27500 1729 Object detection and tracking with decoupled DeepSOR T based on α β lter Lakhdar Djelloul Mazouz, Abdessamad Kaddour T r ea, T ar ek Amiour, Abdelaziz Ouamri Image and Signal Laboratory (LSI), Uni v ersity of Sciences and T echnology of Oran (UST O-MB), Oran, Algeria Article Inf o Article history: Recei v ed Aug 25, 2025 Re vised Oct 5, 2025 Accepted Oct 19, 2025 K eyw ords: Deep learning DeepSOR T High order tracking accurac y Object detection Object tracking V ideo surv eillance ABSTRA CT W ith the rapid gro wth of the population, the demand for autonomous video surv eillance systems has substantially increased. Recently , articial intelligence has played a k e y role in the de v elopment of these systems. In this paper , we present an enhanced autonomous system for object detection and tracking in video streams, tailored for transportation and video surv eillance applications. The system comprises tw o main stages: detection stage; this stage emplo ys you only look once (Y OLO)v8m, trained on the KITTI dataset, and is congured to detect only pedestrians and cars. The model achie v es an a v erage precision of 97.3% and 87.1% for cars and pedestrians classes respecti v ely , resulting a nal mean a v erage precision (mAP) of 92.2%. T racking stage; the tracking compo- nent utilizes the DeepSOR T algorithm, which original ly incorporates a Kalman lter for motion prediction and performs data association using cosine and Ma- halanobis distances to maintain consistent object identiers across f rames. T o impro v e tracking performance, we introduce tw o k e y modic ations to the orig- inal DeepSOR T : architecture modication and Kalman lter replacement. The tracking tests are carried out on KITTI and MO TChallenge Benchmarks. The nal order tracking accurac y (HO T A) scores achie v e 77.645 and 54.019 for Cars and Pedestrians class es respecti v ely in the KITTI-Benchmark and 45.436 for the Pedestrians class in the MO TChallenge-Benchmark. This is an open access article under the CC BY -SA license . Corresponding A uthor: Lakhdar Djelloul Mazouz Signals and Image Laboratory (LSI), Uni v ersity of Sciences and T echnology of Oran (UST O-MB) Bir El Djir 31000, Oran, Algeria Email: lakhdar .djelloul@uni v-usto.dz 1. INTR ODUCTION Real-time object detection is a critical component in applications such as autonomous v ehicles, robotics, and video surv eillance. Among the leading algorithms, you only look once (Y OLO) stands out for its optimal balance between speed and accurac y , enabling ef cient object recognition in static images. Since its introduc- tion, se v eral v ariants of Y OLO ha v e been de v eloped, each impro ving upon its predecessor to enhance perfor - mance and address pre vious limitations. T o public safety and minimize risk, we ha v e de v eloped an intelligent, autonomous video surv eillance system capable of real-time detection and tracking of pedestrians and v ehicles. The system operates in tw o stages: object detection using the Y OLO algorithm, follo wed by tracking with an optimized v ersion of DeepSOR T . During recent years, e xtensi v e research has been conducted in the eld of object detection, leading to the de v elopment of v arious techniques. These techniques can be broadly cate gorized into tw o groups: tradi- J ournal homepage: http://journal.uad.ac.id/inde x.php/TELK OMNIKA Evaluation Warning : The document was created with Spire.PDF for Python.
1730 ISSN: 1693-6930 tional techniques, based on color [1]-[3], te xture [2], morphology [4], edge detection [3], and classical machine learning [5], [6], and adv anced techniques using deep learning and articial intelligence [7]–[12]. In this w ork, we focus on the detection and tracking of tw o classes of objects: pedestrians and cars . W e train our Y OLO model on the KITTI dataset, which achie v es a high mAP score. W e rst apply object detecti on, using Y OLOv8, to identify pedestrians and v ehicles. Then, for t racking, we emplo y DeepSOR T , an enhanced v ersion of the simple online real-time tracking (SOR T) algorithm. Rather than using a single DeepSOR T instance for all object classes, we implement a decoupled approach assigning a dedicated DeepSOR T track er to each class. This reduces int er -class confusion and minimizes ID switches. T o optimize computational ef cienc y , we replace the traditional Kalman lter with a simpler and f ast α β lter . This paper is or g anized as follo ws. Section 2 presents a comprehensi v e theoretical basis. Firstly we present the dataset used follo wed by an introduction of the Y OLOv8 detection algorithm and describe ho w it w as trained to enhance its performance. Ne xt, we present the DeepSOR T track er along with tw o modications we applied to impro v e its tracking capabilities. Section 3 is de v oted to the detailed description of the steps of the proposed method. Section 4 presents the dif ferent metrics emplo yed to assess the performance of the detection algorithm and the v arious DeepSOR T based track ers. Section 5 is primarily dedicated to reporting the e xperimental results used to e v aluate the performance of the detection algorithm and three v ersions of the DeepSOR T track er (the original and the tw o modied v ersions). Finally , section 6 concludes the paper . 2. THE COMPREHENSIVE THEORETICAL B ASIS 2.1. Dataset Y OLOv8 w as initially trained on the common objects in conte xt (COCO) dataset, which contains approximately 330000 annotated images, spanning 80 object cate gories. Ho we v er , our application focuses on de v eloping an autonomous video surv eillance system dedicated to monitoring pedestrians and v ehicles, that is to say tw o object classes: person and car . This narro wer scope necessitates the use of a more tar geted dataset. Consequently , we opted for the KITTI dataset, which includes only 8 object cate gories, among them the tw o classes rele v ant to our study: pedestrian and car [13], [14]. F or object detection training, we utilized the KITTI object detection dataset , comprising 7481 training images and 7518 test im ages, with a total of 80256 labeled instances. All images are in color and stored in PNG format. F or object tracking e v aluation, we empl o y e d the KITTI tracking dataset [15] . It consists of 21 training sequences and 29 test sequences, totaling 8008 color images in PNG format. Among these, 31 sequences comprise images with a resolution of 1242 × 375 pix els, while the remaining sequences contain images with similar dimensions (i.e., 12 xx × 37 x ). T o ensure ef cient inference and alignment with benchmark standards, our system is congured to detect and track only tw o object classes: car and person. Notably , the KITTI benchmark restricts its e v aluation to these specic cate gories using the T rackEv al-Master p ython source codes [16]. T o generalize our application, another benchmark e v aluation is used (MO TChallenge [17]). 2.2. Object detection As pre viously mentioned, this study emplo ys the Y OLO algorithm for object detection. Y OLO is a single-stage detector that partitions the input image into a grid of equally sized cells, typically of dimension ( N × N ). Each cell is responsible for predicting the presence and location of objects within its boundaries. In this w ork, we utilize the Y OLOv8m model, released by Ultralytics in January 2023. The Y OLOv8 f amily comprises v e v ariants designed for tasks such as object detection, se gmentation, and classication: Y OLOv8n (Nano), Y OLOv8s (Small), Y OLOv8m (Medium), Y OLOv8l (Lar ge), and Y OLOv8x (Extra lar ge) [18], [19]. Y OLOv8n is the most lightweight and f astest, making it suitable for real-time applications with limited com- putational resources. In contrast, Y OLOv8x of fers the highest accurac y , albeit with increased computational cost and inference time. 2.3. Object tracking Online object tracking systems commonly rely on the SOR T algorithm, which consists of four prin- cipal components: detection, estimation, data association, identity management (creation and termination). Despite its ef cienc y , SOR T f aces signi cant limitations in maintaini ng consistent object identiers (IDs), par - ticularly when objects reappear follo wing occlusion. In such cases, the algorithm often assigns a ne w ID, treating the reappearing object as entirely ne w . T o address this limitation, the DeepSOR T algorithm introduces TELK OMNIKA T elecommun Comput El Control, V ol. 23, No. 6, December 2025: 1729–1742 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1731 a more rob ust tracking mechanism by incorporating a deep appearance-based association metric. This en- hancement enables the track er to consider visual features of the object, thereby impro ving identity preserv ation across occlusions. The architecture of the DeepSOR T algorithm is illustrated in Figure 1. Figure 1. DeepSOR T architecture [20] The DeepSOR T algorithm maintains consistent object identiers by e v aluating a combined dis tance metric. An object is assigned the same ID if the follo wing condition is satised [21]-[23]: λ × d 1 ( i, j ) + (1 λ ) × d 2 ( i, j ) T hr eshol d (1) In this formulation, d 1 ( i, j ) represents the cosine distance, while d 2 ( i, j ) denotes the Mahalanobis distance. The v ariables i and j correspond to the coordinates of the object’ s center , and λ [0,1] is a weighting f actor that balances the contrib ution of appearance-based and motion-based metrics. This criterion enables the algo- rithm to associate reappearing objects with their original identities, thereby impro ving tracking rob ustness in scenarios in v olving occlusion. 3. METHOD In order to enhance the performance of DeepSOR T , tw o critical modications were incorporated into its original frame w ork. 3.1. Ar chitectur e modication T o reduce identity confusion between distinct object classes, we propose to modify the archite cture of the DeepSOR T tracking algorithm. Specically , the modication in v olv es deplo ying separate DeepSOR T instances for each class: one dedicated to pedestrians and the other to v ehicles (e.g., cars). These track ers operate concurrently and independently , allo wing for class-specic identity management and reducing cross- class mis-association. The nal tracking output is obtained by mer ging the results from both track ers, thereby preserving class inte grity throughout the tracking process. The proposed dual-track er architecture is illustrated in Figure 2. It will be referred in the follo wing as the decoupled DeepSOR T algorithm. Figure 2. Decoupled DeepSOR T architecture Object detection and tr ac king with decoupled DeepSORT based on α β lter (Lakhdar Djelloul Mazouz) Evaluation Warning : The document was created with Spire.PDF for Python.
1732 ISSN: 1693-6930 3.2. Kalman lter r eplacement The second modication that we made to the DeepSOR T algorithm i n v olv es replacing the Kalman lter with a simplied and computationally ef cient alternati v e: the α β lter . This lter is a x ed coef cients, which may be vie wed as a second-order steady-state Kalman. It w as originally designed for tar get tracking in the radar eld. Compared to the Kalman lter , the α β lter of fers se v eral adv antages in the conte xt of real-time object tracking [24]: - Simplied predi ction and u pdat e mechanisms: unlik e the Kalman lter , which dynamically computes g ain v alues based on the inno v ation co v ariance matrix in each frame, the α β lter utilizes x ed g ain parameters, α for position and β for v elocity , resulting in more straightforw ard computations, as sho wn in (2) and (3). - Reduced computation comple xity: the α β lter updates the state v ector and associated parameters without requiring matrix in v ersion, thereby signicantly lo wering the computational b urden compared to the Kalman lter , as sho wn in (4) and (5). - Comparable perf ormance in simple tracking scenarios: despite its simplicity , the α β lter achie v es tracking accurac y similar to that of the Kalman lter in scenarios with limited noise and linear motion. The state v ector at time step (frame) k, used in the lter , encompasses the object’ s bounding box parameters [ X k , Y k , W k , H k ] and their v elocities [ ˙ X k , ˙ Y k , ˙ W k , ˙ H k ], with [ X k , Y k ] denoting the object center coordinates, and [ W k , H k ] the height and width of the bounding box. The operational steps of the α β lter are outlined belo w . T o simplify , we gi v e the equations only for the x com- ponent. Similar equations apply for the remaining component. Let X e k , X p k and X m k denote, respecti v ely , the estimate, the predicted and the measurement (pro vided by the detection stage) of X k . - Initialization: The X k component of the state v ector is initialized with the X coordinate of the center of rst ( k = 0 ) bounding box, pro vided by the detection stage. The ˙ X k of the state v ector is initialized with zero. - Prediction X p k = X e k + T × ˙ X e k (2) ˙ X p k = ˙ X e k (3) where T represents the frame period. - Update X e k = X p k + α x × ( X p k X m k ) (4) ˙ X e k = ˙ X p k + β x T × ( X p k X m k ) (5) where α x and β x are the x ed coef cients of the lter , relati v e to the component X k of the state v ector . The selection of these coef cients depends on the system’ s dynamics: The higher these coef cients are the more responsi v e is the lter . 4. METRICS 4.1. Object detection metrics T o determine which object detector is t for our application, we can emplo y dif ferent metrics such as recall, precision and intersection o v er union IoU , F1 scor e , a v erage precision ( AP ) and mean a v erage precision ( mAP ) [25]. T o e v aluate the suitability of v arious object detection models for a gi v en application, se v eral performance metrics are commonly emplo yed. These include recall, precision, intersection o v er Union (IoU), F1 score, a v erage precision (AP), and Mean a v erage precision (mAP) [25]. 4.1.1. A v erage pr ecision W e be gin by dening tw o v ery important metrics for detector e v aluation, which are: precision and recall. These tw o metrics are gi v en by: P r ecision = T P T P + F P (6) R ecal l = T P T P + F N (7) where : TELK OMNIKA T elecommun Comput El Control, V ol. 23, No. 6, December 2025: 1729–1742 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1733 - T P = T rue positi v es - F P = F alse positi v es - F N = F alse ne g ati v es F or a single class, the AP is computed as the area under the precision-recall curv e: Z 1 0 P ( r ) d r (8) where P ( r ) is the precision as a function of recall. In practice, the a v erage precision for class may be approximated using: AP i = k 1 X j =0 [ R ecal l ( i, j ) R ecal l ( i, j 1)] × P r ecision ( i, j ) (9) where R ecal l ( i, j ) and P r ecision ( i, j ) are the recall and precision of class i , e v aluated using the j th threshold and k is the number of threshol ds. The AP is a metric that may be used to assess the performance of detection and localization algorithms, the higher it is the more ef cient is an algorithm. It corresponds to the area under the precision-recall curv e, and it may be estimated using the pairs (precision, recall) for se v eral condence thresholds 4.1.2. Mean a v erage pr ecision The mAP is another important performance metric, mainly used for e v aluating machine learning models. It is dened as the a v erage of the a v erage precisions of dif ferent detected classes, and is calculated through the (10): mAP = 1 N × N X i =1 AP i (10) where N indicates the number of classes and AP i is the a v erage precision of class i . 4.1.3. Scor e F1 ( F 1 scor e ) The F 1 S cor e is a performance metric used in classication and detection tasks. It represents the harmonic mean of precision and recall, thus balancing both metrics into a single v alue. F 1 scor e = P r ecision × R ecal l P r ecision + R ecal l 2 (11) This metric is particularly useful when the dataset is imbalanced, as it considers both f alse positi v es and f alse ne g ati v es. 4.2. Object tracking metrics [26] The classication of e v ents, acti vities, and relationships (CLEAR) w orkshop has dened a common, unied frame w ork for e v aluating multi-object tracking (MO T) algorithms, kno wn as CLEAR MO T metrics, which places the multiple object tracking accurac y (MO T A) metric as the primary metric for tracking e v alua- tion, although it has been criticized for f a v oring detection o v er association. In recent years, the most commonly used benchmarks for e v aluating mul ti-object tracking algorithms are MO TChallenge and KITTI, the main met- rics used in these benchmarks are MO T A, IDF1 and high order tracking accurac y (HO T A). 4.2.1. DetA Detection accurac y ( D etA ) measures ho w well the track er detects objects, independent of identity preserv ation. The formula for its computation is: D etA = T P T P + F P + F N (12) Object detection and tr ac king with decoupled DeepSORT based on α β lter (Lakhdar Djelloul Mazouz) Evaluation Warning : The document was created with Spire.PDF for Python.
1734 ISSN: 1693-6930 4.2.2. AssA Association accurac y ( AssA ) e v aluates ho w well the track er m aintains object identities across frames. It is computed by: AssA = T P A T P A + F P A + F N A (13) where : - T P A = T rue positi v es association - F P A = F alse positi v es association - F N A = F alse ne g ati v es association 4.2.3. Identication F1 scor e (IDF1) I D F 1 e v aluates the accurac y of identity preserv ation in tracking. It is the harmonic mean of identity precision and identity recall: I D F 1 = 2 × I D T P 2 × I D T P + I D F P + I D F N (14) where : - I D T P = Identity true positi v es - I D F P = Identity f alse positi v es - I D F N = Identity f alse ne g ati v es 4.2.4. LocA Localization accurac y ( LocA ) measures the a v erage spatial alignment of correctly detected objects using IoU. The formula for its computation is: LocA = 1 | T P | × X c T P Loc I oU ( c ) (15) Where I oU ( c ) represents the intersection o v er union for true positi v e candidate c . 4.2.5. Multiple object tracking accuracy MO T A e v aluates o v era ll tracking performance by penalizing f alse positi v es, missed detections, and identity switches. It reects ho w well the track er maintains object presence and identity . Its formula is: M O T A = 1 P F P × P F N × P I D S W P GT D et (16) where : - k = Frame inde x - I D S W = Identity switches - GT D et = Ground-truth tracks detection 4.2.6. Multiple object tracking pr ecision M O T P measures the a v erage localization precision of matched object detections, based on the spatial o v erlap (IoU) between predicted and ground truth bounding box es. It can be computed by: M O T P = P k ,i I oU k ,i P k c k (17) Where I o U k ,i represents the bounding box o v erlap of object i , at time k and c k the number of matches in frame k . TELK OMNIKA T elecommun Comput El Control, V ol. 23, No. 6, December 2025: 1729–1742 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1735 4.2.7. HO T A HO T A is a more recent metric that jointly e v aluates detection, association, and LocA. It represents the geometric mean of detection and association accuracies: H O T A α = p D ett α × Ass α = s P c T P Ass I oU ( c ) | T P A α | + | F N A α | + | F P A α | (18) In this formula, the term α represents the dif ferent IoU thresholds used to used to compute the metric. A generalized v ersion of H O T A α , denoted as HO T A is computed o v er a range of thresholds α [0,1]: H O T A = Z 0 α 1 H O T A α = 0 . 95 X α =0 . 05 H O T A α (19) HO T A is a scalar metric that summarizes the o v erall tracking performance of a system by a v eraging the HO T A scores across a range of IoU thresholds. It captures the balance between detection accurac y , association accurac y , and localization precision, ma king it one of the most comprehensi v e metrics for e v aluating multi- object tracking. Unlik e traditional metrics that focus hea vily on either detection MO T A or identity preserv ation (IDF1), HO T A inte grates all three (aspects: detection, associ ation, and localization) into a unied score that reects performance across v arying spatial tolerances. 5. EXPERIMENT AL RESUL TS 5.1. Object detection In training the KITTI dataset w as spl it into dif ferent folders with a ratio of 80:10:10 for training, v alidation and test respecti v ely . The KITTI dataset w as used for the e v aluation of the emplo yed object detection method. This dataset w as split into three folders, with a ratio of 80:10:10 for training, v alidation and test, respecti v ely . The hardw are conguration used for training is: - Graphics processing unit (GPU): NVIDIA GeF orce R TX 3060. - Central processing unit (CPU): 10th Gen Intel Core(TM) i5-10400, 2.9 Ghz (12 CPU). - Memory 64 GB. The softw are conguration used in training: - Python v ersion 3.11.9, and V isual studio code v ersion 1.102.3. T raining h yperparameters: - Epochs = 50, imgsz = 640, batch = 16, learning rate = 0.01 and 60fps. The mean a v erage precision ( mAP ) obtained from training is presented by Figure 3: (a) (b) Figure 3. A v erage precision: (a) KITTI dataset and (b) COCO dataset Figure 3 sho ws the comparati v e mAP results, re v ealing a clear adv antage in our K I T T I trained model Figure (3)a compared to the C O C O reference Figure 3(b), with a signicant impro v ement in the score: 4.2% for the pedestrian class and 26% for the car class. Object detection and tr ac king with decoupled DeepSORT based on α β lter (Lakhdar Djelloul Mazouz) Evaluation Warning : The document was created with Spire.PDF for Python.
1736 ISSN: 1693-6930 5.2. Object tracking r esults Once a high-performance object detector has been obtained, it must be inte grated with a tracking algorithm to de v elop a fully autonomous video surv eillance system. F or the tracking component, we selected DeepSOR T , one of the most widely adopted algorithms in this domain, due to its pro v en ef fecti v eness in tracking mo ving objects within video scenes. As pre viously noted, tw o modications were introduced to enhance DeepSOR T’ s performance. T o e v aluate the impact of these impro v ements, we compare tracking metrics obtained using the modied v ersions ag ainst those from the original implementation. The tracking h yperparameters used in both congurations are detailed belo w: - max cosine distance = 0.3 : enforces stricter appearance matching. - nn budg et = 100 : species the number of appearance features to cache. - max I o U distance = 0 . 7 : sets the bounding box o v erlap threshold. - max ag e = 60 : determines the number of frames a track is retained without updates. - n init = 3 : indicates the number of detections required to conrm a track. The α v alue w as chosen empirically from the interv al [0,1]. The opti mal β v alue w as chosen according to the Benedict-Bordner rule [24] , where: β = α 2 2 α (20) After se v eral tests, the couple ( α , β ) that g a v e the best results in t erms of metrics w as chosen to v alidate the e xperimental results of our application is (0.2, 0.022). The real-time performance specications: FPS: 30, GPU: 15% and latenc y: 37 ms. T able 1 and Figure 4 present the performance metrics obtained from the original DeepSOR T implementation. According to T able 1 and Figure 4 , we can see that the scores obtained for the MO T A and HO T A (Figure 4(a)) metrics were 68.359 and 0.68 respecti v ely , with an IDswitch of 434 for the car class, while for the pedestrian cl ass, the MO T A and HO T A (Figure 4 (b)) scores were 42.474 and 0.42 respecti v ely , with an IDswitch of 510. T able 2 and Figure 5 sho w the dif ferent metrics obtained from the implementation of the decoupled DeepSOR T . T able 1. Original DeepSOR T metrics Class MO T A MO TP TP FN FP IDSW Dets GT Dets IDs GT IDs Car 68.359 68.007 20026 4044 1234 434 21260 24070 938 564 Pedestrian 42.474 44.682 8103 3006 3393 510 11496 11109 472 167 (a) (b) Figure 4. Original DeepSOR T metrics: (a) car and (b) pedestrian TELK OMNIKA T elecommun Comput El Control, V ol. 23, No. 6, December 2025: 1729–1742 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control 1737 T able 2. Decoupled DeepSOR T metrics Class MO T A MO TP TP FN FP IDSW Dets GT Dets IDs GT IDs Car 88.238 86.772 22432 1638 947 381 (-53) 23379 24070 823 564 Pedestrian 67.954 79.448 8959 2150 1072 370 (-140) 10031 11109 308 167 (a) (b) Figure 5. Decoupled DeepSOR T metrics: (a) car and (b) pedestrian Compared with the original DeepSOR T metri cs, we can see that the scores of MO T A and HO T A (Figure 5(a)) ha v e impro v ed signicantly by +19.238 (88.238) and +0.07 (0.75) respecti v ely , and a reduction by -53 (381) in IDswitch f o r the car class. while for the pedestrian class, the MO T A and HO T A (Figure 5 (b)) scores sho w an impro v ement of +25.48 and +0.1 respecti v ely and a reduction of -150 in the IDswitch due to the use of the decoupled DeepSOR T which solv es this problem by using tw o parallel architectures, making it impossible to confuse the yolo cl asses. T able 3 and Figure 6 sho w the dif ferent metrics obtai ned from the implementation of the α β lter based decoupled DeepSOR T algorithm. Compared with the original DeepSOR T metri cs, we can see that the scores of MO T A and HO T A Figure 6(a) ha v e impro v ed signicantly by +22.023 and +0.1 respecti v ely , and a reduction by -188 in IDswitch for the car class. while for the pedestrian class, the MO T A and HO T A Figure 6(b) scores sho w an impro v ement of +27.073 and +0.12 respecti v ely and a reduction of -172 in the IDswitch. T ables 4 and 5 present a comparison of the metrics obtained with the 3 v ersions of the DeepSOR T algorithm (the original one and the 2 modied v ersions), for the pedestrians and cars classes. As sho wn in the comparison, all e v aluation metrics are impro v ed by the decoupled DeepSOR T algo- rithm compared to the original v ersi on. These metrics are further enhanced when the α β lter is inte grated into the decoupled DeepSOR T architecture. In order to generalize the results of our application, we repeated the tests on another e v aluation benchmark (MO TChallenge benchmark [17]). The MO TChallenge focuses on pedestrian tracking only . The results o bt ained are presented in the follo wing Figure 7, where Figure 7(a) represents the metri cs of the original DeepSOR T , Figure 7(b) represents the metrics obtained from the decoupled DeepSOR T , and Figure 7(c) represents the metrics obtained from the decoupled DeepSOR T based on the α β lter . According to the Figure 7 and T ables 6 and 7, we can conclude that the results are consistent with those obtained from the KITTI benchmark [15]. Figure 8 sho w some tracking results using the original DeepSOR T before and after occlusion from the KITTI object tracking e v aluation dataset [16]. T able 3. Decoupled DeepSOR T based α β lter metrics Class MO T A MO TP TP FN FP IDSW Dets GT Dets IDs GT IDs Car 90.382 87.085 22630 1440 494 246 (-188) 23124 24070 745 564 Pedestrian 69.547 79.434 8931 2178 835 338 (-172) 9766 11109 256 167 Object detection and tr ac king with decoupled DeepSORT based on α β lter (Lakhdar Djelloul Mazouz) Evaluation Warning : The document was created with Spire.PDF for Python.
1738 ISSN: 1693-6930 (a) (b) Figure 6. Metrics of the α β lter based decoupled DeepSOR T : (a) cars and (b) pedestrians T able 4. Comparison of the metrics obtained with the 3 v ersions of the DeepSOR T algorithm, for the cars’ class Metric (%) Original DeepSOR T Decoupled DeepSOR T Decoupled DeepSOR T based αβ lter HO T A 68.359 74.767 (+6.408) 77.645 (+9.286) DetA 68.007 77.032 (+9.025) 79.296 (+11.289) AssA 69.093 73.074 (+3.981) 76.502 (+7.409) DetRe 73.712 82.754 (+9.042) 83.742 (+10.03) DetPr 83.455 85.2 (+1.745) 87.168 (+3.713) AssRe 73.694 77.82 (+4.126) 80.209 (+6.515) AssPr 85.374 86.728 (+1.354) 88.342 (+2.998) LocA 87.966 88.099 (+0.133) 88.385 (+0.419) T able 5. Comparison of the metrics obtained with the 3 v ersions of the DeepSOR T algorithm, for the pedestrians’ class Metric (%) Original DeepSOR T Decoupled DeepSOR T Decoupled DeepSOR T based αβ lter HO T A 42.474 52.506 (+10.032) 54.019 (+11.545) DetA 44.682 58.606 (+13.924) 59.142 (+14.46) AssA 40.576 47.297 (+6.721) 49.589 (+9.013) DetRe 59.805 66.285 (+6.78) 65.835 (+6.03) DetPr 57.792 73.408 (+15.616) 74.888 (+17.096) AssRe 46.445 54.441 (+7.996) 56.331 (+9.886) AssPr 65.463 68.092 (+2.629) 68.928 (+3.465) LocA 81.742 82.042 (+0.3) 82.067 (+0.325) T able 6. MO TChallenge benchmark’ s metrics with 3 v ersion of DeepSOR T Algorithm MO T A MO TP TP FN FP IDSW Dets GT Dets IDs GT IDs Original Deep- SOR T 33.171 76.761 31767 8138 18009 512 49776 39905 794 500 Decoupled Deep- SOR T 34.399 76.746 31682 8223 17681 274 (-238) 49363 39905 873 500 Decoupled Deep- SOR T based αβ 34.399 76.746 31682 8223 17681 274 (-238) 49363 39905 873 500 In the original DeepSOR T algorithm, both classes are track ed simultaneously by the same DeepSOR T . Occlusion can cause confusion between the tw o classes, i.e., a pedestrian can be classied as a car and vice v ersa. This confusion is translated by an IDSwitch. As sho wn in the Figure 8, the system confuses their TELK OMNIKA T elecommun Comput El Control, V ol. 23, No. 6, December 2025: 1729–1742 Evaluation Warning : The document was created with Spire.PDF for Python.