Inter national J our nal of Inf ormatics and Communication T echnology (IJ-ICT) V ol. 15, No. 1, March 2026, pp. 120 137 ISSN: 2252-8776, DOI: 10.11591/ijict.v15i1.pp120-137 120 A comparati v e analysis of P oS tagging tools f or Hindi and Marathi Pratik Narayanrao Kalamkar , Prasadu P eddi, Y ogesh K umar Sharma Department of Computer Science and Engineering, Shri Jagdishprasad Jhabarmal T ibre w ala Uni v ersity , Jhunjhunu, India Article Inf o Article history: Recei v ed Oct 4, 2024 Re vised Jul 11, 2025 Accepted Oct 7, 2025 K eyw ords: Computational linguistics Machine learning Natural language processing P art-of-speech tagging T e xt analytics T ok enization ABSTRA CT Man y tools e xist for performing parts of speech (PoS) data tagging in Hindi and Marathi. Still, no s tandard benchmark or performance e v aluation data e xists for these tools to help researchers choose the bes t according to their needs. This paper presents a performance comparison of dif ferent PoS taggers and widely a v ailable trained m odels for these tw o language s. W e used dif ferent granularity data sets to com pare the performance and precision of these tools with the Stan- ford PoS tagger . Since the tag sets used by these PoS taggers dif fer , we propose a mapping between dif ferent PoS tagsets to address this inherent challenge in tagger comparison. W e tested our proposed PoS tag mappings on ne wly created Hindi and Marathi mo vie scripts and subtitle datasets since mo vie scripts are dif ferent in ho w the y are formatted and structured. W e shall be surv e ying and comparing v e parts of speech tagge rs viz. IML T Hindi rules-based PoS tagger , L TRC IIIT Hindi PoS tagger , CD A C Hindi PoS tagger , L TRC Marathi PoS tag- ger , CD A C Marathi PoS tagger . It w ould also help us e v aluate ho w the Bureau of Indian Standards’ s (BIS) tag set of Indian languages compares to the Uni v er - sal Dependenc y (UD) PoS tag set, as no studies ha v e been conducted before to e v aluate this aspect. This is an open access article under the CC BY -SA license . Corresponding A uthor: Pratik N. Kalamkar Department of CSE, Shri Jagdishprasad Jhabarmal T ibre w ala Uni v ersity Jhunjhunu, Rajasthan, India Email: pratik2kcn@gmail.com 1. INTR ODUCTION In the eld of natural language processing (NLP), part-of-speech (PoS) tagging is an important task that signicant ly impacts later linguistic processing stages. It in v olv es marking each w ord, punctuation, and symbol in a sentence with i ts corresponding PoS grammatical cate gory—such as noun, v erb, and ad- jecti v e—based on its meaning and conte xt. PoS tagging pro vides structural and conte xtual information that helps in understanding the te xt accurately in the later stages of syntactic parsing, NER, chunking, and seman- tic analysis. PoS tagging is particularly challenging for morphologically rich and v aried languages spok en in India, lik e Hindi and Marathi, where w ords und e r go comple x inections based on gender , number , and case. The task becomes e v en more challenging and comple x due to the limited a v ailability of ready-made annotated corpora and linguistic resources for these languages , compared to widely studied languages lik e English [1]. PoS tagging possibility is more comple x than it looks in this situation bec ause of code-switching, colloquial- ism, and lack of linguistic resources for this kind of data [2]. Code-switching occurs when te xt is switched from one language to another . Colloquialism refers to informal e xpressions, slang, and non-standard gram- matical constructs that often dif fer from formal language norms. The lack of linguistic resources, including J ournal homepage: http://ijict.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Inf & Commun T echnol ISSN: 2252-8776 121 scarce or non-e xistent PoS-tagged datasets for less-documented languages or dialects, signicantly limits the a v ailability of supervised training options for de v eloping accurate models. This paper presents a comprehen- si v e comparati v e performance and accurac y e v aluation of se v eral PoS taggers de v eloped for Hindi and Marathi languages. The focus is on thei r performance across v arious dimensions, such as accurac y , speed, rob ustness, and the ability to handle dif ferent le v els of PoS granularity . The taggers selected for e xamination a re widely a v ailable and popular tools, including the IML T Hindi rules-based PoS tagger , L TRC IIIT Hindi PoS tagger , CD A C Hindi PoS tagger , L TRC IIIT Marathi PoS tagger , and CD A C Marathi PoS tagger . These taggers adopt a di v erse range of methodologies, f rom rule-based systems to machine learning (ML)-based models, thereby pro viding a comprehensi v e vie w of the research landscape. A central aspect of this study i s the e v aluation of the taggers across dif ferent le v els of PoS granular ity , particularly when mapped to the Uni v ersal Dependencies (UD) PoS tag set. PoS tagging systems mostly use language-specic tag sets, which may v ary in their le v el of granularity (e.g., dif ferentiating between proper nouns and common nouns or grouping them under a general “noun” cate gory) from tagger to tagger . By mapping these outputs of dif ferent taggers to the common UD tag set, we can perform a more uniform and signicant comparison across taggers, e v en if their original tag sets dif fer in comple xity detail or number . This analysis allo ws us to assess the taggers’ e xibility to adjust to d i f ferent le v els of linguistic abstraction, which is especially important in cross-linguistic and multilingual NLP applications. T o further underscore the practical application of the research, we test these PoS taggers on mo vie scripts and subtitles, which present unique challenges not typical ly found in traditional, grammatically formal corpora. Mo vie scripts and subtitles contain informal dialogues, scene descriptions, quoted te xt, and act ion lines, which may pose additional dif culties for PoS taggers designed with more structured te xt in mind [3]. W e use a mo vie script dataset for Hindi PoS tagger e v aluation; this consists of scripts of 100 dif ferent Hindi mo vies released in 2018. F or the Marathi PoS tagger e v aluation, we use a subtitle dataset consisting of 100 subtitles belonging to 100 lms released in 2010. This paper aligns with t he goals of adv ancing NLP techniques for underrepresented languages, s pecif- ically Hindi and Marathi, by pro viding a comparati v e analysis of e xisting PoS tagging tools. It bridges the g ap between academic research and practical applications by focusing on linguistically rich and computationally challenging datasets lik e mo vie scripts and subtitles. This alignment ensures that the study contrib utes both to the theoretical understanding and practical enhancement of PoS tagging performance for Indian languages. 2. D A T A PREP ARA TION AND PRE-PR OCESSING F ollo wing data pre-processing steps were follo wed for creating a dataset of Hindi mo vie scripts, Step 1: web cra wling: websites such as lmcompanion.in and scribd.com were cra wled using tools such as Open W eb Scraper and Scrap y to collect Hindi mo vie scripts. Step 2: formatting issues: scripts were found in v arious formats, including De v anag ari, Phonotonic Hindi, English, PDFs, and scanned copies, due to the lack of uniform scriptwriting standards in the Hindi mo vie industry . Step 3: manual typing: scanned copies of scripts were manually typed into De v anag ari Hindi te xt. Step 4: spell checking: proper spell checks were performed on English and Hindi electronic te xts using Microsoft W ord macros to create a cleaner dataset. Step 5: language con v ersion: scripts in Phonotonic Hindi and English were con v erted into De v anag ari Hindi using Google T ranslate. Step 6: UTF-8 con v ersion: all scripts were con v erted to Unicode UTF-8 format to ensure compatibility with the PoS taggers being e v aluated. Step 7: te xt standardization: re gular e xpressions were used in notepad++ to arrange the scripts in a uniform te xt format. This includes steps lik e remo ving unw anted characters and page numbers and con v erting each sentence to a ne w line, as sho wn in Figure 1. Step 8: dataset o v ervie w: the nal dataset consists of 100 Hindi mo vie scripts across v arious genres since 2018, containing a total of 170,744 lines and 1,029,826 w ords, stored as UTF-8 formatted te xt les in De v anag ari Hindi. Step 9: Python script for analysis: a Python script w as written to calculate the total number of lines and w ords. Lines were counted by reading each le line by line, while w ords were counted by splitting each line based on whitespace. A compar ative analysis of P oS ta g ging tools for Hindi and Mar athi (Pr atik Nar ayanr ao Kalamkar) Evaluation Warning : The document was created with Spire.PDF for Python.
122 ISSN: 2252-8776 F or the Marathi language mo vies dataset, it w as e v en more challenging to get a dataset, as no signif- icant readily a v ailable scripts were present. Hence, in Step 1, we g athered subtitles (.srt) of Marathi mo vies in dif ferent languages and manually translated them into Marathi language using Google T ranslate and later v eried with language e xperts. Language e xpert v erication w ould further reduce t he chances of errors af- ter autom atic translation. One hundred Marathi mo vies of dif ferent genres ha v e been selected since 2010. Lahoti et al. [4] e xtensi v ely surv e yed Marathi language NLP tasks, especially for resources, tools, and state- of-the-art techniques. Ho we v er , this surv e y sho wed a lack of research on Marathi subtitles. Later , in Step 2, time stamps were remo v ed from subti tle les. T o remo v e time stamps, we processed te xt using re ge x (re gular e xpressions) in Python by remo ving lines from subtitle les that ha v e only numbers and symbols lik e : ( colon) and (,) comma. W e are left with only dialogues, sound ef fect w ords, time cues, background noise w ords, and non-v erbal communication. W e k ept these w ords t o impro v e the richness of our subtitles lik e a script. Step 3, since the original te xt we are using here w as in subtitle format, the sentences (dialogues) were spread across dif ferent lines. T o combine multiple lines of the same sentence into a single line, we used re ge x on te xt les. The use of Re ge x in Notepad++ w ould combine the sentences that are split across multiple lines into a single line by matching termination symbols viz. (dot), ? (question mark), ! (e xcla- mation) which are similarl y used in Marathi as in the English language. W e created a dataset of 100 dif ferent Marathi mo vie subtitles, which nally is in the form of 100 te xt les in UTF-8 with Marathi te xt in De vnag ari after pre-processing, as illustrated in Figure 2. These contain a total of 153,565 lines and 851,377 w ords. The number of lines and w ords is counted using Python script, just lik e we did for Hindi script les. Figure 1. Sample snippet of Hindi mo vie script Figure 2. Sample snippet of Marathi mo vie script 3. RELA TED W ORK V arious studies ha v e compared dif ferent PoS tagging techniques across domains, each emphasizing the performance of di v erse algorithms. One of the earliest studies in PoS tagger comparison by K uma w at and Jain [5] talks about v arious de v elopments in PoS t aggers and PoS-tag-set for the Indian language; the y talk about the application T rigram and hidden Mark o v models (HMM) methods on Hindi te xt to measure performance accurac y of dif ferent PoS taggers a v ailable. Chiplunkar et al. [1] conducte d a comparati v e e v aluation of PoS taggers using HMM across multilingual corpora consisting of Hindi and Marathi. Their study demonstrated the impact of linguistic di v ersity on tagging accurac y , emphasizing the v arying performance of each model in dif ferent linguistic settings. Horsmann et al. [6] conducted a comparati v e analysis of 22 PoS tagger models for English and German, sourced from nine dif ferent implementations. By e v aluating these models on a v ariety of corpora spanning dif ferent domains, the y simulate a “black-box” scenario in which researchers select a PoS tagger based on f actors such as popularity or ease of use and subsequently apply it to di v erse types of te xt. This approach focuses on assessing the performance of the taggers across v arious te xt types. Jacobsen et al. [7] in v estig ate the trade-of f between model size and performance in ML-based NLP , proposing methods to compare the tw o. Their case study on part-of-speech tagging across eight languages identies classical taggers as optimal in balancing size and performance. In contrast, deep models, such studies lack Hindi and Marathi languages. Specic to Indian languages, T alukdar and Sarma (2023) [8] traces the e v olution of automatic PoS tagging for Indo-Aryan languages from dictionary-based and rule-based methods to ML and deep learning (DL) models. Their re vie w highlights the superior performance of ML and DL-based taggers, with reported accuracies reaching up to 97%, and emphasizes the role of customized DL models and pre-processing methods in enhancing performance [8]. In their w ork on PoS tagging for the Khasi language, the authors emplo y the conditional random eld (CRF) method, achie ving a testing accurac y of 92.12% and an F1-score of 0.91. Int J Inf & Commun T echnol, V ol. 15, No. 1, March 2026: 120–137 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Inf & Commun T echnol ISSN: 2252-8776 123 This study contrib utes to the de v elopment of a Khasi PoS corpus, which is crucial for b uilding lemmatizers and supporting v arious NLP applications for the language [9]. PoS tagging plays a pi v otal role in man y NLP applications. Still, while English has well-established taggers, no such resources e xist for Marathi, a language with comple x morphology and re gional v ariations. The research pro vides a thorough e xploration of dif ferent PoS tagging models for Marathi, addressing challenges such as ambiguity , inectional structure, and free w ord order while proposing solutions to impro v e tagging accurac y [10]. PoS tagging for languages lik e Marathi, with comple x morphology , presents signicant chal lenges, which ha v e been addressed through rule-based techniques, HMMs, and h ybrid models. Despite the promise of ML and DL approaches, lim ited annotated data remains a k e y obstacle, as highlighted in recent studies [11]. T alukdar et al. [12] performed a critical re vie w on PoS and Uni v ersal PoS (UPoS) in lo w-resource languages lik e Hindi. Computation for Indian Language T echnology , Indian Institute of T echnology , Bombay has created tw o taggers, a Hindi PoS tagger and a h ybrid PoS tagger . The rst one is the Hindi PoS tagger , which is a CRF-based PoS tagger for the Hindi language. This tagger uses CRF-based open-source tool kit CRF++. It is a v ailable only in Hindi. The post-tagged le contains the PoS-tagged te xt in Shakti standard format (SSF). L TRC IIIT Hindi shallo w parser , de v eloped at machine translation and NLP lab at Language T ech- nology Research Center , IIIT -Hyderabad. The center has undertak en w ork in v arious sub-areas of NLP , viz. Syntax, parsing, semantics, w ord sense disambiguation, discourse, tree banking, and machine translation. The center has de v eloped a computational P anini an grammar (CPG) frame w ork for Indian languages. The focus of research in the lab includes computational gram matical models, machine translation, parsing, semantics, and dialogue and discourse analys is. The critical component studied here is the w orking of a shallo w parser for Hindi and Marathi languages. The shallo w parsing here consists of a chunk er and morphol o gi cal analysis and includes SSF , as described earlier in the ILMT study . This one does not pro vide a direct PoS tagger; this is a shallo w parser b ut is pro vided for both Marathi and Hindi. But this shallo w parser pro vides deb ug and f ast modes, where we can get intermediate outputs at v arious stages lik e tok enization and PoS tag. W e used this feature to get PoS tags. Therefore, we shall refer to it as the L TRC IIIT Hindi/M arathi PoS tagger . The L TRC IIIT Hindi PoS tagger intermediate output is in wx format (format de v eloped by IIT); hence, we ha v e an e xtra step of con v erting that into meaningful UTF-8 encoding. W e ha v e commented out steps after the PoS tagger in this parser source code, as the y are not required for analyzing the PoS tagger; this shall gi v e us an accurate measurement of the PoS tagger pro vided by this shallo w parser . CD A CM Hindi and Marathi PoS taggers, de v eloped kno wledge based computer systems di vision of CD A C Mumbai, pro vides PoS model for Hindi and Marathi languages, trained on Stanford parts of speech log-linear model. Ho we v er , the model, although trained using Stanford PoS tagger , is trained on a tweet dataset of Hindi mix ed with English [2]. PoS tool pro vides a Linux sh script that tak es the input le and pro vides an output le with PoS tags. It is de v eloped with pre-processing follo wed by plain training with Stanford PoS tagger . Using a maximum entrop y method, this PoS tagger l earns a log-linear conditional probability model from tagged te xt. The PoS tag of the input w ord is then deci ded by the model based on the conte xt and surrounding tags. Stanford pro vides a v ariety of NLP tools, including PoS tagger , which is a Ja v a implementation of the log-linear part-of-speech taggers [13]. This PoS tagger comes with thr ee trained models for English, tw o for Chinese, tw o for Arabic, and one for French, German, and Spanish. This tool has been distrib uted with a library named Stanza and has pre-trained language models for Hindi and Marathi. The Stanford Hindi and Marathi PoS taggers use Uni v ersal PoS tags. The English tagger here uses PennT reebank T ag set [14]. An attention- based model using transfer learning ac hie v ed 93.86% accurac y on the Hindi disease dataset, demonstrating the potential of domain adaptation for PoS tagging in lo w-resource domains [15]. This s tudy of fers a comprehensi v e comparati v e analysis of v e PoS taggers for Hindi and M arathi, focusing on performance metrics such as accurac y , rob ustness, and speed on unique datasets, including Hindi mo vie scripts and Marathi subtitles. A standardized mapping between t ag sets and the UD PoS tag set is proposed, enabling uniform comparisons across taggers. By e v aluati n g these tools on informal, structurally di v erse datasets, this w ork highlights their adaptability to real-w orld te xt. Additionally , insights into rob ustness across tok en sizes and computational trade-of fs are pro vided, of fering practical guidance for lo w-resource language NLP . The ndings contrib ute signicantly to adv ancing PoS tagging techniques for Indian languages. A compar ative analysis of P oS ta g ging tools for Hindi and Mar athi (Pr atik Nar ayanr ao Kalamkar) Evaluation Warning : The document was created with Spire.PDF for Python.
124 ISSN: 2252-8776 4. METHOD AND RESUL TS W e will see ho w these v e mentioned parts of speech taggers V iz. IML T Hindi rules based PoS tagger , L TRC IIIT Hindi PoS tagger , CD A C Hindi PoS tagger , L TRC IIIT Marathi PoS tagger , and CD A C Marathi PoS tagger compared with Stanford PoS tagger when we use pre-processed scripts from Hindi mo vies, and subtitles from Marathi mo vies as dataset. W e shall compare the speed, accurac y , and rob ustness of these PoS taggers with the Stanford PoS taggers. The reason for choosing the Stanford PoS tagger as the gold standard for comparing other PoS taggers is its wide acceptance and lesser granular tag set when compared to these other PoS taggers. This w ould help us to ha v e common minimum tags when we compare. Speed e v aluation: for measuring the speed performance of these PoS taggers, we supplied these PoS taggers with our pre-processed dataset consisting of Hindi mo vie scripts and Marathi mo vie subtitles. Hindi dataset is processed on the IML T Hindi rules-based PoS tagger , L TRC IT Hindi PoS tagger , CD A C Hindi PoS tagger , and Stanford Hindi PoS tagger (Stanza). Marathi dataset is processed on L TRC Marathi PoS tagger , CD A C Marathi PoS tagger , and Stanford Marathi PoS tagger (Stanza). Stanza, introduced by Manning et al. [16], supports a wide v ariety of NLP tasks for Indian languages, including PoS tagging for Hindi and Marathi. V arious Linux shell and Python scripts were created that w ould suitably run these PoS taggers on batch les. In order to measure processing time and memory utilization, we used /usr/bin/time command. W e run each of the PoS taggers for 10 iterations; this w ould help us eliminate random v ariation due to se v eral f actors, such as background processes, memory management, and CPU scheduling. This w ould also ensure t hat the nal speed measurement is not sk e wed by an outlier , gi ving a better measure of consistenc y and stability of the PoS tagger . All these speed tests were performed on Ub untu 20.04, on the same hardw are, i.e., Core i7 processor with 12 GB RAM. Dif ferent speed metrics were measured. Processing time is calculated using the Lapsed w all clock time tak en to nish PoS tagging of the gi v en dataset. Memory utilization is measured in maximum resident set size (RSS), which indicates the peak ph ysical memory used by a process during e x ecution. It i s critical for e v aluating memory ef cienc y , identifying potential memory leaks, and optimizing resource allocation. High RSS v alues may cause performance de gradation, especially on systems with limited RAM. Figure 3 presents a detailed comparison of dif ferent metrics for four dif ferent Hindi PoS taggers: IML T Hindi rules-based PoS tagger , L TRC IIIT Hindi PoS tagger , CD A C Hindi PoS tagger , and Stanford Hindi PoS T agger (Stanza). The metrics include the processing time (in seconds) as sho wn in Figure 3(a), tok ens per second in Figure 3(b), and memory usage (in kilobytes) in Figure 3(c). Each tagger’ s performance is e v aluated o v er 10 iterations to assess a v erage processing time, tok ens per second, and memory utilization. T able 1 presents the a v erage performance and standard de viation for four Hindi PoS ta g ge rs e v alu- ated o v er 10 iterations. The IML T Hindi rules-based tagger demonstrated the highest tok en processing speed (1,137.4 tok ens/sec) with lo w memory usage (676,614 kB) and minimal v ariation, making it the most ef cient in terms of speed. In contrast, the L TRC IIIT Hindi tagger sho wed e xtremely lo w speed (9.12 tok ens/sec) b ut used the least memory o v erall (16,276 kB). The CD A C Hindi tagger , ho we v er , of fered a reassuringly balanced performance. It boasted a high proces sing speed (917.9 tok ens/sec) and moderate memory usage (415,494 kB), pro viding a reliable option for PoS tagging tasks. The Stanford Hindi tagger (Stanza), while e xhibiting rela- ti v ely good speed (71.6 tok ens/sec), consumed signicantly higher memory (4.94 GB), indicating a trade-of f between model comple xity and resource consumption. T able 2 summarizes the a v erage performance and standard de viati o n of three Marathi PoS taggers across 10 itera tions. The L TRC IIIT Marathi tagger recorded the slo west processing speed (1.59 tok ens/sec) with the lo west memory usage (247,631 kB), indicating a l ightweight b ut computationally intensi v e imple- mentation. The CD A C Marathi tagger demonstrated signicantly better ef cienc y , processing 436.3 tok ens/sec with moderate memory consumption (431,050 kB), making it the most balanced performer in this group. In contrast, the Stanford Marathi tagger (Stanza) achie v ed a moderate speed of 28.16 tok ens/sec. Ho we v er , it required o v er 5.3 GB of memory , highlighting a substantial resource de mand that underscores the need for optimization in DL architectures. Figure 4 presents a detail ed comparison of three dif ferent Marathi PoS taggers: L TRC IIIT Marathi PoS tagger , CD A C Marathi PoS tagger , and Stanford Marathi PoS tagger (Stanza) on 100 Marathi mo vies dataset. The metrics include processing time (in seconds) as sho wn in Figure 4(a), total tok ens generated and tagged, tok ens per second in Figure 4(b), and memory performance MSS (in kilobytes) in Figure 4(c). Each tagger’ s performance is e v aluated o v er 10 iterations to assess a v erage processing time, tok ens per second, and memory utilization. Int J Inf & Commun T echnol, V ol. 15, No. 1, March 2026: 120–137 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Inf & Commun T echnol ISSN: 2252-8776 125 (a) (b) (c) Figure 3. Performance metrics: (a) processing time per iteration, (b) tok ens processed per second per iteration, and (c) maximum resident set size for four Hindi PoS taggers T able 1. A v erage performance summary with standard de viation for Hindi PoS taggers o v er 10 ierations T agger T ime (s) T ok ens/sec Max. Res. Memory (kB) IML T Hindi rules-based 926.4 ± 17.6 1,137.4 ± 21.7 676,614 ± 26198 L TRC IIIT Hindi 137,590 ± 2,883 9.12 ± 0.19 16,276 ± 41 CD A C Hindi 1273.2 ± 23.3 917.9 ± 16.5 415,494 ± 3874 Stanford Hindi (Stanza) 17,251.9 ± 367.4 71.6 ± 1.5 4,943,015 ± 94,292 T able 2. A v erage performance summary with standard de viation for Marathi PoS taggers o v er 10 iterations T agger T ime (s) T ok ens/sec Max. Res. Memory (kB) L TRC IIIT Marathi 615,433 ± 2,429.8 1.59 ± 0.01 247,631.2 ± 946.2 CD A C Marathi 1951.9 ± 36.5 436.3 ± 7.9 431,049.6 ± 3488.7 Stanford Marathi (Stanza) 38,324.4 ± 286.3 28.16 ± 0.21 5,314,620.7 ± 8,055.5 There is a dif fe rence in the number of tok ens processed across dif ferent PoS-taggers, e v en though the e v aluation is based on the same set of 100 les for Hindi and 100 les for Marathi due to inconsistencies in ho w the PoS taggers handle tok enization. Ho we v er , it must be noted that the dataset number of lines and w ords we are using for e v aluating these PoS taggers are the same. Due to v ariations in tok enization methods, dif ferent PoS taggers generated dif fering tok en counts for the same dataset. W e chose to retain the original tok enization of each system to preserv e their design inte grity . Dra wing on the approach of Chiche and Y itagesu [17], who conducted a comprehensi v e analysis of PoS tagging systems, comparing v arious approaches on accurac y and speed, we calculated performance metrics such as accurac y and F1-score based on each tagger’ s tok enization. A compar ative analysis of P oS ta g ging tools for Hindi and Mar athi (Pr atik Nar ayanr ao Kalamkar) Evaluation Warning : The document was created with Spire.PDF for Python.
126 ISSN: 2252-8776 (a) (b) (c) Figure 4. Performance metrics: (a) processing time per iteration, (b) tok ens processed per second per iteration, and (c) maximum resident set size for three Marathi PoS taggers Accurac y e v aluation: as illustrated in Figure 5, the block diagram for measuring the accurac y of Hindi PoS taggers compares the PoS-tagged output from the IML T Hindi rules-based PoS tagger , L TRC IIIT Hindi PoS tagger , and CD A C Hindi PoS tagger with the PoS-tagged production of the Stanford Hi ndi PoS tagger (Stanza). He nce, Stanford Hindi PoS tagger (Stanza) w ould act as a benchmark gold standard for e v aluating the accurac y of the other three PoS taggers. W e chose Stanford Hindi PoS tagger (Stanza) as our gold standard as it is a widely recognized PoS tagger with a coar ser tag set than the other three. This w ould help us propose a meaningful mapping of tags when we compare the other three PoS taggers with Stanford Hindi PoS tagger (Stanza). Proposed mapping: as discussed pre viously , we are proposing the follo wi ng tag mapping between dif ferent Hindi PoS taggers for an accurate comparison of Hindi PoS taggers. Figure 5. POS tagger e v aluation w orko w T able 3 sho ws the mapping of equi v alent linguistic cate gories from dif ferent PoS taggers of the Hindi language to a standardized set. In this case, the Stanford Hindi PoS tagger is the reference or standard tag. Original tags in dif ferent PoS taggers are replaced with the ir equi v alent tag from Stanford Hindi PoS tagger for Int J Inf & Commun T echnol, V ol. 15, No. 1, March 2026: 120–137 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Inf & Commun T echnol ISSN: 2252-8776 127 accurac y measureme n t . The tags across dif ferent taggers use dif ferent notations or granular cate gorizations b ut often represent the same or similar linguistic cate gories. F or e xample: ADJ (Stanford) represents adje cti v es. In other taggers, this is mapped to tags lik e JJ or QO (IML T , L TRC, and CD A C), which also mark adjecti v es or related cate gories lik e quantiers or ordinals. NOUN (Stanford) represents common nouns. It is mapped to tags lik e NN and NST in the other taggers, which are also noun-related cate gories. The purpose of this mapping is to help in the f air analysis of PoS cate gories across dif ferent taggers by associating tags that describe the same parts of speech; this w ou l d help us handle granularity dif ferences while computing accurac y . The rst three Hindi PoS taggers that we are comparing with Stanford Hindi PoS tagger use the tag set published by the Bureau of Indian standards, “Linguistic resources PoS tag set for Indian languages guidelines for designing tag sets and specication, [18]. Ho we v er , each parser uses a dif ferent granularity of tagset from this standard. The Standford uses uni v ersal dependencies PoS tags for both Hindi and Marathi languages. The rst open-source treebank for Marathi, adhering to the UD syntactic annotation scheme, has been de v eloped to pro vide a standardized resource for syntactic analysis of the language [19]. W e propose a mapping, as sho wn in T able 4, between dif ferent Marathi PoS taggers to enable an accurate compar - ison with the Stanford Marathi PoS tagger (Stanza). T able 3. Mapping of dif ferent Hindi PoS tagger’ s tag set with Stanford Hindi PoS tagger’ s tag set IML T Hindi L TR C IIIT Hindi CD A C Hindi Stanford Hindi JJ, QO JJ, QO, JJC JJ, QT O ADJ PSP PSP PSP ADP RB, INTF RB, INTF , RBC RB, INTF AD V V A UX V A UX V A UX A UX CC CC, CCC CCD CCONJ DEM, QF , CL DEM, QF , CL, QFC DM, DMD, DMR, DMQ, DMI, QT , QTF DET INJ INJ, INJC INJ INTJ NN, NST NN, NST , NSTC N, NN, NST NOUN QC QC, QCC QTC NUM RP , NEG RP , NEG RP , RPD, NEG P AR T PRP , WQ PRP , WQ PR, PRP , PRF , PRL, PRC, PRQ, PRI PR ON NNP NNP NNP PR OPN SYM SYM PUNC PUNCT UT UT CCS SCONJ VM VM , VMC V , VM VERB RDP , ECH, UNK RDP , ECH, UNK RD, RDF , UNK, ECH X In the case of L TRC IIIT Marathi PoS tagger , question w ords (e.g., “who, “what”) WQ are ma tched as either PR ON (pronouns) or DET (determiners) tags in Stanford Marathi PoS tagger , as sho wn in T able 4. Similarly , the CC (coordinating conjunction) tag is considered matched as either CCONJ (coordinating con- junction) or SCONJ (subordinating conjunction) tags in Stanford Marathi PoS tagger . The question w ords (WQ) in L TRC are mapped to either PR ON or DET in Stanford based on their conte xt—whether the y function as pronouns or determiners. Si milarly , the CC tag in L TRC can correspond to either CCONJ or SCONJ in Stanford, depending on its syntactic role. These mappings, lik e other mappings, reect dif ferences in granular - ity between the taggers. In Stanford, the broader L TRC tags are further clas sied for ner classication, with conte xt guiding the e xact match. Despite the v arious formats from dif ferent PoS taggers, our standardization process simplies the data into a common format: UTF-8 te xt les with tw o columns. The rst column contains the tok en, and the ne xt column contains the corresponding PoS tag of that tok en, separ ated by a tab . Each line has a tok en-PoS tag pair , as illustrated in Figure 6. After we ha v e the PoS-tagged dataset from each parser , we replace the original PoS tag gi v en by that PoS parser with the corresponding Stanford PoS tag, as per the earlier tables for Hindi and Marathi PoS- tagged data. It w ould help us measure accurac y by ha ving mapping of tags used by Hindi and Marathi PoS taggers, which are mainly PoS tags set for Indian languages gi v en by the Bureau of Indian standards, with the corresponding Stanford PoS tagger’ s tag set, which is primarily UD PoS tag set. W e ha v e written some Python scripts to perform this task. A compar ative analysis of P oS ta g ging tools for Hindi and Mar athi (Pr atik Nar ayanr ao Kalamkar) Evaluation Warning : The document was created with Spire.PDF for Python.
128 ISSN: 2252-8776 F or matching purposes, we match a w ord in the PoS-tagged le with the corresponding w ord in the Standford PoS-tagged le. If we get the same PoS tag, the w ord is considered to be correctly mat ched. Since dif ferent PoS taggers ha v e dif ferent tok enization approaches and hence the dif ferent tok ens on the same data are generated, to ha v e the best possible matching of w ords between the test le and the gold standard le, we use a matching algorithm using a Python script. T able 4. Mapping of dif ferent Marathi PoS tagger’ s tag set with Stanford Marathi PoS tagger’ s tag set L TRC IIIT Marathi CD A C Marathi Stanford Marathi JJ, QO JJ, QT O ADJ PSP PSP ADP RB, INTF RB, INTF AD V V A UX V A UX A UX CC CCD CCONJ DEM, QF , WQ DM, DMD, D MR, DMQ, QT , QTF DET INJ INJ INTJ NN, NST N, NN, NST NOUN QC QTC NUM RP , NEG RP , RPD, NEG P AR T PRP , WQ PR, PRP , PRF , PRL, PRC, PRQ PR ON NNP NNP PR OPN SYM PUNC PUNCT UT , CC CCS, UT SCONJ VM V , VM, VNF VERB CL, C, RDP , ECH, UNK RD, RDF , UNK, ECH X Figure 6. T ok en-PoS tag pair The Python code compares PoS-tagged data from tw o les—one containing test data (other PoS tag- gers) and the other containing a gold standard (Stanford PoS taggers)—by rst reading both les and e xtracting w ord-PoS pairs. It then uses the dif ib .SequenceMatcher to nd the best matching sequences of w ords between the tw o datasets. F or each matched w ord pair , it checks whether the PoS tags are the same, recording true posi- ti v es (correct matches), f alse positi v es (incorrect tags assigned in the test data), and f alse ne g ati v es (correct tags missed by the test data). The code calculates each tag’ s precision, recall, and F1 scores and produces o v erall accurac y metrics for dif ferent tok en size thresholds. P an and Saha [20] e v aluated PoS tagging for Beng ali te xt. This w ork inspired our approach to calculating each tag’ s precision, recall, and F1 scores and generating o v erall accurac y metrics for dif ferent tok en size thresholds. The Python library we use, dif ib .SequenceMatcher is a Python module that compares sequenc es of an y hashable types, typically for string or tok en matching. It uses a v ariant of the Ratclif f/Obershelp algorithm, also kno wn as the “gestalt pattern matching” algorithm [21]. This algorithm compares sequences by nding the longest contiguous matching subsequence and recursi v ely applying the process to the unmatched part. The Ratclif f/Obershelp algorithm is partic ularly ef cient for sequences with lar ge matching blocks, making it suitable for te xt comparison tasks where most of the te xt remains unchanged, which is ideal for our datasets. SequenceMatcher is easy for quick, approximate matching tasks, with high-le v el methods to compute similarity ratios and generate dif fs. It balances ef cienc y and accurac y , making it a good choi ce for te xtual or sequence- matching tasks where the order and contiguous blocks matter and for cases where the sequences are mainly similar . The support for each PoS tag is calculated by processing les from a gold-standard folder containing tok en-tag pairs. A Python script increments a counter for each tag encountered in indi vidual les. These counts are subsequently aggre g ated across all les using a functi on that consolidates the tag occurrences into a master dictionary . The nal output pro vides the total occurrences (support) of each PoS tag across all les in the gold-standard folder , ef fecti v ely representing the frequenc y distrib ution of PoS tags i n the dataset. The results of this analysis are presented in T able 5. Int J Inf & Commun T echnol, V ol. 15, No. 1, March 2026: 120–137 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Inf & Commun T echnol ISSN: 2252-8776 129 T able 5. Support table for each tag for gold standard data POS tag Support POS tag Support NOUN 178,722 CCONJ 19,356 SCONJ 19,034 PR ON 207,328 P AR T 42,892 VERB 152,316 ADP 100,208 A UX 116,814 PUNCT 231,186 ADJ 52,171 INTJ 1,788 PR OPN 52,553 AD V 18,660 DET 24,182 NUM 15,863 X 829 Accurac y analysis is done on PoS taggers, using follo wing dif ferent metrics, Precision for each tag t: Precision t = T P t T P t + F P t (1) Recall for each tag t: Recall t = T P t T P t + F N t (2) F1 score for each tag t: F1 t = 2 × Precision t × Recall t Precision t + Recall t (3) T able 6 demonstrates IML T Hindi rule-based PoS tagger’ s ef fecti v eness across v arious parts of speech, with e xceptionally high precision for cate gories such as PR ON (0.98) and PUNCT (0.99). At the same time, challenges remain with tags lik e SCONJ and INTJ. T able 7 sho ws the L TRC IIIT Hindi PoS tagger performs e xceptionally well for cate gories lik e PR ON (precision: 0.98, recall: 0.94, F1: 0.96) and PUNCT (precision: 1.00, F1: 0.99), while challenges persist for tags lik e SCONJ and INTJ, with lo wer performance metrics due to fe wer correct identications. T able 8 sho ws CD A C Hindi PoS tagger demonstrates high preci sion for cate gories such as PUNCT (1.00) and NUM (0.82) while maintaining consistent performance across most tags. Ho we v er , challenges are noted for tags lik e SCONJ and INTJ, which sho w lo wer F1 scores due to lo wer precision and recall v alues. T able 9 sho ws L TRC IIIT Marathi PoS tagger performs e xceptionally well in recognizing tags lik e PR ON (F1: 0.85), PUNCT (F1: 0.98) and CCONJ (F1: 0.91). Ho we v er , it struggles with tags lik e PR OPN and INTJ, which ha v e signi cantly lo wer F1 scores due to lo wer precision and recall. T able 10 sho ws CD A C Marathi PoS tagger performs well in cate gories lik e PR ON (precision: 0.82, F1: 0.73), NOUN (F1: 0.73), and CCONJ (F1: 0.82), while lo wer performance is observ ed for tags such as PR OPN and INTJ due to limited precision and recall. T able 6. Performance metrics for the IML T Hindi rule-based PoS tagger PoS tag TP FN FP Precision Recall F1 score NOUN 140,263 13,504 42,924 0.77 0.91 0.83 CCONJ 15,092 1,412 16,714 0.47 0.91 0.62 SCONJ 0 16,104 0 0.00 0.00 0.00 PR ON 146,578 35,553 2,749 0.98 0.80 0.88 P AR T 31,039 5,557 1,430 0.96 0.85 0.90 VERB 111,263 17,017 34,694 0.76 0.87 0.81 ADP 77,684 11,407 2,521 0.97 0.87 0.92 A UX 63,674 29,576 12,699 0.83 0.68 0.75 PUNCT 139,920 4,466 1,221 0.99 0.97 0.98 ADJ 37,138 8,856 9,321 0.80 0.81 0.80 INTJ 279 973 1,233 0.18 0.22 0.20 PR OPN 23,184 17,463 13,444 0.63 0.57 0.60 AD V 8,928 6,909 2,276 0.80 0.56 0.66 DET 16,766 4,431 11,671 0.59 0.79 0.68 NUM 11,246 878 664 0.94 0.93 0.94 X 0 744 0 0.00 0.00 0.00 A compar ative analysis of P oS ta g ging tools for Hindi and Mar athi (Pr atik Nar ayanr ao Kalamkar) Evaluation Warning : The document was created with Spire.PDF for Python.