IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
15,
No.
2,
April
2026,
pp.
1876
∼
1890
ISSN:
2252-8938,
DOI:
10.11591/ijai.v15.i2.pp1876-1890
❒
1876
A
comparati
v
e
study
of
Arabic
mor
phological
analyzers
Omar
Saadiy
eh
1
,
Alaaeddine
Ramadan
2
,
Chamseddine
Zaki
3
,
Mohamad
Hajjar
4
,
Gilles
Ber
nard
1
1
P
aragraphe
Research
Lab,
Uni
v
ersity
of
P
aris
VIII,
P
aris,
France
2
Colle
ge
of
Engineering
and
Computing,
American
Uni
v
ersity
of
Bahrain,
Rif
f
a,
Bahrain
3
Colle
ge
of
Engineering
and
T
echnology
,
American
Uni
v
ersity
of
the
Middle
East,
Eg
aila,
K
uw
ait
4
F
aculty
of
T
echnology
,
Lebanese
Uni
v
ersity
,
Saida,
Lebanon
Article
Inf
o
Article
history:
Recei
v
ed
Jun
8,
2025
Re
vised
Jan
9,
2026
Accepted
Jan
25,
2026
K
eyw
ords:
Arabic
dialects
processing
Arabic
linguistics
Arabic
natural
language
processing
Language
learning
Morphological
analyzer
ABSTRA
CT
The
eld
of
Arabic
na
tural
language
processing
(NLP)
has
witnessed
signicant
adv
ancements,
dri
v
en
by
the
de
v
elopment
of
v
arious
morphological
analyzers.
This
paper
compares
se
v
eral
major
Arabic
morphological
analyzers
and
e
xamines
their
ability
to
handle
w
ord
ambiguities,
process
dialects,
operate
ef
ciently
,
and
support
do
wnstream
NLP
tasks.
By
re
vie
wing
pre
vious
studies,
we
identify
k
e
y
g
aps,
including
the
limited
resources
for
dialects,
the
shortage
of
annotated
corpora,
and
challenges
related
to
system
scalabilit
y
.
The
study
also
hi
ghlights
future
directions,
such
as
b
uilding
lar
ger
and
more
di
v
erse
corpora,
adapting
neural
models
for
dialects,
and
de
v
eloping
analyzers
that
are
more
interpretable
and
trustw
orth
y
.
Ov
erall,
this
com
parati
v
e
o
v
ervie
w
aims
to
pro
vide
a
clearer
understanding
of
the
current
state
of
Arabic
morphological
analyzers,
synthesize
e
xisting
research,
and
of
fer
practical
recommendations
for
future
w
ork
in
this
area.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Alaaeddine
Ramadan
Colle
ge
of
Engineering
and
Computing,
American
Uni
v
ersity
of
Bahrain
Rif
f
a,
Bahrain
Email:
alaaeddine.ramadan@aubh.edu.bh
1
INTR
ODUCTION
W
ith
more
than
400
million
speak
ers
w
orldwide,
Arabic
is
the
of
cial
language
in
22
countries.
It
is
rank
ed
as
the
fourth
commonly
used
language
on
the
internet
[1].
Research
conducted
by
se
v
eral
researchers
[2],
[3]
has
identi
ed
three
v
ariations
within
Arabic:
i)
classical
Arabic
(CA),
kno
wn
for
its
use
in
literary
w
orks
and
the
Quran,
ii)
modern
standard
Arabic
(MSA)
is
commonly
used
in
formal
conte
xts,
and
ii
i)
dialectal
Arabic
(D
A)
utilized
in
informal
con
v
ersations
and
e
v
eryday
interactions
[4].
D
A
further
branches
out
into
six
groups,
including
Egyptian,
Le
v
antine,
Gulf,
Iraqi,
Maghrebi,
and
other
re
gional
dialects
[2],
[5],
[6].
Similar
,
to
semitic
languages
Arabic
features
a
morphological
structure
characterized
by
root
letters,
prex
es,
suf
x
es
and
di
v
erse
grammatical
patterns.
Morphology
in
v
olv
es
studying
ho
w
w
ords
are
structured
from
units
kno
wn
as
morphemes.
Morphemes
are
the
units
of
meaning,
in
a
language.
It’
s
crucial
to
understand
ho
w
the
y
are
arranged
within
w
ords
for
language
processing
tasks
lik
e
part
of
speech
tagging
parsing
and
machine
translation
[7].
In
Arabic
core
w
ords
ha
v
e
inected
forms.
F
or
instance
Arabic
v
erbs
boast
5400
forms
compared
to
6
in
English
as
sho
wn
in
T
able
1.
T
able
1.
English
v
erb
paradigm
VB
VBD
VB
G
VBN
VBP
VBZ
go
went
going
gone
go
goes
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
1877
In
grammar
v
erbs
can
change
forms
to
con
v
e
y
tenses
and
grammatical
aspects.
The
base
form
is
VB
past
tense
is
denoted
by
VBD,
the
gerund
or
present
participle
form
is
VBG,
past
participle
form
is
VBN,
non
3rd
person
si
n
gul
ar
form
is
VBP
,
and
the
3rd
person
singular
form
is
VBZ.
Arabic
v
erbs
can
tak
e
on
forms
based
on
gender
(2),
number
(3),
person
(3),
aspect
(3),
particle
(2),
mood
(3),
v
oice
(2),
pronominal
clitic
(12),
and
conjunction
clitic
(3)
combinations
as
illustrated
in
Figure
1.
Figure
1.
Arabic
morphology
e
xample
Arabic
follo
ws
a
system
based
on
roots,
where
w
ords
are
typically
created
from
a
three
letter
root.
Thi
s
root
system
enables
the
formation
of
w
ords,
with
meanings
through
dif
ferent
patterns
and
af
x
es,
resulting
in
a
di
v
erse
range
of
le
xical
forms
.
In
Arabic,
the
process
of
inection
in
v
olv
es
altering
prex
es,
suf
x
es
and
inx
es
to
e
xpress
functions
lik
e
tense,
mood,
v
oice,
number
,
gender
,
and
case.
F
or
e
xample
the
root
"k
t
b"
t
can
gi
v
e
rise
to
w
ords
such
as
"kataba"
t
(he
wrot
e)
"yaktub
u"
tk§
(he
writes)
"kitab"
At
(book),
and
"maktab"
tk
(of
ce).
This
morphological
comple
xity
allo
ws
for
v
ersatility
and
richness
in
language
e
xpression.
Also
introduces
challenges
in
processing
Arabic
for
natural
language
applications.
The
rest
of
the
document
is
structured
as
follo
ws:
section
2
pro
vides
an
o
v
ervie
w
of
adv
ancement
s
in
syntactic
and
semantic
analysis
specically
tailored
for
Arabic.
Section
3
e
xplores
approaches
to
Arabic
morphological
analysis,
highlighting
its
k
e
y
aspects
and
the
a
v
ai
lable
analyzers.
Lastly
section
4
of
fers
an
e
xamination
of
e
xisting
techniques
used
in
morphological
analysis,
within
Arabic
linguistics.
In
section
5
the
challenges
and
future
prospect
s
of
natural
language
processing
(NLP)
are
e
xplored.
This
article
concludes
with
a
summary
of
the
ndings.
2
SYNT
A
CTIC
AND
SEMANTIC
AN
AL
YSIS
The
tw
o
basic
methods
for
comprehending
natural
language
are
syntactic
and
semantic
analysis.
–
Syntactic
analysis
(pars
ing)
e
xamines
sentence
structure
according
to
grammatical
rules.
In
Arabic,
this
is
challenging
due
to
rich
morphology
,
e
xible
w
ord
order
,
and
diacritics.
W
ords
often
consist
of
roots,
prex
es,
suf
x
es,
and
inx
es,
making
morphological
analysis
a
prerequisite.
Morphological
ambiguity
and
v
ariable
w
ord
order
notably
af
fect
parsing
performance
[8].
F
or
e
xample,
the
root
--
(k-t-b)
produces
A
(k
¯
atib,
writer),
At
(kit
¯
ab,
book),
and
wtk
(makt
¯
ub,
written).
Although
Arabic
typically
fol
lo
ws
a
v
erb-subject-object
(VSO)
order
,
as
in
TAft
r
(akal
al-rajul
al-tuf
f
¯
ah
.
ah,
the
man
ate
the
apple),
it
can
also
use
subject-v
erb-object
(SV
O)
and
others,
adding
comple
xity
.
Diacritics,
which
mark
short
v
o
wels,
are
often
omitted,
leading
to
ambiguity
t
(ktb)
can
mean
Ábata
(kataba,
he
wrote),
Ábitu
(kutiba,
it
w
as
written),
or
utu
(kutub,
books).
Accurate
analysis
relies
on
rules
go
v
erning
agreement,
conjug
ation,
and
particle
use.
–
Semantic
analysis
focuses
on
meaning
at
w
ord,
phrase,
and
sentence
le
v
els.
In
Arabic,
it
is
complicated
by
polysemy
,
synon
ymy
,
and
conte
xt
dependence.
W
ord
sense
disambiguation
(WSD)
is
vital;
for
e
xample,
y
(’ayn)
may
mean
“e
ye,
”
“spring,
”
or
“sp
y
.
”
Named
entity
recognition
(NER)
identies
entities
such
as
dm
(Muh
.
ammad)
or
-r¡Aq
(al-Q
¯
ahirah,
Cairo).
Semantic
role
labeling
denes
relationships,
as
in
¨l
Y
Atk
dm
YW
(a‘t
.
¯
a
Muh
.
ammad
al-kit
¯
ab
il
¯
a
‘
Al
¯
ı),
where
dm
is
the
gi
v
er
,
Atk
the
object,
and
¨l
the
recipient.
Le
xical
semantics
e
xplores
relations
lik
e
synon
yms
(
dy`F
a
nd
C¤rs
),
anton
yms,
and
hierarchies.
Conte
xtual
analysis
resolv
es
ambiguities,
as
in
TFCdm
Y
¡Ð
(dhahaba
il
¯
a
al
-madrasah),
meaning
“He
went
to
school,
”
where
the
subject
is
implied.
Ambiguity
remai
ns
the
main
challenge
in
Arabic
syntactic
and
semantic
analysis,
stemming
from
omitted
diacritics
and
e
xible
w
ord
order
.
F
or
instance,
Atk
t
(kataba
al-kit
¯
ab)
can
mean
“he
wrote
the
book”
or
“the
book
w
as
A
compar
ative
study
of
Ar
abic
morpholo
gical
analyzer
s
(Omar
Saadiyeh)
Evaluation Warning : The document was created with Spire.PDF for Python.
1878
❒
ISSN:
2252-8938
written.
”
Dialectal
v
ariation
further
complicates
processing;
“house”
is
y
(bayt)
in
MSA
b
ut
C
(d
¯
ar)
or
Mw
(al-h
.
a
wsh)
in
dialects.
The
scarcity
of
annotated
corpora
and
linguistic
resources
also
limits
progress.
Despite
these
challenges,
syntactic
and
semantic
analysis
are
essential
for
adv
ancing
Arabic
NLP
tasks
such
as
translation,
information
retrie
v
al,
and
sentiment
analysis.
3
ARABIC
MORPHOLOGICAL
AN
AL
YSIS
APPR
O
A
CHES
AND
A
V
AILABLE
AN
AL
YZERS
3.1
Arabic
mor
phological
analysis
appr
oaches
This
section
e
xplores
v
arious
approaches
to
linguistic
anal
ysis
based
on
le
xicons,
which
systemati
cally
store
linguistic
rules.
The
le
xicon
comprises
tw
o
main
sections:
the
rst
contains
the
w
ord
roots,
patterns,
and
stems,
and
the
other
displays
related
information
in
the
analysis
outcomes.
The
k
e
y
approaches
discussed
are:
–
Root-pattern
morphology:
focuses
on
the
relationship
between
meaning
and
form,
using
nonconcatenati
v
e
methods
to
deri
v
e
stems
from
root–pattern
combinations,
as
described
by
McCarth
y
.
Prominent
systems
include
the
Buckw
alter
Arabic
morphological
analyzer
(B
AMA)
and
standard
Arabic
morphological
analysis
(SAMA)
(T
able
2).
–
Stem-based
morphology:
e
xpands
be
yond
surf
ace
forms
to
pro
vide
linguistic
and
semantic
data
for
each
le
xical
item.
It
inte
grates
root–pattern
structures
with
syntactic
information,
of
fering
a
more
intuiti
v
e
frame
w
ork
for
le
xicon
e
xpansion.
–
Le
x
eme-based
morphology:
recognizes
that
a
single
le
x
eme
can
produce
multiple
w
ord
forms,
focusing
on
stem-le
v
el
representations
rather
than
indi
vidual
root
or
pattern
constituents.
–
Syllable-based
morphology:
although
ef
fecti
v
e
in
some
European
languages
,
syllable-based
approaches
remain
lar
gely
une
xplored
in
semitic
languages
lik
e
Arabic.
T
able
2.
Examples
of
root-pattern
morphology
Root
P
attern
In
Arabic
Meaning
(drs)
xC
(CaCaCa)
`
(darasa)
ÁFÁCÁ
study
(CA
CiC)
iAa
(dAris)
xÃCAÁ
student
(CaC
aCa
)
ÅalaÌ`a
(dar
asa
)
ÁsaÄCÁ
he
teaches
(CA
CiC)
iAa
(dAriswn)
ÁwuFÃCAÁ
group
of
students
3.2
A
v
ailable
mor
phological
analyzers
Standard
Arabic
language
morphological
analysis
(SALMA)
[9]
w
as
e
v
aluated
using
the
SALMA
Gold
Standard
corpus,
with
a
focus
on
the
prediction
accurac
y
of
22
morphological
features
at
the
morpheme
le
v
el.
The
e
v
aluation
included
tw
o
distinct
Arabic
te
xt
samples:
the
Qur’an
[10]
and
the
CCA
[11].
Exact
match
accurac
y
reached
71.21%
for
the
CCA
corpus
and
53.50%
for
the
Qur’an,
with
man
y
of
t
he
discrepancies
being
minor
(e.g.,
symbol
substitutions).
The
system
sho
wed
particularly
strong
performance
in
15
morphological
cate
gories,
including
part-of-speech
(POS),
v
erb
and
particle
subcate
gories,
deniteness,
v
oice,
and
root-related
features
achie
ving
accuracies
of
98.53%
for
CCA
and
90.11%
for
Qur’an.
The
remaining
7
cate
gories,
such
as
gender
,
number
,
and
case,
sho
wed
s
lightly
lo
wer
accurac
y
,
ranging
from
81.35%–97.51%
for
CCA
and
74.25%–89.03%
for
the
Qur’an.
These
results
demonstrate
the
SALMA
t
agger’
s
ef
fecti
v
eness
in
deli
v
ering
ne-grained
morphological
analysis
across
v
arious
Arabic
te
xt
genres,
le
v
eraging
traditional
Arabic
grammar
rules
within
a
kno
wledge-based
frame
w
ork.
In
terms
of
methodology
,
the
SALMA
tagger
is
a
rule-based,
kno
wledge-dri
v
en
analyzer
,
b
uilt
on
traditional
Arabic
grammar
and
the
SALMA-ABCLe
xicon,
a
massi
v
e
le
xical
resource
compiled
from
23
classical
dictionaries
(14M
tok
ens;
2.7M
v
o
welized
pai
rs).
Its
modular
design
inte
grates
tok
enization,
lemmatization,
root
e
xtraction,
v
o
welization,
and
pattern
generation,
allo
wing
for
highly
detailed
morpheme-le
v
el
tagging
across
22
features.
Its
main
strength
is
the
high
accurac
y
in
features
lik
e
POS,
v
erb
type,
and
root-related
cate
gories,
making
it
a
strong
choice
for
detailed
corpus
annotation.
Its
weaknesses
appear
in
cate
gories
lik
e
gender
,
case,
and
number
,
particularly
in
classical
Arabic,
where
the
performance
drops
compared
to
MSA.
Error
analysis
sho
ws
that
man
y
f
ailures
are
minor
(e.g.,
symbol
substitution
or
misassigned
diacritics),
though
some
errors
reect
the
comple
xity
of
handling
ambiguous
morphosyntactic
features.
While
per
-feature
accurac
y
is
reported,
statistical
signicance
testing
and
condence
interv
als
are
absent,
lea
ving
rob
ustness
across
corpora
less
certain.
Int
J
Artif
Intell,
V
ol.
15,
No.
2,
April
2026:
1876–1890
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
1879
SAMA
[12]
follo
ws
a
rule-based
le
xical
approach
rather
than
statistical
or
neural
methods.
It
b
uil
d
s
on
the
Buckw
alter
analyzer
by
e
xpanding
root
and
pattern
co
v
erage
through
an
enriched
le
xicon
and
rened
af
xation
rules.
The
system
outputs
all
possible
morphological
parses
for
a
gi
v
en
surf
ace
form,
which
pro
vides
wide
co
v
erage
b
ut
lea
v
es
the
task
of
conte
xtual
disambiguation
to
e
xternal
modules.
This
design
reects
both
a
strength
comprehensi
v
eness
of
analysis
and
a
weakness,
since
in
practice
the
ra
w
outputs
are
often
too
ambiguous
to
use
without
further
processing.
The
analyzer
w
as
primarily
de
v
eloped
and
distrib
uted
by
the
Linguistic
Data
Consortium
(LDC),
and
while
it
does
not
train
on
a
specic
corpus
in
the
w
ay
statistical
models
do,
its
le
xicons
are
informed
by
e
xtensi
v
e
le
xical
resources
curated
o
v
er
years
of
Arabic
linguistic
research.
In
terms
of
e
v
aluation,
SAMA
is
documented
as
a
linguistic
resource
rather
than
a
benchmark
ed
system,
so
no
formal
e
v
aluation
numbers
(e.g.,
accurac
y
for
POS
tagging,
stemming,
or
lemmatization)
are
typically
reported,
and
no
condence
interv
als
or
statistical
signicance
testing
are
pro
vided.
B
AMA
[13]
is
a
rule-based,
le
xicon-dri
v
en
tool
for
Arabic
morphological
analysis,
designed
for
MSA
by
T
im
Buckw
alter
.
It
uses
an
ASCII-based
representation
and
includes
modules
for
tok
enization,
transliteration,
le
xicon
lookup,
and
morphological
analysis,
producing
detailed
output
with
features
lik
e
person,
number
,
gender
,
aspect,
and
v
oice.
Initially
implemented
in
Perl
and
later
in
Ja
v
a,
B
AMA
supports
only
Arabic
and
of
fers
multiple
analyses
per
tok
en.
It
is
widely
used
in
linguistic
research,
NLP
applications,
and
Arabic
language
technologies
.
Resources
are
curated
internally
,
with
no
reported
training
corpus
or
e
v
aluation
ag
ainst
gold
standards
in
the
original
release.
P
erformance
metrics
appear
only
in
later
comparati
v
e
studies,
and
no
statistical
signicance
testing
is
a
v
ailable
for
B
AMA
alone.
F
arasa
analyzer
[14]
is
an
adv
anced
Arabic
NLP
tool
de
v
eloped
by
the
Qatar
Computing
Rese
arch
Institute
(QCRI).
It
is
grounded
in
a
statistical
learning
approach,
specically
an
support
v
ector
machine
(SVM)-rank
classier
with
linear
k
ernels,
which
le
v
erages
a
wide
set
of
linguistic
and
probabilistic
features
such
as
prex/suf
x
li
k
elihoods,
stem
templates,
and
le
xicon
lookups.
Unlik
e
purely
rule-based
analyzers,
it
combines
statistical
ranking
with
curated
l
e
xi
cons,
striking
a
balance
between
ef
cienc
y
and
accurac
y
.
It
pro
vides
comprehens
i
v
e
NLP
capabilities
via
a
RESTful
W
eb
API
and
is
a
v
ailable
as
standalone
Ja
v
a
jars.
F
arasa
supports
the
Arabic
language
and
includes
components
such
as
se
gmentation,
spell
checking,
POS
tagging,
lemmatization,
diacritization,
dependenc
y
parsing,
constituenc
y
parsing,
and
NER.
The
accurac
y
of
F
arasa
(up
to
98.94%)
matches
or
slightly
surpasses
state-of-the-art
systems.
Error
analysis
re
v
eals
weaknesses
in
handling
foreign
named
entities
and
o
v
erly
long
w
ords
with
multiple
v
alid
se
gmentations.
In
these
cases,
the
model
often
generates
the
correct
se
gmentation
b
ut
misranks
it,
suggesting
room
for
impro
v
ement
through
richer
g
azetteers
or
feature
e
xpansion.
The
analyzer
w
as
trained
on
parts
of
the
Penn
Arabic
T
reebank
(A
TB)
[15]
and
a
lar
ge
Aljazeera
corpus
(94M
w
ords,
2000–2011),
and
te
sted
both
on
A
TB
subsets
and
an
independent
W
ikiNe
ws
set
of
18,271
w
ords.
F
or
do
wnstream
e
v
aluation,
F
arasa
w
as
benchmark
ed
on
machine
translation
using
IWSL
T
TED
talks
(183K
sentences)
and
the
NEWS
corpus
(202K
sentences),
and
on
information
retrie
v
al
(IR)
using
the
TREC
2001/2002
Arabic
ne
wswire
collection
(59.6M
w
ords,
75
topics).
AlKhalil
analyzer
has
tw
o
v
ersions
:
i)
the
rst
v
ersion,
de
v
eloped
in
2010
[16],
pro
vides
all
possible
v
o
welized
forms
for
a
gi
v
en
w
ord.
Each
v
o
welized
form
is
accompanied
by
detailed
morphological
information,
including
clitics,
stem,
root,
and
POS
tag,
and
ii)
the
second
v
ersion,
de
v
eloped
in
2017
[17],
it
adopts
a
rule-based
morpho-syntactic
approach
implemented
in
Ja
v
a.
It
relies
on
an
e
xtensi
v
e,
carefully
structured
le
xicon
of
deri
v
ed
and
non-deri
v
ed
w
ords,
clitic
lists,
and
root
–pattern
les,
enriched
with
lemmas
and
patterns.
Its
w
orko
w
includes
normalizat
ion,
se
gmentat
ion
into
proclitics/stems/enclitics,
and
parallel
analysis
of
st
ems
as
e
xceptional,
non-deri
v
ed,
deri
v
ed
nouns,
or
v
erbs.
V
alidation
steps
check
compatibility
between
clitics,
stems,
and
diacritics
before
producing
the
set
of
possible
analyses.
A
major
strength
of
this
system
is
its
broad
le
xical
co
v
erage
(o
v
er
4.1M
v
o
welized
stems),
high
accurac
y
,
and
speed,
which
together
mak
e
it
rob
ust
and
ef
cient
for
do
wnstream
tas
ks.
Ho
we
v
er
,
lik
e
man
y
out-of-conte
xt
analyzers,
it
produces
multiple
candidate
analyses
for
ambiguous
w
ords,
which
can
o
v
erwhelm
applications
without
a
disambiguation
module.
F
or
e
xample,
the
non-v
o
welized
form
“
l
”
can
yield
outputs
lik
e
l
(science),
ala
(ag),
or
ilu
(w
as
kno
wn),
underscoring
its
reliance
on
e
xternal
disambiguation
for
conte
xt-sensiti
v
e
interpretation.
The
system
w
as
e
v
aluated
on
more
than
72
million
diacritized
w
ords
from
the
T
ashk
eela
corpus
(63M)
[18],
Nemlar
(0.5M),
and
RDI
(8.5M).
Results
sho
wed
co
v
erage
of
99.31%,
wi
th
an
a
v
erage
of
4.71
lemmas,
5
.08
stems,
and
8.05
v
o
welized
forms
per
w
ord,
reecting
its
rich
le
xical
resources.
On
Nemlar
,
it
achie
v
ed
97.16%
lemma
match,
96.76%
stem
match,
and
97.21%
diacritization
accurac
y
,
with
full-feature
match
at
96.56%.
Its
throughput
reached
632
w
ords/second,
balancing
speed
with
co
v
erage.
The
authors
do
A
compar
ative
study
of
Ar
abic
morpholo
gical
analyzer
s
(Omar
Saadiyeh)
Evaluation Warning : The document was created with Spire.PDF for Python.
1880
❒
ISSN:
2252-8938
not
report
statistical
signicance
testing
or
condence
interv
als.
Arabic
Stanford
Se
gmenter:
the
Arabic
Stanford
Se
gmenter
[19]
is
a
widel
y
recognized
tool
for
morphological
se
gmentation
and
tok
enization
of
Arabic
te
xt.
De
v
eloped
as
part
of
the
Stanford
NLP
Group’
s
toolkit,
it
is
based
on
a
conditional
random
elds
(CRF)
model
trained
on
annotated
Arabic
corpora.
The
tool
is
particularly
ef
fecti
v
e
in
addressing
the
challenges
of
Arabic
morphology
,
which
include
af
xation,
clitics,
and
the
absence
of
clear
w
ord
boundaries
in
written
form.
The
Stanford
Se
gmenter
attempts
to
se
gment
the
clitics
correctly
using
a
statistical
model
that
learns
from
linguistic
patterns
in
annotated
data,
primarily
dra
wing
on
the
Penn
Arabic
T
reebank
(P
A
TB)
[15].
Unlik
e
rule-based
systems
that
may
requi
re
e
xtensi
v
e
linguistic
input
and
manual
tuning,
the
Arabic
Stanford
Se
gmenter
le
v
erages
machine
learning
techniques,
which
allo
w
it
to
generalize
well
across
dif
ferent
domains.
It
outputs
both
se
gmented
tok
ens
and
their
corresponding
morphological
analyses,
making
it
a
comprehensi
v
e
preprocessing
solution
for
modern
Arabic
NLP
pipelines.
Reported
results
sho
w
strong
performance
with
an
F1
of
92.09%
on
Egyptian
Arabic
and
statis
tically
signicant
g
ains
(
p
<
0
.
001
)
o
v
er
prior
baselines,
plus
a
7
decoding
speedup
compared
to
MAD
A
and
MAD
A-ARZ.
Error
analysis
highlights
three
issues:
i)
inconsistencies
in
gold
data,
ii)
o
v
erly
local
se
gmentation
features,
and
iii)
conte
xt-sensiti
v
e
ambiguities
(e.g.,
wla
meaning
“and
not”
or
“or
,
”
and
-na
as
pronoun
vs.
v
erb
suf
x).
Strengths
include
dialect-agnostic
design,
tested
impro
v
ements,
and
ef
cienc
y;
weaknesses
lie
in
handling
conte
xt-sensiti
v
e
se
gmentation
and
data
inconsistencies.
MAD
AMIRA
[20]
is
a
morphological
analyzer
that
assigns
morphological
tags
to
each
w
ord
in
a
sentence
by
considering
the
w
ord’
s
conte
xt.
It
inte
grates
tw
o
morphological
analysis
systems:
MAD
A
[21]
and
AMIRA
[22].
Initially
,
the
system
analyzes
the
w
ords
of
a
sentence
out
of
conte
xt
using
the
SAMA
analyzer
[12].
T
o
choose
a
single
solution
from
the
multiple
options
generated
in
this
rst
phase,
a
disambiguation
step
based
on
the
use
of
SVM
and
the
language
models
is
performed.
It
adopts
a
machine
learning
approach
that
relies
on
linear
SVM
classiers
and
n-gram
language
models
for
morphological
feature
prediction,
combined
with
ranking
modules
for
disambiguation.
Unlik
e
its
Perl-based
pre
decessors,
it
is
implemented
in
Ja
v
a,
which
contrib
utes
to
its
rob
ustness,
portability
,
and
remarkable
ef
cienc
y
,
achie
ving
speed
impro
v
ements
of
up
to
20
.
The
analyzer
supports
both
MSA
and
Egyptian
Arabic
(EGY),
using
the
P
A
TB
(parts
1–3)
and
Egyptian
Arabic
T
reebanks
(parts
1–6)
as
training
data,
respecti
v
ely
.
The
test
sets
included
around
25K
w
ords
for
MSA
and
20K
for
EGY
.
Ev
aluation
sho
ws
h
i
gh
accurac
y:
for
MSA,
95.9%
POS
accurac
y
,
96.0%
lemma
accurac
y
,
and
86.3%
diacritization;
for
EGY
,
92.4%
POS,
87.8%
lemma,
and
83.2%
diacritization.
T
ok
enization
reached
98.9%
perfect
accurac
y
in
MSA
and
96.6%
in
EGY
.
MAD
AMIRA
’
s
strengths
lie
in
its
broad
functionality
(morphological
disambiguation,
diacritization,
POS
tagging,
tok
enization,
glossing,
and
stemming),
speed,
and
e
xtensibility
.
It
also
all
o
ws
e
xible
tok
enization
schemes
and
pro
vides
both
XML
and
HTTP
interf
aces,
making
it
user
-friendly
.
W
eaknesses
include
a
slight
drop
in
accurac
y
compared
to
MAD
A
for
some
metrics
(up
to
0.6%
lo
wer
in
EGY
full
morphological
accurac
y)
and
hea
vy
memory
requirements
(up
to
2.5
GB
heap
space).
Ov
erall,
e
v
aluation
results
are
reported
with
clear
accurac
y
percentages
b
ut
without
statistical
signicance
testing
or
condence
interv
als,
lea
ving
rob
ustness
comparisons
open
for
further
analysis.
CAMEL
MORPH
MSA
[23]
is
a
comprehensi
v
e
and
publicly
a
v
ailable
morphological
analyzer
and
generator
for
MSA.
Featuring
o
v
er
100,000
lemmas
and
support
for
rare
morphological
features
inherited
from
classical
Arabic,
it
signicantly
e
xpands
the
analytical
capabilities
of
Arabic
NLP
tools.
The
system
generates
approximately
1.45
billion
analyses
and
535
mil
lion
distinct
diacritizations.
CAMEL
MORPH
MSA
inte
grates
seamlessly
with
the
camel
tools
Python
suite
[24],
ensuring
ease
of
use.
Ev
aluation
across
lar
ge
datasets,
including
MSA-CB,
CA-CB,
and
P
A
TB-T
rain,
sho
ws
rob
ust
accurac
y
and
signicantly
impro
v
ed
co
v
erage.
In
terms
of
strengths,
CAMEL
MORPH
MSA
dramatically
impro
v
es
le
xical
co
v
erage
and
reduces
out-of-v
ocab
ulary
(OO
V)
rates
by
36%
compared
to
SAMA/CALIMA
across
massi
v
e
corpora
lik
e
MSA-CB
(9.9B
tok
ens,
11.4M
types)
and
CA-CB
(0.7B
tok
ens,
2.4M
types).
Ev
aluation
on
P
A
TB-T
rain
sho
wed
a
95.9%
recall,
with
manual
inspection
attrib
uting
about
90%
of
mismatches
to
annotation
errors
rather
than
the
system
itself,
highlighting
its
reliability
.
Error
analyses
re
v
ealed
challenges
in
handling
spelling
inconsistencies,
lemma–stem
mismatches,
and
ambiguous
paradigms.
Its
main
weakness
lies
in
speed,
running
2.4–2.9
times
slo
wer
than
SAMA,
though
of
fering
richer
analyses
per
w
ord.
Importantly
,
the
results
were
reported
with
dataset-scale
e
v
aluations
and
manual
error
breakdo
wns,
b
ut
without
e
xplicit
statistical
signicance
testing
or
condence
interv
als.
Alma
[25]
is
an
open-source
tool
for
Arabic
language
processing
that
inte
grates
lemmatization,
POS
tagging,
and
root
e
xtraction.
Its
approach
is
primarily
frequenc
y-based
and
le
xicon-dri
v
en,
le
v
eraging
a
lar
ge
pre-computed
memory
b
uilt
from
the
Qabas
le
xicographic
database
[26],
the
Shamela
corpus,
and
digitized
Int
J
Artif
Intell,
V
ol.
15,
No.
2,
April
2026:
1876–1890
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
1881
le
xicons.
This
design
shifts
computational
comple
xity
from
runtime
analysi
s
to
memory
construction,
enabling
Alma
to
achie
v
e
v
ery
high
processing
speeds
lemmatizing
around
34,000
tok
ens
per
second.
F
or
OO
V
cases,
Alma
inte
grates
a
ne-tuned
bidirectional
encoder
representati
o
ns
from
transformers
(BER
T)
model
to
impro
v
e
POS
tagging,
which
achie
v
ed
F1-scores
abo
v
e
98%
on
the
Arabic
T
reebank
(A
TB)
for
POS
classication.
Its
co
v
erage
e
xtends
across
40
POS
tags
and
includes
the
rst
fully
functional
root
tagger
grounded
in
Qabas.
Ev
aluation
results
highlight
Alma’
s
competiti
v
e
performance:
on
the
LDC
Arabic
T
reebank
(339k
tok
ens)
it
reached
87.8%
in
true
lemmatization
and
92.7%
in
POS
tagging,
while
on
the
SALMA
corpus
(34k
tok
ens)
it
achie
v
ed
90.5%
and
93.8%
respecti
v
ely
.
These
scores
were
further
impro
v
ed
when
combined
with
BER
T
for
OO
V
handling.
Speed
compari
sons
sho
wed
Alm
a
v
astly
outperformed
MAD
AMIRA
(1710
seconds
vs.
10
seconds
on
A
TB).
Error
analysis
re
v
ealed
most
f
ailures
were
due
to
ambiguous
lemmatization
(61%
of
errors),
where
Alma
f
a
v
ored
the
most
frequent
lemma
e
v
en
if
conte
xtually
less
accurate,
and
to
general
POS
confusions,
such
as
mistaking
adjecti
v
es
for
nouns.
Ibn-Ginni
is
a
h
ybrid
Arabi
c
morphological
analyzer
that
combines
the
speed
and
precision
of
Buckw
alter
Arabic
morphological
analyzer
(B
AMA)
with
the
broader
classical
Arabic
co
v
erage
of
the
Alkhalil
analyzer
.
T
o
impro
v
e
co
v
erage,
morphological
data
for
3
million
unique
Arabic
w
ords
w
as
generated
using
Alkhalil,
rened,
and
added
to
B
AMA
’
s
database.
The
resulting
sys
tem
analyzed
600,000
more
w
ords
than
B
AMA
alone,
with
an
a
v
erage
analysis
time
of
0.3
milliseconds
per
w
ord.
In
benchmark
tes
ting,
Ibn-Ginni
pro
vided
full
morphological
solutions
for
72.72%
of
w
ords
and
partial
solutions
for
24.24%,
demonstrating
impro
v
ed
performance
and
ef
cienc
y
[27].
SinaT
ools
[28]
is
an
open-source
toolkit
de
v
eloped
at
Birzeit
Uni
v
ersity
.
It
adopts
a
h
ybrid
methodology
that
inte
grates
rule-based
resources
with
modern
machine
learning,
particularly
ne-tuned
BER
T
models.
Its
morphological
analysis
module,
Alma,
relies
on
a
frequenc
y-based
le
xicon
where
lemmatization,
POS
tagging,
and
root
tagging
are
handled
through
dictionary
lookups,
while
a
BER
T
-based
model
supports
OO
V
handl
ing.
Other
modules,
including
NER
and
w
ord
sense
disambiguation
(WSD),
are
also
po
wered
by
transformer
models
such
as
AraBER
T
v2
[29].
This
design
not
only
ensures
speed
and
accurac
y
b
ut
also
pro
vides
e
xibility
through
v
arious
inte
gration
interf
aces,
including
CLI,
API,
and
SDK.
Its
modularity
and
e
xtensibility
allo
w
de
v
elopers
to
plug
in
additional
NLP
tasks
with
minimal
ef
fort,
which
highlights
its
strength
as
a
research
and
applied
tool.
Ho
we
v
er
,
the
reliance
on
pre-computed
le
xicons
limits
its
adaptability
in
unseen
or
domain-shifted
conte
xts,
as
illustrated
by
consistent
v
erb-tagging
for
ambiguous
w
ords
re
g
ardless
of
conte
xt.
The
toolkit
is
trained
and
e
v
aluated
on
se
v
eral
corpora.
Morphological
e
v
aluation
w
as
conducted
on
the
Arabic
T
reeBank
(A
TB,
339k
tok
ens)
and
the
SALMA
dataset
(34k
tok
ens),
while
NER
w
as
tested
on
the
W
ojood
datasets
[30],
including
W
ojoodGaza
(50k
tok
ens
from
ne
ws
te
xts)
and
a
Politics
dataset
(12k
tok
ens).
WSD
w
as
benchmark
ed
using
the
SALMA
sense-annotated
corpus
(34k
tok
ens),
and
semantic
relatedness
w
as
assessed
through
SemEv
al-2024
with
595
sentence
pairs.
In
terms
of
performance,
SinaT
ools
achie
v
ed
lemmatization
accurac
y
of
90.5%
and
POS
tagging
at
97.5%.
Its
NER
module
reached
an
F1-score
of
87.3%,
the
WSD
module
recorded
82.6%
o
v
erall
accurac
y
,
and
semantic
relatedness
scored
0.49
Spearman
correlation.
These
e
v
aluations,
though
impressi
v
e,
underline
that
SinaT
ools’
s
trength
lies
in
high-speed
le
xicon-back
ed
morphology
with
h
ybrid
neural
e
xtensions.
Camelira
[31]
is
a
multi-D
A
morphologi
cal
disambiguator
that
int
e
grat
es
statistical
and
neural
approaches
for
analysis
.
Its
backbone
relies
on
CAMeL
T
ools’
morphological
disambiguation
system.
The
tool
co
v
ers
four
Arabic
v
arieties:
MSA,
Egyptian,
Gulf,
and
Le
v
antine,
and
is
accessible
through
a
user
-friendly
web
interf
ace.
Distinguishing
itself
from
prior
analyzers,
Camelira
not
only
outputs
disambiguated
readings
in
conte
xt
b
ut
also
presents
alternati
v
e
out-of-conte
xt
analyses
along
with
probability
scores.
A
k
e
y
strength
is
its
inte
gration
of
dialect
identication,
which
automatically
selects
the
appropriate
disambiguator
,
making
it
v
aluable
for
learners
or
researchers
who
may
not
kno
w
the
input
dialect.
Ho
we
v
er
,
its
co
v
erage
is
limited
to
specic
dialects,
and
the
system
struggles
with
unseen
genres
or
underrepresented
v
arieties,
producing
occasional
errors
when
processing
te
xts
outside
its
training
distrib
ution.
Sample
outputs
in
the
interf
ace
demonstrate
diacritized
te
xt,
tok
enized
forms,
lemmas,
and
full
morphological
features,
b
ut
Gulf
Arabic
lacks
diacritization
due
to
una
v
ailable
annotated
resources.
In
terms
of
resources,
Camelira
relies
on
the
datasets
used
in
the
CAMeL
T
ools
pipeline
and
the
multi-Arabic
dialect
applications
and
resources
(MAD
AR)
shared
task
for
dialect
identication.
Ev
aluation
reported
for
morphological
disambiguation,
the
model
achie
v
es
accurac
y
across
dialects
as
follo
ws:
MSA
(95.9%
for
all
tags,
98.7%
POS),
Egyptian
(90.5%,
94.0%),
Gulf
(93.8%,
96.6%),
and
Le
v
antine
(85.5%,
92.7%).
A
compar
ative
study
of
Ar
abic
morpholo
gical
analyzer
s
(Omar
Saadiyeh)
Evaluation Warning : The document was created with Spire.PDF for Python.
1882
❒
ISSN:
2252-8938
According
to
Zalmout
and
Habash’
s
[32],
bidirectional
long
short-term
memory
(Bi-LSTM)
morphological
disambiguation
system
is
a
neural
morphological
disambiguation
model
for
Arabic
that
combines
Bi-LSTM
architectures
with
morphological
analyzers.
Unlik
e
earlier
rule-based
or
statistical
approaches,
the
system
le
v
erages
w
ord
and
character
le
v
el
embeddings
enriched
with
subw
ord
and
morphological
features
(such
as
af
x
es
or
dictionary-based
tags).
Its
strength
lies
in
using
the
outputs
of
a
traditional
m
orpho
l
ogical
analyzer
not
as
a
replacement
b
ut
as
a
guide,
ranking
possible
analyses
with
learned
probabilities.
This
h
ybrid
design
captures
long-distance
dependencies
better
than
x
ed-windo
w
methods
and
signicantly
boosts
disambiguation
for
morphologically
rich
features
l
ik
e
case
and
mood.
W
eaknesses
remain
in
areas
such
as
case
assignment
and
rare
cate
gories
(e.g.,
second-person
v
erbs,
passi
v
e
v
oice),
where
ambiguity
and
data
sparsity
still
limit
performance.
The
authors
pro
vide
detailed
error
analysis,
sho
wing,
for
instance,
that
while
their
system
doubles
the
cases
where
it
outperforms
MAD
AMIRA,
some
errors
persist,
especially
for
morphosyntactic
cues
hea
vily
reliant
on
syntax.
F
or
e
v
aluation,
the
authors
use
the
P
A
TB
parts
1–3
as
the
main
dataset
(503K
training
w
ords,
63K
w
ords
each
for
de
v
elopment
and
test),
complemented
with
pre-trained
embeddings
from
the
2.15
billion-w
ord
Arabic
gig
a
w
ord
corpus
[33].
Results
demonstrate
full
morphological
analysis
accurac
y
equal
to
90.0%,
and
76.9%
for
OO
V
w
ords.
Across
specic
features,
POS
tagging
reached
97.9%,
case
tagging
impro
v
ed
by
3.7
points,
and
diacritization
accurac
y
w
as
equal
to
91.7%.
These
results
are
statistically
signicant
across
metrics,
supported
by
comparati
v
e
error
anal
yses
and
condence-based
scoring.
Ov
erall,
the
system
illustrates
the
enduring
v
alue
of
combining
deep
neural
architectures
with
traditional
analyzers,
sho
wing
measurable
impro
v
ements
while
highlighting
remaining
g
aps
in
modeling
ne-grained
Arabic
morphology
.
Neural-based
Arabic
morphological
analyzer
[34]
emplo
ys
a
neural-based
approach,
specically
a
recurrent
neural
netw
ork
(RNN),
to
perform
Arabic
morphological
analysis.
Unlik
e
earlier
rule-based
systems,
this
model
le
v
erages
sub-w
ord
information
(prex
es,
inx
es,
roots,
and
suf
x
es
)
and
con
v
erts
them
into
v
ectors
for
sequence
modeli
ng.
The
analyzer
aims
to
o
v
ercome
tw
o
main
g
aps
in
prior
w
ork,
particularly
in
the
Jabalin
system:
the
inability
to
identify
nouns
and
the
hea
vy
reliance
on
dictionaries
for
v
erb
form
classication.
By
combining
pattern
e
xtraction,
sub-w
ord
v
ectorization,
and
RNN-based
classication,
the
system
is
able
to
automatically
identify
morphosyntactic
descriptions
(MSDs)
for
both
v
erbs
and
nouns.
This
design
highlights
a
strength
i
n
its
ability
to
handle
dictionary
dependenc
y
problems
and
generalize
to
nouns
deri
v
ed
from
v
erbal
roots,
something
pre
vious
analyzers
struggled
with.
Ho
we
v
er
,
one
noted
limitation
is
reduced
accurac
y
for
certain
rare
v
erb
forms
"Iii",
where
performance
dropped
to
73%,
indicat
ing
challenges
in
modeling
less
frequent
pat
terns.
F
or
its
dataset,
the
system
relies
on
the
Qur’anic
Arabic
Corpus
[10]
which
already
includes
morphological
labels
.
with
preprocessing,
reducing
the
initial
1,778
unique
w
ords
into
an
e
xpanded
dataset
of
o
v
er
30,936
labeled
w
ords
using
linguistic
pattern
tables.
After
splitting,
24,748
w
ords
were
used
for
training
and
6,188
for
testing.
The
e
v
aluation
reported
99%
o
v
erall
accurac
y
,
99%
precision,
96%
recall,
and
97%
F1-score,
with
results
brok
en
do
wn
by
POS,
aspect,
gender
,
number
,
and
v
erb
form.
Statis
tical
comparisons
with
the
Jabalin
system
sho
wed
a
mark
ed
impro
v
ement
(99%
vs.
39%
o
v
erall
accurac
y),
especially
in
noun
recognition
(99%
vs.
0%).
While
the
study
did
not
report
formal
signicance
testing
or
condence
interv
als,
the
detailed
per
-feature
results
(POS,
tense,
gender
,
number
,
and
v
erb
form)
demonstrate
rob
ust
e
v
aluation
across
morphological
cate
gories.
Morphosyntactic
tagging
wi
th
pre-trained
transformer
models
(CAMeLBER
T)
[35]
adopts
a
neural
approach,
ne-tuning
pre-trained
transformer
models
(CAMeLBER
T
-MSA
for
MSA
and
CAMeLBER
T
-Mix
for
dialects).
Each
morphosyntactic
feature
is
modeled
with
an
independent
classier
,
and
in
some
setups,
predictions
are
rened
using
e
xternal
morphological
analyzers
(SAMA
for
MSA,
CALIMA
for
Egyptian,
and
automatically
induced
analyzers
for
Gulf
and
Le
v
antine).
This
h
ybrid
design
sho
ws
clear
strengths,
it
achie
v
es
state-of-the-art
results
across
all
v
arieties
studied,
with
absolute
impro
v
ements.
Its
weaknesses,
ho
we
v
er
,
stem
from
reliance
on
analyzer
quality
and
dialectal
orthographic
inconsistenc
y;
while
manually
crafted
analyzers
impro
v
e
tagging
accurac
y
,
automatically
generated
ones
can
sometimes
hurt
performance
as
data
gro
ws.
Error
analysis
highlights
dif
culties
with
enclitics
and
nominal
distinctions,
particularly
in
dialects,
with
POS
misclassications
and
annotation
inconsistencies
contr
ib
uting
to
common
f
ailures.
The
model
w
as
trained
and
tested
on
four
corpora:
P
A
TB
(629k
tok
ens,
MSA),
Gumar
(202k,
Gulf),
ARZTB
(175k,
Egyptian),
and
Curras
(57k,
Le
v
antine).
Ev
aluation
includes
POS
tagging
and
full
morphosyntactic
feature
prediction.
F
or
POS
tagging,
accurac
y
reached
98.9%
(MSA),
96.9%
(Egyptian),
97.9%
(Gulf),
and
94.6%
(Le
v
antine).
F
or
full
morphosyntactic
tagging
(ALL
T
A
GS),
accurac
y
w
as
96.3%
(MSA),
91.0%
(Egyptian),
Int
J
Artif
Intell,
V
ol.
15,
No.
2,
April
2026:
1876–1890
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
1883
95.7%
(Gulf),
and
87.6%
(Le
v
antine).
Results
are
statistically
signicant
(McNemar’
s
test,
p
<
0
.
05
).
The
system
le
v
erages
pre-trained
transformers,
e
xternal
analyzers,
and
cross-dialectal
transfer
,
while
highlighting
resource
and
annotation
limitations.
Zalmout
et
al.
[36]
neural
disambiguator
(Egyptian
Arabic)
adopts
a
neural,
Bi-
LSTM–based
approach
for
morphological
tagging
and
disambiguation
of
Egyptian
Arabic.
It
inte
grates
w
ord
and
character
embeddings
(tested
with
both
con
v
olutional
neural
net
w
ork
(CNN)
and
long
short-term
memory
(LSTM)
v
ariants),
alongside
embedding
space
mapping
to
handle
noisy
,
user
-generated
dialectal
te
xt.
Noise
normalization
is
applied
at
the
v
ector
le
v
el,
a
v
oiding
ra
w
te
xt
alterations.
The
system
le
v
erages
morphological
analyzers
deri
v
ed
from
SAMA,
CALIMA,
and
AD
AM
resources
to
generate
candidate
analyses,
whi
ch
are
then
resolv
ed
using
neural
models.
A
lar
ge
in-house
Egyptian
Arabic
corpus
(410M
w
ords)
w
as
used
for
pre-training
embeddings,
while
the
annotated
ARZ
corpus
(160K
tok
ens;
split
into
134K
train,
20K
de
v
,
and
21K
blind
test)
w
as
used
for
supervised
training
and
e
v
aluation.
In
terms
of
performance,
the
best
conguration
achie
v
ed
POS
accurac
y
of
93.6%,
lemma
accurac
y
of
88.1%,
diacritization
accurac
y
of
83.8%,
and
full
morphological
analysis
accurac
y
of
78.4%,
yielding
signicant
error
reductions
o
v
er
the
MAD
AMIRA
baseline
(e.g.,
21.9%
relati
v
e
impro
v
ement
in
POS).
Error
analysis
re
v
ealed
strengths
in
handling
noisy
orthograph
y
and
clitic
se
gmentation,
b
ut
also
weaknesses
such
as
frequent
confusion
among
nominal
cate
gories
(74%
of
POS
errors)
and
issues
with
Hamza
spelling,
diacritization
propag
ation,
and
MSA–EGY
cognate
mismatches.
Interestingly
,
when
trained
on
COD
A-normalized
orthograph
y
,
results
nearly
matched
the
best
noise-rob
ust
setup,
suggesting
the
model
closely
approaches
the
performance
ceiling
for
such
data.
Statistical
reporting
includes
accurac
y
metrics
with
relati
v
e
error
reduction;
ho
we
v
er
,
no
e
xplicit
condence
interv
als
or
signicance
tests
were
pro
vided.
Stanza
(St
anfordNLP)
for
Arabic
UD
[37]
is
a
fully
neural,
language-agnostic
NLP
toolkit
de
v
eloped
by
Stanford,
designed
to
process
ra
w
te
xt
through
a
complete
pipeline
including
tok
enization,
multi-w
ord
tok
en
e
xpansion,
lemmatizati
on
,
POS
and
morphological
tagging,
dependenc
y
parsing,
and
NER.
F
or
Arabic,
it
relies
on
the
P
ADT
treebank
wi
thin
the
Uni
v
ersal
Dependencies
(UD
v2.5)
frame
w
ork.
Its
models
use
Bi-LSTM
architectures
with
biaf
ne
scoring
for
syntactic
analysis,
and
seq2seq
ensembles
for
lemmatization
and
tok
en
e
xpansion,
allo
wing
the
system
to
generalize
ef
fecti
v
ely
across
di
v
erse
languages.
A
k
e
y
strength
lies
in
its
broad
multilingual
co
v
erage
(66
languages)
and
ability
to
handle
te
xt
end-to-end
from
ra
w
input,
producing
competiti
v
e
or
state-of-the-art
performance.
Its
weaknesses,
ho
we
v
er
,
include
slo
wer
runtime
compared
to
lightweight
systems
such
as
spaCy
,
a
n
d
occasional
errors
in
sentence
se
gmentation
and
multi-w
ord
tok
en
e
xpansion
in
morphologically
rich
languages.
The
authors
also
ackno
wledge
computational
cost
as
a
limiting
f
act
or
for
scal
ability
and
ef
cienc
y
.
F
or
datas
ets,
Stanza
w
as
trained
on
112
corpora,
with
Arabic
specically
using
the
P
ADT
UD
treebank
(non-cop
yrighted
portion),
plus
additional
NER
data
such
as
A
QMAR
for
NER.
The
Arabic
P
ADT
e
v
aluation
sho
ws
v
ery
high
tok
enization
accurac
y
(99.98)
and
strong
performance
in
POS
tagging
(UPOS
94.89,
XPOS
91.75,
UFeats
91.86),
lemmatization
(93.27),
and
dependenc
y
parsing
(U
AS
83.27,
LAS
79.33).
F
or
Arabic
NER,
Stanza
achie
v
ed
an
F1
score
of
74.3
on
A
QMAR,
comparable
to
FLAIR
b
ut
outperforming
spaCy
where
a
v
ailable.
Results
were
benchmark
ed
ag
ainst
UDPi
pe
and
spaCy
using
the
of
cial
UD
e
v
aluation
script,
b
ut
no
statistical
signicance
tests
or
condence
interv
als
were
reported.
Stanza
demonstrates
rob
ustness
and
breadth,
though
ef
cienc
y
and
handling
of
genre/domain
v
ariability
remain
areas
for
impro
v
ement.
UDPipe
2
(Neural
UD
Pipeline
for
Arabic)
for
Arabic
(P
ADT
treebank),
UDPipe
2.0
[38]
sho
wed
strong
b
ut
not
a
wless
performance.
It
achie
v
ed
v
ery
high
se
gmentation
scores
(tok
ens:
99.98,
w
ords:
93.71,
sentences:
80.89
F1),
indicating
reliable
basic
preprocessing.
In
tagging,
it
reached
UPOS
90.64,
XPOS
87.81,
and
UFeats
88.05,
while
lemmatization
stood
at
87.38.
P
arsing
results
were
competiti
v
e
b
ut
lo
wer:
U
AS
88.94,
LAS
72.34,
MLAS
63.77,
and
BLEX
65.66.
These
numbers
highlight
that
while
UDPipe
is
rob
ust
at
se
gmentation
and
POS
tagging,
parsing
comple
x
Arabic
syntax
remains
challenging.
Strengths
lie
in
its
end-to-end
neural
joint
model
that
handles
multiple
tasks
consistently
without
language-specic
parameter
tuning.
Ho
we
v
er
,
weaknesses
emer
ge
with
Arabic
morphology
and
syntax,
where
error
analysis
indic
ates
struggles
with
rich
inection,
clitic
se
gmentation,
and
long-distance
dependencies.
F
or
e
xample,
the
model
often
produces
incorrect
lemma
forms
when
diacritics
or
clitics
alter
the
base
w
ord,
and
dependenc
y
arcs
occasionally
mislabel
subordinate
clauses
or
prepositional
phrases,
reducing
LAS.
While
these
shortcomings
are
typical
in
morphologically
ric
h
languages,
the
consistenc
y
of
UDPipe’
s
results
across
treebanks
suggests
its
architecture
generalizes
well,
e
v
en
if
Arabic
parsing
lags
behind
se
gmentation
accurac
y
.
A
compar
ative
study
of
Ar
abic
morpholo
gical
analyzer
s
(Omar
Saadiyeh)
Evaluation Warning : The document was created with Spire.PDF for Python.
1884
❒
ISSN:
2252-8938
UDify
(mBER
T
Multi-task
Morphology
for
UD
Arabic)
[39]
is
a
multilingual,
multi-task
neural
analyzer
b
uilt
on
pretrained
mBER
T
embeddings.
It
jointly
predicts
POS
tags,
morphological
features,
lemmas,
and
dependenc
y
parses
using
a
self-attention
architecture.
T
rained
on
124
UD
treebanks
(75
languages),
including
Arabic
P
ADT
(around
6.1k
sentences),
UDify
achie
v
es
strong
syntactic
accurac
y
(UPOS
96.58%,
U
AS
87.72%,
LAS
82.88%)
b
ut
performs
poorly
in
lemmatization
(73.55%)
due
to
lack
o
f
character
-le
v
el
embeddings—a
k
e
y
limitation
for
morphologically
rich
languages.
While
m
ultilingual
training
boosts
parsing
for
Arabic,
weaknesses
remain
in
morphology-sensiti
v
e
tasks
lik
e
lemmas
and
UFeats.
No
statistical
signicance
testing
or
condence
interv
als
were
reported.
Gulf
Arabic
neural
morphology
[40]
combines
rule-based
morphological
analyzers
with
a
neural
disambiguation
model.
Specically
,
the
Gulf
Arabic
anal
yzer
w
as
created
automatically
through
paradigm
completion
based
on
annotated
training
data,
while
high-quality
manual
analyzers
were
used
for
MSA
(SAMA)
and
Egyptian
Arabic
(CALIMA).
F
or
disambiguation,
the
authors
emplo
yed
a
neural
joint
model
(sequence-to-sequence
with
shared
encoders
for
le
xical
and
morphological
features)
alongside
a
baseline
maximum
lik
elihood
estimation
(MLE)
system.
The
y
tested
dif
ferent
setups:
no
analyzer
,
Gulf-only
analyzer
,
and
combinations
with
MSA
and
EGY
analyzers,
embedding
or
ranking
the
candidates.
The
strengths
lie
in
the
system’
s
ability
to
handle
Gulf
Arabic
morphology
for
the
r
st
time
and
its
adaptability
to
data
size.
Ho
we
v
er
,
weaknesses
appear
when
analyzers
constrain
the
neural
model,
especial
ly
in
lemmatization:
ranking
candidates
often
reduces
accurac
y
,
sho
wing
the
analyzer’
s
limited
co
v
erage
could
restrict
performance
rather
than
impro
v
e
it.
The
system
w
as
trained
and
e
v
aluated
on
the
annotated
Gumar
Corpus
[41],
Emirati
Arabic
no
v
els
totaling
about
202K
tok
ens
across
train/de
v/test
splits
(with
train
162K
tok
ens).
Additional
embeddings
were
dra
wn
from
the
lar
ger
100M-tok
en
Gumar
corpus.
On
test,
the
best
conguration
reached
full
89.2%,
T
A
GS
92.9%,
LEX
93.1%,
POS
96.7%,
SEG
97.3%.
Results
were
reported
with
detailed
breakdo
wns
b
ut
without
statistical
signicance
testing
or
condence
interv
als.
Error
analysis
highlighted
that
lemmatization
remained
the
weak
est
link,
often
suf
fering
when
analyzers’
le
xicon
f
ailed
to
match
the
di
v
ersity
of
Gulf
lemmas.
4
CLASSIFICA
TION
OF
ARABIC
MORPHOLOGICAL
AN
AL
YSIS
TECHNIQ
UES
The
results
of
all
twenty
analyzers,
along
with
their
performance
metrics,
are
presented
in
T
able
3.
These
metrics:
accurac
y
,
precision,
recall,
and
F1-score,
are
compiled
from
re
po
r
ted
sources
to
enable
direct
comparison.
When
grouped
by
methodological
approach,
the
results
reect
both
strengths
and
trade-of
fs.
Rule-based
analyzers
such
as
AlKhalil
(2017)
and
SALMA
(2013)
sho
w
high
reliability
on
lar
ge,
curated
resources,
often
e
xceeding
95%
in
tasks
lik
e
lemma
accurac
y
or
diacritization.
Ho
we
v
er
,
their
performance
drops
considerably
when
applied
to
dialects
or
ne-grai
n
e
d
cate
gories
(e.g.,
patterns,
stems),
suggesting
limited
adaptability
.
Hybrid
systems
lik
e
Madamira
(2014),
Ibn
Ghini
(2024),
and
SinaT
ools
(2024)
e
xtend
co
v
erage
by
combining
rules
with
statistical
or
neural
components,
producing
rob
ust
se
gmentation
and
morphological
disambiguation.
Nonetheless,
their
performance
is
not
uniform:
Madamira
achie
v
es
o
v
er
98%
in
se
gmentation
b
ut
lo
wer
scores
(77–86%)
in
diacritization
and
full
solutions,
while
SinaT
ools
performs
well
in
lemma
and
POS
tagging
yet
sho
ws
moderate
outcomes
in
WSD
and
semantic
tasks.
Corpus
size
and
di
v
ersity
ha
v
e
a
clear
impact
across
s
ystems.
Models
trained
on
lar
ge,
balanced
resources
such
as
the
P
A
TB
(>1.3M
w
ords)
or
Gumar
(202K
annotated)
consistently
produce
higher
accurac
y
and
generalization.
F
or
instance,
Stanford’
s
se
gmenter
(2020)
and
F
arasa
(2016)
achie
v
e
se
gmentation
accurac
y
near
or
abo
v
e
98%,
reecting
the
adv
antage
of
lar
ge-scale
training.
In
contrast,
analyzers
b
uilt
on
smaller
or
domain-specic
corpora,
such
as
the
Qur’anic
Arabic
Corpus
(31K
w
ords),
sho
w
v
ery
strong
performance
within
that
domain
(99%
accurac
y)
b
ut
with
limited
applicability
be
yond
it.
Dialectal
e
xtensions,
as
in
Camelira
(2022)
or
Zalmout–Habash
(2018),
demonstrate
competiti
v
e
results
(POS
90–94%),
though
accurac
y
remains
belo
w
that
of
MSA
systems,
highlighting
the
challenges
of
resource
scarcity
and
linguistic
v
ariability
.
Neural
architectures
dominate
recent
benchmarks
in
Arabic
morphosyntactic
tagging
and
parsing.
Models
lik
e
CAMeLBER
T
(2022)
and
UDify
(2019)
e
xceed
95%
in
UPOS
and
full-tagging
tasks,
conrming
the
strength
of
conte
xtualized
embeddings
and
multitask
learning.
Hybrid
sys
tems
remain
rele
v
ant:
Ibn
Ghini
(2024)
of
fers
notable
ef
cienc
y
(0.3
ms/w
ord),
and
Madamira
pro
vides
broad
functional
co
v
erage,
making
them
practical
where
speed
and
e
xplainability
matter
more
than
state-of-the-art
accurac
y
.
The
eld
is
shifting:
rule-based
systems
e
xcel
in
controlled
settings,
h
ybrid
approaches
of
fer
balance,
and
neural
architectures
deli
v
er
top
accurac
y
when
lar
ge,
di
v
erse
corpora
are
a
v
ailable.
Int
J
Artif
Intell,
V
ol.
15,
No.
2,
April
2026:
1876–1890
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
1885
T
able
3.
Arabic
morphological
analyzers
performance
Analyzer
Approach
Corpus/Size
Morpheme
Ev
aluation
CAMEL
MORPH
2024
Rule-based
P
A
TB
/
1.5M
w
ords
lemma
+
analysis
+
diacritization
Recall:
95.9%
Alma
2024
Fr
equenc
y-based
+
Le
xicon-dri
v
en
+
BER
T
(OO
V)
LDC
A
TB
(1.5M),
SALMA
(500K)
Morphological
analysis
F1:
88%
(A
TB),
90%
(SALMA)
Ibn
Ghini
2024
Hybri
d
3M
w
ords
/
600K
analyzed
Full
morphological
solutions
Accurac
y:
72.72%
Alkhalil
+
B
AMA
Extended
/
3M
w
ords
P
artial
solutions
Accurac
y:
24.24%
0.3ms
per
w
ord
Analysis
speed
T
ime:
0.3
ms/w
ord
AlKhalil
2017
Rul
e-based
Nemlar
(500K)
&
T
eshk
eela
(75M)
Rate-Lemma
Accurac
y:
97.16%
Rate-stem
Accurac
y:
96.76%
Rate-diac
Accurac
y:
97.21%
Rate-full
Accurac
y:
96.56%
Gold
standard
(MSA,
546
w
ords)
Root
Acc:
74.96%,
Prec:
78.94%,
Rec:
74.96%,
F1:
76.90%
Stem
Acc:
54.43%,
Prec:
57.33%,
Rec:
54.43%,
F1:
55.84%
P
attern
Acc:
36.35%,
Prec:
38.28%,
Rec:
36.35%,
F1:
37.29%
Multi-dialect
(EGY
,
TUN,
10
sentences
each)
N/A
Acc:
68%,
Prec:
68%,
Rec:
66%,
F1:
67%
Stanford
2020
Sta
tistical
ML
(CRF-based)
Penn
A
TB
(>1.3M
w
ords)
Se
gmentation
F1:
98.24%
F
arasa
2016
Stat
istical
(SVM-rank
+
Le
xicon)
Noor–Ghateh
(223,690
w
ords)
W
ord
se
gmentation
Acc:
81%,
Prec:
81%,
F1:
89%
Penn
A
TB
(>1.3M
w
ords)
Se
gmentation
(base)
Acc:
98.76%
Penn
A
TB
(>1.3M
w
ords)
Se
gmentation
(lookup)
Acc:
98.94%
Madamira
2014
Hybrid
(Rule-based
+
ML
disambiguation
with
SVM
+
LMs)
MSA
(25K
w
ords)
EV
ALDiac
Acc:
86.3%
Ev
alLe
x
Acc:
96%
Ev
alFull
Acc:
84.1%
Perfect
T
ok
Acc:
98.9%
Correct
se
gmentation
Acc:
99.2%
EGY
(20K
w
ords)
EV
ALDiac
Acc:
83.2%
Ev
alLe
x
Acc:
87.8%
Ev
alFull
Acc:
77.3%
Perfect
T
ok
Acc:
96.6%
Correct
se
gmentation
Acc:
97.6%
Penn
A
TB
(>1.3M
w
ords)
Se
gmentation
Acc:
98.76%
Noor–Ghateh
(223,690
w
ords)
W
ord
se
gmentation
Acc:
80%,
Prec:
80%,
Rec:
99%,
F1:
88%
Multi-dialect
(EGY
,
TUN,
MSA,
10
sentences
each)
N/A
Acc:
85%,
Prec:
86%,
Rec:
88%,
F1:
87%
SALMA
2013
Rule-ba
sed,
Kno
wledge-dri
v
en
(Grammar
+
Le
xicon)
CCA
(500K),
Qur’an
(77K)
Morphological
features
Acc:
98.53%
(CCA),
90.11%
(Qur’an)
CCA
(500K),
Qur’an
(77K)
Remaining
cate
gories
Acc:
81.35–97.51%
(CCA),
74.25–89.03%
(Qur’an)
SinaT
ools
2024
Hybrid
(Rule-based
+
BER
T/T
ransformers)
A
TB
(339K),
SALMA
(34K)
Morphology
(Lemma,
POS)
Lemma:
90.5%,
POS:
97.5%
W
ojood
(50K),
Politics
(12K)
NER
F1:
87.3%
SALMA
Sense
(34K)
WSD
Acc:
82.6%
SemEv
al-2024
(595
pairs)
Semantic
relatedness
Spearman:
0.49
Camelira
2022
Stat
istical
+
Neural
(CAMeL
tools
backbone)
MSA
Morphological
disambiguation
All
tags:
95.9%,
POS:
98.7%
Egyptian
All:
90.5%,
POS:
94.0%
Gulf
All:
93.8%,
POS:
96.6%
Le
v
antine
All:
85.5%,
POS:
92.7%
Zalmout
and
Habash
2017
Neural
(Bi
-LSTM
+
Analyzer
guidance)
P
A
TB
(503K
train,
63K
de
v/test)
Morphological
disambiguation
Full:
90.0%,
OO
V
:
76.9%
Gig
a
w
ord
(2.15B)
POS:
97.9%,
Diac:
91.7%
Neural
analyzer
(RNN)
Neural
(RNN
+
subw
ord
v
ectors)
Qur’anic
Arabic
Corpus
(31K)
Morphological
analysis
Acc:
99%,
Prec:
99%,
Rec:
96%,
F1:
97%
CAMeLBER
T
2022
Neural
(T
ransformer
-based
+
Analyzer
support)
A
TB
(629K)
Morphosyntactic
tagging
POS:
98.9%,
All
tags:
96.3%
ARZTB
(175K),
Gumar
(202K),
Curras
(57K)
Dialects:
91–95%
Zalmout–Habash
2018
(EGY)
Neural
(Bi-LSTM,
noise-rob
ust)
ARZ
(160K),
Gumar
(410M
pretrain)
Morphological
disambiguation
POS:
93.6%,
Lemma:
88.1%,
Diac:
83.8%,
Full:
78.4%
Stanza
2020
Neural
(Bi-LSTM
+
seq2seq)
P
ADT
UD
(Arabic)
Full
UD
morphology
UPOS:
94.9,
XPOS:
91.8,
UFeats:
91.9,
Lemma:
93.3
Dependenc
y
parsing
U
AS:
83.3,
LAS:
79.3
A
QMAR
(NER)
NER
F1:
74.3
UDPipe
2.0
(2018)
Neural
(Joint
model)
P
ADT
UD
(Arabic)
Morphology
+
P
arsing
UPOS:
90.6,
XPOS:
87.8,
UFeats:
88.1
Lemma:
87.4,
U
AS:
88.9,
LAS:
72.3
UDify
2019
Neural
(mBER
T
multitask)
UD
Arabic
P
ADT
(6.1K
sents)
POS
+
Features
+
Lemma
+
P
arsing
UPOS:
96.6,
U
AS:
87.7,
LAS:
82.9,
Lemma:
73.6
Gulf
Morph
2020
Hybrid
(Rule-based
analyzers
+
Neural
disambiguation)
Gumar
annotated
corpus
(202K),
embeddings
100M
Gulf
morphology
(POS,
SEG,
LEX)
POS:
96.7%,
SEG:
97.3%
LEX:
93.1%,
Full:
89.2%
A
compar
ative
study
of
Ar
abic
morpholo
gical
analyzer
s
(Omar
Saadiyeh)
Evaluation Warning : The document was created with Spire.PDF for Python.