I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
14
, N
o.
4
,
A
ugus
t
2025
, pp.
3354
~
3365
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
14
.i
4
.pp
3354
-
3365
3354
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
C
l
ass
i
f
i
c
at
i
on
of
K
an
n
ad
a
d
oc
u
m
e
n
t
s
u
si
n
g
n
ove
l
se
m
an
t
i
c
sym
b
ol
i
c
r
e
p
r
e
se
n
t
a
t
i
on
an
d
se
l
e
c
t
i
on
m
e
t
h
od
R
an
gan
at
h
b
ab
u
K
as
t
u
r
i
R
an
gan
1
,
B
u
k
a
h
al
ly
S
om
as
h
e
k
ar
H
ar
is
h
2
,
C
h
al
u
ve
gow
d
a K
an
ak
al
ak
s
h
m
i
R
oop
a
2
1
D
e
pa
r
t
m
e
nt
of
I
nf
or
m
a
t
i
on
S
c
i
e
nc
e
a
nd E
ngi
ne
e
r
i
ng, V
i
dya
va
r
dha
ka
C
ol
l
e
ge
of
E
ngi
ne
e
r
i
ng, M
ys
or
e
, I
ndi
a
2
D
e
pa
r
t
m
e
nt
of
I
nf
or
m
a
t
i
on
S
c
i
e
nc
e
a
nd E
ngi
ne
e
r
i
ng, J
S
S
S
c
i
e
nc
e
a
nd T
e
c
hnol
ogy U
ni
ve
r
s
i
t
y, M
ys
or
e
, I
ndi
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
A
pr
21, 2024
R
e
vi
s
e
d
F
e
b 25, 2025
A
c
c
e
pt
e
d
M
a
r
15, 2025
Kannada
is
one
of
the
22
scheduled
Indian
regional
languages.
It
is
also
a
low
-
resource
regional
language.
The
Kannada
document
classificati
on
is
arduous
due
to
its
vocabulary
richness,
agglutinative
terms,
and
l
ack
of
resources. The go
od representati
on and the p
rominent feat
ure selection
aid in
solving
the
challenges
in
document
classification
tasks.
In
this
paper,
we
are
proposing
semantic
symbolic
representation
and
feature
selection
m
ethod,
for
better
representation
of
Kannada
terms
in
interval
values
embedde
d
with
positional
information.
Following,
selection
of
prominent
discrim
inative
symbolic
feature
vectors
is
also
proposed.
Further
the
symbolic
doc
ument
classi
fier
is
used
to
classify
the
Kannada
documents.
The
proposed
cluster
based
symbolic
representation
preserves
the
intra
class
variance
and
r
educes
the
ambiguity
in
classification
of
Kannada
documents.
The
experime
nts
are
performed
over
two
Kannada
document
datasets
which
are
multilab
el
and
unbalanced.
The
comparative
analysis
of
proposed
method
with
other
standard methods is a
lso presented.
K
e
y
w
o
r
d
s
:
C
la
s
s
if
ic
a
ti
on
F
e
a
tu
r
e
s
e
le
c
ti
on
K
a
nna
da
doc
um
e
nt
s
S
e
m
a
nt
ic
a
na
ly
s
is
S
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
R
a
nga
na
th
ba
bu
K
a
s
tu
r
i
R
a
nga
n
D
e
pa
r
tm
e
nt
of
I
nf
or
m
a
ti
on S
c
ie
nc
e
a
nd E
ngi
ne
e
r
in
g, V
id
ya
va
r
dha
ka
C
ol
le
ge
of
E
ngi
ne
e
r
in
g
G
okul
a
m
, M
ys
or
e
, K
a
r
na
ta
ka
, I
ndi
a
E
m
a
il
:
r
kr
a
nga
n@
vvc
e
.a
c
.i
n
1.
I
N
T
R
O
D
U
C
T
I
O
N
I
n
th
e
m
ul
ti
li
ngua
l
s
uppor
ti
ve
di
gi
ta
l
w
or
ld
,
na
tu
r
a
l
la
ngua
ge
p
r
oc
e
s
s
in
g
r
e
s
e
a
r
c
h
i
s
not
c
onf
in
e
d
to
E
ngl
is
h.
M
a
ny
na
tu
r
a
l
la
ngua
g
e
a
ppl
ic
a
ti
ons
a
r
e
de
v
e
lo
pe
d
f
or
va
r
io
us
r
e
gi
ona
l
la
ngua
ge
s
to
a
voi
d
di
gi
ta
l
la
ngua
ge
di
vi
de
be
twe
e
n
dom
in
a
nt
la
ngua
ge
s
a
nd
ot
he
r
s
.
K
a
nna
da
is
one
of
th
e
I
nd
ia
n
r
e
gi
ona
l
la
ngua
ge
s
a
nd
one
of
th
e
22
s
c
he
dul
e
d
l
a
ngua
ge
s
in
I
ndi
a
n
c
ons
ti
tu
ti
on.
K
a
nna
da
te
xt
is
m
or
phol
ogi
c
a
ll
y
r
ic
h
a
nd
a
ggl
ut
in
a
ti
ve
in
na
tu
r
e
.
T
he
r
e
f
or
e
,
pr
ope
r
r
e
pr
e
s
e
nt
a
ti
on
of
th
e
s
e
te
xt
s
m
a
k
e
s
a
s
ig
ni
f
ic
a
nt
c
ont
r
ib
ut
io
n
in
na
tu
r
a
l
la
ngua
ge
unde
r
s
ta
ndi
ng t
a
s
ks
.
I
n
ge
ne
r
a
l,
f
o
r
th
e
ta
s
k
of
K
a
nna
da
doc
um
e
nt
c
la
s
s
if
ic
a
ti
on,
a
t
f
ir
s
t
th
e
r
a
w
K
a
nna
da
doc
um
e
nt
s
a
r
e
pr
e
pr
oc
e
s
s
e
d.
I
n
pr
e
pr
oc
e
s
s
in
g,
th
e
r
a
w
da
ta
s
e
t
i
s
c
le
a
ne
d
by
r
e
m
ovi
ng
punc
tu
a
ti
on,
te
r
m
s
a
r
e
to
ke
ni
z
e
d,
S
to
pw
or
ds
a
r
e
r
e
m
ove
d,
tr
a
ns
li
te
r
a
ti
on,
s
te
m
m
in
g
a
nd
le
m
m
a
t
iz
a
ti
on
(
if
r
e
qui
r
e
d)
a
r
e
pe
r
f
o
r
m
e
d.
S
e
c
ondl
y,
pr
e
pr
oc
e
s
s
e
d
da
ta
s
houl
d
be
r
e
pr
e
s
e
nt
e
d
by
us
in
g
be
tt
e
r
r
e
pr
e
s
e
nt
a
ti
on
m
e
th
ods
.
F
ur
th
e
r
vi
ta
l
f
e
a
tu
r
e
s
a
r
e
s
e
le
c
te
d
th
r
ough
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
[
1]
a
nd
c
la
s
s
if
ie
r
s
a
r
e
a
ppl
ie
d
to
le
a
r
n
th
e
da
ta
.
A
t
th
e
la
s
t
s
ta
ge
,
le
a
r
ni
ng of
t
he
m
ode
l
is
e
va
lu
a
te
d w
it
h t
e
s
t
s
a
m
pl
e
s
.
T
hi
s
s
ta
nd
a
r
d pr
oc
e
s
s
i
s
d
e
pi
c
te
d i
n
F
ig
ur
e
1.
I
n
K
a
nna
da
doc
um
e
nt
c
la
s
s
if
ic
a
ti
on
be
tt
e
r
doc
um
e
nt
s
un
de
r
s
ta
ndi
ng
le
a
ds
to
be
tt
e
r
r
e
s
ul
ts
.
T
he
pr
opos
e
d
s
e
m
a
nt
ic
b
a
s
e
d
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
pr
e
s
e
r
v
e
s
th
e
c
ont
e
xt
ua
l
in
f
or
m
a
ti
on
a
nd
unde
r
s
ta
nd
s
th
e
in
tr
a
c
la
s
s
va
r
ia
ti
ons
.
T
he
opt
im
um
uni
t
f
or
te
xt
r
e
p
r
e
s
e
nt
a
ti
on
a
nd
c
a
te
gor
iz
a
ti
on
in
a
ut
om
a
ti
c
K
a
nna
da
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
C
la
s
s
if
ic
at
io
n of
K
annada doc
um
e
nt
s
us
in
g nov
e
l
s
e
m
ant
ic
s
y
m
bol
ic
…
(
R
anganathbabu K
as
tu
r
i
R
angan
)
3355
doc
um
e
nt
c
la
s
s
if
ic
a
ti
on
i
s
t
he
te
r
m
.
U
nf
or
t
un
a
t
e
ly
,
a
t
e
xt
d
oc
u
m
e
n
t
l
a
c
k
s
t
he
r
ig
id
s
tr
uc
tu
r
e
of
a
tr
a
d
it
io
n
a
l
da
t
a
b
a
s
e
e
v
e
n
i
f
i
t
c
a
n e
x
pr
e
s
s
a
w
i
de
r
a
n
ge
of
i
nf
or
m
a
ti
on.
U
n
s
t
r
uc
t
ur
e
d
d
a
t
a
m
u
s
t
be
c
o
nv
e
r
t
e
d
i
nt
o
s
tr
uc
tu
r
e
d
da
t
a
,
e
s
p
e
c
ia
ll
y
f
r
e
e
-
r
u
nn
in
g
te
xt
d
a
t
a
.
N
u
m
e
r
ou
s
pr
e
pr
o
c
e
s
s
i
n
g
s
tr
a
te
gi
e
s
a
r
e
s
ug
ge
s
t
e
d
in
th
e
li
t
e
r
a
tu
r
e
t
o
a
c
c
o
m
pl
is
h
t
hi
s
. O
n
c
e
un
s
t
r
u
c
tu
r
e
d
d
a
t
a
i
s
s
tr
u
c
t
ur
e
d, w
e
m
u
s
t
c
r
e
a
te
a
pow
e
r
f
ul
r
e
pr
e
s
e
nt
a
ti
on mod
e
l
to
c
r
e
a
t
e
a
p
ow
e
r
f
ul
c
l
a
s
s
if
i
c
a
ti
o
n
s
y
s
te
m
.
T
he
l
i
te
r
a
t
ur
e
c
on
ta
i
n
s
a
w
id
e
v
a
r
i
e
ty
of
r
e
pr
e
s
e
n
ta
ti
o
na
l
s
c
h
e
m
e
s
.
F
ig
ur
e
1. T
he
s
ta
nda
r
d pr
oc
e
s
s
of
K
a
nna
da
do
c
um
e
nt
s
c
l
a
s
s
if
ic
a
ti
on
A
lt
hough
th
e
r
e
a
r
e
s
e
ve
r
a
l
m
ode
ls
f
or
r
e
pr
e
s
e
nt
in
g
te
xt
doc
um
e
nt
s
in
th
e
li
te
r
a
tu
r
e
,
th
e
f
r
e
que
nc
y
-
ba
s
e
d
ve
c
to
r
s
pa
c
e
m
ode
l
pr
oduc
e
s
good
out
c
om
e
s
w
he
n
us
e
d
to
c
l
a
s
s
if
y
te
xt
s
.
U
nf
or
tu
na
te
ly
,
th
is
r
e
pr
e
s
e
nt
a
ti
ona
l
m
e
th
od
ha
s
it
s
o
w
n
dr
a
w
ba
c
ks
.
H
ig
h
di
m
e
ns
io
n,
lo
s
s
of
c
or
r
e
la
ti
on,
a
nd
lo
s
s
of
s
e
m
a
nt
ic
li
nk
be
twe
e
n
te
r
m
s
in
a
do
c
um
e
nt
a
r
e
f
e
w
of
th
e
m
.
A
ddi
t
io
na
ll
y,
w
e
m
us
t
s
ol
ve
th
e
te
r
m
'
s
c
om
pl
e
x
m
or
phol
ogy
a
nd
a
ggl
ut
in
a
ti
on
pr
obl
e
m
in
r
e
gi
ona
l
I
ndi
a
n
la
ngua
ge
s
li
ke
K
a
nna
da
.
A
ll
th
e
s
e
a
bove
-
m
e
nt
io
ne
d
c
ha
ll
e
nge
s
a
r
e
a
ddr
e
s
s
e
d
w
it
h
va
r
io
us
li
ngui
s
ti
c
,
s
ta
ti
s
ti
c
a
l
a
nd
m
a
c
hi
ne
le
a
r
ni
ng
m
e
th
ods
in
th
e
pr
opos
e
d
m
ode
l.
T
he
m
a
in
c
ha
ll
e
nge
is
f
in
di
ng
th
e
id
e
a
l
r
e
pr
e
s
e
nt
a
ti
on
f
or
th
e
r
a
w
K
a
nna
da
te
r
m
s
a
nd
it
s
doc
um
e
nt
s
.
T
he
c
om
pl
e
x
c
om
po
s
it
io
n
of
K
a
nna
da
te
r
m
le
tt
e
r
s
is
r
e
pr
e
s
e
nt
e
d
num
e
r
ic
a
ll
y
by
uni
ve
r
s
a
l
c
ode
d
c
ha
r
a
c
te
r
s
e
t
(
uni
c
ode
)
te
r
m
e
n
c
odi
ng
[
2]
.
F
ur
th
e
r
to
a
ddr
e
s
s
th
e
lo
s
s
of
s
e
m
a
nt
ic
in
f
or
m
a
ti
on
in
th
e
f
r
e
que
nc
y
-
ba
s
e
d
ve
c
to
r
s
pa
c
e
,
th
e
pos
it
io
na
l
in
f
or
m
a
ti
on
of
K
a
nna
da
te
r
m
s
is
e
m
be
dde
d.
T
hi
s
le
a
ds
to
th
e
pos
it
io
na
ll
y
e
nc
ode
d
f
r
e
que
nc
y
-
ba
s
e
d
r
e
pr
e
s
e
nt
a
ti
on
of
K
a
nna
da
doc
um
e
nt
s
.
V
a
s
w
a
ni
e
t
al
.
[
3]
w
or
ke
d
on
a
tt
e
nt
io
n
-
ba
s
e
d t
r
a
ns
f
or
m
e
r
s
us
e
d t
hi
s
po
s
it
io
n e
nc
odi
ng t
e
c
hni
que
t
o ge
t
be
tt
e
r
out
c
om
e
s
.
L
a
te
r
,
to
a
ddr
e
s
s
th
e
c
ha
ll
e
nge
of
pr
e
s
e
r
vi
ng
th
e
in
tr
a
c
la
s
s
va
r
ia
ti
ons
,
c
lu
s
te
r
ba
s
e
d
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
[
4]
is
e
m
pl
oye
d.
I
n
th
is
r
e
pr
e
s
e
nt
a
ti
on
th
e
c
l
us
te
r
in
g
te
c
hni
que
s
a
r
e
u
s
e
d
f
or
f
in
di
ng
th
e
r
e
la
ti
ons
be
twe
e
n
doc
um
e
nt
s
a
nd
te
r
m
s
w
it
h
r
e
s
pe
c
t
to
th
e
ir
c
la
s
s
e
s
.
T
he
in
tr
a
c
la
s
s
va
r
ia
ti
on
of
e
a
c
h
f
e
a
tu
r
e
is
r
e
pr
e
s
e
nt
e
d
by
th
e
in
t
e
r
va
l
va
lu
e
r
a
th
e
r
th
a
n
c
r
is
p
v
a
lu
e
s
.
I
n
th
e
pr
opos
e
d
m
e
th
od
th
i
s
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
is
u
s
e
d
on
th
e
pos
it
io
na
ll
y
e
nc
od
e
d
f
r
e
que
n
c
y
-
ba
s
e
d
r
e
pr
e
s
e
nt
a
ti
on
[
5]
,
w
hi
c
h
le
a
d
s
to
s
e
m
a
nt
ic
ba
s
e
d
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
f
or
K
a
nna
da
doc
um
e
nt
s
[
6]
.
F
ur
th
e
r
m
or
e
,
due
to
th
e
c
ha
nc
e
s
of
f
e
a
tu
r
e
s
pr
e
s
e
n
c
e
in
m
ul
ti
pl
e
c
la
s
s
e
s
(
in
te
r
c
la
s
s
va
r
ia
ti
ons
)
,
a
m
bi
gui
ty
s
ti
ll
pr
e
va
il
s
.
H
e
n
c
e
,
it
i
s
im
por
ta
nt
to
c
hoos
e
f
e
a
tu
r
e
s
a
ppr
opr
ia
te
ly
s
o
th
a
t
th
e
r
e
i
s
le
s
s
ove
r
la
p
a
c
r
os
s
di
f
f
e
r
e
nt
c
la
s
s
e
s
[
7]
,
[
8]
.
H
e
r
e
,
th
e
c
or
r
e
la
ti
on
ba
s
e
d
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
is
a
l
s
o
a
ppl
ie
d
f
or
th
e
di
m
e
ns
io
na
li
ty
r
e
duc
ti
on.
A
s
f
or
m
e
r
ly
m
e
nt
io
ne
d, i
n t
hi
s
a
r
ti
c
le
w
e
ha
ve
de
t
a
il
e
d de
s
c
r
ip
ti
ons
of
t
he
f
ol
lo
w
in
g c
ont
r
ib
ut
io
ns
:
‒
F
or
t
he
K
a
nna
da
doc
um
e
nt
s
,
s
e
m
a
nt
ic
e
m
be
dd
e
d s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on i
s
pr
opos
e
d.
‒
P
r
opos
e
d s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on me
th
od on
s
e
m
a
nt
ic
-
s
ym
b
ol
ic
r
e
pr
e
s
e
nt
a
ti
on of
K
a
nna
da
doc
um
e
nt
s
.
‒
C
la
s
s
if
ie
r
f
or
i
nt
e
r
va
l
va
lu
e
d r
e
pr
e
s
e
nt
a
ti
on of
K
a
nna
da
doc
um
e
nt
s
i
s
pr
opos
e
d.
‒
C
om
pa
r
a
ti
ve
a
na
ly
s
is
of
s
ta
c
ke
d
e
n
s
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
w
i
th
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
f
or
K
a
nna
da
doc
um
e
nt
s
c
la
s
s
if
ic
a
ti
on.
F
ur
th
e
r
,
s
e
c
ti
on 2 pr
e
s
e
nt
s
t
he
l
it
e
r
a
tu
r
e
r
e
vi
e
w
w
it
h r
e
s
pe
c
t
to
r
e
pr
e
s
e
nt
a
ti
on a
nd s
e
le
c
ti
on me
th
od
s
.
T
he
pr
opos
e
d
m
e
th
odol
ogy
i
s
pr
e
s
e
nt
e
d
in
s
e
c
ti
on
3
in
d
e
ta
il
.
L
a
te
r
,
th
e
da
ta
s
e
ts
a
nd
e
xpe
r
im
e
nt
a
ti
ons
pe
r
f
or
m
e
d
on
th
os
e
da
ta
s
e
ts
a
r
e
e
xpl
a
in
e
d
in
s
e
c
ti
on
4.
F
ur
th
e
r
,
th
e
c
om
pa
r
a
ti
ve
a
na
ly
s
is
is
pr
e
s
e
nt
e
d
a
nd
c
onc
lu
de
d i
n s
e
c
ti
on 5.
T
he
f
ut
ur
e
s
c
ope
s
of
t
he
s
e
f
in
di
ngs
a
r
e
a
ls
o pr
e
s
e
nt
e
d i
n
s
e
c
ti
on 5.
2.
R
E
L
A
T
E
D
WORK
L
a
ngua
ge
r
e
s
our
c
e
s
a
r
e
c
r
uc
ia
l
f
or
ta
s
ks
in
vol
vi
ng
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g.
L
ow
r
e
s
our
c
e
la
ngua
ge
s
a
r
e
th
os
e
s
pok
e
n
in
m
a
ny
r
e
gi
ons
of
I
ndi
a
th
a
t
la
c
k
th
e
r
e
s
our
c
e
s
ne
e
de
d
f
or
la
ngua
ge
pr
oc
e
s
s
in
g
a
c
ti
vi
ti
e
s
.
I
t
is
pos
s
ib
le
to
do
la
ngua
ge
pr
oc
e
s
s
in
g
ta
s
ks
a
t
th
e
c
ha
r
a
c
te
r
,
s
e
nt
e
nc
e
,
pa
r
a
gr
a
ph,
or
doc
um
e
nt
le
ve
ls
. R
e
s
e
a
r
c
he
r
s
h
a
ve
f
oc
us
e
d m
or
e
on c
ha
r
a
c
t
e
r
a
nd s
e
nt
e
n
c
e
l
e
ve
l
w
or
k t
ha
n doc
um
e
nt
l
e
ve
l
w
or
k due
t
o
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
4
,
A
ugus
t
2025
:
3354
-
3365
3356
th
e
la
c
k
of
r
e
gi
ona
l
la
ngua
ge
r
e
s
our
c
e
s
.
I
n
la
ngua
ge
id
e
nt
if
ic
a
ti
on
ta
s
k,
to
de
te
r
m
in
e
w
he
th
e
r
phr
a
s
e
s
in
th
e
twe
e
te
r
da
ta
s
e
t
a
r
e
in
H
in
di
or
E
ngl
is
h,
A
ns
a
r
i
e
t
al
.
[
1]
u
s
e
d
th
e
c
hi
-
s
qua
r
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od.
T
o
pe
r
f
or
m
a
s
pe
c
t
-
ba
s
e
d
s
e
nt
im
e
nt
a
na
ly
s
is
f
or
H
in
di
r
e
vi
e
w
s
a
t
th
e
w
or
d
le
ve
l,
G
a
ndhi
a
nd
A
tt
a
r
[
9]
pr
e
s
e
nt
e
d
c
a
te
gor
y
a
s
s
oc
ia
ti
on
w
or
d
(
C
A
W
)
f
e
a
tu
r
e
s
e
ns
e
m
b
le
a
lg
or
it
hm
a
nd
a
c
hi
e
ve
d
76%
of
a
c
c
ur
a
c
y.
A
na
nd
e
t
al
.
[
10]
us
e
d
a
f
uz
z
y
-
ba
s
e
d
c
onvolut
io
na
l
ne
twor
k
f
or
f
e
a
tu
r
e
s
e
le
c
ti
on
to
ge
th
e
r
w
it
h
e
ns
e
m
bl
e
le
a
r
ni
ng
m
e
th
ods
to
a
ddr
e
s
s
th
e
pr
obl
e
m
of
m
ul
ti
li
ngua
l
of
f
e
ns
iv
e
la
ngu
a
ge
de
t
e
c
ti
on
a
nd
a
c
hi
e
v
e
d
98%
a
c
c
ur
a
c
y.
T
he
a
c
c
ur
a
c
y
of
a
ut
hor
s
hi
p
id
e
nt
if
ic
a
ti
on
ta
s
k
f
or
K
a
nna
da
li
te
r
a
tu
r
e
w
a
s
88%
,
a
nd
th
is
w
a
s
a
c
c
om
pl
is
he
d u
s
in
g s
ty
lo
m
e
tr
y f
e
a
tu
r
e
s
a
nd a
pr
of
il
e
-
ba
s
e
d t
e
c
hni
que
i
n
[
11]
.
T
o
a
ddr
e
s
s
th
e
hi
gh
di
m
e
ns
io
na
li
ty
c
ha
ll
e
ng
e
a
t
th
e
doc
um
e
nt
le
ve
l
la
ngua
ge
pr
oc
e
s
s
in
g
ta
s
ks
,
it
is
ne
c
e
s
s
a
r
y
to
c
hoo
s
e
th
e
m
o
s
t
vi
ta
l
s
ubs
e
t
of
f
e
a
tu
r
e
s
[
12]
,
[
13]
.
T
he
r
e
a
r
e
b
a
s
ic
a
ppr
oa
c
he
s
li
ke
f
il
te
r
s
a
nd
w
r
a
ppe
r
s
,
f
or
f
e
a
tu
r
e
s
e
l
e
c
ti
on
but
e
n
s
e
m
bl
e
t
e
c
hni
que
p
e
r
f
or
m
s
be
tt
e
r
.
E
ns
e
m
bl
e
is
th
e
c
om
bi
na
ti
on
of
ba
s
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
in
va
r
io
us
w
a
ys
it
m
a
y
be
hom
oge
ne
ous
or
he
te
r
oge
ne
ous
[
14]
.
H
om
oge
ne
ous
is
th
e
c
om
bi
na
ti
on
of
th
e
s
a
m
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od
w
it
h
di
f
f
e
r
e
nt
pa
r
a
m
e
te
r
s
but
he
te
r
oge
nou
s
is
th
e
c
om
bi
na
ti
on
of
th
e
di
f
f
e
r
e
nt
f
e
a
tu
r
e
s
e
le
c
ti
on
te
c
hni
que
s
a
n
d
yi
e
ld
s
b
e
tt
e
r
r
e
s
ul
ts
[
15]
.
T
ia
n
e
t
al
.
[
16]
pr
e
s
e
nt
e
d
e
n
s
e
m
bl
e
-
ba
s
e
d
f
il
te
r
f
e
a
tu
r
e
s
e
le
c
ti
on
(
he
te
r
oge
ne
ou
s
s
tr
a
te
gy)
us
in
g
f
e
a
tu
r
e
r
a
nki
ng
m
e
th
ods
li
ke
in
f
or
m
a
ti
on
ga
in
,
ga
in
r
a
ti
o,
c
hi
-
s
qua
r
e
d,
a
nd
R
e
li
e
f
F
f
o
r
pr
ope
r
f
e
a
tu
r
e
s
e
le
c
ti
on.
W
a
ng
e
t
al
.
[
17]
us
e
d
ge
ne
ti
c
a
lg
or
it
hm
to
s
e
le
c
t
th
e
be
s
t
r
a
nki
ng
f
e
a
tu
r
e
s
.
T
he
r
e
a
r
e
m
or
e
he
te
r
oge
ne
ou
s
f
e
a
tu
r
e
s
e
le
c
ti
on
e
ns
e
m
bl
e
s
,
a
nd
it
s
e
x
a
m
pl
e
s
c
a
n
be
f
ound
in
[
18]
–
[
21]
.
T
he
s
e
hom
oge
ne
ous
a
nd
h
e
te
r
oge
ne
ous
te
c
hni
que
s
a
r
e
a
na
ly
z
e
d
in
[
21]
,
[
22]
r
e
s
e
a
r
c
h
a
r
ti
c
le
s
.
I
n
th
e
e
xpe
r
im
e
nt
s
s
e
c
ti
on,
th
e
f
in
di
ngs
of
on
e
of
th
e
he
te
r
oge
ne
ous
e
ns
e
m
bl
e
m
e
th
ods
a
ppl
ie
d
to
K
a
nna
da
do
c
u
m
e
nt
s
a
r
e
di
s
c
us
s
e
d
in
c
om
pa
r
is
on
w
it
h
th
e
out
c
om
e
s
of
t
he
pr
opos
e
d m
e
th
od.
R
e
s
e
a
r
c
h
e
r
s
ha
ve
r
e
c
e
nt
ly
e
xpl
or
e
d
w
id
e
ly
w
it
h
c
h
a
r
a
c
te
r
r
e
c
o
gni
ti
on
f
or
th
e
K
a
nna
da
la
ngua
ge
.
A
s
th
e
r
e
a
r
e
f
e
w
e
r
c
or
po
r
a
a
va
il
a
bl
e
,
e
xpe
r
im
e
nt
s
a
t
th
e
doc
um
e
nt
le
ve
l
a
r
e
li
m
it
e
d.
O
n
th
e
K
a
nna
da
-
M
N
I
S
T
da
ta
s
e
t,
G
u
[
23]
w
or
k
w
it
h
th
e
K
a
nna
da
c
ha
r
a
c
te
r
r
e
c
ogn
it
io
n
pr
obl
e
m
.
W
it
h
98.77%
a
c
c
ur
a
c
y,
th
e
c
onvolut
io
na
l
ne
ur
a
l
ne
twor
ks
(
C
N
N
)
m
ode
l
e
xc
e
ls
.
T
r
is
h
a
la
a
nd
M
a
m
a
th
a
[
24]
pr
e
s
e
nt
e
d
u
ns
up
e
r
vi
s
e
d
K
a
nna
da
te
r
m
s
s
te
m
m
e
r
a
n
d
K
a
nna
da
te
r
m
s
r
ul
e
-
ba
s
e
d
le
m
m
a
ti
z
e
r
.
T
he
y
bui
lt
a
c
or
pus
of
17,825
K
a
nna
da
r
oot
w
or
ds
f
o
r
th
e
e
xpe
r
im
e
nt
a
ti
on.
A
ddi
ti
ona
ll
y,
C
ha
ndr
a
ka
la
a
nd
T
hi
ppe
s
w
a
m
y
[
25]
pr
opos
e
d
hi
s
to
r
ic
a
l
ha
ndw
r
it
te
n
K
a
nna
da
s
to
ne
in
s
c
r
ip
ti
on
r
e
c
ogni
ti
on
a
nd
c
a
te
g
or
iz
a
ti
on
of
th
e
11
th
c
e
nt
ur
y.
T
he
c
ha
r
a
c
te
r
s
w
e
r
e
c
la
s
s
if
ie
d
us
in
g
two
s
e
pa
r
a
t
e
c
la
s
s
if
ic
a
ti
on
a
lg
or
it
hm
s
li
k
e
s
to
c
ha
s
ti
c
gr
a
di
e
nt
de
s
c
e
nt
w
it
h
m
om
e
nt
u
m
(
S
G
D
M
)
a
nd
s
uppor
t
ve
c
to
r
m
a
c
hi
ne
(
S
V
M
)
,
us
in
g
th
e
f
e
a
tu
r
e
s
c
ol
le
c
te
d
by
th
e
de
e
p
c
onvolut
io
na
l
ne
ur
a
l
ne
twor
k (
D
C
N
N
)
, a
nd 70%
a
c
c
ur
a
c
y w
a
s
a
tt
a
in
e
d.
I
n
th
e
s
tu
dy
of
te
xt
c
la
s
s
if
i
c
a
ti
on,
c
l
us
t
e
r
in
g
i
s
us
e
d
a
s
a
di
f
f
e
r
e
nt
r
e
pr
e
s
e
nt
a
ti
o
n
te
c
hni
qu
e
f
or
te
xt
doc
um
e
nt
s
.
T
h
e
r
e
ha
v
e
be
e
n
s
e
v
e
r
a
l
c
l
us
t
e
r
in
g
s
tr
a
te
g
ie
s
p
ut
f
or
th
.
T
h
e
s
e
c
lu
s
te
r
s
ta
k
e
a
dv
a
nt
a
g
e
of
th
e
r
e
la
ti
on
s
hi
p
b
e
twe
e
n doc
um
e
nt
s
a
nd
ke
y
te
r
m
s
.
S
un
e
t
a
l.
[
26]
a
ddr
e
s
s
e
d
th
e
im
ba
l
a
nc
e
d
d
a
ta
c
l
a
s
s
if
i
c
a
ti
o
n
by
th
e
a
da
p
ti
ve
w
e
ig
ht
e
d
k
-
ne
a
r
e
s
t
n
e
ig
hbor
s
(
A
W
K
N
N
)
m
e
th
o
d
w
hi
c
h
u
s
e
s
s
im
il
a
r
it
y
-
ba
s
e
d
f
e
a
tu
r
e
c
l
us
t
e
r
in
g.
T
he
r
e
s
e
a
r
c
he
r
s
[
27]
–
[
30]
w
or
k
e
d
on
th
e
in
f
or
m
a
ti
o
n
bot
tl
e
ne
c
k
m
e
th
od
a
nd
two
-
di
m
e
n
s
io
n
a
l
c
lu
s
t
e
r
in
g
a
lg
or
it
hm
s
,
w
hi
c
h
h
e
lp
in
t
he
c
lu
s
te
r
in
g
of
te
r
m
s
b
a
s
e
d
on
th
e
di
s
tr
ib
ut
io
n
of
e
a
c
h
t
e
r
m
’
s
c
l
a
s
s
la
b
e
ls
.
F
ur
th
e
r
a
ut
hor
s
i
n
[
31]
, [
3
2]
w
or
k
e
d on
f
e
a
t
ur
e
e
xt
r
a
c
ti
on u
s
in
g
a
c
lu
s
te
r
i
ng a
l
gor
it
hm
f
r
om
th
e
c
om
bi
na
ti
o
n of
la
be
l
e
d
a
n
d
unl
a
b
e
le
d
da
t
a
.
A
ut
h
or
s
in
[
33]
,
[
34]
w
or
ke
d
o
n
a
w
or
d
e
m
b
e
ddi
ng
a
ppr
o
a
c
h
f
or
di
m
e
ns
io
na
l
it
y
r
e
duc
ti
on
le
a
di
ng
to
be
tt
e
r
f
e
a
tu
r
e
s
e
le
c
ti
on.
T
o
w
a
r
d
s
s
e
m
a
nt
ic
ba
s
e
d
r
e
pr
e
s
e
nt
a
ti
on,
a
ut
hor
s
in
[
35]
,
[
36]
pr
e
s
e
nt
e
d
te
r
m
w
e
ig
ht
in
g
t
e
c
h
ni
que
.
I
t
i
s
ba
s
e
d
on
t
e
r
m
’
s
s
e
m
a
nt
i
c
s
im
il
a
r
it
y,
w
h
ic
h
i
s
c
om
put
e
d
u
s
in
g
W
or
dN
e
t.
D
ue
to
it
s
c
om
pl
e
x
it
y
in
c
om
p
ut
a
ti
on,
it
s
h
ow
s
lo
w
e
r
pe
r
f
or
m
a
nc
e
th
a
n
s
ta
n
da
r
d
t
e
r
m
w
e
ig
ht
in
g
m
e
th
od
s
.
P
o
s
it
io
na
l
e
nc
odi
ng
i
s
u
s
e
d
by
m
a
ny
a
tt
e
nt
io
n
-
b
a
s
e
d
m
ode
l
s
li
k
e
B
E
R
T
[
37]
,
R
oB
E
R
T
a
[
38]
,
a
nd
G
P
T
-
2
[
39]
.
A
bs
ol
ut
e
or
r
e
la
ti
ve
po
s
it
io
na
l
in
f
or
m
a
ti
on
of
th
e
te
r
m
i
s
e
xt
r
a
c
t
e
d
in
po
s
it
io
na
l
e
nc
o
di
ng
te
c
hni
que
[
40]
.
T
o
e
xt
e
n
d
tr
a
n
s
f
or
m
e
r
s
to
tr
e
e
dom
a
in
a
c
ti
vi
t
ie
s
(
p
a
r
ti
c
ul
a
r
ly
bi
na
r
y
tr
e
e
s
)
,
S
u
n
e
t
al
.
[
41]
pr
ovi
de
a
n
ove
l
f
r
a
m
e
w
or
k
of
c
u
s
to
m
i
z
e
d
po
s
it
io
n
a
l
e
nc
od
in
g
s
.
G
e
hr
in
g
e
t
al
.
[
42]
pr
opo
s
e
d
c
o
nvol
ut
i
ona
l
s
e
qu
e
n
c
e
t
o s
e
qu
e
nc
e
l
e
a
r
ni
ng mod
e
l.
T
he
y u
s
e
d pos
i
ti
ona
l
e
nc
odi
ng t
o e
xt
r
a
c
t
s
e
qu
e
n
c
e
i
nf
or
m
a
ti
on of
t
e
r
m
s
,
f
or
th
e
tr
a
n
s
l
a
ti
on
t
a
s
k.
I
n
th
i
s
w
a
y,
r
e
pr
e
s
e
nt
a
ti
o
na
l
a
ppr
o
a
c
h
e
s
e
m
pl
oy
pos
it
io
n
a
l
e
n
c
odi
ng.
T
he
li
te
r
a
tu
r
e
in
di
c
a
te
s
th
a
t
s
e
m
a
nt
i
c
a
nd
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
s
a
r
e
not
e
xpl
or
e
d
f
or
K
a
nn
a
da
do
c
um
e
nt
s
c
l
a
s
s
if
ic
a
ti
on
.
F
ur
th
e
r
,
t
he
r
e
is
a
l
s
o
ne
e
d
of
di
s
c
u
s
s
io
n
on
K
a
n
na
d
a
do
c
um
e
nt
s
c
la
s
s
if
ic
a
ti
on
e
xp
e
r
im
e
nt
s
ba
s
e
d
on
s
ym
bol
i
c
f
e
a
tu
r
e
s
e
le
c
ti
on
a
nd
c
la
s
s
if
ic
a
ti
on
s
w
i
th
ot
h
e
r
s
ta
t
e
-
of
-
th
e
-
a
r
t
le
a
r
ni
ng
a
lg
or
it
hm
s
.
3.
P
R
O
P
O
S
E
D
M
E
T
H
O
D
T
he
r
a
w
K
a
nna
da
te
xt
doc
um
e
nt
s
a
r
e
c
la
s
s
if
ie
d
in
to
m
ul
ti
pl
e
c
a
te
gor
ie
s
ba
s
e
d
on
th
e
s
e
m
a
nt
ic
it
y,
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
a
nd
s
e
le
c
ti
on
(
S
R
S
)
m
e
th
od.
T
hi
s
pr
opos
e
d
pr
oc
e
s
s
of
r
e
pr
e
s
e
nt
a
ti
on
(
e
m
be
dde
d
w
it
h
s
e
m
a
nt
ic
in
f
or
m
a
ti
on)
a
nd
th
e
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od
f
or
K
a
nna
da
doc
um
e
nt
c
la
s
s
if
ic
a
ti
on
.
I
t
is
de
pi
c
te
d i
n F
ig
ur
e
2.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
C
la
s
s
if
ic
at
io
n of
K
annada doc
um
e
nt
s
us
in
g nov
e
l
s
e
m
ant
ic
s
y
m
bol
ic
…
(
R
anganathbabu K
as
tu
r
i
R
angan
)
3357
F
ig
ur
e
2. T
he
pr
opos
e
d m
e
th
od of
r
e
pr
e
s
e
nt
a
ti
on a
nd
s
e
le
c
ti
on
of
K
a
nna
da
doc
um
e
nt
s
f
or
c
la
s
s
if
ic
a
ti
on t
a
s
k
I
n
pr
e
pr
oc
e
s
s
in
g
to
ke
ni
z
a
ti
on,
punc
tu
a
ti
on
a
nd
s
to
pw
or
d
s
r
e
m
ova
l
ta
s
ks
a
r
e
p
e
r
f
or
m
e
d.
A
t
f
ir
s
t,
th
e
pr
obl
e
m
of
te
r
m
r
e
pr
e
s
e
nt
a
ti
on
of
r
e
gi
ona
l
la
ngua
ge
is
a
ddr
e
s
s
e
d.
E
a
c
h
K
a
nna
da
te
r
m
is
r
e
pr
e
s
e
nt
e
d
by
a
uni
que
de
c
im
a
l
num
be
r
by
u
ni
c
ode
e
nc
odi
ng me
th
od.
T
hi
s
i
s
di
s
c
us
s
e
d i
n t
he
s
e
c
ti
on a
s
f
ol
lo
w
s
.
3.1. Un
ic
od
e
e
n
c
od
in
g f
or
K
an
n
ad
a t
e
r
m
T
o
ge
t
a
r
ound
th
e
di
s
c
or
da
nc
e
of
A
S
C
I
I
va
lu
e
s
e
nc
odi
ng
f
or
c
ha
r
a
c
te
r
s
f
r
om
la
ngua
ge
s
ot
he
r
th
a
n
E
ngl
is
h,
th
e
r
e
is
a
c
ha
r
a
c
te
r
s
e
t
c
a
ll
e
d
u
ni
c
od
e
.
E
ve
r
y
c
ha
r
a
c
te
r
is
e
nc
ode
d
uni
que
ly
in
u
ni
c
ode
us
in
g
a
s
pe
c
ia
l
num
be
r
c
a
ll
e
d
a
c
od
e
-
poi
nt
.
“
\
uX
X
X
X
,”
is
th
e
r
e
pr
e
s
e
nt
a
ti
on
of
e
a
c
h
c
ode
-
poi
nt
,
he
r
e
‘
u’
in
di
c
a
t
e
s
th
a
t
th
e
va
lu
e
is
a
c
ode
-
poi
nt
,
a
nd
‘
X
X
X
X
’
is
th
e
f
our
-
di
gi
t
he
xa
de
c
im
a
l
va
lu
e
.
I
n
th
e
pr
opos
e
d
e
xpe
r
im
e
nt
s
,
w
e
di
s
c
ov
e
r
e
d
th
a
t
s
e
ve
r
a
l
a
ggl
ut
in
a
ti
ve
/
m
or
phol
ogi
c
a
ll
y
r
ic
h
te
r
m
c
ha
r
a
c
te
r
s
f
a
de
d
w
he
n
te
xt
-
pr
oc
e
s
s
in
g
a
c
ti
vi
ti
e
s
a
r
e
c
onduc
te
d
di
r
e
c
tl
y
on
th
e
s
e
K
a
nn
a
da
te
r
m
s
.
T
hi
s
r
e
s
ul
ts
in
f
e
a
tu
r
e
in
f
or
m
a
ti
on
lo
s
s
.
I
n
or
de
r
to
r
e
ta
in
th
e
m
e
a
ni
ng
of
e
a
c
h
K
a
nna
da
te
r
m
in
ta
c
t
a
nd
a
voi
d
th
e
ne
e
d
f
or
e
xt
r
a
la
ngua
ge
c
or
por
a
,
w
e
ne
e
d
a
uni
c
ode
-
e
nc
ode
d
de
c
im
a
l
r
e
pr
e
s
e
nt
a
ti
on
f
or
e
a
c
h
te
r
m
[
2
]
.
A
n
e
xa
m
pl
e
is
s
how
n
in
T
a
bl
e
1.
F
or
e
xa
m
pl
e
:
A
te
r
m
i
n t
he
K
a
nna
da
l
a
ngua
ge
:
“
ಮ
ನ
ು
ಯ
”
(
E
ngl
is
h t
r
a
ns
la
ti
on:
huma
n be
in
g)
.
T
a
bl
e
1. U
ni
c
ode
e
nc
ode
d K
a
nna
da
t
e
r
m
r
e
pr
e
s
e
nt
a
ti
on
K
a
nna
da
c
ha
r
a
c
t
e
r
s
of
t
he
t
e
r
m
ಮ (
M
a
)
ನ
(
Na
)
ುು
ಷ
(
s
a)
ು
ಯ
(
Y
a
)
U
ni
c
ode
c
ode
-
poi
nt
s
\
u0c
a
e
\
u0c
a
8
\
u0c
c
1
\
u0c
b7
\
u0c
c
d
\
u0c
a
f
E
nc
odi
ng of
c
ode
-
poi
nt
s
(
U
T
F
-
16)
b'
\
xf
f
\
xf
e
\
xa
e
\
x0c
\
xa
8
\
x0c
\
xc
1
\
x0c
\
xb7
\
x0c
\
xc
d
\
x0c
\
xa
f
\
x0c
'
D
e
c
i
m
a
l
va
l
ue
257257805393772252295682176515839
3.2. S
e
m
an
t
ic
r
e
p
r
e
s
e
n
t
at
io
n
F
ol
lo
w
in
g
uni
c
ode
e
nc
odi
ng,
doc
um
e
nt
te
r
m
m
a
tr
ix
is
us
e
d
to
r
e
pr
e
s
e
nt
K
a
nna
da
doc
um
e
nt
s
in
ve
c
to
r
s
pa
c
e
m
ode
l
.
T
he
va
lu
e
s
of
th
e
te
r
m
f
r
e
que
nc
y
(
T
F
)
or
te
r
m
f
r
e
que
nc
y
-
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y
(
TF
-
I
D
F
)
a
r
e
in
c
lu
de
d
in
th
e
doc
um
e
nt
te
r
m
m
a
tr
ix
[
43]
.
P
os
it
io
na
l
e
nc
odi
ng
is
in
te
gr
a
te
d
w
it
h
T
F
or
TF
-
I
D
F
t
o s
ol
ve
t
he
pr
obl
e
m
of
t
he
a
bs
e
nc
e
of
s
e
qu
e
nc
e
i
nf
or
m
a
ti
on or
s
e
m
a
nt
ic
i
nf
or
m
a
ti
on.
P
os
it
io
na
l
e
nc
odi
ng
m
a
in
ta
in
s
th
e
s
e
qu
e
nc
e
or
de
r
of
th
e
te
r
m
s
.
F
or
a
n
e
xa
m
pl
e
,
if
in
put
s
e
nt
e
nc
e
is
of
le
ngt
h
′
′
,
a
nd
to
e
xt
r
a
c
t
th
e
′
ℎ
′
te
r
m
pos
it
io
na
l
in
f
or
m
a
ti
on
in
th
e
in
put
s
e
que
nc
e
,
th
e
pos
it
io
na
l
e
nc
odi
ng i
s
c
a
lc
ul
a
t
e
d a
s
s
how
n i
n
(
1
)
a
nd
(
2
)
us
in
g s
in
e
a
nd c
o
s
in
e
f
unc
ti
ons
.
.
(
,
2
)
=
s
in
(
2
⁄
)
(
1)
.
(
,
2
+
1
)
=
s
(
2
⁄
)
(
2)
W
he
r
e
“
”
is
k
t
h
obj
e
c
t
po
s
it
io
n,
“
”
is
u
s
e
r
de
f
in
e
d
s
c
a
la
r
va
lu
e
s
e
t
to
10
,
000
b
a
s
e
d
on
e
m
pi
r
ic
a
l
r
e
s
ul
ts
[
3]
,
di
m
e
ns
io
n
o
f
out
put
e
m
be
ddi
ng
s
pa
c
e
is
r
e
pr
e
s
e
nt
e
d
by
“
”
a
nd
“
”
is
th
e
in
de
x
r
a
nge
s
be
twe
e
n
0
≤
<
/
2
.
T
he
pos
it
io
na
l
e
nc
ode
d
(
P
E
)
va
lu
e
s
s
houl
d
a
ls
o
be
c
onvolut
e
d,
w
hi
c
h
m
e
a
ns
th
a
t
th
e
s
in
e
a
nd
c
os
in
e
va
lu
e
s
of
e
a
c
h t
e
r
m
s
houl
d be
a
dde
d a
s
gi
v
e
n i
n
(
3
)
a
nd a
ls
o pr
e
s
e
nt
e
d i
n A
lg
or
it
hm
1.
=
(
)
+
(
)
(
3)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
4
,
A
ugus
t
2025
:
3354
-
3365
3358
A
lg
or
it
hm
1:
P
os
it
io
na
l
e
nc
odi
ng
I
n
p
u
t
:
L
e
n
g
t
h
o
f
d
o
c
u
m
e
n
t
,
t
h
e
o
u
t
p
u
t
e
m
b
e
d
d
i
n
g
v
a
l
u
e
.
D
a
t
a
:
P
E
=
P
o
s
i
t
i
o
n
a
l
e
n
c
o
d
i
n
g
,
n
=
1
0
0
0
0
s
t
a
n
d
a
r
d
e
m
p
i
r
i
c
a
l
l
y
d
e
t
e
r
m
i
n
e
d
v
a
l
u
e
[
3
]
O
u
t
p
u
t
:
D
o
c
u
m
e
n
t
’
s
p
o
s
i
t
i
o
n
a
l
e
n
c
o
d
e
d
m
a
t
r
i
x
.
S
T
E
P
1
:
f
o
r
i
n
r
a
n
g
e
(
l
e
n
g
t
h
o
f
d
o
c
u
m
e
n
t
)
S
T
E
P
2
:
f
o
r
i
n
r
a
n
g
e
(
o
u
t
p
u
t
e
m
b
e
d
d
i
n
g
/
2
)
STEP 3:
(
,
2
)
=
s
i
n
(
2
⁄
)
STEP 4:
(
,
2
+
1
)
=
c
os
(
2
⁄
)
STEP 5:
=
,
2
+
,
2
+
1
S
T
E
P
6
:
e
n
d
S
T
E
P
7
:
e
n
d
T
he
ve
c
to
r
s
pa
c
e
goe
s
th
r
ough
s
ha
ll
ow
s
hi
f
ts
be
c
a
us
e
of
th
e
c
onvolut
io
n
[
5]
.
I
n
a
doc
um
e
nt
,
th
e
s
a
m
e
phr
a
s
e
m
a
y
a
ppe
a
r
in
v
a
r
io
us
pos
it
io
n
s
.
T
h
e
s
e
pos
it
io
n
s
of
th
e
s
a
m
e
te
r
m
a
r
e
c
om
bi
ne
d,
a
nd
th
e
ir
m
e
a
ns
a
r
e
c
a
l
c
ul
a
te
d, a
s
pr
e
s
e
nt
e
d i
n (
4)
(
is
T
e
r
m
’
s
r
e
pe
ti
ti
ve
c
ount
i
n a
doc
um
e
nt
)
.
(
∑
=
0
)
⁄
(
4)
T
he
m
e
a
n
va
lu
e
obt
a
in
e
d
f
r
om
(
4
)
is
e
m
be
dde
d
to
th
e
T
F
or
T
F
-
I
D
F
va
lu
e
s
of
ℎ
te
r
m
in
doc
um
e
nt
te
r
m
m
a
tr
ix
a
s
s
how
n i
n (
5)
a
nd (
6)
.
=
+
(
(
∑
=
0
)
⁄
)
(
5)
=
.
+
(
(
∑
=
0
)
⁄
)
(
6)
T
he
obt
a
in
e
d
a
tt
e
nt
io
n
/
s
e
m
a
nt
ic
b
a
s
e
d
te
r
m
w
e
ig
ht
s
(
)
of
ℎ
te
r
m
f
r
om
(
5)
a
nd
(
6)
is
upda
t
e
d
in
doc
um
e
nt
t
e
r
m
m
a
tr
ix
.
3.3. Clu
s
t
e
r
b
as
e
d
s
y
m
b
ol
ic
r
e
p
r
e
s
e
n
t
at
io
n
T
he
r
e
w
il
l
be
s
ig
ni
f
ic
a
nt
in
tr
a
-
c
la
s
s
va
r
ia
nc
e
s
in
th
e
s
e
m
a
nt
ic
ba
s
e
d
TF
ve
c
to
r
s
w
it
h
r
e
ga
r
d
to
e
a
c
h
c
la
s
s
.
A
s
a
r
e
s
ul
t,
a
n
e
f
f
e
c
ti
ve
r
e
pr
e
s
e
nt
a
ti
on
is
c
r
e
a
te
d
by
us
in
g
c
lu
s
te
r
in
g
to
c
a
pt
ur
e
th
e
va
r
ia
nc
e
s
a
nd
s
ym
bol
iz
in
g
e
a
c
h
c
lu
s
te
r
w
it
h
a
n
in
te
r
va
l
-
va
lu
e
d
f
e
a
tu
r
e
ve
c
to
r
.
L
e
t
th
e
r
e
be
c
la
s
s
e
s
e
a
c
h
w
it
h
doc
um
e
nt
s
,
a
nd
e
a
c
h
w
it
h
a
di
m
e
ns
io
na
l
TF
ve
c
to
r
to
d
e
s
c
r
i
be
it
.
L
e
t’
s
s
a
y
is
th
e
pr
opos
e
d
s
e
m
a
nt
ic
ba
s
e
d
doc
um
e
nt
t
e
r
m
m
a
tr
ix
(s
-
D
T
M
)
of
s
iz
e
(
∗
)
,
w
he
r
e
e
a
c
h
r
o
w
r
e
pr
e
s
e
nt
s
a
doc
um
e
nt
la
be
ll
e
d w
it
h
a
c
la
s
s
,
a
nd
te
r
m
s
a
r
e
r
e
pr
e
s
e
nt
e
d
in
th
e
m
a
tr
ix
c
ol
um
ns
.
T
he
d
im
e
ns
io
na
li
ty
r
e
duc
ti
on
te
c
hni
que
r
e
gul
a
r
iz
e
d
lo
c
a
li
ty
pr
e
s
e
r
vi
ng
in
de
x
(
R
L
P
I
)
[
7]
is
a
ppl
ie
d
on
r
e
s
ul
ti
ng
in
r
e
duc
e
d
s
-
D
T
M
r
e
pr
e
s
e
nt
e
d
a
s
(
×
)
,
w
he
r
e
is
s
e
le
c
te
d
vi
ta
l
f
e
a
tu
r
e
s
f
r
om
to
ta
l
f
e
a
tu
r
e
s
.
N
e
xt
,
in
th
e
r
e
duc
e
d
s
-
D
T
M
m
a
tr
ix
,
ba
s
e
d
on
TF
ve
c
to
r
s
tr
a
in
in
g
doc
um
e
nt
s
a
r
e
c
lu
s
t
e
r
e
d
w
it
hi
n
e
a
c
h
c
la
s
s
.
L
e
t
[
1
,
2
,
3
,
…
.
,
]
is
a
do
c
um
e
nt
c
lu
s
te
r
of
s
a
m
pl
e
s
be
lo
ngi
ng
to
ℎ
c
la
s
s
s
a
y
;
=
1
,
2
,
3
,
…
,
(
)
)
a
nd
=
1
,
2
,
3
,
…
,
.
F
ur
th
e
r
,
=
[
1
,
2
,
…
,
]
be
f
e
a
tu
r
e
s
s
e
t,
de
s
c
r
ib
in
g
a
s
a
m
pl
e
doc
um
e
nt
,
w
hi
c
h
be
lo
ngs
to
c
lu
s
te
r
.
F
ur
th
e
r
,
f
or
e
a
c
h
ℎ
f
e
a
tu
r
e
va
lu
e
b
e
lo
ngi
ng
to
th
e
ℎ
c
lu
s
te
r
is
r
e
pr
e
s
e
nt
e
d
by
in
te
r
va
l
va
lu
e
[
−
,
+
]
to
c
a
pt
ur
e
th
e
in
tr
a
c
la
s
s
va
r
ia
ti
ons
.
T
he
in
te
r
va
l
[
−
,
+
]
r
e
pr
e
s
e
nt
s
th
e
c
e
il
in
g
a
nd
f
lo
or
in
g
va
lu
e
s
of
a
f
e
a
tu
r
e
b
e
lo
ng
to
a
doc
um
e
nt
c
lu
s
te
r
.
L
a
t
e
r
,
f
or
a
c
lu
s
te
r
th
e
r
e
f
e
r
e
nc
e
doc
um
e
nt
is
r
e
pr
e
s
e
nt
e
d by the
i
nt
e
r
va
l
f
e
a
tu
r
e
va
lu
e
s
of
t
he
f
e
a
tu
r
e
s
=
1
,
2
,
3
,
…
,
a
s
s
ho
w
n i
n
(
7
)
.
RF
j
l
=
{
[
f
j1
−
,
f
j1
+
]
,
[
f
j2
−
,
f
j2
+
]
,
…
,
[
f
jm
−
,
f
jm
+
]
}
(
7)
I
n
(
7)
,
th
e
doc
um
e
nt
c
lu
s
te
r
s
of
c
la
s
s
‘
’
a
r
e
in
de
xe
d
by
=
1
,
2
,
3
,
…
,
.
I
t
s
houl
d
be
e
m
pha
s
iz
e
d
th
a
t,
in
c
ont
r
a
s
t
to
tr
a
di
ti
ona
l
f
e
a
tu
r
e
ve
c
to
r
s
,
th
is
one
is
a
n
in
t
e
r
va
l
va
lu
e
d
f
e
a
tu
r
e
ve
c
to
r
th
a
t
is
r
e
c
or
de
d
in
th
e
knowle
dge
ba
s
e
a
s
a
r
e
pr
e
s
e
nt
a
ti
on
f
or
th
e
ℎ
c
lu
s
te
r
.
T
hi
s
ge
ne
r
a
te
s
num
be
r
of
s
ym
bol
ic
ve
c
to
r
s
th
a
t
r
e
f
le
c
t
th
e
c
la
s
s
-
s
pe
c
if
ic
c
lu
s
te
r
s
.
A
s
a
r
e
s
ul
t,
w
e
w
il
l
be
obt
a
in
in
g
to
ta
l
(
×
)
r
e
pr
e
s
e
nt
a
ti
ve
ve
c
to
r
s
f
or
c
la
s
s
e
s
i
n t
he
da
ta
s
e
t.
3.4. S
ym
b
ol
ic
f
e
at
u
r
e
s
e
le
c
t
io
n
F
r
om
th
e
s
-
D
T
M
,
it
is
im
por
ta
nt
to
c
hoo
s
e
th
e
be
s
t
in
te
r
va
l
f
e
a
tu
r
e
s
th
a
t
ha
ve
th
e
le
a
s
t
a
m
ount
of
c
la
s
s
ove
r
la
p
be
c
a
us
e
th
e
s
e
ove
r
la
ppi
ng
of
in
te
r
va
l
f
e
a
tu
r
e
s
be
twe
e
n
c
la
s
s
e
s
r
e
duc
e
th
e
c
la
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y.
T
he
a
im
of
th
e
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
is
to
s
e
le
c
t
m
a
xi
m
um
va
r
ia
nc
e
f
e
a
tu
r
e
s
f
r
om
e
a
c
h
c
la
s
s
.
H
e
nc
e
,
w
e
c
r
e
a
te
a
pr
oxi
m
it
y
m
a
tr
ix
of
(
×
)
×
(
×
)
s
iz
e
a
nd
e
a
c
h
e
l
e
m
e
nt
a
r
e
m
ul
ti
va
lu
e
d
of
di
m
e
ns
io
n f
e
a
tu
r
e
s
.
In
(
8
)
r
e
s
ul
ts
t
he
s
im
il
a
r
it
y be
twe
e
n t
he
c
la
s
s
e
s
a
nd
w
it
h r
e
s
pe
c
t
to
ℎ
f
e
a
tu
r
e
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
C
la
s
s
if
ic
at
io
n of
K
annada doc
um
e
nt
s
us
in
g nov
e
l
s
e
m
ant
ic
s
y
m
bol
ic
…
(
R
anganathbabu K
as
tu
r
i
R
angan
)
3359
L
i
→
j
k
=
(
|
I
ik
∩
I
jk
|
|
I
jk
|
)
(
8)
I
n
(
8
)
,
=
[
−
,
+
]
∀
=
1
,
2
,
…
,
a
r
e
i
nt
e
r
va
l
f
e
a
tu
r
e
s
of
c
la
s
s
a
nd
s
im
il
a
r
ly
is
f
or
c
la
s
s
.
N
ow
,
f
r
om
th
e
pr
oxi
m
it
y
m
a
tr
ix
,
th
e
m
a
tr
ix
of
s
iz
e
:
(
×
)
2
×
is
bui
l
t
by
li
s
ti
ng
m
ul
ti
va
lu
e
d
ty
pe
e
le
m
e
nt
s
in
r
ow
s
.
F
ur
th
e
r
,
th
e
hi
ghe
s
t
c
or
r
e
la
ti
on
f
e
a
tu
r
e
s
w
i
ll
be
s
e
le
c
te
d
a
s
th
e
be
s
t
f
e
a
tu
r
e
s
.
T
he
to
ta
l
c
or
r
e
la
ti
ons
(
)
of
ℎ
c
ol
um
n
w
it
h
ot
he
r
ℎ
c
ol
um
ns
a
r
e
c
a
lc
ul
a
te
d,
a
nd
it
is
c
om
pa
r
e
d
w
it
h
a
ve
r
a
ge
c
or
r
e
la
ti
on
va
lu
e
(
)
a
s
s
how
n
in
(
9
)
a
nd
(
10
)
.
I
f
is
hi
ghe
r
th
a
n
th
e
n
th
os
e
ℎ
c
ol
um
n f
e
a
tu
r
e
s
a
r
e
s
e
le
c
te
d be
c
a
us
e
w
e
a
r
e
i
nt
e
r
e
s
te
d i
n
f
e
a
tu
r
e
s
w
it
h a
hi
gh de
gr
e
e
of
di
s
c
r
im
in
a
ti
on.
T
C
o
r
r
k
=
∑
C
o
r
r
(
k
th
C
o
l
um
n
,
y
th
C
o
l
um
n
)
m
y
=
0
(
9)
Avg
T
C
o
r
r
k
=
∑
T
C
o
r
r
k
m
k
=
0
m
⁄
(
10)
3.5. S
ym
b
ol
ic
c
la
s
s
i
f
ie
r
T
he
te
s
t
doc
um
e
nt
c
on
s
is
ts
of
f
e
a
tu
r
e
s
w
it
h
c
r
is
p
va
lu
e
s
but
w
e
ha
ve
r
e
pr
e
s
e
nt
a
ti
on
w
it
h
in
te
r
va
l
f
e
a
tu
r
e
va
lu
e
s
of
th
e
r
e
s
pe
c
ti
v
e
c
lu
s
te
r
to
c
om
pa
r
e
a
nd
c
la
s
s
if
y.
H
e
nc
e
th
e
c
la
s
s
if
ic
a
ti
on
w
il
l
be
pe
r
f
or
m
e
d
ba
s
e
d
on
de
gr
e
e
of
be
lo
ngi
ngne
s
s
.
F
or
th
e
te
s
t
doc
um
e
nt
,
le
t
=
[
1
,
2
,
…
,
]
be
a
di
m
e
ns
io
na
l
f
e
a
tu
r
e
ve
c
to
r
.
L
e
t
is
th
e
r
e
f
e
r
e
nc
e
doc
um
e
nt
of
ℎ
c
lu
s
te
r
of
ℎ
c
la
s
s
w
it
h
in
te
r
va
l
va
lu
e
s
.
E
a
c
h
ℎ
f
e
a
tu
r
e
va
lu
e
is
c
om
pa
r
e
d
w
it
h
th
e
c
or
r
e
s
ponding
in
te
r
va
ls
o
f
.
T
he
d
e
gr
e
e
of
be
lo
ngi
ngne
s
s
w
il
l
be
de
te
r
m
in
e
d
by
th
e
num
be
r
of
f
e
a
tu
r
e
s
w
hos
e
va
lu
e
s
f
a
ll
in
s
id
e
th
e
c
or
r
e
s
ponding
in
te
r
va
l.
I
f
th
e
va
lu
e
f
a
ll
s
in
s
id
e
th
e
in
te
r
va
l,
th
e
n
c
ount
is
1
e
ls
e
0.
B
e
lo
ngi
ngne
s
s
c
ount
is
us
e
d
to
de
te
r
m
in
e
th
e
c
la
s
s
la
be
l
f
or
th
e
te
s
t
doc
um
e
nt
a
s
s
how
n i
n
(
11
)
a
nd
(
12
)
.
=
∑
(
,
[
−
,
+
]
)
=
1
(
11)
(
,
[
−
,
+
]
)
=
{
1
;
(
≥
−
≤
+
)
0
;
ℎ
(
12)
B
e
lo
ngi
ngne
s
s
c
ount
is
c
om
put
e
d
f
or
a
ll
c
lu
s
te
r
s
of
a
ll
c
la
s
s
e
s
. L
a
te
r
th
e
te
s
t
doc
um
e
nt
c
l
a
s
s
la
be
l
is
pr
e
di
c
te
d
ba
s
e
d
on
th
e
c
la
s
s
ha
vi
ng
th
e
hi
ghe
s
t
.
I
n
th
is
w
a
y
th
e
K
a
nna
da
doc
um
e
nt
c
la
s
s
if
ic
a
ti
on
ta
s
k
i
s
pe
r
f
or
m
e
d.
T
he
e
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
w
it
h
c
om
p
a
r
is
on
of
ot
he
r
r
e
pr
e
s
e
nt
a
ti
ona
l
m
e
th
ods
a
nd
s
e
le
c
ti
on
m
e
th
ods
a
r
e
di
s
c
u
s
s
e
d i
n ne
xt
s
e
c
ti
on
.
4.
E
X
P
E
R
I
M
E
N
T
A
T
I
O
N
S
WI
T
H
R
E
S
U
L
T
S
T
he
s
ub
s
e
que
nt
s
e
c
ti
on
pr
e
s
e
nt
s
th
e
in
f
or
m
a
ti
on
on
th
e
da
ta
s
e
ts
us
e
d,
e
xp
e
r
im
e
nt
a
ti
ons
c
a
r
r
ie
d
out
a
nd c
om
pa
r
is
on of
t
he
pr
opos
e
d f
e
a
tu
r
e
s
e
l
e
c
ti
on me
th
ods
.
4.1. T
h
e
d
at
as
e
t
s
T
he
I
ndi
a
n
r
e
gi
ona
l
la
ngua
ge
,
K
a
nna
da
is
le
s
s
r
e
s
our
c
e
d
s
pe
c
ia
ll
y
a
t
th
e
doc
um
e
nt
le
ve
l.
T
he
pr
opos
e
d
m
ode
l
i
s
a
ppl
ie
d
on
th
e
f
ol
lo
w
in
g
two
K
a
nn
a
da
doc
um
e
nt
da
ta
s
e
t
s
.
T
h
e
f
ir
s
t
ve
r
s
io
n
of
da
ta
s
e
t
(
s
m
a
ll
)
ha
s
300
K
a
nna
d
a
doc
um
e
nt
s
of
di
f
f
e
r
e
nt
s
e
c
ti
ons
. T
hi
s
s
m
a
ll
da
ta
s
e
t
c
ont
a
in
s
5
c
a
te
gor
ie
s
li
ke
s
p
a
c
e
,
pol
it
ic
s
,
c
r
im
e
,
s
por
ts
,
a
nd
e
c
onomi
c
s
.
T
hi
s
da
ta
s
e
t
is
a
s
u
bs
e
t
of
th
e
la
r
ge
r
da
ta
s
e
t
pr
e
s
e
nt
e
d
f
ur
th
e
r
.
T
he
s
e
c
ond
da
t
a
s
e
t
i
s
a
la
r
ge
r
da
t
a
s
e
t
[
6]
w
hi
c
h
c
ont
a
in
s
11,
045
doc
um
e
nt
s
th
a
t
a
r
e
une
ve
nl
y
di
s
tr
ib
ut
e
d
a
m
ong 10 c
a
te
gor
ie
s
.
T
he
de
ta
il
s
of
t
he
s
e
da
ta
s
e
ts
a
r
e
a
s
s
how
n
i
n F
ig
ur
e
s
3 a
nd 4.
F
ig
ur
e
3. D
e
ta
il
s
of
K
a
nna
da
doc
um
e
nt
s
d
a
ta
s
e
t
(
s
m
a
ll
e
r
)
40
95
55
60
50
0
20
40
60
80
100
S
pa
c
e
P
ol
i
t
i
c
s
C
r
i
m
e
S
por
t
s
E
c
onom
i
c
s
1
2
3
4
5
C
ou
n
t
of
doc
u
m
e
n
t
s
C
a
t
e
g
or
i
e
s
S
m
a
l
l
D
a
t
a
s
e
t
N
o. of
D
oc
um
e
nt
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
4
,
A
ugus
t
2025
:
3354
-
3365
3360
F
ig
ur
e
4. D
e
ta
il
s
of
K
a
nna
da
doc
um
e
nt
s
d
a
ta
s
e
t
(
la
r
ge
r
)
4.2. E
xp
e
r
im
e
n
t
at
io
n
s
T
he
r
a
w
K
a
nna
da
doc
um
e
nt
s
a
r
e
pr
e
pr
oc
e
s
s
e
d
by
r
e
m
ovi
ng
punc
tu
a
ti
ons
,
a
lg
e
br
a
ic
num
be
r
s
,
a
nd
s
to
pw
or
ds
(
f
r
e
que
nc
y
ba
s
e
d)
.
T
oke
ni
z
e
d
K
a
nna
d
a
te
r
m
s
a
r
e
u
ni
c
ode
e
nc
ode
d
a
s
di
s
c
u
s
s
e
d
in
f
or
m
e
r
s
e
c
ti
on.
F
ur
th
e
r
, ba
s
e
d on the
pos
it
io
na
l
e
nc
odi
ng me
th
od t
he
t
e
r
m
s
s
e
q
ue
nt
ia
l
in
f
or
m
a
ti
on i
s
e
xt
r
a
c
te
d a
nd e
m
be
dde
d
in
to
th
e
doc
um
e
nt
te
r
m
m
a
t
r
ix
.
T
hi
s
le
a
ds
to
s
-
D
T
M
of
s
iz
e
(
×
)
.
T
he
R
L
P
I
is
a
ppl
ie
d
f
or
di
m
e
ns
io
na
li
ty
r
e
duc
ti
on
a
nd
he
nc
e
tr
a
ns
f
or
m
s
to
of
s
iz
e
(
×
)
.
R
L
P
I
[
7]
is
c
hos
e
n
be
c
a
u
s
e
it
di
s
c
ove
r
s
t
he
doc
um
e
nt
s
pa
c
e
’
s
di
s
c
r
im
in
a
ti
ng s
tr
uc
tu
r
e
.
F
ur
th
e
r
,
th
e
e
xpe
r
im
e
nt
s
a
r
e
c
onduc
te
d
ove
r
th
e
da
ta
s
e
t
s
pl
it
r
a
ti
o
of
50:
50
a
nd
60:
40.
A
s
a
f
or
e
m
e
nt
io
ne
d,
R
L
P
I
is
a
ppl
ie
d
to
s
e
le
c
t
m
f
e
a
tu
r
e
s
r
a
ng
in
g
f
r
om
1
to
15
di
m
e
ns
io
ns
.
F
ol
lo
w
in
g,
th
e
pr
opos
e
d
c
lu
s
te
r
ba
s
e
d
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
is
a
ppl
ie
d
f
or
th
e
tr
a
in
in
g
s
e
t,
r
e
s
ul
ti
ng
in
s
ym
bol
ic
ve
c
to
r
s
f
or
e
a
c
h
c
la
s
s
.
F
uz
z
y
C
m
e
a
ns
(
F
C
M
)
a
lg
or
it
hm
is
us
e
d
f
or
c
l
us
te
r
in
g
due
to
it
s
s
tr
e
ngt
h
of
id
e
nt
if
yi
ng
th
e
c
lu
s
te
r
s
a
nd
e
ve
n
th
e
bound
a
r
ie
s
a
r
e
ove
r
la
ppi
ng
in
th
e
da
ta
. T
he
num
be
r
of
c
lu
s
te
r
s
in
e
a
c
h
e
xpe
r
im
e
nt
a
ti
on
is
e
m
pi
r
ic
a
ll
y
de
c
id
e
d.
L
a
te
r
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
is
a
ppl
ie
d
to
s
e
le
c
t
opt
im
a
l
f
e
a
tu
r
e
s
s
ub
s
e
t,
a
nd
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
is
us
e
d
f
or
c
la
s
s
if
ic
a
ti
on
of
te
s
t
doc
um
e
nt
s
.
T
he
num
be
r
of
c
lu
s
te
r
s
in
e
a
c
h
e
xpe
r
im
e
nt
a
ti
on
is
e
m
pi
r
ic
a
ll
y
de
c
id
e
d.
E
a
c
h
e
xpe
r
im
e
nt
is
r
e
pe
a
te
d
3
ti
m
e
s
,
a
nd
m
in
im
um
a
c
c
ur
a
c
y,
m
a
xi
m
um
a
c
c
ur
a
c
y,
a
nd
a
ve
r
a
ge
a
c
c
ur
a
c
y
is
not
e
d
a
s
s
how
n
in
T
a
bl
e
s
2
to
4.
T
he
e
xpe
r
im
e
nt
r
e
s
ul
ts
a
r
e
ta
bul
a
te
d f
or
va
r
io
us
r
e
pr
e
s
e
nt
a
ti
ons
l
ik
e
S
R
S
_T
F
, S
R
S
_T
F
-
I
D
F
, S
R
S
_
PE
-
TF
-
I
D
F
.
I
n
T
a
bl
e
2,
K
a
nna
d
a
doc
um
e
nt
s
a
r
e
c
la
s
s
if
ie
d
by
u
s
in
g
pr
opos
e
d
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
.
I
n
th
is
w
or
k,
th
e
doc
um
e
nt
s
a
r
e
r
e
pr
e
s
e
nt
e
d
us
in
g
S
R
S
_T
F
ve
c
to
r
s
.
F
or
th
e
s
m
a
ll
da
ta
s
e
t
of
60:
40
s
pl
it
r
a
ti
o
,
w
it
h
3
c
lu
s
te
r
s
of
doc
um
e
nt
s
in
e
a
c
h
c
la
s
s
,
r
e
s
ul
te
d
85.57%
of
a
ve
r
a
ge
a
c
c
ur
a
c
y.
S
im
il
a
r
ly
f
or
la
r
ge
da
ta
s
e
t,
th
e
60:
40
s
pl
it
r
a
ti
o,
w
it
h
3
c
lu
s
te
r
s
of
doc
um
e
nt
s
in
e
a
c
h
c
la
s
s
,
r
e
s
ul
te
d
84.69%
of
a
ve
r
a
ge
a
c
c
ur
a
c
y.
F
ur
th
e
r
,
w
it
h
r
e
s
pe
c
t
to
S
R
S
_T
F
-
I
D
F
doc
um
e
nt
v
e
c
to
r
s
’
r
e
s
ul
t
s
a
r
e
ta
bul
a
t
e
d
in
T
a
bl
e
3. H
e
r
e
,
f
or
bot
h s
m
a
ll
a
nd
la
r
ge
da
ta
s
e
ts
,
60:
40
s
pl
it
r
a
ti
o
w
it
h
3
doc
um
e
nt
c
lu
s
te
r
s
a
t
e
a
c
h
c
la
s
s
yi
e
ld
e
d
86.65%
a
nd
85.90%
of
a
ve
r
a
ge
a
c
c
ur
a
c
y
r
e
s
pe
c
ti
ve
ly
.
I
n
T
a
bl
e
4,
c
l
a
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y
of
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
is
pr
e
s
e
nt
e
d
f
or
th
e
pr
opos
e
d
m
e
th
od
(
s
e
m
a
nt
ic
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on)
S
R
S
_P
E
_T
F
-
I
D
F
.
A
m
ong
50:
50
a
nd
60:
40
tr
a
in
-
te
s
t
s
pl
it
s
,
60:
40
s
pl
it
e
xpe
r
im
e
nt
s
w
it
h
3
c
lu
s
te
r
s
f
or
bot
h
da
ta
s
e
ts
yi
e
ld
e
d
hi
gh
e
s
t
r
e
s
ul
t
s
w
it
h
89.10 a
nd 87.65%
a
ve
r
a
ge
a
c
c
ur
a
c
y r
e
s
pe
c
ti
ve
ly
.
T
a
b
l
e
2
.
C
l
a
s
s
i
f
i
c
a
t
i
on
a
c
c
u
r
a
c
y
o
f
t
h
e
s
ym
b
o
li
c
d
o
c
um
e
n
t
c
l
a
s
s
i
f
ie
r
o
n
K
a
n
n
a
d
a
d
o
c
u
m
e
n
t
d
a
t
a
s
e
t
s
u
s
i
ng
S
R
S
_
T
F
D
a
t
a
s
e
t
T
r
a
i
ni
ng vs
T
e
s
t
i
ng
N
um
be
r
of
c
l
us
t
e
r
s
M
i
ni
m
um
a
c
c
ur
a
c
y
M
a
xi
m
um
a
c
c
ur
a
c
y
A
ve
r
a
ge
a
c
c
ur
a
c
y
S
m
a
l
l
da
t
a
s
e
t
50 vs
50
1
72.62
76.52
75.20
60 vs
40
1
74.95
79.40
76.85
50 vs
50
2
69.19
72.05
70.66
60 vs
40
2
72.56
75.34
74.13
50 vs
50
3
76.25
80.26
79.78
60 vs
40
3
83.26
87.25
85.57
50 vs
50
4
75.64
78.65
76.32
60 vs
40
4
76.58
79.59
78.60
L
a
r
ge
da
t
a
s
e
t
50 vs
50
1
65.63
69.17
67.38
60 vs
40
1
68.88
70.45
69.84
50 vs
50
2
71.38
75.39
74.99
60 vs
40
2
72.56
76.45
74.97
50 vs
50
3
76.52
79.45
78.61
60 vs
40
3
81.26
85.57
84.69
50 vs
50
4
68.26
71.52
68.95
60 vs
40
4
70.56
74.52
72.26
1697
744
136
457
904
2002
486
794
3009
816
S
pa
c
e
&
S
c
i
e
nc
e
P
ol
i
t
i
c
s
C
r
i
m
e
S
por
t
s
E
c
onom
i
c
s
E
nt
e
r
t
a
i
nm
e
nt
H
e
a
l
t
h
S
t
or
i
e
s
S
oc
i
a
l
S
c
i
e
nc
e
S
pi
r
i
t
ua
l
1
2
3
4
5
6
7
8
9
10
0
1000
2000
3000
4000
C
at
e
gor
i
e
s
C
ou
n
t
of
d
oc
u
m
e
n
t
s
L
ar
ge
D
at
as
e
t
No. of
Doc
um
e
nt
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
C
la
s
s
if
ic
at
io
n of
K
annada doc
um
e
nt
s
us
in
g nov
e
l
s
e
m
ant
ic
s
y
m
bol
ic
…
(
R
anganathbabu K
as
tu
r
i
R
angan
)
3361
T
a
bl
e
3. C
la
s
s
if
ic
a
ti
on a
c
c
ur
a
c
y of
t
he
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
on
K
a
nna
da
doc
um
e
nt
da
ta
s
e
ts
u
s
in
g
S
R
S
_T
F
-
I
D
F
D
a
t
a
s
e
t
T
r
a
i
ni
ng vs
T
e
s
t
i
ng
N
um
be
r
of
c
l
us
t
e
r
s
M
i
ni
m
um
a
c
c
ur
a
c
y
M
a
xi
m
um
a
c
c
ur
a
c
y
A
ve
r
a
ge
a
c
c
ur
a
c
y
S
m
a
l
l
da
t
a
s
e
t
50 vs
50
1
70.38
73.26
72.19
60 vs
40
1
71.95
73.95
72.85
50 vs
50
2
71.58
75.05
74.65
60 vs
40
2
74.69
78.34
76.32
50 vs
50
3
78.50
82.55
80.87
60 vs
40
3
85.55
88.60
86.65
50 vs
50
4
80.45
82.05
81.25
60 vs
40
4
81.50
83.90
82.55
L
a
r
ge
da
t
a
s
e
t
50 vs
50
1
68.85
70.25
69.50
60 vs
40
1
70.65
73.95
72.40
50 vs
50
2
72.40
76.39
74.50
60 vs
40
2
74.60
78.85
77.65
50 vs
50
3
79.68
83.58
82.50
60 vs
40
3
83.50
86.90
85.90
50 vs
50
4
70.30
72.10
71.55
60 vs
40
4
74.60
77.25
76.55
T
a
bl
e
4. C
la
s
s
if
ic
a
ti
on a
c
c
ur
a
c
y of
t
he
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
on Ka
nna
da
doc
um
e
nt
da
ta
s
e
ts
u
s
in
g
S
R
S
_P
E
-
TF
-
I
D
F
D
a
t
a
s
e
t
T
r
a
i
ni
ng vs
T
e
s
t
i
ng
N
um
be
r
of
c
l
us
t
e
r
s
M
i
ni
m
um
a
c
c
ur
a
c
y
M
a
xi
m
um
a
c
c
ur
a
c
y
A
ve
r
a
ge
a
c
c
ur
a
c
y
S
m
a
l
l
da
t
a
s
e
t
50 vs
50
1
74.95
77.65
76.50
60 vs
40
1
76.10
78.58
77.10
50 vs
50
2
73.26
75.05
74.12
60 vs
40
2
74.65
77.45
75.50
50 vs
50
3
81.90
84.60
83.65
60 vs
40
3
87.10
90.25
89.10
50 vs
50
4
82.65
83.15
82.95
60 vs
40
4
84.60
86.85
85.95
L
a
r
ge
da
t
a
s
e
t
50 vs
50
1
70.50
72.65
71.35
60 vs
40
1
72.64
75.15
73.56
50 vs
50
2
75.20
78.65
77.65
60
vs
40
2
77.10
79.25
78.35
50 vs
50
3
80.65
83.55
82.45
60 vs
40
3
84.95
88.25
87.65
50 vs
50
4
71.50
73.25
72.65
60 vs
40
4
73.55
75.50
74.20
T
he
va
r
io
us
s
ta
te
-
of
-
th
e
-
a
r
t
m
a
c
hi
ne
le
a
r
ni
ng
c
la
s
s
if
ie
r
s
r
e
s
u
lt
s
a
r
e
ta
bul
a
te
d
in
T
a
bl
e
s
5
a
nd
6.
T
he
c
la
s
s
if
ie
r
s
li
ke
de
c
is
io
n
tr
e
e
(
D
T
)
,
k
-
ne
a
r
e
s
t
ne
ig
hbor
(
K
N
N
)
,
S
V
M
w
it
h
va
r
io
us
ke
r
ne
ls
,
r
ul
e
-
ba
s
e
d
c
la
s
s
if
ie
r
a
nd
th
e
pr
opos
e
d
s
ym
bol
ic
c
la
s
s
if
ie
r
a
r
e
c
om
pa
r
e
d.
A
s
pe
c
ia
l
obs
e
r
va
ti
on
is
not
e
d
f
o
r
50:
5
0
tr
a
in
-
te
s
t
s
pl
it
r
a
ti
o
of
la
r
ge
da
ta
s
e
t.
T
he
pr
opos
e
d
c
la
s
s
if
ie
r
yi
e
ld
e
d
m
a
r
gi
na
ll
y
hi
gh
a
c
c
ur
a
c
y
of
82.50%
f
or
S
R
S
_T
F
-
I
D
F
th
a
n
82.45
%
of
a
c
c
ur
a
c
y
f
or
S
R
S
_
P
E
-
TF
-
I
D
F
.
B
ut
f
or
60:
40
tr
a
in
-
te
s
t
s
pl
it
o
f
la
r
ge
da
ta
s
e
t,
S
R
S
_P
E
-
TF
-
I
D
F
yi
e
ld
s
87.65%
of
a
c
c
ur
a
c
y,
w
hi
c
h
is
hi
ghe
r
th
a
n
S
R
S
_T
F
-
I
D
F
.
T
hi
s
obs
e
r
va
ti
on
r
e
f
le
c
ts
th
a
t
m
or
e
tr
a
in
in
g
s
a
m
pl
e
s
c
ont
r
ib
ut
e
to
th
e
b
e
tt
e
r
s
e
m
a
nt
ic
i
nf
or
m
a
ti
on
a
na
ly
s
is
.
F
or
bot
h
s
m
a
ll
a
nd
la
r
ge
K
a
nna
da
doc
um
e
nt
da
ta
s
e
ts
.
T
he
pr
opos
e
d
c
la
s
s
if
ie
r
yi
e
ld
s
be
tt
e
r
r
e
s
ul
ts
in
a
ll
va
r
ia
nt
s
of
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
ons
.
T
he
s
e
m
a
nt
ic
s
ym
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
yi
e
ld
s
be
s
t
a
ve
r
a
g
e
a
c
c
ur
a
c
y
of
89.10%
a
nd
87.65%
us
in
g s
ym
bol
ic
c
la
s
s
if
ie
r
w
hi
c
h i
s
a
ppl
ie
d on
s
m
a
ll
a
nd l
a
r
ge
da
ta
s
e
ts
r
e
s
p
e
c
ti
ve
ly
.
T
a
bl
e
5. C
om
pa
r
a
ti
ve
a
na
ly
s
is
of
t
he
s
ym
bol
ic
c
la
s
s
if
ie
r
w
it
h ot
he
r
c
la
s
s
if
ie
r
s
us
in
g 50:50 r
a
ti
o
C
l
a
s
s
i
f
i
e
r
S
R
S
_T
F
S
R
S
_T
F
-
I
D
F
S
R
S
_P
E
-
TF
-
I
D
F
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
DT
64.50
62.35
69.55
67.45
71.55
70.20
KNN
c
l
a
s
s
i
f
i
e
r
74.25
70.25
76.55
73.55
77.85
75.20
S
V
M
-
l
i
ne
a
r
76.55
74.60
78.95
75.15
79.65
76.50
S
V
M
-
RBF
71.40
70.55
73.65
71.05
76.55
72.20
S
V
M
-
s
i
gm
oi
d
78.90
76.30
80.15
76.95
80.94
78.60
S
V
M
-
pol
ynom
i
a
l
74.64
70.21
76.54
72.88
78.54
76.69
R
ul
e
ba
s
e
d
c
l
a
s
s
i
f
i
e
r
69.58
67.56
71.45
68.33
73.69
71.52
S
ym
bol
i
c
c
l
a
s
s
i
f
i
e
r
79.78
78.61
80.87
82.50
83.65
82.45
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
4
,
A
ugus
t
2025
:
3354
-
3365
3362
T
a
bl
e
6. C
om
pa
r
a
ti
ve
a
na
ly
s
is
of
t
he
s
ym
bol
ic
c
la
s
s
if
ie
r
w
it
h ot
he
r
c
la
s
s
if
ie
r
s
us
in
g 60:40 r
a
ti
o
C
l
a
s
s
i
f
i
e
r
S
R
S
_T
F
S
R
S
_T
F
-
I
D
F
S
R
S
_P
E
-
TF
-
I
D
F
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
S
m
a
l
l
da
t
a
s
e
t
L
a
r
ge
da
t
a
s
e
t
DT
66.36
63.49
71.36
70.83
73.58
71.55
KNN
c
l
a
s
s
i
f
i
e
r
77.84
73.55
78.36
76.59
79.64
78.65
S
V
M
-
l
i
ne
a
r
79.83
77.54
81.23
80.45
83.65
81.74
S
V
M
-
RBF
74.65
72.15
74.65
71.94
78.46
76.94
S
V
M
-
s
i
gm
oi
d
81.26
80.24
83.56
82.55
84.33
83.16
S
V
M
-
pol
ynom
i
a
l
76.92
74.56
79.55
76.54
81.76
80.38
R
ul
e
ba
s
e
d
c
l
a
s
s
i
f
i
e
r
70.12
69.15
74.36
71.55
76.58
72.66
S
ym
bol
i
c
c
l
a
s
s
i
f
i
e
r
85.57
84.69
86.65
85.90
89.10
87.65
4
.
3
.
C
o
m
p
a
r
i
s
o
n
o
f
p
r
op
o
s
e
d
s
y
m
b
ol
i
c
r
e
p
r
e
s
e
n
t
at
i
on
a
n
d
s
e
l
e
c
t
i
o
n
w
it
h
s
t
a
c
k
e
d
e
n
s
e
m
b
l
e
f
e
a
t
u
r
e
s
e
l
e
c
t
i
o
n
T
o
ove
r
c
om
e
v
a
r
io
us
s
hor
tc
om
in
gs
of
c
onve
nt
io
na
l
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od
s
,
e
ns
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
a
r
e
pr
opos
e
d.
I
n
e
ns
e
m
bl
e
w
e
c
a
n
f
in
d
th
e
r
ig
ht
bl
e
ndi
ng
of
va
r
io
us
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
.
A
s
th
is
is
th
e
e
xt
e
nd
e
d
w
or
k
of
s
ta
c
k
e
d
e
n
s
e
m
bl
e
f
e
a
tu
r
e
s
e
l
e
c
ti
on
on
K
a
nna
da
doc
um
e
nt
s
[
44]
,
w
e
di
s
c
us
s
th
e
c
om
pa
r
is
on
of
r
e
s
ul
ts
be
twe
e
n
th
e
pr
opos
e
d
s
e
m
a
nt
ic
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
a
nd
th
e
s
ta
c
ke
d
e
ns
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
.
I
n
s
ta
c
ke
d
e
n
s
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od,
th
e
r
e
a
r
e
two
la
ye
r
s
.
F
ir
s
t
la
ye
r
c
ons
is
t
s
of
c
hi
-
s
qu
a
r
e
a
nd
m
ut
ua
l
in
f
or
m
a
ti
on
ga
in
s
ta
ti
s
ti
c
a
l
m
e
th
od
s
.
F
ol
lo
w
in
g
in
th
e
s
e
c
ond la
ye
r
w
e
ha
ve
X
G
B
oos
t
m
e
th
od. S
e
le
c
te
d f
e
a
tu
r
e
s
of
f
ir
s
t
la
ye
r
a
r
e
gi
ve
n a
s
i
nput
f
or
t
he
s
e
c
ond la
ye
r
to
f
ur
th
e
r
id
e
nt
if
y
th
e
m
os
t
di
s
c
r
im
in
a
ti
ve
f
e
a
tu
r
e
s
.
T
hr
ough
th
is
e
ns
e
m
bl
e
of
f
e
a
tu
r
e
s
e
l
e
c
ti
on
m
e
th
od
s
vi
ta
l
f
e
a
tu
r
e
s
a
r
e
i
de
nt
if
ie
d a
nd us
e
d f
or
t
he
K
a
nna
d
a
doc
um
e
nt
s
c
la
s
s
if
ic
a
ti
on.
T
he
pr
opos
e
d
S
R
S
(
S
R
S
_P
E
-
TF
-
I
D
F
)
is
c
om
pa
r
e
d
w
it
h
s
ta
c
ke
d
e
ns
e
m
bl
e
f
e
a
tu
r
e
s
e
l
e
c
ti
on
by
a
ppl
yi
ng
S
V
M
,
K
N
N
,
a
nd
D
T
c
la
s
s
if
ie
r
s
f
or
th
e
la
r
ge
d
a
ta
s
e
t
s
pl
it
of
60:
40
tr
a
in
-
te
s
t
s
pl
it
.
T
he
s
a
m
e
is
pr
e
s
e
nt
e
d
in
F
ig
ur
e
5.
R
a
th
e
r
th
a
n
th
e
c
r
is
p
f
e
a
tu
r
e
v
a
lu
e
s
,
in
te
r
va
l
va
lu
e
f
e
a
tu
r
e
s
yi
e
ld
be
tt
e
r
r
e
s
ul
ts
.
H
e
r
e
s
ym
bol
ic
c
la
s
s
if
ie
r
is
not
us
e
d
f
or
c
om
pa
r
is
on
be
c
a
u
s
e
s
ta
c
ke
d
e
n
s
e
m
bl
e
f
e
a
tu
r
e
s
a
r
e
c
r
is
p
va
lu
e
d.
A
m
ong
th
e
a
f
or
e
m
e
nt
io
ne
d
c
la
s
s
if
ie
r
s
,
S
V
M
doe
s
be
tt
e
r
w
it
h
bot
h
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
ods
.
F
ur
th
e
r
,
th
e
pr
opos
e
d
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
S
R
S
_P
E
-
TF
-
I
D
F
yi
e
ld
s
s
ig
ni
f
ic
a
nt
in
c
r
e
a
s
e
in
a
ve
r
a
g
e
a
c
c
ur
a
c
y
of
83.16%
w
he
n c
om
pa
r
e
d t
o s
ta
c
ke
d
e
ns
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
m
e
th
od f
or
S
V
M
c
la
s
s
if
ie
r
.
F
ig
ur
e
5. C
om
pa
r
is
on of
pr
opos
e
d
S
R
S
w
it
h s
ta
c
ke
d e
n
s
e
m
bl
e
f
e
a
tu
r
e
s
e
le
c
ti
on
5.
C
O
N
C
L
U
S
I
O
N
WI
T
H
F
U
T
U
R
E
S
C
O
P
E
I
t
is
e
vi
de
nt
f
r
om
a
ll
th
e
e
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
,
th
a
t
s
e
m
a
nt
ic
s
y
m
bol
ic
r
e
pr
e
s
e
nt
a
ti
on
w
it
h
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
a
nd
s
ym
bol
ic
doc
um
e
nt
c
la
s
s
if
ie
r
r
e
s
ul
t
s
be
tt
e
r
in
th
e
K
a
nn
a
da
do
c
um
e
nt
c
la
s
s
if
ic
a
ti
on
ta
s
k.
T
he
pr
opos
e
d
e
xpe
r
im
e
nt
s
r
e
ve
a
l
th
a
t
th
e
in
te
r
va
l
da
ta
r
e
pr
e
s
e
nt
a
ti
on
a
id
s
in
s
to
r
in
g
th
e
in
tr
a
c
la
s
s
va
r
ia
nc
e
in
f
or
m
a
ti
on.
T
he
pos
it
io
na
l
e
nc
odi
ng
e
m
be
d
s
th
e
te
r
m
s
s
e
que
n
c
e
pos
it
io
na
l
in
f
or
m
a
ti
on
a
nd
he
lp
s
in
s
to
r
in
g
a
tt
e
nt
io
n
or
s
e
m
a
nt
ic
ba
s
e
d
in
f
or
m
a
ti
on.
A
t
th
e
doc
um
e
nt
le
ve
l
of
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
ta
s
ks
,
di
m
e
ns
io
na
li
ty
is
one
of
th
e
m
a
jo
r
c
ha
ll
e
nge
s
.
T
o
a
ddr
e
s
s
th
e
di
m
e
ns
io
na
li
ty
r
e
duc
ti
on
th
e
s
ym
bol
ic
f
e
a
tu
r
e
s
e
le
c
ti
on
is
pr
opos
e
d,
a
nd
it
r
e
s
ul
te
d
in
be
tt
e
r
a
c
c
ur
a
c
y
f
or
K
a
nna
da
doc
um
e
nt
s
c
la
s
s
if
ic
a
ti
on.
F
r
om
a
ll
e
xpe
r
im
e
nt
s
,
th
e
pr
opos
e
d
r
e
pr
e
s
e
nt
a
ti
on
a
nd
s
e
le
c
ti
on
a
ppr
oa
c
h
(
S
R
S
_P
E
-
TF
-
I
D
F
)
r
e
s
ul
te
d
in
th
e
hi
ghe
s
t
a
ve
r
a
ge
a
c
c
ur
a
c
y of
87.65%
f
or
th
e
60:
40 t
r
a
in
-
te
s
t
s
pl
it
of
a
l
a
r
ge
da
ta
s
e
t.
T
he
pr
opos
e
d m
e
th
ods
a
ls
o obta
in
83.16
78.65
71.55
74.15
60.91
47.78
0
10
20
30
40
50
60
70
80
90
S
V
M
K
N
N
DT
A
C
C
U
R
A
C
Y
(
%
)
C
L
A
S
S
I
F
I
E
R
S
C
O
M
P
A
R
I
T
I
V
E
A
N
A
L
Y
S
I
S
S
R
S
_P
E
-
T
F
-
I
D
F
S
t
a
c
ke
d E
ns
e
m
bl
e
f
e
a
t
ur
e
s
e
l
e
c
t
i
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
C
la
s
s
if
ic
at
io
n of
K
annada doc
um
e
nt
s
us
in
g nov
e
l
s
e
m
ant
ic
s
y
m
bol
ic
…
(
R
anganathbabu K
as
tu
r
i
R
angan
)
3363
a
n
a
ve
r
a
ge
a
c
c
ur
a
c
y
of
89.10%
f
or
th
e
s
m
a
ll
da
ta
s
e
t
w
it
h
a
6
0:
40
tr
a
in
-
te
s
t
s
pl
it
r
a
ti
o,
w
hi
c
h
is
hi
ghe
r
th
a
n
th
a
t
of
e
xi
s
ti
ng
s
ta
te
-
of
-
th
e
-
a
r
t
m
e
th
ods
.
S
in
c
e
K
a
nna
da
is
a
lo
w
r
e
s
our
c
e
la
ngua
ge
,
th
e
pr
opos
e
d
m
e
th
ods
c
oul
d
be
us
e
d
f
or
ne
w
ly
c
on
s
tr
uc
te
d,
la
r
ge
r
K
a
nna
da
doc
um
e
nt
c
ol
le
c
ti
ons
in
f
ut
ur
e
w
or
k.
F
ur
th
e
r
,
a
s
th
e
da
ta
s
e
t
i
s
unba
la
n
c
e
d,
th
e
e
xpe
r
im
e
nt
a
ti
ons
c
oul
d
be
e
xt
e
nde
d
to
K
-
F
ol
d
va
li
da
ti
ons
.
A
f
te
r
th
e
c
r
e
a
ti
on
of
la
r
ge
r
da
ta
s
e
t,
th
e
n
th
e
s
e
m
a
nt
ic
s
ym
bol
ic
f
e
a
tu
r
e
ve
c
to
r
s
be
h
a
vi
or
s
c
oul
d
be
a
na
ly
z
e
d
w
it
h
va
r
io
us
ne
ur
a
l
ne
twor
k
c
la
s
s
if
ie
r
s
in
f
ut
ur
e
.
T
he
s
e
pr
opos
e
d
m
e
th
od
s
c
oul
d
a
ls
o
be
a
ppl
ie
d
f
or
ot
he
r
na
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g t
a
s
k
s
l
ik
e
na
m
e
d
e
nt
it
y r
e
c
ogni
ti
on, s
e
nt
im
e
nt
a
na
ly
s
is
a
t
pa
r
a
gr
a
ph l
e
ve
l
, a
nd
s
um
m
a
r
iz
a
ti
on.
F
U
N
D
I
N
G
I
N
F
O
R
M
A
T
I
O
N
A
ut
hor
s
s
ta
te
t
he
r
e
i
s
no f
undi
ng i
nvol
ve
d.
A
U
T
H
O
R
C
O
N
T
R
I
B
U
T
I
O
N
S
S
T
A
T
E
M
E
N
T
T
hi
s
jo
ur
na
l
us
e
s
th
e
C
ont
r
ib
ut
or
R
ol
e
s
T
a
xonomy
(
C
R
e
di
T
)
to
r
e
c
ogni
z
e
in
di
vi
dua
l
a
ut
hor
c
ont
r
ib
ut
io
ns
, r
e
duc
e
a
ut
hor
s
hi
p di
s
put
e
s
,
a
nd f
a
c
il
it
a
te
c
ol
la
bo
r
a
ti
on.
N
am
e
o
f
A
u
t
h
or
C
M
So
Va
Fo
I
R
D
O
E
Vi
Su
P
Fu
R
a
nga
na
th
ba
bu
K
a
s
tu
r
i
R
a
nga
n
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
B
uka
ha
ll
y
S
om
a
s
he
ka
r
H
a
r
is
h
✓
✓
✓
✓
✓
✓
✓
✓
✓
C
ha
lu
ve
gow
da
K
a
na
ka
la
ks
hm
i
R
oopa
✓
✓
✓
✓
✓
✓
✓
✓
C
:
C
onc
e
pt
ua
l
i
z
a
t
i
on
M
:
M
e
t
hodol
ogy
So
:
So
f
t
w
a
r
e
Va
:
Va
l
i
da
t
i
on
Fo
:
Fo
r
m
a
l
a
na
l
ys
i
s
I
:
I
nve
s
t
i
ga
t
i
on
R
:
R
e
s
our
c
e
s
D
:
D
a
t
a
C
ur
a
t
i
on
O
:
W
r
i
t
i
ng
-
O
r
i
gi
na
l
D
r
a
f
t
E
:
W
r
i
t
i
ng
-
R
e
vi
e
w
&
E
di
t
i
ng
Vi
:
Vi
s
ua
l
i
z
a
t
i
on
Su
:
Su
pe
r
vi
s
i
on
P
:
P
r
oj
e
c
t
a
dm
i
ni
s
t
r
a
t
i
on
Fu
:
Fu
ndi
ng a
c
qui
s
i
t
i
on
C
O
N
F
L
I
C
T
O
F
I
N
T
E
R
E
S
T
S
T
A
T
E
M
E
N
T
A
ut
hor
s
s
ta
te
no c
onf
li
c
t
of
i
nt
e
r
e
s
t.
D
A
T
A
A
V
A
I
L
A
B
I
L
I
T
Y
T
he
da
t
a
th
a
t
s
uppor
t
th
e
f
in
di
ngs
of
th
i
s
s
tu
dy
a
r
e
op
e
nl
y
a
va
il
a
bl
e
in
K
a
ggl
e
r
e
po
s
it
or
y
a
t
ht
tp
s
:/
/d
oi
.or
g/
10.34740/kaggle
/d
s
v/
7376871.
R
E
F
E
R
E
N
C
E
S
[
1]
M
.
Z
.
A
ns
a
r
i
,
T
.
A
hm
a
d,
a
nd
A
.
F
a
t
i
m
a
,
“
F
e
a
t
ur
e
s
e
l
e
c
t
i
on
on
noi
s
y
T
w
i
t
t
e
r
s
hor
t
t
e
xt
m
e
s
s
a
ge
s
f
or
l
a
ngua
ge
i
de
nt
i
f
i
c
a
t
i
on,”
ar
X
i
v
-
C
om
put
e
r
Sc
i
e
nc
e
, pp. 1
–
19, 2020.
[
2]
R
.
K
.
R
a
nga
n
a
nd
B
.
S
.
H
a
r
i
s
h,
“
K
a
nna
da
doc
um
e
nt
c
l
a
s
s
i
f
i
c
a
t
i
on
us
i
ng
uni
c
ode
t
e
r
m
e
nc
odi
ng
ove
r
ve
c
t
or
s
pa
c
e
,”
i
n
R
e
c
e
nt
A
dv
anc
e
s
i
n A
r
t
i
f
i
c
i
al
I
nt
e
l
l
i
ge
nc
e
and D
at
a E
ngi
ne
e
r
i
ng
, 2022, pp. 387
–
400
, doi
:
10.1007/
978
-
981
-
16
-
3342
-
3_31.
[
3]
A
.
V
a
s
w
a
ni
e
t
al
.
,
“
A
t
t
e
nt
i
on
i
s
a
l
l
you
ne
e
d,”
i
n
31s
t
C
onf
e
r
e
nc
e
on
N
e
u
r
al
I
nf
or
m
at
i
on
P
r
oc
e
s
s
i
ng
S
y
s
t
e
m
s
(
N
I
P
S
2017)
,
C
a
l
i
f
or
ni
a
, U
ni
t
e
d S
t
a
t
e
s
, 2017, pp. 1
–
11.
[
4]
B
.
S
.
H
a
r
i
s
h,
D
.
S
.
G
ur
u,
S
.
M
a
nj
una
t
h,
a
nd
R
.
D
i
ne
s
h,
“
C
l
us
t
e
r
ba
s
e
d
s
ym
bol
i
c
r
e
pr
e
s
e
nt
a
t
i
on
a
nd
f
e
a
t
ur
e
s
e
l
e
c
t
i
on
f
or
t
e
xt
c
l
a
s
s
i
f
i
c
a
t
i
on,”
i
n
A
dv
anc
e
d D
at
a M
i
ni
ng and A
ppl
i
c
at
i
ons
, 2010, pp. 158
–
166
, doi
:
10.1007/
978
-
3
-
642
-
17313
-
4_16.
[
5]
R
.
K
.
R
a
ng
a
n,
B
.
S
.
H
a
r
i
s
h,
a
nd
C
.
K
.
R
oop
a
,
“
S
e
m
a
nt
i
c
t
e
r
m
w
e
i
ght
i
ng
r
e
p
r
e
s
e
nt
a
t
i
on
f
or
K
a
nna
da
doc
um
e
nt
c
l
a
s
s
i
f
i
c
a
t
i
on,”
R
e
v
ue
d’
I
nt
e
l
l
i
ge
nc
e
A
r
t
i
f
i
c
i
e
l
l
e
, vol
. 38, no. 4, pp. 1243
–
1253, 2024, doi
:
10.1
8280/
r
i
a
.380418.
[
6]
R
.
K
.
R
a
nga
n,
“
K
a
nna
d
a
doc
um
e
nt
s
f
or
c
l
a
s
s
i
f
i
c
a
t
i
on
(
K
D
C
)
:
K
a
nna
da
do
c
um
e
nt
s
da
t
a
s
e
t
f
or
N
.L
.P
t
a
s
ks
,
”
K
aggl
e
.
2024.
[
O
nl
i
ne
]
. A
va
i
l
a
bl
e
:
ht
t
ps
:
/
/
w
w
w
.ka
ggl
e
.c
om
/
da
t
a
s
e
t
s
/
r
ka
s
t
ur
i
r
a
nga
n/
ka
nna
da
-
doc
um
e
nt
s
-
f
or
-
c
l
a
s
s
i
f
i
c
a
t
i
on
-
kdc
[
7]
D
.
C
a
i
,
X
.
H
e
,
W
.
V
.
Z
ha
ng,
a
nd
J
.
H
a
n,
“
R
e
gul
a
r
i
z
e
d
l
oc
a
l
i
t
y
pr
e
s
e
r
vi
ng
i
nde
xi
ng
vi
a
s
pe
c
t
r
a
l
r
e
gr
e
s
s
i
on,”
i
n
P
r
oc
e
e
di
ngs
of
t
he
s
i
x
t
e
e
nt
h A
C
M
c
onf
e
r
e
nc
e
on
C
onf
e
r
e
nc
e
on
i
nf
or
m
at
i
on and
k
now
l
e
dge
m
ana
ge
m
e
nt
, N
e
w
Y
or
k,
U
ni
t
e
d S
t
a
t
e
s
:
A
C
M
,
2007, pp.
741
–
750
, doi
:
10.1145/
1321440.1321544.
[
8]
W
.
L
i
,
H
.
Z
hou,
W
.
X
u,
X
.
-
Z
.
W
a
ng,
a
nd
W
.
P
e
dr
yc
z
,
“
I
nt
e
r
va
l
dom
i
na
nc
e
-
ba
s
e
d
f
e
a
t
ur
e
s
e
l
e
c
t
i
on
f
or
i
nt
e
r
va
l
-
va
l
ue
d
or
de
r
e
d
da
t
a
,”
I
E
E
E
T
r
ans
ac
t
i
ons
on
N
e
ur
al
N
e
t
w
or
k
s
and
L
e
a
r
ni
ng
Sy
s
t
e
m
s
,
vo
l
.
34,
no.
10,
pp.
6898
–
6912,
O
c
t
.
2023,
doi
:
10.1109/
T
N
N
L
S
.2022.3184120.
[
9]
H
.
G
a
ndhi
a
nd
V
.
A
t
t
a
r
,
“
S
e
nt
i
m
e
nt
of
p
r
i
m
a
r
y
f
e
a
t
ur
e
s
i
n
a
s
pe
c
t
ba
s
e
d
s
e
nt
i
m
e
nt
a
na
l
ys
i
s
of
hi
ndi
r
e
vi
e
w
s
,”
i
n
A
ppl
i
e
d
C
om
put
at
i
onal
T
e
c
hnol
ogi
e
s
, S
i
nga
por
e
:
S
pr
i
nge
r
, 2022, pp. 567
–
578
, doi
:
10.
1007/
978
-
981
-
19
-
2719
-
5_54.
[
10]
M
.
A
na
nd,
K
.
B
.
S
a
ha
y,
M
.
A
.
A
hm
e
d,
D
.
S
ul
t
a
n,
R
.
R
.
C
ha
nd
a
n,
a
nd
B
.
S
i
ngh,
“
D
e
e
p
l
e
a
r
ni
ng
a
nd
n
a
t
ur
a
l
l
a
ngua
ge
pr
oc
e
s
s
i
n
g
i
n
c
om
put
a
t
i
on
f
or
of
f
e
ns
i
ve
l
a
ngua
ge
de
t
e
c
t
i
on
i
n
onl
i
ne
s
oc
i
a
l
ne
t
w
or
k
s
by
f
e
a
t
ur
e
s
e
l
e
c
t
i
on
a
nd
e
ns
e
m
bl
e
c
l
a
s
s
i
f
i
c
a
t
i
on
t
e
c
hni
que
s
,”
T
he
o
r
e
t
i
c
al
C
om
put
e
r
Sc
i
e
nc
e
, vol
. 943, pp. 203
–
218, 2023, doi
:
1
0.1016/
j
.t
c
s
.2022.06.020.
Evaluation Warning : The document was created with Spire.PDF for Python.