I
A
E
S
I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
r
t
if
ic
ia
l
I
n
t
e
ll
ig
e
n
c
e
(
I
J
-
AI
)
V
ol
.
14
, N
o.
6
,
D
e
c
e
m
be
r
2025
, pp.
5218
~
5230
I
S
S
N
:
2252
-
8938
,
D
O
I
:
10.11591/
ij
a
i.
v
14
.i
6
.pp
5218
-
5230
5218
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
ai
.
ia
e
s
c
or
e
.c
om
G
r
ap
h
b
ase
d
se
m
a
n
t
i
c
e
m
ai
l
c
l
ass
i
f
i
c
at
i
on
:
a n
ove
l
ap
p
r
oac
h
f
or
ac
ad
e
m
i
c
i
n
st
i
t
u
t
i
on
s
A
r
u
n
a K
u
m
ar
a B
.
1
, M
ad
an
H
.
T
.
2
, R
as
h
m
i
C
.
3
, S
ar
vam
an
g
al
a D
.
R
.
3
1
D
e
pa
r
t
m
e
nt
of
C
om
put
e
r
S
c
i
e
nc
e
a
nd E
ngi
ne
e
r
i
ng, P
r
oudha
de
va
r
a
ya
I
ns
t
i
t
ut
e
of
T
e
c
hnol
ogy
,
H
os
a
pe
t
e
, I
ndi
a
2
D
e
pa
r
t
m
e
nt
of
E
l
e
c
t
r
oni
c
s
a
nd C
om
m
uni
c
a
t
i
on (
A
dva
nc
e
d C
om
m
uni
c
a
t
i
on T
e
c
hnol
ogy)
,
N
M
A
M
I
ns
t
i
t
ut
e
of
T
e
c
hnol
ogy,
N
i
t
t
e
(
D
e
e
m
e
d t
o be
U
ni
ve
r
s
i
t
y)
,
K
a
r
ka
l
a
, I
ndi
a
3
S
c
hool
of
C
om
put
i
ng a
nd I
nf
or
m
a
t
i
on
T
e
c
hnol
ogy,
R
E
V
A
U
ni
ve
r
s
i
t
y,
B
a
nga
l
or
e
, I
ndi
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
N
ov
11
,
2024
R
e
vi
s
e
d
O
c
t
3
,
2025
A
c
c
e
pt
e
d
O
c
t
18
,
2025
Electronic
mail
classification
in
educational
institutes
become
s
the
fundamental
task
to
manage
informatio
n
efficiently.
Due
to
the
global
ization
and
the
technological
advancement,
volume
of
email
users
incr
easing
consist
ently,
which
in
turn
increases
the
volume
of
digital
data
exponenti
ally.
This
necessitat
es
the
developi
ng
automated
email
classifi
cation s
ystems
for the b
etter and
organized
work. Thi
s paper de
velops
a
novel
graph
-
based
similarity
(GBS)
approach
based
on
semantic
si
milarity
to
address
these
challenges
.
The
method
initi
ally
selects
the
most
r
elevant
features based on
feature weights
, later
it bui
lds a
graph by us
ing
Jacc
ard co
-
efficient
method
for
each
category
with
features
as
nodes
and
corr
elation
between
the
nodes
as
edges.
Later,
these
graphs
are
used
as
templates
for
each
category
and
classifies
each
new
incoming
email
into
the
specific
class
based
on
the
similarity
among
the
graph
templates
and
a
new
emai
l.
The
GBS
method
was
compared
with
the
well
-
known
benchmarked
email
classifi
ers
and
the
findings
demonstrated
that
the
GBS
method
outperf
ormed
with
98.91%
accuracy
after
fine
-
tuning
of
graph
parameters
a
nd
the
classifi
er
hyper
parameters.
Additi
onally,
receiver
operating
charac
teristic
(ROC)
curve
analysis
was
conducted,
achieving
a
highest
area
under
curve
(AUC)
score
0.989,
demonstrating
robust
classification
proficiency
across
all catego
ries.
K
e
y
w
o
r
d
s
:
E
m
a
il
c
la
s
s
if
ic
a
ti
on
G
r
a
ph ba
s
e
d c
la
s
s
if
ic
a
ti
on
M
ul
ti
c
la
s
s
c
la
s
s
if
ic
a
ti
on
N
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
S
e
m
a
nt
ic
s
im
il
a
r
it
y
This is an
open
acce
ss artic
le unde
r the
CC BY
-
SA
license.
C
or
r
e
s
pon
di
n
g A
u
th
or
:
M
a
da
n H
.
T
.
D
e
pa
r
tm
e
nt
of
E
le
c
tr
oni
c
s
a
nd C
om
m
uni
c
a
ti
on (
A
dva
nc
e
d C
o
m
m
uni
c
a
ti
on T
e
c
hnol
ogy)
N
M
A
M
I
ns
ti
tu
te
of
T
e
c
hnol
ogy (
N
M
A
M
I
T
)
, N
it
te
(
D
e
e
m
e
d t
o
be
U
ni
ve
r
s
it
y)
N
it
te
, K
a
r
ka
la
T
a
lu
k, U
dupi
-
574110, Ka
r
na
ta
ka
, I
ndi
a
E
m
a
il
:
m
a
da
n.ht
@
ni
tt
e
.e
du.i
n
1.
I
N
T
R
O
D
U
C
T
I
O
N
I
n
th
e
c
ur
r
e
nt
di
gi
ta
l
a
ge
,
e
m
a
il
r
e
m
a
in
s
a
ba
s
ic
c
om
m
uni
c
a
ti
on
m
e
di
um
a
t
bot
h
pe
r
s
on
a
l
a
nd
pr
of
e
s
s
io
na
l
le
ve
ls
.
T
h
e
us
a
ge
of
e
m
a
il
s
f
or
c
om
m
uni
c
a
ti
on
pur
pos
e
is
s
ti
ll
gr
ow
in
g
e
xpone
nt
ia
ll
y
in
s
pi
te
of
th
e
s
w
if
t
te
c
hnol
ogi
c
a
l
de
ve
lo
pm
e
nt
be
c
a
us
e
it
of
f
e
r
s
e
xc
lu
s
iv
e
c
om
pos
it
e
of
pr
of
ic
ie
nc
y,
pr
ot
e
c
te
d,
a
nd
tr
us
twor
th
in
e
s
s
.
A
f
te
r
th
e
pa
nd
e
m
ic
,
e
m
a
il
c
om
m
uni
c
a
ti
on
in
c
r
e
a
s
e
d
e
xpone
nt
ia
ll
y
in
th
e
a
c
a
d
e
m
ic
f
ie
ld
,
be
c
a
us
e
th
e
le
a
r
ni
ng
w
a
s
s
hi
f
te
d
to
onl
in
e
m
ode
.
A
c
c
or
di
ng
to
th
e
R
a
di
c
a
ti
gr
oups
of
e
m
a
il
s
ta
ti
s
ti
c
s
r
e
por
t
,
4.549
bi
ll
io
n
us
e
r
s
us
in
g
e
m
a
il
gl
oba
ll
y
a
s
of
2025;
by
th
e
e
nd
of
2027,
th
a
t
nu
m
be
r
is
pr
e
di
c
te
d
to
r
e
a
c
h
4.849 bil
li
on
[
1]
.
A
c
la
s
s
ic
a
c
a
de
m
ic
ia
n
ge
ts
a
lm
os
t
60
-
70
e
m
a
il
s
pe
r
da
y,
w
hi
c
h
le
a
ds
to
a
f
lo
od
of
e
m
a
il
s
if
he
goe
s
on
va
c
a
ti
on
of
10
-
15
da
ys
.
I
n
a
ddi
ti
on,
he
ne
e
d
s
to
de
vot
e
a
s
u
bs
ta
nt
ia
l
pa
r
t
of
hi
s
w
or
ki
ng
hour
s
to
pr
oc
e
s
s
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
G
r
aph bas
e
d
s
e
m
ant
ic
e
m
ai
l
c
la
s
s
if
ic
at
io
n:
a nov
e
l
appr
oa
c
h f
or
ac
ade
m
ic
i
ns
ti
tu
ti
ons
(
A
r
una K
um
a
r
a B
.
)
5219
e
m
a
il
s
,
w
hi
c
h
r
e
duc
e
s
th
e
qua
li
ty
a
nd
pr
oduc
ti
vi
ty
o
f
th
e
e
m
pl
oye
e
a
s
w
e
ll
th
e
or
ga
ni
z
a
ti
on.
A
c
c
or
di
ngl
y,
e
m
a
il
m
a
na
ge
m
e
nt
is
a
n
e
s
s
e
nt
ia
l
ta
s
k
th
a
t
bot
h
in
di
vi
dua
ls
a
nd
or
ga
ni
z
a
ti
ons
m
us
t
de
a
l
w
it
h.
I
n
ge
ne
r
a
l,
th
e
pr
im
a
r
y
to
ol
f
or
e
m
a
il
m
a
na
ge
m
e
nt
is
c
a
te
gor
iz
in
g
e
m
a
il
s
in
to
s
p
e
c
if
ie
d
c
a
te
gor
ie
s
ba
s
e
d
on
us
e
r
r
e
qui
r
e
m
e
nt
s
.
F
or
in
s
ta
nc
e
,
in
a
n
a
c
a
de
m
ic
in
s
ti
tu
ti
on/
uni
ve
r
s
it
y,
one
c
a
n
c
la
s
s
if
y
a
n
in
c
om
in
g
e
m
a
il
in
to
a
c
a
de
m
ic
s
, r
e
s
e
a
r
c
h, pl
a
c
e
m
e
nt
s
,
e
xa
m
in
a
ti
ons
, a
nd othe
r
s
f
or
e
a
s
y a
c
c
e
s
s
.
T
o
c
a
te
gor
iz
e
e
m
a
il
s
in
to
pr
e
d
e
te
r
m
in
e
d
gr
oups
,
di
f
f
e
r
e
nt
c
a
te
gor
ie
s
of
le
a
r
ni
ng
m
e
th
od
s
a
r
e
a
va
il
a
bl
e
,
in
c
lu
di
ng
s
upe
r
vi
s
e
d
le
a
r
ni
ng,
c
ont
e
nt
-
ba
s
e
d
le
a
r
ni
n
g,
uns
upe
r
vi
s
e
d
le
a
r
ni
ng,
a
nd
s
e
m
i
-
s
upe
r
vi
s
e
d
le
a
r
ni
ng
[
2]
,
[
3]
.
T
hi
s
w
or
k
us
e
d
th
e
c
onc
e
pt
of
s
upe
r
vi
s
e
d
le
a
r
ni
ng
f
or
e
m
a
il
c
la
s
s
if
ic
a
ti
on,
be
c
a
us
e
it
is
a
r
obus
t
c
hoi
c
e
f
or
a
c
a
de
m
ic
e
m
a
il
c
la
s
s
if
ic
a
ti
on
a
s
it
of
f
e
r
s
vi
b
r
a
nt
pe
r
f
or
m
a
nc
e
m
e
tr
ic
s
a
nd
in
te
r
pr
e
ta
bi
li
ty
.
S
uppor
t
ve
c
to
r
m
a
c
hi
ne
s
(
S
V
M
)
[
4]
,
ge
ne
ti
c
a
lg
or
it
hm
s
(
G
A
)
[
5]
,
a
r
ti
f
ic
ia
l
ne
ur
a
l
ne
twor
ks
(
A
N
N
)
[
6
]
,
[
7
]
,
de
c
is
io
n
tr
e
e
s
(
D
T
)
[
8]
,
na
ïv
e
B
a
y
e
s
(
N
B
)
[
9]
,
r
a
ndom
f
or
e
s
t
(
R
F
)
[
10]
,
c
onvolut
io
n
ne
ur
a
l
ne
twor
ks
(
C
N
N
)
[
11]
–
[
13]
,
a
nd
k
-
ne
a
r
e
s
t
ne
ig
hbor
(
K
N
N
)
[
14]
,
a
r
e
s
om
e
of
th
e
m
e
th
ods
th
a
t
a
ppl
y
th
e
s
up
e
r
vi
s
e
d
le
a
r
ni
ng
pr
in
c
ip
le
. S
om
e
of
t
he
s
e
c
la
s
s
if
ie
r
s
s
how
e
d pr
om
in
e
nt
pe
r
f
or
m
a
nc
e
on e
m
a
il
, but
s
ti
ll
f
a
c
e
d c
ha
ll
e
nge
s
due
t
o
th
e
na
tu
r
e
of
e
m
a
il
d
a
ta
.
W
e
know
th
a
t,
th
e
d
a
ta
in
r
e
a
l
-
ti
m
e
e
m
a
il
is
un
s
tr
uc
tu
r
e
d,
noi
s
y,
a
nd
hi
gh
di
m
e
ns
io
na
l,
a
nd
it
m
a
ke
s
it
di
f
f
ic
ul
t
to
unde
r
s
ta
nd
th
e
s
tr
uc
tu
r
e
of
th
e
da
ta
by
a
c
la
s
s
if
ie
r
.
T
he
n,
r
e
s
e
a
r
c
h
e
r
s
de
ve
lo
pe
d
di
f
f
e
r
e
nt
c
la
s
s
if
ie
r
s
ba
s
e
d
on s
e
m
a
nt
ic
na
tu
r
e
[
15]
,
[
16]
a
nd
tr
e
e
/
gr
a
ph
-
ba
s
e
d
na
tu
r
e
[
7]
,
[
17
]
–
[
19
]
to
s
ol
ve
th
e
s
e
pr
obl
e
m
s
a
nd
a
l
s
o
pe
r
f
or
m
e
d
w
e
ll
on
th
e
publ
ic
da
ta
s
e
ts
.
H
ow
e
ve
r
,
th
e
s
e
c
la
s
s
if
ie
r
s
f
a
il
e
d
to
f
oc
us
on
th
e
s
tr
uc
tu
r
e
a
nd
r
e
la
ti
on
s
hi
p
be
twe
e
n
th
e
e
m
a
il
s
w
h
e
n
us
in
g
r
e
a
l
-
ti
m
e
da
ta
s
e
ts
.
H
e
nc
e
,
th
is
w
or
k
f
oc
us
e
d on de
ve
lo
pi
ng a
gr
a
ph
-
ba
s
e
d
e
m
a
il
c
la
s
s
if
ie
r
s
in
c
e
t
r
e
e
-
ba
s
e
d a
nd gr
a
ph
-
ba
s
e
d c
la
s
s
if
ie
r
s
ha
ve
dr
a
w
n
a
lo
t
of
a
tt
e
nt
io
n
r
e
c
e
nt
ly
b
e
c
a
u
s
e
of
th
e
ir
non
-
li
ne
a
r
na
tu
r
e
,
w
hi
c
h
a
ll
ow
s
th
e
m
to
a
d
a
pt
to
vi
r
tu
a
ll
y
a
ny
c
la
s
s
if
ic
a
ti
on t
a
s
k.
T
hi
s
r
e
s
e
a
r
c
h
pr
opos
e
d
a
uni
que
gr
a
ph
-
ba
s
e
d
e
m
a
il
c
la
s
s
if
i
e
r
f
or
e
f
f
e
c
ti
ve
e
m
a
il
c
la
s
s
if
ic
a
ti
on.
F
ol
lo
w
in
g
a
r
e
th
e
ke
y
c
ont
r
ib
ut
io
ns
of
th
is
w
or
k:
i)
a
nove
l
gr
a
ph
-
ba
s
e
d
s
im
il
a
r
it
y
(
G
B
S
)
a
ppr
oa
c
h
is
pr
opos
e
d
f
or
a
m
ul
ti
-
c
la
s
s
e
m
a
il
c
la
s
s
if
ic
a
ti
on
s
y
s
te
m
,
w
he
r
e
e
a
c
h
e
m
a
il
i
s
r
e
pr
e
s
e
nt
e
d
a
s
a
gr
a
ph
bui
lt
f
r
om
to
p
-
k
te
r
m
f
r
e
que
nc
y
-
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y
(
TF
-
I
D
F
)
f
e
a
tu
r
e
s
a
s
node
s
.
E
dge
s
be
twe
e
n
th
e
f
e
a
tu
r
e
s
a
r
e
c
om
put
e
d
us
in
g
th
e
J
a
c
c
a
r
d
c
oe
f
f
ic
ie
nt
w
hi
c
h
a
ll
ow
s
th
e
m
ode
l
to
c
a
pt
ur
e
th
e
s
e
m
a
nt
ic
r
e
la
ti
ons
hi
p
be
twe
e
n
th
e
node
s
unl
ik
e
th
e
c
onve
nt
io
na
l
m
e
th
ods
;
ii
)
a
te
m
pl
a
te
-
ba
s
e
d
c
la
s
s
if
ic
a
ti
on
a
ppr
oa
c
h
is
pr
e
s
e
nt
e
d,
w
he
r
e
e
a
c
h
c
a
te
gor
y
is
a
s
s
o
c
ia
te
d
w
it
h
it
s
c
l
a
s
s
r
e
pr
e
s
e
nt
a
ti
ve
gr
a
ph.
C
a
te
gor
iz
a
ti
on
of
n
e
w
un
s
e
e
n
in
c
om
in
g
e
m
a
il
is
p
e
r
f
or
m
e
d
by
c
om
pa
r
in
g
it
s
gr
a
ph
w
it
h
th
e
c
la
s
s
s
pe
c
if
ic
te
m
pl
a
te
gr
a
ph
s
;
a
nd
ii
i)
t
he
pr
opos
e
d
G
B
S
a
ppr
oa
c
h
is
e
xt
e
ns
iv
e
ly
e
va
lu
a
te
d
on
a
r
e
a
l
-
ti
m
e
a
c
a
de
m
i
c
e
m
a
il
da
ta
s
e
t
a
nd
it
w
a
s
c
om
pa
r
e
d
w
it
h
c
onve
nt
io
na
l
c
la
s
s
if
ie
r
s
s
u
c
h
a
s
m
ul
ti
nom
ia
l
na
ïv
e
B
a
ye
s
(
M
N
B
)
,
li
ne
a
r
s
uppor
t
ve
c
to
r
c
la
s
s
if
ie
r
(
L
S
V
C
)
,
lo
ng
s
hor
t
-
te
r
m
m
e
m
or
y
(
L
S
T
M
)
,
a
nd
s
e
m
a
nt
ic
ba
s
e
d
f
or
e
n
s
ic
a
na
ly
s
is
a
nd
c
la
s
s
if
ic
a
ti
on
of
e
m
a
il
da
ta
(
s
e
F
A
C
E
D
)
.
T
he
m
od
e
l
unde
r
goe
s
f
in
e
-
tu
ni
ng
a
c
r
os
s
di
f
f
e
r
e
nt
T
F
-
I
D
F
th
r
e
s
hol
d
va
lu
e
s
.
T
h
e
r
e
c
e
iv
e
r
ope
r
a
ti
ng
c
ha
r
a
c
te
r
is
ti
c
(
R
O
C
)
c
ur
ve
a
na
ly
s
is
a
ls
o
c
onduc
te
d
t
o
va
li
da
te
th
e
di
s
c
r
im
in
a
ti
ve
c
a
pa
bi
li
ty
of
th
e
m
e
th
od, a
nd t
he
f
in
di
ngs
de
m
ons
tr
a
te
s
s
upe
r
io
r
pe
r
f
or
m
a
nc
e
pr
e
dom
in
a
nt
ly
w
it
h i
m
ba
la
nc
e
d c
la
s
s
e
s
.
T
he
r
e
m
a
in
de
r
of
t
hi
s
pa
pe
r
i
s
s
tr
uc
tu
r
e
d a
s
f
ol
lo
w
s
:
s
e
c
ti
on
2 gi
ve
s
t
he
va
r
io
us
w
or
ks
c
a
r
r
ie
d out on
di
f
f
e
r
e
nt
e
m
a
il
c
la
s
s
if
ic
a
ti
on
te
c
hni
que
s
u
s
e
d
f
or
e
m
a
il
c
la
s
s
if
ic
a
ti
on.
S
e
c
ti
on
3
di
s
c
us
s
e
s
th
e
pr
opos
e
d
m
e
th
odol
ogy,
w
hi
c
h
de
s
c
r
ib
e
s
bui
ld
in
g
gr
a
phs
,
f
in
di
ng
th
e
n
ode
s
im
il
a
r
it
y
be
twe
e
n
gr
a
phs
,
a
nd
gr
oupi
ng
e
m
a
il
s
in
to
pr
e
de
f
in
e
d
c
a
te
gor
ie
s
.
S
e
c
ti
on
4
di
s
c
us
s
e
s
th
e
r
e
s
ul
ts
obt
a
in
e
d
a
f
te
r
r
ig
or
ous
e
xpe
r
im
e
nt
s
on
r
e
a
l
-
ti
m
e
a
nd be
nc
hm
a
r
k da
ta
s
e
t
s
. F
in
a
ll
y, s
e
c
ti
on 5 c
onc
lu
d
e
d
th
e
w
or
k.
2.
R
E
L
A
T
E
D
WORK
T
he
r
e
le
va
nt
w
or
k
w
a
s
s
um
m
a
r
iz
e
d
in
to
th
r
e
e
c
a
te
gor
ie
s
:
s
upe
r
vi
s
e
d
m
e
th
ods
,
tr
e
e
-
ba
s
e
d
m
e
th
ods
,
a
nd
s
e
m
a
ti
c
-
ba
s
e
d
m
e
th
od
s
.
T
he
s
e
m
e
th
ods
c
la
s
s
if
y
e
m
a
il
s
in
to
pr
e
de
f
in
e
d
c
a
te
gor
ie
s
.
S
om
e
of
th
e
m
a
r
e
de
ta
il
e
d
a
s
f
ol
lo
w
s
.
A
nge
lo
va
a
nd
W
e
ik
um
[
20]
de
ve
lo
pe
d
a
nove
l
m
e
th
od
f
or
te
xt
c
la
s
s
if
ic
a
ti
on
th
a
t
ut
il
iz
e
d
a
gr
a
ph
in
w
hi
c
h
te
xt
is
hype
r
li
nke
d.
T
hi
s
m
e
th
od
us
e
d
ne
ig
hbor
-
le
a
r
ni
ng
to
a
s
c
e
r
ta
in
th
e
li
nk
be
twe
e
n
da
ta
it
e
m
s
.
T
he
a
c
c
ur
a
c
y
a
nd
r
obus
tn
e
s
s
of
th
e
c
la
s
s
if
ie
r
s
w
e
r
e
e
nha
nc
e
d
by
th
is
te
c
hni
que
.
F
u
r
th
e
r
m
or
e
,
ne
ig
hbo
r
tr
im
m
in
g
a
nd
e
dge
w
e
ig
ht
in
g
ba
s
e
d
on
s
im
il
a
r
it
y
w
e
r
e
e
m
pl
oye
d
to
le
s
s
e
n
th
e
noi
s
e
in
th
e
n
e
ig
hbor
hood;
c
la
s
s
la
be
l
-
ba
s
e
d
s
im
il
a
r
it
y
w
a
s
ut
il
iz
e
d
to
f
ur
th
e
r
a
lt
e
r
th
e
w
e
ig
ht
.
T
he
c
ha
ll
e
ngi
ng
ta
s
k
of
m
ul
ti
-
c
la
s
s
a
ut
om
a
te
d
ha
te
s
pe
e
c
h
c
a
te
gor
iz
a
ti
on
f
or
te
xt
[
21]
w
a
s
s
ol
v
e
d
in
th
is
s
tu
dy
w
it
h
s
ig
ni
f
ic
a
nt
ly
im
pr
ove
d
out
c
om
e
s
w
he
n
s
ig
ni
f
ic
a
nt
c
ha
ll
e
nge
s
w
e
r
e
f
ir
s
t
id
e
nt
if
ie
d.
T
e
n
di
s
ti
nc
t
bi
na
r
y
-
c
a
te
gor
iz
e
d
d
a
ta
s
e
ts
w
it
h
va
r
io
us
ki
nds
of
ha
te
s
pe
e
c
h w
e
r
e
c
r
e
a
t
e
d.
A
dna
n
e
t
al
.
[
22]
a
ddr
e
s
s
e
d
v
a
r
io
us
c
ha
ll
e
ng
e
s
in
s
p
a
m
e
m
a
il
c
la
s
s
if
ic
a
ti
on
us
in
g
e
n
s
e
m
bl
e
le
a
r
ni
ng
te
c
hni
que
s
.
T
he
ir
m
e
th
od
e
m
pl
oye
d
di
f
f
e
r
e
nt
c
la
s
s
if
ie
r
s
in
c
lu
di
ng
DT
,
LR
,
KNN
,
A
da
B
oo
s
t,
a
nd
G
a
us
s
ia
n
NB
a
s
ba
s
e
c
l
a
s
s
if
ie
r
s
.
T
he
ir
e
va
lu
a
ti
on
de
m
on
s
tr
a
te
d
th
a
t
A
d
a
B
oos
t
w
a
s
th
e
s
tr
onge
s
t
in
di
vi
dua
l
c
l
a
s
s
if
ie
r
a
nd a
ls
o i
t
s
how
e
d t
h
a
t
th
e
pr
opos
e
d s
ta
c
ki
ng me
th
od a
c
c
om
pl
i
s
he
d hi
ghe
r
pe
r
f
or
m
a
nc
e
.
H
a
s
s
a
na
t
[
23
]
p
r
op
os
e
d
a
u
ni
que
w
a
y
f
or
a
c
c
e
le
r
a
t
in
g
b
ig
d
a
ta
c
a
te
g
o
r
iz
a
ti
on
.
T
w
o
a
p
pr
oa
c
he
s
w
e
r
e
us
e
d,
a
nd
th
e
i
ns
ta
nc
e
s
w
e
r
e
s
o
r
te
d
a
c
c
o
r
di
ng
to
h
ow
c
l
os
e
ly
t
he
y
r
e
s
e
m
bl
e
d
t
w
o
l
oc
a
l
lo
c
a
t
io
ns
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
6
,
D
e
c
e
m
be
r
20
25
:
5218
-
5230
5220
T
he
f
i
r
s
t
s
t
r
a
te
gy
c
h
oos
e
s
lo
c
a
l
po
in
ts
ba
s
e
d
on
h
ow
m
uc
h
th
e
y
r
e
s
e
m
b
le
t
he
e
x
tr
e
m
e
g
lo
ba
l
p
oi
nt
s
,
w
he
r
e
a
s
t
he
s
e
c
ond
te
c
hn
iq
u
e
s
e
le
c
ts
lo
c
a
l
s
i
te
s
a
t
r
a
n
dom
.
T
he
ou
tc
o
m
e
s
of
nu
m
e
r
ous
t
r
ia
ls
pe
r
f
o
r
m
e
d
o
n
m
ul
ti
pl
e
l
a
r
ge
da
ta
s
e
ts
r
e
ve
a
l
r
e
s
pe
c
ta
bl
e
a
c
c
u
r
a
c
y
r
a
te
s
c
o
m
pa
r
e
d
t
o
c
ut
ti
ng
-
e
d
ge
te
c
h
ni
que
s
a
nd
t
he
K
N
N
c
la
s
s
if
ie
r
.
A
DT
m
e
th
od
[
24]
ba
s
e
d
on
th
e
R
a
o
-
S
ti
r
li
ng
in
de
x
w
a
s
pr
opos
e
d.
W
he
n
qua
nt
if
yi
ng
da
ta
im
pur
it
y,
th
e
R
a
o
-
S
ti
r
li
ng
in
de
x
ta
ke
s
c
la
s
s
di
s
ta
nc
e
s
in
to
a
c
c
ount
a
nd
gi
ve
s
hi
ghe
r
w
e
ig
ht
to
r
e
f
e
r
e
nc
e
pa
ir
s
in
c
la
s
s
e
s
th
a
t
a
r
e
f
a
r
th
e
r
a
pa
r
t.
T
he
out
c
om
e
s
de
m
ons
tr
a
te
d t
ha
t
th
e
r
e
c
om
m
e
nde
d m
e
th
od i
s
m
or
e
a
c
c
ur
a
te
, s
ugge
s
ti
ng
th
a
t
a
c
c
ount
in
g f
or
c
la
s
s
di
s
t
a
nc
e
s
c
oul
d i
m
pr
ove
D
T
a
c
c
ur
a
c
y.
S
onowa
l
[
25]
us
e
d bi
na
r
y
s
e
a
r
c
h f
e
a
tu
r
e
s
e
le
c
ti
on (
B
S
F
S
)
w
it
h
a
r
a
ti
ng s
ys
te
m
ba
s
e
d on the
P
e
a
r
s
on
c
or
r
e
la
ti
on
c
oe
f
f
ic
ie
nt
to
c
a
te
gor
iz
e
phi
s
hi
ng
e
m
a
il
s
.
T
he
pr
opos
e
d
s
tr
a
te
gy
m
a
ke
s
a
dv
a
nt
a
ge
of
f
our
c
a
te
gor
ie
s
of
f
e
a
tu
r
e
s
f
r
om
th
e
hype
r
li
nks
,
e
m
a
il
body,
a
nd
c
ont
e
nt
.
G
e
ne
r
a
ll
y,
th
e
f
our
d
im
e
ns
io
ns
m
e
nt
io
ne
d
e
a
r
li
e
r
w
e
r
e
us
e
d
to
c
hoos
e
41
a
tt
r
ib
ut
e
s
.
T
hus
,
th
e
B
S
F
S
m
e
th
od
out
pe
r
f
o
r
m
e
d
th
e
s
e
que
nt
ia
l
f
or
w
a
r
d f
e
a
tu
r
e
s
e
le
c
ti
on (
S
F
F
S
)
m
e
th
od (
95.63
%
)
a
nd t
he
w
it
hout
f
e
a
tu
r
e
s
e
le
c
ti
on (
W
F
S
)
m
e
th
od (
95.56%
)
w
it
h
a
s
c
or
e
of
97.41%
.
T
hi
s
s
tu
dy
s
how
e
d
th
a
t
w
hi
le
th
e
S
F
F
S
ta
ke
s
th
e
m
os
t
ti
m
e
to
f
in
d
th
e
opt
im
a
l
f
e
a
tu
r
e
s
e
t,
th
e
W
F
S
doe
s
s
o
th
e
qui
c
ke
s
t.
B
ut
c
om
pa
r
e
d
to
ot
he
r
a
ppr
oa
c
he
s
,
th
e
W
F
S
'
s
a
c
c
ur
a
c
y
is
r
a
th
e
r
lo
w
.
T
he
m
a
in
f
in
di
ng
of
th
e
e
xpe
r
im
e
nt
w
a
s
th
a
t
th
e
B
F
S
F
to
ok
th
e
s
m
a
ll
e
s
t
a
m
ount
of
ti
m
e
to
e
va
lu
a
te
th
e
be
s
t
f
e
a
tu
r
e
s
e
t
m
or
e
a
c
c
ur
a
te
ly
, e
ve
n a
f
te
r
e
li
m
in
a
ti
ng a
f
e
w
f
e
a
tu
r
e
s
f
r
om
t
he
f
e
a
tu
r
e
c
or
pus
.
A
nov
e
l
m
e
t
hod
f
or
r
e
s
ol
v
in
g
t
he
p
r
o
xi
m
i
ty
s
e
a
r
c
hi
ng
is
s
ue
w
a
s
pr
e
s
e
nt
e
d
[
26
]
.
T
he
m
e
th
od
ol
ogy
r
e
l
ie
d
on
e
m
p
lo
yi
n
g
a
d
i
r
e
c
t
e
d
g
r
a
ph
k
now
n
a
s
t
he
k
-
n
e
a
r
e
s
t
ne
i
gh
bo
r
g
r
a
p
h
(
k
-
N
N
G
)
to
i
nde
x
t
he
da
ta
ba
s
e
.
T
h
is
g
r
a
p
h
e
s
ta
bl
is
he
s
a
c
o
nne
c
t
io
n
be
tw
e
e
n
e
ve
r
y
e
nt
r
y
a
nd
i
ts
ne
a
r
e
s
t
ne
i
gh
bo
r
s
.
F
o
r
r
a
nge
a
nd
ne
a
r
e
s
t
ne
i
ghb
o
r
qu
e
r
ie
s
,
th
e
m
e
th
od
of
f
e
r
e
d
tw
o
f
in
di
ng
a
lg
or
it
hm
s
th
a
t
us
e
d
t
he
m
e
t
r
i
c
a
n
d
na
vi
ga
t
io
na
l
pr
ope
r
t
ie
s
of
th
e
k
-
N
N
G
ne
tw
or
k.
I
n
a
d
di
ti
on,
th
e
a
u
th
or
s
de
m
ons
tr
a
te
d
t
he
c
o
m
p
e
t
it
iv
e
ne
s
s
o
f
th
e
a
pp
r
oa
c
h
s
t
r
a
te
g
y
ve
r
s
us
e
xi
s
ti
n
g
o
ne
s
.
F
o
r
i
ns
ta
nc
e
,
ut
il
iz
i
ng
onl
y
0.
25
%
o
f
a
c
ti
ve
e
le
c
t
r
o
ni
c
a
ll
y
s
c
a
n
ne
d
a
r
r
a
y
(
A
E
S
A
s)
r
e
qu
ir
e
d
s
pa
c
e
,
th
e
ne
a
r
e
s
t
n
e
i
ghb
or
s
e
a
r
c
h
m
e
t
hods
us
e
d
in
th
is
pa
pe
r
c
o
m
p
le
te
d
30
%
m
o
r
e
di
s
ta
nc
e
e
v
a
lu
a
t
io
ns
i
n
th
e
m
e
tr
ic
do
c
um
e
n
t
s
pa
c
e
.
C
on
ve
r
s
e
l
y,
th
e
p
iv
ot
-
ba
s
e
d
m
e
t
ho
d
w
a
s
in
e
f
f
e
c
t
iv
e
i
n
th
e
s
a
m
e
f
ie
ld
.
S
hi
m
om
ur
a
a
nd
K
a
s
te
r
[
27]
pr
e
s
e
nt
e
d
two
s
ig
ni
f
ic
a
nt
im
pr
ove
m
e
nt
s
to
gr
a
ph
r
e
la
te
d
a
lg
or
it
hm
s
f
or
s
im
il
a
r
it
y f
in
di
ngs
. I
n
t
he
f
i
r
s
t,
t
he
pr
im
a
r
y
gr
a
ph c
a
te
gor
ie
s
t
ha
t
a
r
e
c
ur
r
e
nt
ly
us
e
d f
or
s
im
il
a
r
it
y c
he
c
ks
w
e
r
e
r
e
vi
e
w
e
d,
a
nd
th
e
be
s
t
il
lu
s
tr
a
ti
ve
gr
a
ph
s
in
a
n
e
nvi
r
onm
e
nt
th
a
t
is
s
h
a
r
e
d
by
pr
e
c
i
s
e
a
nd
c
lo
s
e
s
t
s
e
a
r
c
h
s
tr
a
te
gi
e
s
w
e
r
e
e
xpe
r
im
e
nt
a
ll
y
e
va
lu
a
te
d.
T
he
la
te
r
one
w
a
s
a
nove
l
c
onne
c
te
d
-
pa
r
ti
ti
on
te
c
hni
que
to
pr
oxi
m
it
y gr
a
ph c
ons
tr
uc
ti
on a
nd s
im
il
a
r
it
y s
e
a
r
c
h r
e
s
pons
e
s
c
a
ll
e
d H
G
r
a
ph.
M
a
lk
o
v
e
t
a
l.
[
28
]
pr
op
os
e
d
a
ne
w
m
e
t
hod
f
o
r
r
e
s
ol
v
in
g
th
e
m
e
tr
ic
s
pa
c
e
K
N
N
s
e
a
r
c
h
p
r
o
bl
e
m
.
A
n
a
c
c
e
s
s
i
bl
e
w
o
r
ld
n
e
tw
o
r
k
w
i
th
ve
r
ti
c
e
s
f
o
r
t
he
c
om
po
n
e
nt
s
t
o
be
s
to
r
e
d,
e
d
ge
s
f
o
r
t
he
c
on
ne
c
ti
ons
be
tw
e
e
n
th
os
e
e
le
m
e
n
ts
,
a
nd
a
f
or
m
o
f
th
e
gr
e
e
dy
m
e
th
o
d
f
o
r
f
i
ndi
ng
s
e
r
ve
d
a
s
th
e
f
o
un
da
t
io
n
f
o
r
t
he
s
e
a
r
c
h
s
t
r
uc
t
ur
e
.
T
h
e
in
i
ti
a
l
D
e
l
a
un
a
y
g
r
a
ph
a
p
p
r
ox
im
a
t
io
n
l
i
nka
g
e
s
w
e
r
e
s
i
m
pl
y
m
a
i
nt
a
in
e
d
to
c
o
ns
t
r
uc
t
th
e
na
vi
ga
b
le
ti
ny
e
n
vi
r
o
nm
e
nt
.
T
he
a
pp
r
oa
c
h,
w
hi
c
h
w
a
s
p
r
e
s
e
n
te
d
in
te
r
m
s
o
f
a
r
bi
t
r
a
r
y
m
e
tr
ic
s
pa
c
e
s
,
w
a
s
bo
th
ge
n
e
r
a
l
a
nd
s
im
pl
e
.
T
he
p
r
ob
a
bi
li
s
ti
c
K
N
N
q
ue
r
ie
s
a
c
c
u
r
a
c
y
c
a
n
be
m
o
di
f
ie
d
w
it
ho
ut
r
e
w
r
it
in
g
th
e
s
t
r
uc
tu
r
e
.
N
um
e
r
ous
c
la
s
s
if
ie
r
s
f
or
m
a
c
hi
ne
le
a
r
ni
ng
ha
ve
be
e
n
de
ve
lo
pe
d,
a
nd
s
om
e
of
th
e
m
ha
ve
de
m
ons
tr
a
te
d
not
a
bl
e
pe
r
f
or
m
a
nc
e
on
a
va
r
ie
ty
of
da
ta
s
e
ts
,
a
c
c
or
di
ng
to
a
th
or
ough
r
e
vi
e
w
of
th
e
li
te
r
a
tu
r
e
.
A
ddi
ti
ona
ll
y,
c
la
s
s
if
ie
r
s
ba
s
e
d
on
tr
e
e
s
a
nd
gr
a
ph
s
ha
ve
g
a
in
e
d
popula
r
it
y
in
r
e
c
e
nt
ye
a
r
s
,
a
nd
onl
y
a
f
e
w
tr
e
e
-
ba
s
e
d
a
nd
gr
a
ph
-
ba
s
e
d
c
la
s
s
if
ie
r
s
s
how
e
d
good
p
e
r
f
or
m
a
nc
e
on
hi
gh
-
di
m
e
ns
io
na
li
ty
e
m
a
il
da
ta
.
H
ow
e
ve
r
,
a
la
r
ge
num
be
r
of
gr
a
ph
-
ba
s
e
d
a
nd
tr
e
e
-
ba
s
e
d
c
l
a
s
s
if
ie
r
s
ha
ve
be
e
n
de
ve
lo
pe
d
but
f
a
il
e
d
to
unde
r
s
ta
nd t
he
l
e
xi
c
a
l
a
nd c
ont
e
xt
ua
l
r
e
la
ti
ons
hi
p be
twe
e
n f
e
a
t
ur
e
s
. H
e
nc
e
, t
hi
s
pa
pe
r
s
ugge
s
ts
a
gr
a
ph
-
ba
s
e
d
c
la
s
s
if
ie
r
ba
s
e
d
on
s
e
m
a
nt
ic
s
im
il
a
r
it
y
f
or
a
n
e
f
f
e
c
ti
ve
e
m
a
il
c
la
s
s
if
ic
a
ti
on
s
ys
t
e
m
us
in
g
T
F
-
I
D
F
a
nd
th
e
J
a
c
c
a
r
d c
o
e
f
f
ic
ie
nt
.
3.
M
E
T
H
O
D
O
L
O
G
Y
T
he
pr
opos
e
d
m
e
th
odol
ogy
e
m
pl
oy
s
a
uni
que
G
B
S
m
e
th
od
f
or
e
m
a
il
c
la
s
s
if
ic
a
ti
on
s
ys
t
e
m
.
T
he
m
e
th
od
c
om
pr
is
e
s
f
iv
e
m
a
in
ph
a
s
e
s
:
da
ta
a
c
qui
s
it
io
n,
pr
e
-
pr
oc
e
s
s
in
g
th
e
r
a
w
e
m
a
il
da
ta
,
f
in
di
ng
a
nd
s
e
le
c
ti
ng r
e
le
va
nt
f
e
a
tu
r
e
s
, bui
ld
in
g a
gr
a
ph, a
nd g
r
oupi
ng e
m
a
i
l
da
ta
a
c
c
or
di
ng t
o t
he
s
im
il
a
r
it
y be
twe
e
n t
w
o
gr
a
phs
. T
he
f
r
a
m
e
w
or
k of
t
he
pr
opos
e
d e
m
a
il
c
la
s
s
if
ic
a
ti
on me
t
hod ba
s
e
d on G
B
S
i
s
i
ll
us
tr
a
te
d i
n F
ig
ur
e
1.
3.1
. D
at
a ac
q
u
is
it
io
n
T
he
R
E
V
A
e
m
a
il
d
a
ta
s
e
t
c
ont
a
in
s
3
,
400
e
m
a
il
s
a
m
pl
e
s
w
it
h
4
di
f
f
e
r
e
nt
c
a
te
gor
ie
s
s
u
c
h
a
s
:
e
xa
m
in
a
ti
on,
r
e
s
e
a
r
c
h,
a
c
a
de
m
ic
s
,
a
nd
pl
a
c
e
m
e
nt
s
.
T
he
s
e
c
a
te
gor
ie
s
a
r
e
c
on
s
id
e
r
e
d
in
th
is
s
tu
dy
[
29]
.
T
he
e
m
a
il
s
a
m
pl
e
s
i
n t
he
da
ta
s
e
t
ha
v
e
be
e
n di
s
tr
ib
ut
e
d une
ve
nl
y a
nd i
t
is
i
ll
us
tr
a
te
d i
n t
he
F
ig
ur
e
2.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
G
r
aph bas
e
d
s
e
m
ant
ic
e
m
ai
l
c
la
s
s
if
ic
at
io
n:
a nov
e
l
appr
oa
c
h f
or
ac
ade
m
ic
i
ns
ti
tu
ti
ons
(
A
r
una K
um
a
r
a B
.
)
5221
F
ig
ur
e
1. T
he
pr
opos
e
d a
r
c
hi
te
c
tu
r
e
f
or
e
m
a
il
c
la
s
s
if
ic
a
ti
on us
i
ng
G
B
S
F
ig
ur
e
2. D
is
tr
ib
ut
io
n of
e
m
a
il
s
a
m
pl
e
s
3.2
.
D
at
a p
r
e
p
ar
at
io
n
T
o
tr
a
ns
f
or
m
th
e
r
a
w
e
m
a
il
da
ta
in
to
th
e
f
or
m
a
t
ne
e
de
d
f
or
a
ddi
ti
ona
l
pr
oc
e
s
s
e
s
li
ke
f
e
a
tu
r
e
s
e
le
c
ti
on
a
nd
c
la
s
s
if
ic
a
ti
on,
d
a
ta
pr
e
-
pr
oc
e
s
s
in
g
is
ne
c
e
s
s
a
r
y.
S
e
ve
r
a
l
da
ta
pr
e
-
pr
oc
e
s
s
in
g
m
e
th
od
s
a
r
e
us
e
d
on
unpr
oc
e
s
s
e
d
e
m
a
il
da
ta
,
s
u
c
h
a
s
to
ke
ni
z
a
ti
on,
lo
w
e
r
c
a
s
e
c
o
nve
r
s
io
n,
e
m
a
il
s
ig
na
tu
r
e
r
e
m
ova
l,
s
to
p
-
w
or
d
r
e
m
ova
l,
a
nd
punc
tu
a
ti
on
r
e
m
ova
l
[
30]
,
[
31]
.
A
f
te
r
pr
e
-
p
r
oc
e
s
s
in
g,
th
e
r
e
ha
s
be
e
n
a
c
l
e
a
r
de
c
li
ne
in
th
e
vol
um
e
of
e
m
a
il
da
ta
.
T
hi
s
w
il
l
im
pr
ove
th
e
pr
oc
e
s
s
in
g
e
f
f
ic
ie
nc
y
of
th
e
s
ugge
s
t
e
d
e
m
a
il
c
la
s
s
if
ic
a
ti
on
s
ys
te
m
.
T
h
e
de
ta
il
e
d
e
xpl
a
na
ti
on
of
da
ta
pr
e
pa
r
a
ti
on
is
pr
ovi
de
d
a
s
f
ol
lo
w
s
,
a
c
c
om
pa
ni
e
d
by
a
n
e
x
a
m
pl
e
.
F
our
e
m
a
il
s
a
m
pl
e
s
(
D
1
,
D
2
,
D
3
,
a
nd
D
4
)
th
a
t
f
a
ll
und
e
r
th
e
da
ta
s
e
t'
s
r
e
s
e
a
r
c
h
c
a
te
gor
y
w
e
r
e
ta
ke
n
in
to
c
ons
id
e
r
a
ti
on t
o de
m
ons
tr
a
te
t
he
s
ugg
e
s
te
d m
e
th
od'
s
w
or
ki
ng t
e
c
hni
que
.
–
D
1
:
he
a
r
ty
c
ongr
a
tu
la
ti
ons
t
o a
ut
hor
s
, publi
s
he
d a
r
e
s
e
a
r
c
h a
r
ti
c
le
i
n
S
c
opus
in
de
xe
d j
our
na
l.
–
D
2
:
de
a
r
a
ll
,
I
'
m
pl
e
a
s
e
d
to
a
nnounc
e
th
a
t
a
r
e
s
e
a
r
c
h
a
r
ti
c
le
I
w
r
ot
e
ha
s
be
e
n
publ
is
he
d
in
a
jo
ur
na
l
w
it
h
a
n i
m
pa
c
t
f
a
c
to
r
of
1.8 tha
t
is
i
nde
xe
d by S
C
I
.
–
D
3
:
de
a
r
a
ll
,
I
a
m
pl
e
a
s
e
d
to
in
f
o
r
m
th
a
t
th
e
r
e
s
e
a
r
c
h
pa
pe
r
c
o
-
a
ut
hor
e
d
by
m
e
a
nd
m
y
R
U
R
e
s
e
a
r
c
h
S
c
hol
a
r
M
r
.
I
m
r
a
n
K
ha
n
is
a
c
c
e
pt
e
d
in
th
e
I
nt
e
r
na
ti
ona
l
J
our
na
l
of
E
le
c
tr
oni
c
s
a
nd
T
e
le
c
om
m
uni
c
a
ti
on
s
, a
W
e
b of
S
c
ie
nc
e
i
nde
xe
d j
our
na
l.
–
D
4
:
de
a
r
a
ll
,
it
is
our
gr
e
a
t
pl
e
a
s
ur
e
to
s
ha
r
e
th
a
t
on
e
of
our
a
r
ti
c
le
s
ti
tl
e
d
"
A
c
om
pr
e
he
ns
iv
e
r
e
s
e
a
r
c
h
on
it
s
to
pol
ogi
e
s
,
m
ode
ll
in
g,
c
ont
r
ol
a
nd
it
s
a
ppl
ic
a
ti
ons
"
is
a
c
c
e
pt
e
d
in
one
of
th
e
to
p
-
r
a
te
d
in
te
r
na
ti
ona
l
pe
e
r
r
e
vi
e
w
e
d J
our
na
l
of
"
I
E
T
P
ow
e
r
E
le
c
tr
oni
c
s
"
.
T
he
r
e
duc
e
d da
ta
s
e
t
a
f
te
r
pr
e
-
pr
oc
e
s
s
in
g t
e
c
hni
que
s
a
ppl
ie
d on
e
m
a
il
doc
um
e
nt
s
D
1, D
2, D
3 a
nd D
4
a
r
e
a
s
f
ol
lo
w
s
:
–
D
1
’
:
{
c
ongr
a
ts
, a
ut
hor
, publi
s
h, r
e
s
e
a
r
c
h. a
r
ti
c
le
,
S
c
opus
, i
nde
x,
j
our
na
l}
.
–
D
2
’
:
{
ha
ppy, a
nnounc
e
, r
e
s
e
a
r
c
h, a
r
ti
c
le
, w
r
it
e
, publi
s
h, j
our
na
l,
i
m
pa
c
t,
f
a
c
to
r
, i
nde
x, s
c
i}
.
–
D
3
’
:
{
r
e
s
e
a
r
c
h, ha
ppy, a
r
ti
c
le
, s
c
hol
a
r
, a
ut
hor
, publi
s
h, w
e
b,
a
c
c
e
pt
, s
c
ie
nc
e
, j
our
na
l,
i
nde
x,}
.
–
D
4
’
:
{
gr
e
a
t,
ha
ppy,
a
r
ti
c
le
,
r
e
s
e
a
r
c
h,
a
c
c
e
pt
,
in
te
r
na
ti
ona
l,
r
e
vi
e
w
,
jo
ur
na
l.
I
E
T
,
P
ow
e
r
,
E
le
c
tr
oni
c
s
,
pa
pe
r
, w
r
it
e
, s
c
hol
a
r
, a
ut
hor
}
.
F
our
e
m
a
il
doc
um
e
nt
s
ha
d
a
c
om
bi
ne
d
le
ngt
h
of
145
w
or
d
s
pr
io
r
to
th
e
us
e
of
pr
e
-
pr
oc
e
s
s
in
g
te
c
hni
que
s
;
th
e
n
le
ngt
h
of
th
e
s
e
f
our
doc
um
e
nt
s
b
e
c
om
e
s
4
7
w
or
ds
a
f
te
r
pr
e
pr
oc
e
s
s
in
g
pha
s
e
,
w
hi
c
h
is
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
6
,
D
e
c
e
m
be
r
20
25
:
5218
-
5230
5222
a
lm
os
t
69%
r
e
duc
ti
on
in
th
e
s
iz
e
of
th
e
or
ig
in
a
l
d
a
ta
.
T
hi
s
he
lp
s
th
e
le
a
r
ni
ng
te
c
hni
que
s
w
or
k
m
or
e
e
f
f
ic
ie
nt
ly
.
F
ur
th
e
r
,
f
r
om
th
e
r
e
duc
e
d
da
ta
,
n
uni
que
w
or
ds
a
r
e
c
ons
id
e
r
e
d
oc
c
ur
s
of
te
nl
y
a
m
ong
th
e
di
f
f
e
r
e
nt
e
m
a
il
doc
um
e
nt
s
f
or
t
he
f
e
a
tu
r
e
s
e
le
c
ti
on pha
s
e
.
3.3
. F
e
at
u
r
e
e
n
gi
n
e
e
r
in
g
T
hi
s
s
e
c
ti
on
de
s
c
r
ib
e
s
th
e
id
e
nt
if
ic
a
ti
on
of
w
e
ig
ht
s
f
or
e
a
c
h
f
e
a
tu
r
e
/t
e
r
m
a
nd
s
e
le
c
ti
ng
th
e
m
f
or
f
ur
th
e
r
pr
oc
e
s
s
.
I
ni
ti
a
ll
y,
a
bi
na
r
y
m
ode
l
is
c
r
e
a
te
d
f
or
n
of
te
nl
y
oc
c
ur
r
e
d
w
or
ds
in
th
e
r
e
duc
e
d
da
ta
s
e
t
u
s
in
g
ba
g
of
w
or
ds
(
B
o
W
)
m
e
th
od.
A
f
te
r
th
a
t,
a
w
e
ig
ht
r
e
pr
e
s
e
nt
in
g
e
a
c
h
w
or
d'
s
r
e
le
va
n
c
e
w
a
s
d
e
te
r
m
in
e
d
us
in
g
TF
-
I
D
F
[
32]
.
T
he
B
oW
r
e
pr
e
s
e
nt
a
ti
on
f
or
th
e
n
uni
que
w
or
ds
(
i.
e
.,
n
=
10
f
or
th
is
s
tu
dy)
of
th
e
r
e
duc
e
d
da
ta
s
e
t
is
s
how
n i
n T
a
bl
e
1.
T
a
bl
e
1
.
B
in
a
r
y B
o
W
r
e
pr
e
s
e
nt
a
ti
on f
or
t
he
s
e
le
c
te
d
te
r
m
s
D
oc
/
W
or
d
R
e
s
e
a
r
c
h
A
r
t
i
c
l
e
J
our
na
l
P
ubl
i
s
h
H
a
ppy
A
ut
hor
I
nde
x
S
c
hol
a
r
W
r
i
t
e
A
c
c
e
pt
D
1
1
1
1
1
0
1
1
0
0
0
D
2
1
1
1
1
1
0
1
0
1
0
D
3
1
1
1
1
1
1
1
1
0
1
D
4
1
1
1
0
1
1
0
1
1
1
T
he
B
oW
m
od
e
l,
on
th
e
ot
he
r
h
a
nd,
ju
s
t
in
di
c
a
te
s
th
e
pr
e
s
e
n
c
e
of
w
or
ds
;
it
m
a
ke
s
no
a
ll
ow
a
n
c
e
s
f
or
th
e
im
pl
ic
a
ti
on
of
in
di
vi
dua
l
te
r
m
s
in
s
id
e
a
n
e
m
a
il
c
ont
e
nt
.
F
or
in
s
ta
nc
e
,
in
th
e
f
our
th
doc
um
e
nt
,
th
e
w
or
d
"
s
c
hol
a
r
"
ha
s
gr
e
a
te
r
s
ig
ni
f
ic
a
nc
e
t
ha
n ot
he
r
nouns
. H
ow
e
ve
r
,
w
he
n w
or
ds
e
xi
s
t
in
t
he
e
m
a
il
, t
he
y
r
e
c
e
iv
e
t
he
va
lu
e
'
1'
in
th
is
m
ode
l;
ot
he
r
w
is
e
,
th
e
y
r
e
c
e
iv
e
th
e
va
lu
e
'
0'
.
T
hi
s
m
e
a
ns
th
a
t
a
ba
g
of
w
or
ds
m
ode
l
w
it
h
TF
-
I
D
F
s
c
or
e
w
a
s
us
e
d
in
pl
a
c
e
of
"
0s
"
a
nd
"
1s
"
us
e
d
in
th
e
or
ig
in
a
l
m
ode
l.
F
or
a
gi
ve
n
w
or
d,
T
F
-
I
D
F
m
ul
ti
pl
ie
s
T
F
a
nd
I
D
F
to
f
in
d
th
e
s
c
or
e
f
or
th
a
t
w
or
d
in
th
e
doc
um
e
nt
.
C
ons
e
que
nt
ly
,
th
e
s
c
or
e
f
or
e
ve
r
y
w
or
d i
n t
he
doc
um
e
nt
c
om
put
e
s
a
s
s
how
n i
n (
1)
t
o (
3)
.
(
,
)
=
(
,
)
×
(
)
(
1)
T
hus
,
two
m
a
tr
ic
e
s
m
u
s
t
be
c
a
lc
ul
a
t
e
d
f
or
th
is
m
e
th
od:
f
ir
s
t
on
e
(
T
F
)
c
ount
s
th
e
te
r
m
s
or
w
or
ds
th
a
t
oc
c
ur
in
e
a
c
h
doc
um
e
nt
,
a
nd
th
e
ot
he
r
(
I
D
F
)
de
te
r
m
in
e
s
th
e
r
e
le
va
nc
e
of
e
a
c
h
w
or
d
th
a
t
oc
c
ur
s
in
e
a
c
h
doc
um
e
nt
.
B
ot
h of
t
he
m
a
r
e
c
a
lc
ul
a
te
d us
in
g (
2)
a
nd (
3)
.
(
,
)
=
(
2)
(
)
=
(
1
+
ℎ
)
(
3)
T
o
de
te
r
m
in
e
th
e
m
e
a
ni
ng
of
e
a
c
h
te
r
m
,
th
e
T
F
di
c
ti
ona
r
y
f
or
th
e
"
n"
m
os
t
of
te
n
r
e
c
ur
r
in
g
w
or
ds
w
it
h
th
e
ir
T
F
va
lu
e
s
w
a
s
f
ir
s
t
pr
oduc
e
d.
A
f
te
r
w
a
r
ds
,
th
e
s
a
m
e
s
e
t
of
te
r
m
s
w
it
h
th
e
ir
c
or
r
e
s
ponding
I
D
F
va
lu
e
s
w
e
r
e
in
c
lu
de
d
in
th
e
I
D
F
le
xi
c
on.
U
lt
im
a
te
ly
,
a
s
T
a
bl
e
2
il
lu
s
tr
a
te
s
,
th
e
T
F
-
I
D
F
s
c
or
e
i
s
pr
oduc
e
d.
T
e
r
m
s
th
a
t
ha
ve
s
c
or
e
s
c
lo
s
e
to
'
0'
im
pl
y
le
s
s
r
e
le
va
nc
e
,
w
he
r
e
a
s
th
os
e
w
it
h
hi
gh
T
F
-
I
D
F
va
lu
e
s
in
di
c
a
te
gr
e
a
te
r
r
e
le
va
nc
e
.
A
f
e
a
tu
r
e
s
e
t
w
a
s
ge
ne
r
a
te
d
us
in
g
th
e
to
p
10
w
or
ds
ha
vi
ng
th
e
hi
ghe
s
t
T
F
-
I
D
F
s
c
or
e
s
,
a
nd
a
gr
a
ph
w
a
s
th
e
n
c
r
e
a
te
d
u
s
in
g
th
is
f
e
a
tu
r
e
s
e
t.
I
n
th
e
f
ol
lo
w
in
g
s
te
p,
a
gr
a
ph
w
a
s
c
r
e
a
t
e
d
w
it
h
th
e
s
e
pr
ope
r
ti
e
s
, a
nd i
t
w
a
s
t
he
n ut
il
iz
e
d t
o c
a
t
e
gor
iz
e
i
nc
om
in
g e
m
a
i
ls
i
n t
he
f
ut
ur
e
i
nt
o t
he
a
ppr
opr
ia
te
gr
oups
.
T
a
bl
e
2.
T
e
r
m
f
r
e
que
nc
y
-
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y
m
ode
l
D
oc
/
W
or
d
R
e
s
e
a
r
c
h
A
r
t
i
c
l
e
J
our
na
l
P
ubl
i
s
h
H
a
ppy
A
ut
hor
I
nde
x
S
c
hol
a
r
W
r
i
t
e
A
c
c
e
pt
D
1
´
0.989
0.042
0.989
0
0.987
0.034
0.581
0
0.081
0.085
D
2
´
0.989
0
0.989
0
0.0895
0.027
0.624
0
0.852
0.847
D
3
´
0.989
0
0.989
0.048
0.981
0.033
0.158
0.542
0.075
0.079
D
4
´
0.989
0
0.989
0.059
0.879
0
0.254
0.124
0.759
0.568
3.4
.
G
r
ap
h
b
u
il
d
i
n
g u
s
in
g Jac
c
ar
d
s
im
il
ar
it
y
D
ur
in
g
th
is
s
te
p,
a
n
e
m
a
il
-
doc
um
e
nt
gr
a
ph
w
a
s
c
ons
tr
uc
te
d
us
in
g
th
e
f
e
a
tu
r
e
s
w
it
h
hi
gh
T
F
-
I
D
F
s
c
or
e
s
f
or
e
m
a
il
c
la
s
s
if
ic
a
ti
on. He
r
e
i
s
how
t
he
gr
a
ph i
s
de
f
in
e
d
(
4)
t
o (
6)
:
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
G
r
aph bas
e
d
s
e
m
ant
ic
e
m
ai
l
c
la
s
s
if
ic
at
io
n:
a nov
e
l
appr
oa
c
h f
or
ac
ade
m
ic
i
ns
ti
tu
ti
ons
(
A
r
una K
um
a
r
a B
.
)
5223
=
(
,
)
(
4)
=
{
1
,
2
,
3
,
…
,
}
(
5)
=
{
1
,
2
,
3
,
…
,
}
(
6)
W
he
r
e
, G
i
s
a
gr
a
ph w
it
h
s
e
t
of
node
s
(
N
)
a
nd e
dge
s
(
E
)
. B
e
twe
e
n T
F
-
I
D
F
s
c
or
e
s
of
t
w
o e
m
a
il
s
M
a
nd N
, t
he
J
a
c
c
a
r
d
s
im
il
a
r
it
y
[
33]
is
c
a
lc
ul
a
te
d a
s
(
7)
.
S
j
(
,
)
=
|
∩
|
|
⋃
|
=
|
∩
|
|
|
+
|
|
−
|
∩
|
(
7)
W
e
de
f
in
e
a
n e
dge
b
e
twe
e
n t
he
ve
r
ti
c
e
s
n
i
a
nd n
j
if
t
he
J
a
c
c
a
r
d s
im
il
a
r
it
y e
xc
e
e
ds
τ
a
s
i
n (
8)
.
(
,
)
=
{
(
,
)
,
(
,
)
>
0
,
ℎ
}
(
8)
T
hus
, t
he
e
dg
e
d
i
,
j
r
e
pr
e
s
e
nt
s
t
he
J
a
c
c
a
r
d s
im
il
a
r
it
y be
twe
e
n t
w
o e
m
a
il
doc
um
e
nt
s
d
i
, a
nd d
j
.
A
f
t
e
r
th
e
r
e
s
e
a
r
c
h
c
a
te
gor
y
gr
a
ph
w
a
s
c
on
s
tr
uc
te
d
a
s
s
ho
w
n
i
n
F
ig
ur
e
3
,
th
e
r
e
m
a
i
ni
n
g
c
a
t
e
g
or
i
e
s
gr
a
ph
s
(
a
c
a
d
e
m
ic
s
,
e
xa
m
s
, a
nd
pl
a
c
e
m
e
n
ts
)
w
e
r
e
a
ls
o
c
on
s
t
r
u
c
t
e
d u
s
in
g t
he
s
a
m
e
pr
oc
e
d
ur
e
.
L
a
t
e
r
,
th
e
s
e
gr
a
ph
s
w
e
r
e
pr
e
s
e
r
v
e
d
a
s
t
e
m
p
la
te
s
.
T
h
e
y
a
r
e
u
s
e
d d
ur
i
ng
th
e
c
a
te
gor
iz
a
ti
on
of
in
c
o
m
in
g m
a
i
l
t
ha
t
h
a
s
not
r
e
a
d y
e
t
.
F
ig
ur
e
3. G
r
a
ph w
it
h w
e
ig
ht
e
d e
dge
s
3.5
.
G
r
ap
h
b
as
e
d
c
la
s
s
if
i
c
at
io
n
I
n
th
is
pha
s
e
,
e
m
a
il
s
w
e
r
e
c
la
s
s
if
ie
d
in
to
one
of
th
e
pr
e
de
f
in
e
d
c
a
te
gor
ie
s
ba
s
e
d
on
th
e
s
im
il
a
r
it
y
be
twe
e
n gr
a
phs
. T
he
da
ta
s
e
t
ha
s
f
our
c
a
te
gor
ie
s
:
a
c
a
d
e
m
ic
s
, e
x
a
m
in
a
ti
on, r
e
s
e
a
r
c
h, a
nd pla
c
e
m
e
nt
s
. F
ir
s
t,
t
he
in
put
gr
a
ph
f
or
th
e
ne
w
uns
e
e
n
e
m
a
il
w
a
s
c
ons
tr
uc
te
d
us
in
g
th
e
m
e
th
ods
de
s
c
r
ib
e
d
in
s
ub
-
s
e
c
ti
on
s
B
,
C
,
a
nd
D
of
s
e
c
ti
on
3.
L
a
te
r
,
th
is
gr
a
ph
is
f
e
d
in
to
th
e
tr
a
in
e
d
m
ode
l
to
pr
e
di
c
t
th
e
c
a
te
gor
y.
T
he
n,
to
p
r
e
di
c
t
th
e
r
e
s
pe
c
ti
ve
c
a
t
e
gor
y,
th
e
m
e
th
od
c
om
put
e
s
th
e
gr
a
ph
s
im
il
a
r
it
y
s
c
or
e
be
twe
e
n
th
e
in
put
gr
a
ph
a
nd
th
e
gr
a
ph
te
m
pl
a
te
s
of
va
r
io
us
c
la
s
s
e
s
a
s
(
9)
.
=
{
1
,
2
,
3
,
…
,
1
}
(
9)
W
he
r
e
C
n
r
e
pr
e
s
e
nt
s
t
he
numbe
r
of
c
la
s
s
e
s
(
I
n t
hi
s
s
tu
dy
a
c
a
de
m
ic
s
, r
e
s
e
a
r
c
h, e
x
a
m
in
a
ti
on, a
nd pla
c
e
m
e
nt
s
).
F
or
a
ne
w
e
m
a
il
doc
um
e
nt
D
ne
w
,
r
e
pr
e
s
e
nt
it
a
s
T
F
-
I
D
F
ve
c
to
r
X
ne
w
a
nd
c
onne
c
t
it
to
th
e
gr
a
ph
G
ba
s
e
d on it
s
J
a
c
c
a
r
d s
im
il
a
r
it
y t
o t
he
e
xi
s
ti
ng node
s
. A
ne
w
e
m
a
il
w
il
l
be
c
la
s
s
if
ie
d i
nt
o one
o
f
t
he
pr
e
de
f
in
e
d
c
a
te
gor
ie
s
us
in
g t
he
e
xi
s
ti
ng l
a
be
l
e
d node
s
i
n t
he
gr
a
ph.
A
s
im
p
le
K
N
N
i
s
us
e
d f
or
c
la
s
s
if
ic
a
ti
on. F
or
t
he
ne
w
e
m
a
il
doc
um
e
nt
, t
he
c
la
s
s
if
ic
a
ti
on t
a
s
k w
or
ks
a
s
f
ol
lo
w
s
:
i)
C
om
put
e
t
he
J
a
c
c
a
r
d s
im
il
a
r
it
y be
twe
e
n X
n
e
w
a
nd t
he
T
F
-
I
D
F
ve
c
to
r
s
of
t
he
e
xi
s
ti
ng e
m
a
il
s
.
ii)
F
in
d t
he
k
-
c
lo
s
e
s
t
ne
ig
hbor
s
ba
s
e
d on J
a
c
c
a
r
d s
im
il
a
r
it
y.
iii)
A
s
s
ig
n
th
e
c
la
s
s
Y
n
e
w
to
th
e
f
r
e
s
h
e
m
a
il
doc
um
e
nt
ba
s
e
d
on
a
m
a
jo
r
it
y
vot
e
f
r
om
th
e
c
la
s
s
e
s
of
th
e
ne
a
r
e
s
t
ne
ig
hbor
s
a
s
(
10)
.
=
arg
m
a
x
∑
1
(
−
)
=
1
(
10)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
6
,
D
e
c
e
m
be
r
20
25
:
5218
-
5230
5224
W
he
r
e
Y
i
de
not
e
s
th
e
c
la
s
s
la
be
l
of
th
e
i
th
ne
a
r
e
s
t
ne
ig
hbor
,
C
k
is
k
t
h
c
la
s
s
a
nd
(Y
i
=
C
k
)
r
e
pr
e
s
e
nt
s
th
e
in
di
c
a
to
r
f
unc
ti
on t
ha
t
r
e
tu
r
ns
1 w
he
n Y
i
=
C
k
a
nd 0 othe
r
w
is
e
.
S
ubs
e
que
nt
ly
,
th
e
c
om
pl
e
te
a
lg
or
it
hm
f
or
c
la
s
s
if
ic
a
ti
on
of
e
m
a
il
s
ba
s
e
d
on
gr
a
ph
s
im
il
a
r
it
y
a
ppr
oa
c
h
is
di
s
c
us
s
e
d
in
A
lg
or
it
hm
1
.
T
he
A
lg
or
it
hm
1
e
xpl
a
in
s
th
e
pr
opos
e
d
gr
a
ph
-
ba
s
e
d
e
m
a
il
c
la
s
s
if
ic
a
ti
on
us
in
g
T
F
-
I
D
F
a
nd
J
a
c
c
a
r
d
c
o
-
e
f
f
ic
ie
nt
.
I
n
s
te
p
1,
va
r
io
us
a
lg
or
it
hm
s
a
r
e
a
ppl
ie
d
to
c
le
a
n
th
e
r
a
w
da
ta
a
nd
m
a
ke
s
d
a
ta
to
s
ui
ta
bl
e
f
or
f
ur
th
e
r
m
a
c
hi
ne
le
a
r
ni
ng
ope
r
a
ti
ons
,
la
te
r
r
e
le
va
nt
to
p
k
f
e
a
tu
r
e
s
a
r
e
s
e
le
c
te
d
u
s
in
g
T
F
-
I
D
F
s
c
or
e
.
T
he
n,
a
gr
a
ph
w
a
s
bui
lt
u
s
in
g
th
e
s
e
f
e
a
tu
r
e
s
a
s
v
e
r
ti
c
e
s
a
nd
th
e
r
e
la
ti
ons
hi
p
be
twe
e
n
th
e
s
e
f
e
a
tu
r
e
s
a
s
e
dge
s
,
J
a
c
c
a
r
d
c
o
-
e
f
f
ic
ie
nt
is
us
e
d
to
f
in
d
th
e
s
im
il
a
r
it
y
s
c
or
e
be
tw
e
e
n
th
e
v
e
r
ti
c
e
s
(
th
r
e
s
hol
d
va
lu
e
c
on
s
id
e
r
e
d
in
th
i
s
s
tu
dy
is
0.75)
.
I
n
s
te
p
2,
a
gr
a
ph
w
il
l
be
c
on
s
tr
uc
te
d
f
or
uns
e
e
n
e
m
a
il
.
L
a
te
r
,
s
te
p
3
c
om
put
e
s
th
e
s
im
il
a
r
it
y
s
c
or
e
b
e
twe
e
n
a
nd
uns
e
e
n
e
m
a
il
gr
a
ph
a
nd
a
gr
a
ph
of
e
xi
s
ti
ng
e
m
a
il
s
.
F
in
a
ll
y, i
n t
he
s
te
p 5 t
he
uns
e
e
n
e
m
a
il
a
s
s
ig
ne
d w
it
h t
he
c
la
s
s
l
a
be
l
w
it
h hi
ghe
s
t
s
c
or
e
obt
a
in
e
d i
n t
he
s
te
p 4.
A
lg
or
it
hm
1
.
G
r
a
ph
-
ba
s
e
d e
m
a
il
c
la
s
s
if
ic
a
ti
on u
s
in
g
TF
-
I
D
F
a
nd J
a
c
c
a
r
d c
o
-
e
f
f
ic
ie
nt
Input:
e
m
a
i
l
d
a
t
a
s
e
t
=
{
1
,
2
,
…
,
}
,
u
n
s
e
e
n
e
m
a
i
l
d
u
,
f
e
a
t
u
r
e
n
u
m
b
e
r
k
,
a
n
d
t
h
r
e
s
h
o
l
d
o
f
J
a
c
c
a
r
d
co
-
e
f
f
i
c
i
e
n
t
ϴ
O
u
t
p
u
t
:
p
r
e
d
i
c
t
e
d
c
l
a
s
s
l
a
b
e
l
f
o
r
u
n
s
e
e
n
e
m
a
i
l
Steps:
1.
f
o
r
e
a
c
h
e
m
a
i
l
d
o
c
u
m
e
n
t
d
i
ϵ
D
:
-
P
r
e
p
r
o
c
e
s
s
d
i
→
g
e
t
t
e
r
m
s
=
{
1
,
2
,
…
,
}
-
C
o
m
p
u
t
e
TF
-
I
D
F
(
t
,
d
i
)
À
t
ϵ
T
-
S
e
l
e
c
t
t
h
e
t
o
p
f
e
a
t
u
r
e
s
→
V
i
-
I
n
i
t
i
a
l
i
z
e
t
h
e
s
e
t
o
f
e
d
g
e
s
E
i
ɸ
-
f
o
r
e
v
e
r
y
p
a
i
r
o
f
e
d
g
e
s
(
u
,
v
)
ϵ
V
i
*
V
i
,
u
≠
v
:
-
c
o
m
p
u
t
e
t
h
e
J
a
c
c
a
r
d
s
i
m
i
l
a
r
i
t
y
:
-
J
(
u
,
v
)
C
(
u
)
∩
C
(
v
)
/
C
(
u
)
Ù
C
(
v
)
-
I
f
J
(
u
,
v
)
>
=
ϴ
:
-
E
i
E
i
Ù
{
(
u
,
v
)
,
w
e
i
g
h
t
=
J
(
u
,
v
)
}
-
B
u
i
l
d
g
r
a
p
h
=
(
,
)
2.
R
e
p
e
a
t
t
h
e
s
t
e
p
1
t
o
c
o
n
s
t
r
u
c
t
t
h
e
g
r
a
p
h
f
o
r
u
n
s
e
e
n
e
m
a
i
l
d
u
-
G
u
(V
u
, E
u
)
3.
Compute
g
raph
si
milar
ity
sco
re
betw
een
G
u
a
n
d
e
v
e
r
y
G
i
u
s
i
n
g
m
o
d
i
f
i
e
d
J
a
c
c
a
r
d
c
o
-
e
f
f
i
c
i
e
n
t
-
(
,
)
=
(
Ո
)
/
(
Ս
)
4.
S
e
l
e
c
t
t
h
e
g
r
a
p
h
w
i
t
h
h
i
g
h
e
s
t
s
i
m
i
l
a
r
i
t
y
s
c
o
r
e
5.
A
s
s
i
g
n
t
h
e
c
l
a
s
s
l
a
b
e
l
o
f
d
j
t
o
d
u
r
e
t
u
r
n
t
h
e
c
l
a
s
s
l
a
b
e
l
f
o
r
u
n
s
e
e
n
e
m
a
i
l
d
u
3.6.
M
od
e
l
f
in
e
t
u
n
in
g an
d
h
yp
e
r
p
ar
am
e
t
e
r
op
t
im
iz
at
io
n
T
o
e
nha
nc
e
th
e
pe
r
f
or
m
a
nc
e
of
th
e
G
B
S
m
ode
l,
s
tr
uc
tu
r
e
d
f
i
ne
-
tu
ni
ng
w
a
s
pe
r
f
or
m
e
d
on
va
r
io
us
pa
r
a
m
e
te
r
s
in
vol
ve
d
in
th
e
pr
oc
e
s
s
in
g
pi
p
e
li
ne
.
T
h
e
pa
r
a
m
e
te
r
s
c
ons
id
e
r
e
d
f
or
opt
im
iz
a
ti
on
in
c
lu
de
th
e
vol
um
e
of
to
pm
os
t
f
e
a
tu
r
e
s
pe
r
e
m
a
il
,
r
a
ngi
ng
f
r
om
5
to
20.
A
ddi
ti
ona
ll
y,
th
e
th
r
e
s
hol
d
va
lu
e
of
J
a
c
c
a
r
d
s
im
il
a
r
it
y f
or
c
om
put
in
g e
dge
va
lu
e
s
w
a
s
t
e
s
te
d a
t
0.2, 0.3, 0.4,
a
nd 0.5.
4.
R
E
S
U
L
T
S
A
N
D
D
I
S
C
U
S
S
I
O
N
T
he
pr
opos
e
d
G
B
S
m
e
th
od’
s
pe
r
f
or
m
a
nc
e
w
a
s
a
s
s
e
s
s
e
d
us
in
g
th
e
a
c
a
de
m
ic
e
m
a
il
da
t
a
s
e
t.
A
ddi
ti
ona
ll
y, t
he
G
B
S
m
e
th
od
'
s
pe
r
f
or
m
a
nc
e
w
a
s
c
ont
r
a
s
te
d w
it
h t
ha
t
o
f
ot
he
r
t
r
a
di
ti
ona
l
e
m
a
il
c
la
s
s
if
ic
a
ti
on
te
c
hni
que
s
. I
nc
lu
di
ng s
e
F
A
C
E
D
[
34]
, L
S
T
M
[
35]
, L
S
V
C
, R
F
,
a
nd M
N
B
.
4.1.
P
e
r
f
or
m
an
c
e
an
al
ys
is
T
he
p
e
r
f
or
m
a
n
c
e
of
v
a
r
io
u
s
e
v
a
lu
a
ti
on
m
e
a
s
ur
e
s
li
k
e
a
c
c
ur
a
c
y
[
36]
,
pr
e
c
is
io
n
[
37]
,
r
e
c
a
ll
[
38]
,
a
nd
F1
-
s
c
or
e
[
39]
m
e
a
s
ur
e
s
w
e
r
e
te
s
te
d
on t
h
e
di
f
f
e
r
e
nt
e
m
a
i
l
c
l
a
s
s
if
ie
r
s
a
nd t
he
pr
opo
s
e
d
c
la
s
s
if
ie
r
a
s
(
11)
t
o (
1
4)
.
=
+
+
+
+
(
11)
=
+
(
12)
=
+
(
13)
1
−
=
2
(
∗
+
)
(
14)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
G
r
aph bas
e
d
s
e
m
ant
ic
e
m
ai
l
c
la
s
s
if
ic
at
io
n:
a nov
e
l
appr
oa
c
h f
or
ac
ade
m
ic
i
ns
ti
tu
ti
ons
(
A
r
una K
um
a
r
a B
.
)
5225
W
he
r
e
T
P
,
F
P
,
T
N
a
nd
F
N
s
pe
c
if
y
tr
ue
-
pos
it
iv
e
s
,
f
a
l
s
e
-
pos
it
iv
e
s
,
tr
ue
-
ne
ga
ti
ve
s
,
a
nd
f
a
ls
e
-
ne
ga
ti
ve
s
c
or
r
e
s
pondingl
y.
T
he
p
e
r
f
or
m
a
nc
e
of
di
f
f
e
r
e
nt
e
va
lu
a
ti
on
m
e
a
s
ur
e
s
on
di
f
f
e
r
e
nt
ty
pe
s
of
e
m
a
il
c
la
s
s
if
ic
a
ti
on
is
il
lu
s
tr
a
te
d
in
T
a
bl
e
3.
T
h
e
a
ve
r
a
ge
r
e
c
a
ll
,
F
1
-
s
c
or
e
,
pr
e
c
i
s
io
n
,
a
nd
a
c
c
ur
a
c
y
of
th
e
pr
opos
e
d
G
B
S
a
ppr
oa
c
h
a
r
e
0.96, 0.97, 0.97
,
a
nd 0.98 r
e
s
pe
c
ti
ve
ly
.
T
a
bl
e
3. P
e
r
f
or
m
a
nc
e
of
pr
opos
e
d m
ode
l
C
l
a
s
s
A
c
c
ur
a
c
y
P
r
e
c
i
s
i
on
R
e
c
a
l
l
F1
-
s
c
or
e
A
c
a
de
m
i
c
0.989
0.978
0.958
0.973
E
xa
m
i
na
t
i
on
0.989
0.985
0.964
0.976
P
l
a
c
e
m
e
nt
s
0.974
0.967
0.955
0.978
R
e
s
e
a
r
c
h
0.992
0.969
0.962
0.982
T
o
e
va
lu
a
te
th
e
pe
r
f
or
m
a
nc
e
of
G
B
S
e
m
a
il
c
la
s
s
if
ic
a
ti
on
m
e
th
od,
a
c
onf
us
io
n
m
a
tr
ix
w
a
s
c
ons
tr
uc
te
d.
F
ig
ur
e
4
gi
ve
s
th
e
de
ta
il
e
d
in
s
ig
ht
of
th
e
pr
opo
s
e
d
m
e
th
od’
s
a
bi
li
ty
to
c
or
r
e
c
tl
y
id
e
nt
if
y
a
nd
di
f
f
e
r
e
nt
ia
te
be
twe
e
n
va
r
io
us
c
la
s
s
e
s
.
T
he
m
a
tr
ix
s
how
s
th
a
t
th
e
pr
opos
e
d
m
e
th
od
a
c
hi
e
ve
s
be
tt
e
r
c
la
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y
f
or
e
a
c
h
c
a
te
gor
y,
pa
r
ti
c
ul
a
r
ly
f
o
r
pl
a
c
e
m
e
nt
a
nd
r
e
s
e
a
r
c
h
c
a
te
gor
ie
s
w
it
h
m
in
im
a
l
m
is
c
la
s
s
if
ic
a
ti
on
s
.
F
r
om
th
e
m
a
tr
ix
it
w
a
s
obs
e
r
ve
d
th
a
t,
th
e
to
ta
l
c
or
r
e
c
t
pr
e
di
c
ti
ons
w
e
r
e
3
,
363
out
of
3
,
400
e
m
a
il
s
a
m
pl
e
s
.
T
he
pr
opos
e
d
G
B
S
a
ppr
oa
c
h
a
c
hi
e
ve
d
w
it
h
hi
ghe
s
t
a
c
c
ur
a
c
y
of
98.91%
w
it
h
lo
w
m
is
c
la
s
s
if
ic
a
ti
on
r
a
te
of
1.09%
.
M
a
xi
m
um
m
is
c
la
s
s
if
ic
a
ti
on
oc
c
ur
r
e
d
be
twe
e
n
th
e
a
c
a
de
m
ic
s
a
nd
e
xa
m
in
a
ti
on
c
a
te
gor
y due
t
o t
he
m
ut
ua
l
te
r
m
s
be
twe
e
n t
he
s
e
t
w
o c
la
s
s
e
s
.
F
ig
ur
e
4. C
onf
us
io
n
m
a
tr
ix
of
t
he
pr
opos
e
d m
e
th
od
4.2.
C
om
p
ar
at
iv
e
an
al
ys
is
F
ig
ur
e
5
s
how
s
th
e
pe
r
f
or
m
a
nc
e
a
na
ly
s
is
of
pr
opos
e
d
m
e
th
o
d.
T
he
pe
r
f
or
m
a
nc
e
of
th
e
G
B
S
a
nd
c
onve
nt
io
na
l
c
la
s
s
if
ie
r
s
a
c
r
os
s
a
r
a
nge
of
pe
r
f
or
m
a
nc
e
m
e
tr
ic
s
is
di
s
pl
a
ye
d
in
th
e
F
ig
ur
e
5(
a
)
.
N
B
obt
a
in
e
d
94.1%
a
c
c
ur
a
c
y,
95.8%
pr
e
c
is
io
n,
96.1%
r
e
c
a
ll
,
a
nd
96.9%
F1
-
s
c
or
e
,
a
s
pe
r
th
e
e
xp
e
r
im
e
nt
a
l
f
in
di
ngs
.
C
ons
e
que
nt
ly
,
w
it
h
97.8%
a
c
c
ur
a
c
y,
96.2%
pr
e
c
i
s
io
n,
9
7.0%
r
e
c
a
ll
,
a
nd
96.4%
F1
-
s
c
or
e
,
L
S
V
C
out
pe
r
f
or
m
e
d
th
e
ot
he
r
s
.
F
ur
th
e
r
m
o
r
e
,
th
e
R
F
te
c
hni
que
pr
oduc
e
d
good
r
e
s
ul
ts
,
w
it
h
a
96.3%
F1
-
s
c
or
e
,
97.1%
r
e
c
a
ll
,
97.0%
pr
e
c
is
io
n,
a
nd
97.7%
a
c
c
ur
a
c
y.
W
it
h
98.
2%
a
c
c
ur
a
c
y,
97.8%
pr
e
c
is
io
n,
98.8%
r
e
c
a
ll
,
a
nd 98.7%
F1
-
s
c
or
e
, t
he
pr
opos
e
d G
B
S
out
pe
r
f
or
m
s
ot
he
r
c
la
s
s
if
ie
r
s
.
F
ig
ur
e
5(
b)
r
e
c
or
de
d
th
e
a
s
s
e
s
s
m
e
nt
of
G
B
S
a
nd
ot
he
r
va
r
io
us
e
m
a
il
c
la
s
s
if
ie
r
s
(
N
B
,
L
S
V
C
,
R
F
,
L
S
T
M
,
a
nd
s
e
F
A
C
E
D
)
on
th
e
a
c
a
de
m
ic
e
m
a
il
da
ta
s
e
t.
F
r
om
th
e
e
xpe
r
im
e
nt
a
ti
on,
it
w
a
s
pe
r
c
e
iv
e
d
th
a
t,
th
e
pr
opos
e
d
G
B
S
s
c
or
e
d
78.2%
a
c
c
ur
a
c
y,
w
he
r
e
a
s
L
S
T
M
a
nd
s
e
F
A
C
E
D
s
c
or
e
d
79.3%
a
nd
85.1%
r
e
s
pe
c
ti
ve
ly
w
he
n
th
e
m
e
th
od
w
a
s
te
s
te
d
on
500
e
m
a
il
s
a
m
pl
e
s
.
A
ls
o,
th
e
r
e
s
ul
ts
il
lu
s
tr
a
te
th
a
t,
th
e
G
B
S
pr
oduc
e
d
92.1
%
a
c
c
ur
a
c
y,
c
om
pa
r
e
d
to
L
S
T
M
a
nd
s
e
F
A
C
E
D
pr
oduc
e
d
89
.02%
a
nd
91.6%
.
F
ur
th
e
r
m
or
e
,
th
e
r
e
s
ul
ts
de
m
ons
tr
a
te
d
th
a
t
th
e
r
e
c
om
m
e
nde
d
G
B
S
pe
r
f
or
m
e
d
m
or
e
e
f
f
e
c
ti
ve
ly
th
a
n
ot
he
r
c
la
s
s
if
ie
r
s
w
it
h
95.2%
a
nd
98.91%
a
c
c
ur
a
c
y
w
he
n
3,000
a
nd
4,000
e
m
a
il
s
a
m
pl
e
s
w
e
r
e
c
ons
id
e
r
e
d.
T
he
s
e
r
e
s
ul
t
s
de
s
ig
na
te
th
a
t
th
e
pr
opos
e
d a
ppr
oa
c
h’
s
pe
r
f
or
m
a
nc
e
i
m
pr
ove
s
a
s
t
he
da
ta
s
e
t
s
iz
e
i
nc
r
e
a
s
e
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8938
I
nt
J
A
r
ti
f
I
nt
e
ll
, V
ol
.
14
, N
o.
6
,
D
e
c
e
m
be
r
20
25
:
5218
-
5230
5226
(
a
)
(
b)
F
ig
ur
e
5. P
e
r
f
or
m
a
nc
e
a
na
ly
s
is
of
pr
opos
e
d m
e
th
od
of
(
a
)
co
m
pa
r
a
ti
ve
a
na
ly
s
is
of
G
B
S
w
it
h t
r
a
di
ti
ona
l
c
la
s
s
if
ie
r
s
(
b)
a
c
c
ur
a
c
y
c
om
pa
r
is
on of
G
B
S
w
it
h c
onve
nt
io
na
l
m
e
th
ods
T
he
R
O
C
c
ur
ve
w
a
s
ge
n
e
r
a
te
d
f
or
a
ll
th
e
f
our
c
a
te
gor
ie
s
.
W
it
h
th
e
a
r
e
a
unde
r
c
ur
ve
(
A
U
C
)
us
e
d
to
qua
nt
if
y
th
e
c
la
s
s
if
ic
a
ti
on
pe
r
f
or
m
a
nc
e
.
T
he
tr
ue
pos
it
iv
e
r
a
t
e
(
T
P
R
)
a
nd
f
a
ls
e
pos
it
iv
e
r
a
t
e
(
F
P
R
)
w
e
r
e
c
om
put
e
d a
s
(
15)
a
nd (
16)
.
=
+
(
15)
=
+
(
16)
F
ig
ur
e
6
il
lu
s
tr
a
te
s
th
e
R
O
C
c
ur
ve
of
th
e
pr
opos
e
d
G
B
S
c
la
s
s
if
ie
r
f
or
a
ll
f
our
c
a
te
gor
ie
s
,
hi
ghl
ig
ht
in
g
th
e
c
la
s
s
if
ie
r
’
s
a
bi
li
ty
to
s
e
pa
r
a
te
e
m
a
il
s
in
to
th
e
ir
c
or
r
e
c
t
c
a
te
gor
ie
s
.
T
he
A
U
C
v
a
lu
e
s
w
e
r
e
0.978
f
or
a
c
a
de
m
ic
s
,
0.981
f
or
r
e
s
e
a
r
c
h
,
0.989
f
or
e
xa
m
in
a
ti
on
,
a
nd
0.982
f
or
pl
a
c
e
m
e
nt
s
.
T
he
s
e
r
e
s
ul
ts
w
e
r
e
de
m
ons
tr
a
te
d
th
a
t
th
e
c
la
s
s
if
ie
r
pe
r
f
or
m
s
e
xc
e
pt
io
na
ll
y
w
e
ll
a
c
r
os
s
a
ll
c
a
t
e
gor
ie
s
w
it
h
th
e
e
xa
m
in
a
ti
on
c
a
te
gor
y
e
xhi
bi
ti
ng
th
e
hi
ghe
s
t
c
la
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y.
T
hi
s
va
li
da
te
s
th
e
e
f
f
e
c
ti
ve
ne
s
s
of
th
e
gr
a
ph
-
ba
s
e
d
c
la
s
s
if
ie
r
in
ha
ndl
in
g
m
ul
ti
-
c
la
s
s
e
m
a
il
c
la
s
s
if
ic
a
ti
on
us
in
g
th
e
J
a
c
c
a
r
d
s
im
il
a
r
it
y.
T
a
bl
e
4
gi
ve
s
th
e
c
om
pa
r
is
on
of
pr
opos
e
d
G
B
S
m
e
th
od
w
it
h
th
e
e
xi
s
ti
ng
m
e
th
ods
(
i.
e
.,
s
e
m
a
nt
ic
gr
a
ph
ne
ur
a
l
ne
twor
k
(
S
G
N
N
)
,
s
ta
c
ki
ng
e
ns
e
m
bl
e
,
s
e
F
A
C
E
D
,
a
nd
M
a
lF
S
C
I
L
)
.
T
he
ove
r
a
ll
pe
r
f
or
m
a
nc
e
of
th
e
G
B
S
m
e
th
od
is
98.91%
a
c
c
ur
a
c
y,
w
hi
c
h
is
1.04,
0.09,
3.91
,
a
nd
2.92%
be
tt
e
r
th
a
n
th
e
e
xi
s
ti
ng
m
e
th
ods
.
F
r
om
th
is
c
om
pa
r
a
ti
ve
a
na
ly
s
i
s
,
w
e
c
a
n
c
onc
lu
de
th
a
t
th
e
pr
opos
e
d
G
B
S
m
e
th
od
out
pe
r
f
or
m
e
d
ot
he
r
m
e
th
ods
by
a
c
hi
e
vi
ng highe
s
t
a
c
c
ur
a
c
y.
F
ig
ur
e
6. R
O
C
c
ur
ve
f
or
t
he
pr
opos
e
d G
B
S
e
m
a
il
c
la
s
s
if
ie
r
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
r
ti
f
I
nt
e
ll
I
S
S
N
:
2252
-
8938
G
r
aph bas
e
d
s
e
m
ant
ic
e
m
ai
l
c
la
s
s
if
ic
at
io
n:
a nov
e
l
appr
oa
c
h f
or
ac
ade
m
ic
i
ns
ti
tu
ti
ons
(
A
r
una K
um
a
r
a B
.
)
5227
T
a
bl
e
4
. A
c
om
pa
r
is
on of
t
he
pr
opos
e
d m
e
th
od w
it
h c
ur
r
e
nt
m
e
th
ods
A
ut
hor
Y
e
a
r
M
e
t
hods
A
c
c
ur
a
c
y
(%)
P
a
n
e
t
al
.
[
18]
2022
S
G
N
N
97.872
A
dna
n
e
t
al
.
[
22]
2024
S
t
a
c
ki
ng E
ns
e
m
bl
e
98.80
H
i
na
e
t
al
.
[
34]
2021
s
e
F
A
C
E
D
95
C
ha
i
e
t
al
.
[
40]
2025
M
a
l
F
S
C
I
L
90.58
P
r
opos
e
d
G
B
S
98.91
4.3.
I
m
p
ac
t
of
f
in
e
t
u
n
in
g
T
o
e
va
lu
a
te
th
e
im
pa
c
t
of
f
in
e
-
tu
ni
ng
on
th
e
p
r
opos
e
d
m
ode
l
(
F
ig
ur
e
7
)
,
di
f
f
e
r
e
nt
s
e
ts
of
e
xpe
r
im
e
nt
s
w
e
r
e
c
onduc
te
d
ove
r
T
F
-
I
D
F
f
e
a
tu
r
e
s
iz
e
s
a
nd
th
r
e
s
hol
d
li
m
it
s
of
J
a
c
c
a
r
d
c
o
-
e
f
f
ic
ie
nt
.
F
ig
ur
e
s
7(
a
)
a
nd
7(
b)
de
m
o
ns
tr
a
te
s
th
e
im
p
a
c
t
of
f
in
e
-
t
uni
n
g
a
ppr
o
a
c
h
e
s
a
m
o
ng
a
c
c
ur
a
c
y
a
n
d
F
1
-
s
c
or
e
r
e
s
p
e
c
ti
v
e
ly
.
A
c
c
or
di
ng
to
th
e
f
in
di
ngs
,
t
he
pr
opo
s
e
d
m
ode
l
ou
t
pe
r
f
or
m
e
d
c
ons
i
s
t
e
nt
ly
a
c
r
os
s
a
ll
c
om
bi
n
a
ti
on
s
a
f
te
r
f
in
e
-
tu
n
e
t
he
m
ode
l.
T
h
e
pr
opo
s
e
d
m
od
e
l
e
nh
a
nc
e
d
it
s
a
c
c
ur
a
c
y
f
r
om
94.
41%
t
o
96
.30%
f
or
th
e
c
om
bi
n
a
ti
on
(
k
=
5,
th
e
ta
=
0.2)
a
nd
obt
a
in
e
d
it
s
hi
gh
e
s
t
99
.2
3%
f
or
th
e
c
onf
ig
ur
a
ti
o
n
(
k
=
10,
th
e
t
a
=
0.
3)
.
C
or
r
e
s
pondingl
y,
th
e
F
1
-
s
c
or
e
e
nha
nc
e
d
a
m
ong
a
ll
c
om
bi
na
ti
o
ns
a
nd
a
c
hi
e
ve
d
hi
ghe
s
t
0.992
f
or
th
e
s
im
il
a
r
c
om
bi
na
ti
on.
T
he
s
e
f
in
di
ngs
va
li
da
t
e
th
a
t
th
e
pr
opos
e
d
c
l
a
s
s
if
i
e
r
boos
ts
it
s
pe
r
f
or
m
a
nc
e
a
f
te
r
f
in
e
-
tu
ni
ng
th
e
to
p r
a
nke
d f
e
a
tu
r
e
s
a
nd t
he
J
a
c
c
a
r
d’
s
th
r
e
s
hol
d va
lu
e
i
n gr
a
ph c
ons
tr
uc
ti
on.
(
a
)
(
b)
F
ig
ur
e
7
.
I
m
pa
c
t
of
f
in
e
t
uni
ng on
G
B
S
m
e
th
od
of
(
a
)
a
c
c
ur
a
c
y
of
t
he
G
B
S
be
f
or
e
a
nd a
f
te
r
f
in
e
t
uni
ng
a
nd
(
b)
F
1
-
va
lu
e
of
t
he
G
B
S
be
f
or
e
a
nd a
f
te
r
f
in
e
t
uni
ng
5.
C
O
N
C
L
U
S
I
O
N
I
n
th
is
s
tu
dy,
a
n
in
nova
ti
ve
g
r
a
ph
-
ba
s
e
d
s
e
m
a
nt
ic
e
m
a
il
c
la
s
s
if
ic
a
ti
on
m
e
th
od
w
a
s
de
ve
lo
pe
d
us
in
g
TF
-
I
D
F
a
nd
th
e
J
a
c
c
a
r
d
c
oe
f
f
ic
ie
nt
.
I
ni
ti
a
ll
y,
th
e
m
e
th
od
us
e
d
di
f
f
e
r
e
nt
da
ta
pr
e
pr
oc
e
s
s
in
g
m
e
th
ods
to
c
le
a
n
th
e
r
a
w
da
ta
;
la
te
r
,
T
F
-
I
D
F
w
a
s
us
e
d
to
e
xt
r
a
c
t
th
e
m
os
t
im
p
or
ta
nt
a
nd
r
e
le
va
nt
f
e
a
tu
r
e
s
f
r
om
th
e
upda
te
d
da
ta
s
e
t.
N
e
xt
,
a
gr
a
ph
w
a
s
c
on
s
tr
uc
te
d
f
or
a
c
la
s
s
us
in
g
th
e
J
a
c
c
a
r
d
c
oe
f
f
ic
ie
nt
,
w
he
r
e
th
e
f
e
a
tu
r
e
s
w
e
r
e
us
e
d
Evaluation Warning : The document was created with Spire.PDF for Python.