I
n
t
e
r
n
at
io
n
al
Jou
r
n
al
of
A
d
van
c
e
s
i
n
A
p
p
li
e
d
S
c
ie
n
c
e
s
(
I
JA
A
S
)
V
ol
.
14
, N
o.
3
,
S
e
pt
e
m
be
r
20
25
, pp.
916
~
927
I
S
S
N
:
2252
-
8814
,
D
O
I
:
10.11591/
ij
a
a
s
.
v14.
i
3
.
pp916
-
927
916
Jou
r
n
al
h
om
e
page
:
ht
tp
:
//
ij
aas
.i
ae
s
c
or
e
.c
om
A
h
yb
r
i
d
f
e
at
u
r
e
s
b
ase
d
m
al
e
vol
e
n
t
d
om
ai
n
d
e
t
e
c
t
i
on
in
c
yb
e
r
sp
ac
e
u
si
n
g
m
ac
h
i
n
e
l
e
ar
n
i
n
g
S
al
e
e
m
R
aj
a
A
b
d
u
l
S
am
ad
1
,
P
r
ad
e
e
p
a
G
an
e
s
an
1
,
A
m
n
a
S
al
im
R
as
h
id
Al
-
K
aab
i
1
,
Ju
s
t
in
R
aj
as
e
k
ar
an
1
,
M
u
r
u
gan
S
in
gar
ave
la
n
2
,
P
e
e
r
b
as
h
a S
h
e
b
b
e
e
r
B
as
h
a
3
1
D
e
pa
r
t
m
e
nt
o
f
I
n
f
or
m
a
t
i
o
n
T
e
c
hn
ol
ogy
,
C
ol
l
e
ge
of
C
om
pu
t
i
ng
a
nd
I
n
f
o
r
m
a
t
i
on
S
c
i
e
nc
e
s
,
U
ni
ve
r
s
i
t
y
of
T
e
c
hno
l
o
gy
a
nd
A
ppl
i
e
d
S
c
i
e
nc
e
s
,
S
hi
na
s
,
S
ul
t
a
na
t
e
of
O
m
a
n
2
D
e
pa
r
t
m
e
nt
of
C
om
put
e
r
S
c
i
e
nc
e
a
nd
E
ngi
ne
e
r
i
ng,
V
e
l
T
e
c
h
R
a
nga
r
a
j
a
n
D
r
.
S
a
gunt
ha
l
a
R
&
D
I
ns
t
i
t
ut
e
of
S
c
i
e
nc
e
a
nd
T
e
c
hnol
ogy,
C
he
nna
i
,
I
ndi
a
3
D
e
pa
r
t
m
e
nt
of
C
om
put
e
r
S
c
i
e
nc
e
,
J
a
m
a
l
M
oha
m
e
d C
ol
l
e
ge
, A
f
f
i
l
i
a
t
e
d t
o B
ha
r
a
t
hi
da
s
a
n U
ni
ve
r
s
i
t
y, T
i
r
uc
hi
r
a
ppa
l
l
i
, T
a
m
i
l
N
a
du, I
ndi
a
A
r
t
ic
le
I
n
f
o
A
B
S
T
R
A
C
T
A
r
ti
c
le
h
is
to
r
y
:
R
e
c
e
iv
e
d
A
ug 29, 2024
R
e
vi
s
e
d
J
un 8, 2025
A
c
c
e
pt
e
d
J
un
20
,
2025
The
rise
of
social
media
has
changed
modern
communication,
placing
information
at
our
fingertips.
While
thes
e
developmen
ts
have
made
ou
r
lives
easier,
they
have
also
increased
cybercrime.
Cyberspace
has
become
a
refuge
for
modern
cybercriminals
to
conduct
destructive
actions.
Most
cyberattacks
are
carried
out
through
malicio
us
links
shared
on
social
media
platforms,
emails,
or
messaging
services.
These
attacks
can
have
serious
consequences
for
indivi
duals
and
organizati
ons,
includi
ng
financial
losses,
sensitive
data
breaches,
and
damage
to
reputation.
Early
identificati
on
and
blocking
of
such
links
are
crucial
to
protecting
internet
users
and
se
curing
cyberspace.
Current
research
uses
machine
learning
(ML)
algorit
h
ms
to
detect
malicious
hyperlinks
based
on
observed
patterns
in
uniform
re
source
locators
(URLs)
or
web
content.
However
,
cyberatta
ck
tactics
are
con
stantly
changing.
To
address
this
challenge,
this
paper
introdu
ces
a
robust
method
that
performs
a
fine
-
grained
analysis
of
URLs
for
classification.
Lexi
cal
and
n
-
gram
features
are
examined
separately,
with
URL
n
-
grams
repre
sented
using
Word2Vec
embeddings.
The
results
from
hybrid
feature
s
ets
are
combined
using
a
logist
ic
regression
(
LR
)
model
to
incr
ease
overall
classifi
cation accu
racy. This
robust
method
allows
the sys
tem to
use b
oth th
e
structural
components
of
the
URL
and
the
fine
-
grained
patterns
obtained
by
the n
-
grams.
K
e
y
w
o
r
d
s
:
M
a
c
hi
ne
l
e
a
r
ni
ng
M
a
li
c
io
us
U
R
L
N
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
N
-
gr
a
m
P
hi
s
hi
ng
W
or
d2V
e
c
This
is
an
open
access
article
under
the
CC
BY
-
SA
license.
C
or
r
e
s
pon
di
n
g
A
u
th
or
:
S
a
le
e
m
R
a
ja
A
bdul
S
a
m
a
d
D
e
pa
r
tm
e
nt
of
I
nf
or
m
a
ti
on T
e
c
hnol
ogy, C
ol
le
ge
of
C
om
put
in
g a
nd I
nf
or
m
a
ti
on S
c
ie
nc
e
s
U
ni
ve
r
s
it
y of
T
e
c
hnol
ogy a
nd A
ppl
ie
d S
c
ie
n
c
e
s
Al
A
qa
r
,
S
hi
na
s
,
S
ul
ta
na
te
of
O
m
a
n
E
m
a
il
:
s
a
le
e
m
.a
bdul
s
a
m
a
d@
ut
a
s
.e
du.om
1.
I
N
T
R
O
D
U
C
T
I
O
N
M
os
t
c
ybe
r
a
tt
a
c
k
s
or
ig
in
a
te
f
r
om
m
a
li
c
io
us
li
nks
di
s
s
e
m
in
a
te
d
vi
a
e
m
a
il
,
s
oc
ia
l
m
e
di
a
pos
t
s
,
a
nd
in
s
ta
nt
m
e
s
s
a
gi
ng
a
ppl
ic
a
ti
on
s
.
T
he
s
e
li
nks
di
r
e
c
t
us
e
r
s
to
ha
r
m
f
ul
w
e
bs
it
e
s
s
pe
c
if
ic
a
ll
y
de
s
ig
ne
d
to
c
om
pr
om
is
e
th
e
ir
s
e
c
ur
it
y.
By
c
li
c
ki
ng
on
th
e
s
e
li
nks
,
vi
c
ti
m
s
can
s
uf
f
e
r
a
r
a
nge
of
c
ons
e
que
n
c
e
s
,
in
c
lu
di
ng
th
e
a
tt
a
c
ke
r
c
ol
le
c
ti
ng
pe
r
s
on
a
l
da
ta
f
r
om
th
e
vi
c
ti
m
,
c
om
pr
om
is
in
g
a
c
c
ount
s
,
dow
nl
oa
di
ng
a
nd
in
s
ta
ll
in
g
m
a
lwa
r
e
(
vi
r
us
e
s
,
r
a
ns
om
w
a
r
e
,
s
pyw
a
r
e
,
or
tr
oj
a
ns
)
,
a
nd
w
r
e
a
ki
ng
ha
voc
on
th
e
s
it
ua
ti
on.
C
om
pr
om
is
e
d
a
c
c
ount
s
c
a
n
ha
r
m
th
e
r
e
put
a
ti
on
of
in
di
vi
dua
ls
a
nd
th
e
c
om
p
a
ny'
s
br
a
nds
a
nd
m
a
y
le
a
d
to
f
in
a
n
c
ia
l
lo
s
s
e
s
.
M
or
e
ove
r
,
c
om
pr
om
is
e
d
s
ys
te
m
s
m
a
y
be
us
e
d
as
"
z
om
bi
e
s
"
to
la
unc
h
f
ur
th
e
r
a
tt
a
c
ks
on
ot
he
r
ne
twor
ks
.
D
e
te
c
ti
ng
s
uc
h
m
a
li
c
io
u
s
li
nks
in
c
ybe
r
s
pa
c
e
is
c
ha
ll
e
ngi
ng
f
o
r
in
te
r
ne
t
us
e
r
s
[
1]
.
To
a
ut
om
a
te
th
e
de
te
c
ti
on
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
dv
A
ppl
S
c
i
I
S
S
N
:
2252
-
8814
A
hy
br
id
f
e
at
ur
e
s
ba
s
e
d m
al
e
v
ol
e
nt
dom
ai
n de
te
c
ti
on i
n c
y
be
r
s
pac
e
…
(
Sal
e
e
m
R
aj
a A
bdul
Sam
ad)
917
pr
oc
e
s
s
,
r
e
s
e
a
r
c
he
r
s
c
on
s
id
e
r
di
f
f
e
r
e
nt
e
le
m
e
nt
s
of
th
e
uni
f
or
m
r
e
s
our
c
e
lo
c
a
to
r
(
U
R
L
)
or
w
e
bs
it
e
,
in
c
lu
di
ng
U
R
L
le
xi
c
a
l
f
e
a
tu
r
e
s
,
n
-
gr
a
m
f
e
a
tu
r
e
s
,
w
e
b
s
it
e
c
ont
e
nt
,
r
e
put
a
t
io
n,
a
nd
vi
s
u
a
l
s
im
il
a
r
it
y
[
2]
.
L
e
xi
c
a
l
f
e
a
tu
r
e
s
a
na
ly
z
e
th
e
s
tr
uc
tu
r
a
l
c
om
pone
nt
s
of
th
e
U
R
L
,
s
u
c
h
as
th
e
do
m
a
in
na
m
e
,
pa
th
,
num
be
r
of
s
ubdom
a
in
s
,
a
nd
num
be
r
of
dot
s
,
to
id
e
nt
if
y
s
us
pi
c
io
us
pa
tt
e
r
ns
.
N
-
gr
a
m
a
na
ly
s
is
goe
s
de
e
pe
r
by
e
xa
m
in
in
g
th
e
c
ha
r
a
c
te
r
-
le
ve
l
de
ta
il
s
of
th
e
U
R
L
,
unc
ove
r
in
g
hi
dde
n
pa
tt
e
r
ns
,
s
ubt
le
c
ha
r
a
c
te
r
c
ha
nge
s
,
a
nd
obf
us
c
a
ti
on
te
c
hni
que
s
.
U
R
L
a
na
ly
s
is
is
g
e
ne
r
a
ll
y
le
s
s
r
e
s
our
c
e
-
in
te
ns
iv
e
a
nd
c
om
put
a
t
io
na
ll
y
e
f
f
ic
ie
nt
th
a
n
ot
he
r
m
e
th
ods
,
a
ll
ow
in
g
f
or
th
e
qui
c
k
c
la
s
s
if
ic
a
ti
on
of
m
a
li
c
io
us
U
R
L
s
w
it
h
m
in
i
m
a
l
r
i
s
k
to
th
e
s
ys
te
m
.
Web
c
ont
e
nt
a
na
ly
s
is
,
w
hi
c
h
in
s
pe
c
ts
te
xt
ua
l
c
ont
e
nt
,
im
a
ge
s
,
vi
de
os
,
a
nd
s
tr
uc
tu
r
a
l
e
le
m
e
nt
s
li
ke
hype
r
te
xt
m
a
r
kup
la
ngua
ge
(
H
T
M
L
)
ta
gs
a
nd
s
c
r
ip
ts
,
of
f
e
r
s
m
or
e
a
c
c
ur
a
te
de
te
c
ti
on
but
pos
e
s
r
is
ks
,
as
m
a
li
c
io
us
c
ont
e
nt
can
pot
e
nt
ia
ll
y
in
f
e
c
t
th
e
s
ys
te
m
a
na
ly
z
in
g
it.
R
e
put
a
ti
on
c
he
c
ks
r
e
ly
on
e
xt
e
r
na
l
s
e
r
vi
c
e
s
to
a
s
s
e
s
s
a
w
e
bs
it
e
'
s
tr
us
twor
th
in
e
s
s
ba
s
e
d
on
f
a
c
to
r
s
li
ke
dom
a
in
s
e
r
ve
r
de
ta
il
s
a
nd
its
r
a
nki
ng
at
bot
h
n
a
ti
ona
l
a
nd
gl
oba
l
le
v
e
ls
.
H
ow
e
ve
r
,
th
e
s
hor
t
li
f
e
s
pa
n
of
m
a
li
c
io
us
w
e
bs
it
e
s
m
a
ke
s
th
i
s
pr
oc
e
s
s
time
-
c
ons
um
in
g.
F
in
a
ll
y,
vi
s
ua
l
s
im
il
a
r
it
y
a
na
ly
s
is
c
om
pa
r
e
s
s
c
r
e
e
ns
hot
s
of
s
u
s
pi
c
io
us
w
e
bs
it
e
s
w
it
h
le
gi
ti
m
a
te
on
e
s
us
in
g
im
a
ge
pr
oc
e
s
s
in
g
te
c
hni
que
s
.
H
ow
e
ve
r
,
th
is
m
e
th
od
is
hi
ghl
y
r
e
s
our
c
e
-
in
te
n
s
iv
e
a
nd
de
m
a
nds
s
ig
ni
f
ic
a
nt
c
om
put
a
ti
ona
l
pow
e
r
.
C
ur
r
e
nt
s
e
c
ur
it
y
s
ys
te
m
s
of
te
n
r
e
ly
on
a
bl
a
c
kl
is
ti
ng
or
s
ig
na
tu
r
e
-
ba
s
e
d
a
ppr
oa
c
h.
In
th
is
m
e
th
od,
U
R
L
s
a
r
e
c
he
c
ke
d
a
ga
in
s
t
a
bl
a
c
kl
is
t
da
t
a
ba
s
e
to
d
e
te
r
m
in
e
w
he
th
e
r
th
e
y
s
houl
d
be
a
ll
ow
e
d
or
de
ni
e
d.
If
th
e
U
R
L
is
pr
e
s
e
nt
in
th
e
bl
a
c
kl
is
t,
a
c
c
e
s
s
is
bl
oc
ke
d;
ot
he
r
w
is
e
,
it
is
p
e
r
m
it
te
d.
T
hi
s
is
an
e
a
s
y
a
nd
f
a
s
t
m
e
th
od
f
or
de
te
c
ti
ng
m
a
li
c
io
us
U
R
L
s
.
H
ow
e
ve
r
,
it
ha
s
s
ig
ni
f
ic
a
nt
dr
a
w
ba
c
ks
.
T
he
bl
a
c
kl
i
s
t
da
ta
ba
s
e
ne
e
ds
c
ons
t
a
nt
upda
te
s
to
id
e
nt
if
y
ne
w
ly
c
r
e
a
te
d
m
a
li
c
io
us
U
R
L
s
,
m
a
ki
ng
it
in
e
f
f
e
c
ti
ve
a
ga
in
s
t
ne
w
th
r
e
a
ts
.
A
ddi
ti
ona
ll
y,
a
tt
a
c
ke
r
s
c
a
n
bypa
s
s
th
e
s
e
s
ys
te
m
s
by
m
in
or
m
odi
f
ic
a
ti
ons
to
m
a
li
c
io
us
U
R
L
s
,
m
a
ki
ng
th
e
m
unde
te
c
ta
bl
e
unl
e
s
s
th
e
e
xa
c
t
U
R
L
or
s
ig
na
tu
r
e
is
li
s
te
d
[
3]
.
An
a
lt
e
r
na
ti
ve
m
e
th
od
is
r
ul
e
s
-
ba
s
e
d
de
te
c
ti
on,
w
hi
c
h
a
ppl
ie
s
ge
ne
r
a
li
z
e
d
r
ul
e
s
to
id
e
nt
if
y
m
a
li
c
io
us
U
R
L
s
.
W
hi
le
th
is
h
e
lp
s
br
oa
de
n
de
te
c
ti
on,
it
r
e
qui
r
e
s
e
xt
e
ns
iv
e
dom
a
in
e
xpe
r
ti
s
e
to
c
r
e
a
te
a
c
c
ur
a
te
r
ul
e
s
.
A
ddi
ti
ona
ll
y,
m
ode
r
n
a
tt
a
c
ks
of
te
n
r
e
qui
r
e
m
or
e
c
om
pl
e
x
r
ul
e
s
,
m
a
ki
ng
th
e
m
di
f
f
ic
ul
t
to
im
pl
e
m
e
nt
a
nd
m
a
in
ta
in
[
4
]
.
As
a
r
e
s
ul
t,
m
a
ny
r
e
s
e
a
r
c
he
r
s
pr
e
f
e
r
m
a
c
hi
ne
le
a
r
ni
ng
(
ML
)
a
ppr
oa
c
he
s
.
T
he
y
c
a
n
de
te
c
t
pr
e
vi
ous
ly
unknown
a
tt
a
c
ks
th
a
t
do
not
m
a
tc
h
e
xi
s
ti
ng
s
ig
na
tu
r
e
s
.
By
c
ont
in
uous
ly
tr
a
in
in
g
on
la
r
ge
da
ta
s
e
ts
,
th
e
s
e
m
ode
ls
can
im
pr
ove
th
e
ir
a
c
c
ur
a
c
y
a
nd
dyna
m
ic
a
ll
y
a
dj
us
t
to
c
ha
ngi
ng
pa
tt
e
r
ns
of
m
a
li
c
io
us
be
ha
vi
or
[
5]
,
[
6]
.
T
he
pr
opos
e
d
s
ys
te
m
c
om
bi
ne
s
U
R
L
s
tr
uc
tu
r
a
l
e
le
m
e
nt
s
(
le
xi
c
a
l
f
e
a
tu
r
e
s
)
w
it
h
U
R
L
c
ha
r
a
c
te
r
-
le
ve
l
e
le
m
e
nt
s
(n
-
gr
a
m
f
e
a
tu
r
e
s
)
to
e
nha
n
c
e
th
e
de
te
c
ti
on
of
m
a
li
c
io
us
U
R
L
s
.
L
e
xi
c
a
l
f
e
a
tu
r
e
s
in
c
lu
de
th
e
U
R
L
s
tr
uc
tu
r
e
,
dom
a
in
le
ngt
h,
s
ubdoma
in
s
,
s
pe
c
ia
l
c
h
a
r
a
c
te
r
s
,
a
nd
que
r
y
pa
r
a
m
e
te
r
s
,
w
hi
c
h
he
lp
id
e
nt
if
y
pa
tt
e
r
ns
a
s
s
oc
ia
te
d
w
it
h
phi
s
hi
ng
or
m
a
lwa
r
e
s
it
e
s
.
To
c
a
pt
ur
e
m
or
e
gr
a
nul
a
r
c
ha
r
a
c
te
r
is
ti
c
s
,
th
e
s
ys
t
e
m
a
ls
o
ut
il
iz
e
s
n
-
gr
a
m
f
e
a
tu
r
e
s
,
w
hi
c
h
br
e
a
k
dow
n
th
e
U
R
L
in
to
s
e
que
nc
e
s
of
c
ha
r
a
c
te
r
s
.
T
he
s
e
n
-
gr
a
m
f
e
a
tu
r
e
s
a
r
e
r
e
pr
e
s
e
nt
e
d
u
s
in
g
W
or
d2V
e
c
e
m
be
ddi
ngs
,
a
n
ML
te
c
hni
que
th
a
t
tr
a
ns
f
or
m
s
c
ha
r
a
c
te
r
s
e
que
n
c
e
s
in
to
de
n
s
e
ve
c
to
r
s
.
T
hi
s
a
ppr
oa
c
h
c
a
pt
ur
e
s
th
e
s
e
m
a
nt
ic
r
e
la
ti
ons
hi
p
s
be
twe
e
n
di
f
f
e
r
e
nt
c
ha
r
a
c
te
r
s
e
que
nc
e
s
,
e
n
a
bl
in
g
th
e
s
y
s
te
m
to
de
te
c
t
s
ubt
le
p
a
tt
e
r
ns
in
m
a
li
c
io
us
U
R
L
s
th
a
t
m
ig
ht
be
m
is
s
e
d
by
c
onve
nt
io
na
l
m
e
th
ods
[
7]
.
T
he
s
ys
te
m
a
c
h
ie
ve
s
im
pr
ove
d
c
la
s
s
if
ic
a
ti
on
pe
r
f
or
m
a
nc
e
by
in
te
gr
a
ti
ng
le
xi
c
a
l
a
nd
n
-
gr
a
m
f
e
a
tu
r
e
s
c
om
pa
r
e
d
to
e
xi
s
ti
ng
ML
-
ba
s
e
d
m
e
th
ods
.
T
he
c
ont
r
ib
ut
io
ns
in
th
is
pa
pe
r
a
r
e
:
i)
ut
il
iz
e
d
a
ba
la
nc
e
d
d
a
ta
s
e
t
f
r
om
a
c
om
m
on
r
e
pos
it
or
y
to
e
ns
ur
e
r
e
li
a
bl
e
e
va
lu
a
ti
on
;
ii
)
e
xt
r
a
c
te
d
m
or
e
s
e
n
s
it
iv
e
le
xi
c
a
l
f
e
a
tu
r
e
s
f
r
om
th
e
U
R
L
da
ta
s
e
t
to
e
nha
nc
e
de
te
c
ti
on
a
c
c
ur
a
c
y
;
ii
i)
e
m
pl
oye
d
n
-
g
r
a
m
f
e
a
tu
r
e
s
w
it
h
W
or
d2V
e
c
e
m
be
ddi
ngs
to
id
e
nt
if
y
f
in
e
r
,
hi
dde
n
pa
tt
e
r
ns
w
it
hi
n
U
R
L
s
;
a
nd
iv
)
in
te
gr
a
te
d
l
e
xi
c
a
l
a
nd
n
-
gr
a
m
f
e
a
tu
r
e
s
(
hybr
id
f
e
a
tu
r
e
)
,
le
a
di
ng
to
s
upe
r
io
r
c
la
s
s
if
ic
a
ti
on
pe
r
f
or
m
a
nc
e
c
om
pa
r
e
d
to
e
xi
s
ti
ng
ML
-
ba
s
e
d
m
e
t
hods
.
T
h
e
r
e
m
a
in
de
r
of
th
e
p
a
pe
r
is
or
ga
ni
z
e
d
as
f
ol
lo
w
s
:
s
e
c
ti
on
1
out
li
ne
s
th
e
s
ig
ni
f
ic
a
nc
e
of
th
e
pr
obl
e
m
a
nd
pr
e
s
e
nt
s
an
ov
e
r
vi
e
w
of
th
e
pr
opos
e
d
m
e
th
od.
S
e
c
ti
on
2
r
e
vi
e
w
s
e
xi
s
ti
ng
r
e
s
e
a
r
c
h
in
th
e
pr
obl
e
m
do
m
a
in
.
S
e
c
ti
on
3
de
ta
il
s
th
e
pr
opos
e
d
m
e
th
od,
w
hi
le
s
e
c
ti
on
4
pr
e
s
e
nt
s
th
e
e
xpe
r
im
e
nt
a
l
r
e
s
ul
t
s
.
F
in
a
ll
y,
s
e
c
ti
o
n
5
c
onc
lu
de
s
th
e
pa
p
e
r
.
2.
R
E
L
A
T
E
D
WO
R
K
S
T
hi
s
s
e
c
ti
on
p
r
e
s
e
n
ts
an
ove
r
v
ie
w
of
r
e
c
e
nt
r
e
s
e
a
r
c
h
in
t
he
pr
obl
e
m
do
m
a
i
n,
f
oc
us
in
g
on
U
R
L
f
e
a
tu
r
e
a
n
a
ly
s
is
.
J
os
hi
et
al
.
[
8]
de
ve
lo
pe
d
a
m
a
l
ic
io
us
U
R
L
de
te
c
ti
on
m
e
t
ho
d
th
a
t
s
ig
n
i
f
ic
a
nt
ly
e
nha
nc
e
s
m
a
l
ic
i
ous
U
R
L
de
te
c
ti
on.
To
d
is
t
in
gu
is
h
be
twe
e
n
da
nge
r
o
us
a
nd
be
ni
gn
U
R
L
s
,
t
he
a
u
th
or
s
g
a
th
e
r
e
d
ov
e
r
5
m
i
ll
io
n
U
R
L
s
f
r
o
m
va
r
i
ous
s
ou
r
c
e
s
a
nd
i
de
n
ti
f
ie
d
23
di
s
ti
n
c
t
l
e
xi
c
a
l
f
e
a
tu
r
e
s
.
T
he
s
e
le
x
ic
a
l
f
e
a
tu
r
e
s
w
e
r
e
m
e
r
g
e
d
w
i
th
1,
000
t
r
ig
r
a
m
-
ba
s
e
d
f
e
a
t
ur
e
s
to
g
e
ne
r
a
te
1,0
23
-
d
im
e
ns
i
ona
l
ve
c
t
o
r
s
.
A
m
ong
t
he
c
la
s
s
i
f
ie
r
s
,
th
e
r
a
nd
om
f
o
r
e
s
t
(
RF
)
m
ode
l
s
c
o
r
e
d
th
e
hi
ghe
s
t
a
c
c
u
r
a
c
y
(
92
%
)
,
in
di
c
a
ti
ng
t
ha
t
it
is
t
he
m
os
t
e
f
f
e
c
t
iv
e
c
la
s
s
i
f
ic
a
ti
o
n
m
e
th
od.
R
a
ja
et
al
.
[
9
]
us
e
d
th
e
U
ni
ve
r
s
i
ty
o
f
N
e
w
B
r
u
ns
w
i
c
k
(
UNB
)
20
16
da
ta
s
e
t
f
o
r
f
e
a
t
ur
e
e
xt
r
a
c
ti
on,
id
e
nt
if
yi
ng
27
f
e
a
tu
r
e
s
(9
ne
w
,
18
s
ta
n
da
r
d)
.
A
f
te
r
c
o
r
r
e
la
ti
on
a
na
l
ys
is
,
20
f
e
a
t
ur
e
s
w
e
r
e
s
e
le
c
t
e
d
f
o
r
te
s
t
in
g
c
la
s
s
if
ie
r
s
l
ik
e
R
F
,
k
-
ne
a
r
e
s
t
ne
ig
h
bo
r
(
K
N
N
)
,
s
up
po
r
t
ve
c
t
o
r
c
la
s
s
i
f
ie
r
(
S
V
C
)
,
lo
g
is
t
ic
r
e
g
r
e
s
s
i
on
(
LR
)
,
a
n
d
n
a
ïv
e
B
a
ye
s
(
NB
)
.
RF
a
c
h
ie
ve
d
99
%
a
c
c
ur
a
c
y.
H
a
et
al
.
[
1
0]
de
ve
lo
pe
d
a
s
ys
te
m
to
de
te
c
t
f
r
a
u
du
le
n
t
w
e
bs
it
e
s
th
a
t
us
e
ML
m
e
th
ods
s
u
c
h
as
R
F
,
de
c
is
io
n
t
r
e
e
(
DT
)
,
A
da
B
o
os
t
,
a
nd
KNN.
R
e
s
e
a
r
c
he
r
s
us
e
d
a
da
ta
s
e
t
of
2
13,
34
5
U
R
L
s
d
iv
i
de
d
i
nt
o
f
i
ve
c
a
te
go
r
i
e
s
(
be
ni
gn,
de
f
a
c
e
m
e
nt
,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8814
I
nt
J
A
dv A
ppl
S
c
i
,
V
ol
.
14
, N
o.
3
,
S
e
pt
e
m
be
r
20
25
:
916
-
927
918
phi
s
hi
ng,
m
a
lw
a
r
e
,
a
nd
s
pa
m
)
a
nd
20
e
x
t
r
a
c
t
e
d
f
e
a
tu
r
e
s
.
T
h
e
RF
a
l
go
r
i
th
m
a
c
h
ie
ve
d
th
e
h
ig
he
s
t
a
c
c
ur
a
c
y
at
9
5.
68
%
in
de
te
c
t
in
g
da
ng
e
r
ous
w
e
bs
it
e
s
.
A
f
z
a
l
et
al
.
[
11
]
p
r
op
os
e
d
a
s
ys
te
m
us
in
g
a
r
ti
f
ic
ia
l
ne
u
r
a
l
ne
two
r
k
(
A
N
N
)
a
nd
b
id
i
r
e
c
ti
ona
l
e
nc
o
de
r
r
e
p
r
e
s
e
n
ta
ti
ons
f
r
o
m
tr
a
ns
f
o
r
m
e
r
s
(
B
E
R
T
)
to
e
xt
r
a
c
t
c
o
nt
e
xt
ua
l
e
m
be
ddi
ngs
f
r
o
m
U
R
L
s
,
pr
ot
e
c
ti
ng
us
e
r
s
f
r
o
m
m
a
l
ic
io
us
w
e
bs
i
te
s
by
c
a
t
e
go
r
iz
i
ng
U
R
L
s
in
to
s
pa
m
,
ph
is
h
in
g,
m
a
lwa
r
e
,
de
f
a
c
e
m
e
nt
,
a
nd
be
n
ig
n
.
T
he
y
us
e
d
K
a
gg
le
a
nd
UNB
da
ta
s
e
ts
w
i
th
17
2,0
00
U
R
L
s
f
or
t
r
a
i
ni
ng
a
nd
c
la
s
s
i
f
ic
a
ti
on
.
T
he
m
ode
l
u
ti
li
z
e
d
B
E
R
T
'
s
t
oke
n
iz
a
ti
on
a
nd
a
12
-
la
ye
r
t
r
a
ns
f
or
m
e
r
e
nc
ode
r
,
a
c
h
ie
v
in
g
a
h
ig
h
a
c
c
u
r
a
c
y
of
98%
a
nd
o
ut
pe
r
f
or
m
i
ng
tr
a
di
ti
ona
l
e
m
be
d
di
ng
m
e
t
hods
li
ke
W
o
r
d
2
V
ec,
F
a
s
tT
e
x
t,
a
n
d
G
lo
V
e
.
T
he
ir
a
pp
r
oa
c
h
e
xc
e
l
le
d
in
p
r
e
c
is
i
on,
r
e
c
a
ll
,
a
nd
F
-
m
e
a
s
u
r
e
,
a
l
l
at
98
%
.
K
u
m
i
et
al
.
[
1
2]
p
r
o
pos
e
d
a
c
la
s
s
if
ic
a
ti
on
-
ba
s
e
d
on
a
s
s
oc
ia
t
io
n
(
C
B
A
)
te
c
hn
iq
ue
f
o
r
de
te
c
ti
ng
r
is
k
y
U
R
L
s
by
a
s
s
e
s
s
in
g
U
R
L
f
e
a
t
u
r
e
s
a
nd
w
e
bpa
ge
c
on
te
n
t.
CBA
c
ons
t
r
uc
ts
an
a
c
c
u
r
a
te
c
la
s
s
if
ie
r
us
in
g
a
s
s
oc
i
a
t
io
n
r
ul
e
s
f
r
om
a
t
r
a
i
ni
ng
d
a
ta
s
e
t,
a
c
hi
e
v
in
g
95.
8
%
a
c
c
u
r
a
c
y
w
it
h
m
in
im
a
l
f
a
ls
e
p
os
i
ti
ve
a
n
d
ne
g
a
t
iv
e
r
a
te
s
.
It
de
m
o
ns
t
r
a
te
d
s
t
r
ong
r
e
li
a
bi
li
ty
w
he
n
t
e
s
te
d
a
ga
i
ns
t
120
0
ta
gge
d
U
R
L
s
.
Al
-
H
a
ij
a
a
n
d
A
l
-
F
a
y
ou
m
i
[
1
3
]
p
r
opos
e
d
a
n
ML
-
ba
s
e
d
a
pp
r
oa
c
h
f
o
r
de
te
c
ti
ng
m
a
l
ic
i
ous
U
R
L
s
us
i
ng
th
e
I
S
C
X
-
U
R
L
2
01
6
da
ta
s
e
t
of
57
,0
0
0
s
a
m
p
le
s
,
in
c
l
udi
ng
bi
na
r
y
a
nd
m
ul
ti
-
c
la
s
s
la
be
ls
a
n
d
79
f
e
a
tu
r
e
s
.
T
he
i
r
m
e
th
od
f
i
r
s
t
di
f
f
e
r
e
n
ti
a
te
s
b
e
tw
e
e
n
be
n
ig
n
a
n
d
ha
r
m
f
u
l
U
R
L
s
a
nd
t
he
n
s
pl
it
s
m
a
l
ic
i
ous
U
R
L
s
in
to
f
iv
e
c
a
te
g
o
r
ie
s
:
de
f
a
c
e
m
e
nt
,
m
a
lwa
r
e
,
phi
s
hi
ng,
s
pa
m
,
a
n
d
be
n
ig
n.
T
he
r
e
s
e
a
r
c
he
r
us
e
d
f
ou
r
e
ns
e
m
bl
e
l
e
a
r
ni
ng
a
p
pr
oa
c
h
e
s
,
w
i
th
ba
gg
in
g
t
r
e
e
s
a
c
h
ie
v
in
g
th
e
h
ig
he
s
t
a
c
c
u
r
a
c
y
in
b
in
a
r
y
(
9
9.3
%
)
a
nd
m
u
lt
i
-
c
l
a
s
s
(
97
.9
2%
)
c
la
s
s
if
ic
a
ti
ons
.
R
a
ja
et
al
.
[
14]
pr
opos
e
d
a
m
e
th
od
f
or
de
te
c
ti
ng
bogu
s
li
nk
s
by
a
na
ly
z
in
g
li
ngui
s
ti
c
f
e
a
tu
r
e
s
of
U
R
L
s
.
T
hr
e
e
n
a
tu
r
a
l
la
ngua
ge
pr
oc
e
s
s
in
g
(
N
L
P
)
-
ba
s
e
d
ve
c
to
r
i
z
e
r
s
a
r
e
te
s
te
d
w
it
h
s
ix
di
s
ti
nc
t
ML
m
e
th
ods
.
R
e
s
ul
ts
de
m
ons
tr
a
te
th
a
t
th
e
pr
opos
e
d
m
e
th
od
w
it
h
c
ount
ve
c
t
or
iz
e
r
+
R
F
a
lg
or
it
hm
de
li
ve
r
s
hi
ghe
r
a
c
c
ur
a
c
y
(
92.49%
)
.
L
e
e
et
al
.
[
15]
de
m
ons
tr
a
te
d
th
a
t
pa
r
ti
c
le
s
w
a
r
m
opt
im
iz
a
ti
on
(
P
S
O
)
im
pr
ove
s
U
R
L
f
e
a
tu
r
e
s
e
le
c
ti
on
f
or
m
a
li
c
io
us
U
R
L
de
te
c
ti
on.
T
he
r
e
s
e
a
r
c
he
r
s
us
e
d
a
s
uppor
t
ve
c
to
r
m
a
c
hi
ne
(
S
V
M
)
a
nd
NB
to
obt
a
in
99%
a
c
c
ur
a
c
y.
R
a
ja
et
al
.
[
16]
de
ve
lo
pe
d
a
m
e
th
od
f
or
ve
c
to
r
iz
in
g
U
R
L
s
f
or
f
e
a
tu
r
e
ge
ne
r
a
ti
on
u
s
in
g
N
L
P
te
c
hni
que
s
s
uc
h
as
w
or
d
f
r
e
que
nc
y
a
nd
in
ve
r
s
e
doc
um
e
nt
f
r
e
que
nc
y.
To
c
la
s
s
if
y
r
is
ky
hype
r
li
nks
,
r
e
s
e
a
r
c
he
r
s
u
s
e
d
a
w
e
ig
ht
e
d
s
of
t
vot
in
g
c
la
s
s
if
ie
r
w
it
h
two
s
te
ps
of
w
e
ig
ht
a
dj
u
s
tm
e
nt
to
in
c
r
e
a
s
e
a
c
c
ur
a
c
y.
T
he
a
ppr
oa
c
h
w
a
s
e
va
lu
a
te
d
on
two
da
ta
s
e
ts
,
D1
a
nd
D
2,
out
pe
r
f
or
m
in
g
ba
s
e
c
la
s
s
if
ie
r
s
.
T
he
a
c
c
ur
a
c
y
obt
a
in
e
d
w
a
s
91.4%
f
or
D1
a
nd
98.8%
f
or
D
2,
pr
ovi
ng
th
a
t
it
is
be
tt
e
r
th
a
n
ot
he
r
m
e
th
ods
.
T
a
bl
e
1
s
um
m
a
r
iz
e
s
th
e
e
xi
s
ti
ng
m
e
th
ods
.
T
he
r
e
la
te
d
w
or
k
hi
ghl
ig
ht
s
th
e
im
por
ta
nc
e
of
us
in
g
a
hybr
id
f
e
a
tu
r
e
a
ppr
oa
c
h
th
a
t
c
om
bi
ne
s
bot
h
le
xi
c
a
l
f
e
a
tu
r
e
s
a
nd
n
-
gr
a
m
f
e
a
tu
r
e
s
of
U
R
L
s
,
e
nha
nc
e
d
by
W
or
d2V
e
c
r
e
pr
e
s
e
nt
a
ti
on.
T
hi
s
r
obus
t
m
e
th
od
h
a
s
pr
ove
n
to
be
e
f
f
e
c
ti
ve
in
c
la
s
s
if
yi
ng
m
a
li
c
io
us
U
R
L
s
.
T
a
bl
e
1
.
S
um
m
a
r
y
of
r
e
la
te
d w
or
ks
A
ut
hor
(
s
)
D
a
t
a
s
e
t
F
e
a
t
ur
e
s
e
t
R
e
m
a
r
ks
J
os
hi
e
t
al
. [
8]
O
pe
nphi
s
h, A
l
e
xa
w
hi
t
e
l
i
s
t
s
, i
nt
e
r
na
l
F
i
r
e
E
ye
s
our
c
e
s
23
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
+
1,000
t
r
i
gr
a
m
-
ba
s
e
d
f
e
a
t
ur
e
s
of
U
R
L
s
T
r
i
gr
a
m
s
a
l
one
m
a
y
not
f
ul
l
y
c
a
pt
ur
e
t
he
d
e
t
a
i
l
e
d
c
ha
r
a
c
t
e
r
i
s
t
i
c
s
of
t
he
U
R
L
s
t
r
i
ng.
M
or
e
ove
r
,
c
ont
e
xt
ua
l
r
e
pr
e
s
e
nt
a
t
i
on i
s
a
l
s
o n
e
c
e
s
s
a
r
y.
R
a
j
a
e
t
al
. [
9]
U
N
B
2016
da
t
a
s
e
t
27
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
T
he
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
a
l
one
m
a
y
not
s
uf
f
i
c
e
f
or
a
r
obus
t
s
ys
t
e
m
.
N
-
gr
a
m
f
e
a
t
ur
e
s
c
a
n
be
us
e
d
t
o r
e
ve
a
l
hi
dde
n pa
t
t
e
r
ns
w
i
t
hi
n t
he
U
R
L
s
H
a
e
t
al
. [
10]
O
pe
nP
hi
s
h P
hi
s
ht
a
nk,
Z
one
-
H
, W
E
B
S
P
A
M
-
U
K
2007.
20
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
T
he
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
a
l
one
m
a
y
not
s
uf
f
i
c
e
f
or
a
r
obus
t
s
ys
t
e
m
.
N
-
gr
a
m
f
e
a
t
ur
e
s
c
a
n
be
us
e
d
t
o r
e
ve
a
l
hi
dde
n pa
t
t
e
r
ns
w
i
t
hi
n t
he
U
R
L
s
A
f
z
a
l
e
t
al
. [
11]
D
a
t
a
s
e
t
f
r
om
K
a
ggl
e
r
e
pos
i
t
or
y
W
or
d
E
m
be
ddi
ng
of
U
R
L
s
T
i
m
e
-
i
nt
e
ns
i
ve
a
nd c
om
put
a
t
i
ona
l
l
y e
xpe
ns
i
ve
K
um
i
e
t
al
. [
12]
O
pe
nP
hi
s
h, V
xV
a
ul
t
,
U
R
L
ha
u
s
L
e
xi
c
a
l
a
nd
ho
s
t
-
ba
s
e
d
f
e
a
t
ur
e
s
of
U
R
L
s
,
w
e
b
c
ont
e
nt
f
e
a
t
ur
e
s
U
R
L
s’
l
i
m
i
t
e
d
l
e
xi
c
a
l
f
e
a
t
ur
e
s
a
r
e
i
ns
uf
f
i
c
i
e
nt
f
or
de
ve
l
opi
ng
a
r
obus
t
s
y
s
t
e
m
.
M
or
e
ove
r
,
w
e
b
c
ont
e
nt
a
na
l
ys
i
s
m
a
y
a
f
f
e
c
t
t
he
pr
oc
e
s
s
i
ng
s
ys
t
e
m
.
Al
-
H
a
i
j
a
a
nd
Al
-
F
a
youm
i
[
13]
I
S
C
X
-
U
R
L
2016
79 U
R
L
f
e
a
t
ur
e
s
U
R
L
s
’
l
i
m
i
t
e
d
f
e
a
t
ur
e
s
a
r
e
i
ns
uf
f
i
c
i
e
nt
f
or
de
ve
l
opi
ng
a
r
obus
t
s
ys
t
e
m
.
N
-
gr
a
m
f
e
a
t
ur
e
s
c
a
n
be
us
e
d t
o r
e
ve
a
l
hi
dde
n pa
t
t
e
r
ns
w
i
t
hi
n t
he
U
R
L
s
R
a
j
a
e
t
al
. [
14]
D
a
t
a
s
e
t
f
r
om
K
a
ggl
e
r
e
pos
i
t
or
y
8
l
e
xi
c
a
l
f
e
a
t
ur
e
s
of
U
R
L
s
a
nd
n
-
gr
a
m
f
e
a
t
ur
e
s
of
U
R
L
s
U
R
L
s’
l
i
m
i
t
e
d
l
e
xi
c
a
l
f
e
a
t
ur
e
s
a
r
e
i
ns
uf
f
i
c
i
e
nt
f
or
de
ve
l
opi
ng
a
r
obus
t
s
ys
t
e
m
.
M
or
e
ove
r
,
n
-
gr
a
m
r
e
qui
r
e
s
c
ont
e
xt
ua
l
r
e
pr
e
s
e
nt
a
t
i
on.
L
e
e
e
t
al
. [
15]
C
ur
a
t
e
d da
t
a
s
e
t
9
f
e
a
t
ur
e
s
i
nc
l
ude
l
e
xi
c
a
l
a
nd
ho
s
t
-
ba
s
e
d
f
e
a
t
ur
e
s
U
R
L
s
’
l
i
m
i
t
e
d
l
e
xi
c
a
l
a
nd
hos
t
-
ba
s
e
d
f
e
a
t
ur
e
s
a
r
e
i
ns
uf
f
i
c
i
e
nt
f
or
de
ve
l
opi
ng a
r
obus
t
s
ys
t
e
m
.
R
a
j
a
e
t
al
. [
16]
I
S
C
X
-
U
R
L
2016, U
N
B
P
hi
s
ht
a
nk
V
e
c
t
or
i
z
e
d
U
R
L
s
T
he
dom
a
i
n
na
m
e
a
l
one
i
s
i
ns
uf
f
i
c
i
e
nt
f
or
e
f
f
e
c
t
i
ve
l
y
c
l
a
s
s
i
f
yi
ng
a
U
R
L
.
F
ur
t
he
r
m
or
e
,
a
c
ont
e
xt
ua
l
r
e
pr
e
s
e
nt
a
t
i
on
of
t
he
U
R
L
i
s
r
e
qui
r
e
d
r
a
t
he
r
t
ha
n
e
xc
l
us
i
ve
l
y
de
pe
ndi
ng
on
ve
c
t
or
i
z
e
d
r
e
pr
e
s
e
nt
a
t
i
ons
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
dv
A
ppl
S
c
i
I
S
S
N
:
2252
-
8814
A
hy
br
id
f
e
at
ur
e
s
ba
s
e
d m
al
e
v
ol
e
nt
dom
ai
n de
te
c
ti
on i
n c
y
be
r
s
pac
e
…
(
Sal
e
e
m
R
aj
a A
bdul
Sam
ad)
919
3.
M
E
T
H
O
D
R
e
c
e
nt
r
e
s
e
a
r
c
h
ha
s
in
c
r
e
a
s
in
gl
y
f
oc
us
e
d
on
th
e
le
xi
c
a
l
a
na
l
ys
is
of
U
R
L
s
du
e
to
th
e
ir
r
is
k
-
f
r
e
e
na
tu
r
e
a
nd
f
a
s
te
r
de
te
c
ti
on
c
a
p
a
bi
li
ti
e
s
.
H
ow
e
ve
r
,
th
is
a
ppr
oa
c
h
ha
s
s
om
e
li
m
it
a
ti
ons
.
F
or
in
s
ta
n
c
e
,
e
nc
ode
d
or
hi
dde
n
c
ha
r
a
c
te
r
s
can
e
va
de
le
xi
c
a
l
a
na
ly
s
is
,
a
nd
le
gi
ti
m
a
te
U
R
L
s
w
it
h
unus
ua
l
p
a
tt
e
r
ns
,
s
u
c
h
as
e
xc
e
s
s
iv
e
ly
lo
ng
U
R
L
s
w
it
h
m
ul
ti
pl
e
pa
r
a
m
e
te
r
s
,
m
ig
ht
be
in
c
or
r
e
c
tl
y
f
la
gge
d
as
m
a
li
c
io
us
.
U
R
L
s
th
a
t
c
lo
s
e
ly
mimic
popula
r
le
gi
ti
m
a
te
s
it
e
s
m
ig
ht
be
m
is
c
la
s
s
if
ie
d
as
s
a
f
e
.
To
a
ddr
e
s
s
th
e
s
e
i
s
s
ue
s
,
th
e
pr
opos
e
d
s
ys
te
m
in
c
or
por
a
te
s
n
-
gr
a
m
a
na
ly
s
is
,
a
m
e
th
od
in
N
L
P
th
a
t
e
x
a
m
in
e
s
c
ont
in
uous
s
e
que
n
c
e
s
of
c
ha
r
a
c
te
r
s
or
w
or
ds
.
T
hi
s
t
e
c
hni
que
he
lp
s
to
id
e
nt
if
y
unus
ua
l
or
s
u
s
pi
c
i
ous
pa
tt
e
r
ns
w
it
hi
n
U
R
L
s
m
or
e
e
f
f
e
c
ti
ve
ly
.
C
om
bi
ni
ng
le
xi
c
a
l
a
n
a
ly
s
is
a
nd
n
-
gr
a
m
a
na
ly
s
is
m
a
k
e
s
th
e
s
ys
te
m
m
or
e
r
obus
t
a
nd
pe
r
f
or
m
s
be
tt
e
r
th
a
n
e
xi
s
ti
ng
m
e
th
ods
.
F
ig
ur
e
1
s
how
s
th
e
a
r
c
hi
te
c
tu
r
e
of
th
e
pr
opos
e
d
s
ys
te
m
.
F
ig
ur
e
1.
A
r
c
hi
te
c
tu
r
e
of
th
e
pr
opos
e
d m
e
th
od
3
.1
.
D
at
as
e
t
T
he
pr
opos
e
d
m
e
th
od
ut
il
iz
e
s
th
e
f
ul
l
pot
e
nt
ia
l
of
th
e
U
R
L
da
ta
s
e
t.
T
a
bl
e
s
2
a
nd
3
pr
e
s
e
nt
a
s
um
m
a
r
y
of
th
e
da
ta
s
e
t,
w
hi
c
h
in
c
lu
de
s
be
ni
gn
a
nd
m
a
li
c
io
u
s
U
R
L
s
,
in
c
lu
di
ng
de
f
a
c
e
m
e
nt
,
phi
s
hi
ng,
a
nd
m
a
lwa
r
e
U
R
L
s
.
T
h
e
da
ta
s
e
t
us
e
d
f
or
th
e
e
xpe
r
im
e
nt
wa
s
c
ol
le
c
te
d
f
r
om
K
a
ggl
e
a
nd
P
hi
s
hT
a
nk.
T
a
bl
e
2
. D
a
ta
s
e
t
de
ta
il
s
D
a
t
a
s
e
t
U
R
L
s
K
a
ggl
e
[
17]
B
e
ni
gn a
nd m
a
l
i
c
i
ous
(
phi
s
hi
ng, m
a
l
w
a
r
e
, a
nd d
e
f
a
c
e
m
e
nt
U
R
L
s
)
P
hi
s
ht
a
nk
[
18]
M
a
l
i
c
i
ous
(
phi
s
hi
ng)
U
R
L
s
T
a
bl
e
3
. D
a
ta
s
e
t
s
um
m
a
r
y
T
ype
C
ount
B
e
ni
gn
15,530
M
a
l
i
c
i
ous
15,882
3
.2
.
F
e
at
u
r
e
e
xt
r
ac
t
io
n
F
e
a
tu
r
e
e
xt
r
a
c
ti
on
f
r
om
U
R
L
s
in
vol
ve
s
a
na
ly
z
in
g
th
e
s
tr
uc
tu
r
e
of
e
a
c
h
U
R
L
to
id
e
nt
if
y
im
por
ta
nt
c
ha
r
a
c
te
r
is
ti
c
s
th
a
t
c
a
n
r
e
ve
a
l
it
s
unde
r
ly
in
g
f
or
m
a
t
a
nd
in
te
nt
.
T
hi
s
pr
oc
e
s
s
he
lp
s
in
di
s
ti
ngui
s
hi
ng
be
twe
e
n
be
ni
gn
a
nd
m
a
li
c
io
us
U
R
L
s
ba
s
e
d
on
s
pe
c
if
ic
pa
tt
e
r
n
s
a
nd
to
ke
ns
pr
e
s
e
nt
w
it
hi
n
th
e
U
R
L
s
tr
in
g.
T
a
bl
e
4
pr
ovi
de
s
a
de
ta
il
e
d
li
s
t
of
th
e
le
xi
c
a
l
f
e
a
tu
r
e
s
e
xt
r
a
c
te
d
f
r
om
th
e
U
R
L
da
ta
s
e
t,
w
hi
c
h
w
e
r
e
s
ubs
e
que
nt
ly
s
to
r
e
d i
n a
f
il
e
f
or
f
ur
th
e
r
a
na
ly
s
is
a
nd mode
l
tr
a
in
in
g.
3
.2.1. S
u
b
d
om
ai
n
C
ybe
r
c
r
im
in
a
ls
of
te
n
a
dd
m
ul
ti
pl
e
s
ubdoma
in
s
to
c
r
e
a
te
c
om
pl
e
x
U
R
L
s
tr
uc
tu
r
e
s
[
19]
,
[
20]
th
a
t
m
im
ic
le
gi
ti
m
a
te
w
e
bs
it
e
s
,
d
e
c
e
iv
in
g
us
e
r
s
a
nd
s
e
c
ur
it
y
s
ys
t
e
m
s
.
F
or
e
xa
m
pl
e
,
th
e
ht
tp
:
//
lo
gi
n.x
bank
.c
om
.aut
h.phis
hm
al
.c
om
m
a
y
a
ppe
a
r
le
gi
ti
m
a
te
a
t
f
ir
s
t
gl
a
nc
e
,
de
c
e
iv
in
g
us
e
r
s
in
to
be
li
e
vi
ng t
he
y a
r
e
vi
s
it
in
g a
l
e
gi
ti
m
a
te
w
e
b
s
it
e
. H
ow
e
ve
r
, i
t'
s
a
m
a
li
c
io
us
w
e
bs
it
e
.
3
.2.2. P
u
n
yc
od
e
P
unyc
ode
is
a
w
a
y
to
r
e
pr
e
s
e
nt
U
ni
c
ode
c
ha
r
a
c
t
e
r
s
in
a
n
A
m
e
r
ic
a
n
s
ta
nda
r
d
c
ode
f
or
in
f
or
m
a
ti
on
in
te
r
c
ha
nge
(
A
S
C
I
I
)
-
c
om
pa
ti
bl
e
f
or
m
a
t,
a
ll
ow
in
g
non
-
A
S
C
I
I
c
ha
r
a
c
te
r
s
to
be
us
e
d
in
dom
a
in
na
m
e
s
[
21]
.
T
hi
s
e
nc
odi
ng
is
e
s
s
e
nt
ia
l
f
or
s
uppor
ti
ng
in
te
r
na
ti
ona
li
z
e
d
do
m
a
in
na
m
e
s
(
I
D
N
s
)
,
w
hi
c
h
a
ll
ow
us
e
r
s
a
r
ound
th
e
w
or
ld
to
us
e
na
ti
ve
s
c
r
ip
ts
in
w
e
b
a
ddr
e
s
s
e
s
.
F
or
in
s
ta
nc
e
,
th
e
P
unyc
ode
ve
r
s
io
n
of
“
a
pp
s
dom
a
in
.c
om
”
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8814
I
nt
J
A
dv A
ppl
S
c
i
,
V
ol
.
14
, N
o.
3
,
S
e
pt
e
m
be
r
20
25
:
916
-
927
920
c
oul
d
be
“
xn
--
pps
dom
a
in
-
9w
b.c
om
”
,
w
he
r
e
a
C
yr
il
li
c
“
a
”
r
e
p
la
c
e
s
th
e
s
ta
nda
r
d
L
a
ti
n
“
a
”
,
il
lu
s
tr
a
ti
ng
how
de
c
e
pt
iv
e
l
ooka
li
ke
doma
in
s
c
a
n be
c
r
e
a
te
d.
3
.2.3. UR
L
r
e
d
ir
e
c
t
io
n
U
R
L
r
e
di
r
e
c
ti
on
is
a
te
c
hni
que
th
a
t
r
e
di
r
e
c
ts
us
e
r
s
a
nd
s
e
a
r
c
h
e
ngi
ne
s
f
r
om
one
U
R
L
to
a
not
he
r
,
di
r
e
c
ti
ng
tr
a
f
f
ic
to
a
s
pe
c
if
ie
d
w
e
b
pa
ge
[
22]
.
H
ow
e
ve
r
,
it
c
a
n
a
ls
o
be
us
e
d
f
or
m
a
li
c
io
us
pur
pos
e
s
li
ke
phi
s
hi
ng or
m
a
lwa
r
e
di
s
tr
ib
ut
io
n. C
ybe
r
c
r
im
in
a
ls
of
te
n us
e
U
R
L
r
e
di
r
e
c
ti
on t
o
hi
de
m
a
li
c
io
us
l
in
ks
, di
r
e
c
ti
ng
us
e
r
s
to
bogus
w
e
bs
it
e
s
[
23]
th
a
t
a
ppe
a
r
le
gi
ti
m
a
te
.
S
om
e
k
e
yw
or
ds
a
r
e
m
os
t
c
om
m
onl
y
u
s
e
d
f
or
U
R
L
r
e
di
r
e
c
ti
on, s
uc
h a
s
r
e
di
r
e
c
t,
t
a
r
ge
t,
a
nd ne
xt
.
3
.2.4. S
u
s
p
ic
io
u
s
k
e
yw
or
d
s
W
he
n
a
na
ly
z
in
g
pot
e
nt
ia
ll
y
ha
r
m
f
ul
U
R
L
s
,
id
e
nt
if
yi
ng
s
us
pi
c
io
us
ke
yw
or
ds
is
e
s
s
e
nt
ia
l,
a
s
th
e
s
e
te
r
m
s
of
te
n
s
ig
na
l
m
a
lwa
r
e
or
ph
is
hi
ng
th
r
e
a
ts
[
24]
.
S
uc
h
k
e
yw
or
ds
ty
pi
c
a
ll
y
in
c
lu
de
te
r
m
s
li
ke
“
lo
gi
n,”
“
ve
r
if
y,”
“
upda
te
,”
or
br
a
nd
na
m
e
s
c
om
m
onl
y
e
xpl
oi
te
d
in
s
oc
ia
l
e
ngi
ne
e
r
in
g
a
tt
a
c
ks
.
I
n
th
is
e
xpe
r
im
e
nt
,
a
to
ta
l
of
63 ke
yw
or
ds
w
e
r
e
i
de
nt
if
ie
d a
nd us
e
d a
s
i
ndi
c
a
to
r
s
of
m
a
li
c
io
us
i
nt
e
nt
t
o e
nha
nc
e
de
t
e
c
ti
on a
c
c
ur
a
c
y.
3
.2.5. UR
L
e
n
t
r
op
y
I
n
in
f
or
m
a
ti
on
th
e
or
y,
e
nt
r
opy
m
e
a
s
ur
e
s
unc
e
r
ta
in
ty
or
r
a
ndo
m
ne
s
s
in
a
s
ys
te
m
.
I
n
th
e
c
ont
e
xt
of
U
R
L
s
,
e
nt
r
opy
c
a
n
be
u
s
e
d
to
m
e
a
s
ur
e
th
e
ir
c
om
pl
e
xi
ty
a
n
d
pos
s
ib
le
m
a
li
c
io
u
s
ne
s
s
by
e
xa
m
in
in
g
th
e
ir
le
ngt
h,
c
ha
r
a
c
te
r
di
ve
r
s
it
y,
a
nd
ove
r
a
ll
s
tr
uc
tu
r
e
[
25]
.
H
ig
h
U
R
L
e
nt
r
opy
in
di
c
a
te
s
th
a
t
th
e
U
R
L
is
m
or
e
c
om
pl
e
x a
nd pote
nt
ia
ll
y obf
us
c
a
te
d.
T
a
bl
e
4
.
U
R
L
f
e
a
tu
r
e
s
F
e
a
t
ur
e
D
e
s
c
r
i
pt
i
on
ur
l
_l
e
n
L
e
ngt
h
of
t
he
U
R
L
ha
s
_s
s
l
C
he
c
k
t
he
SSL
c
e
r
t
i
f
i
c
a
t
e
f
or
t
he
U
R
L
pr
e
s
_s
ubdom
C
he
c
k
t
he
pr
e
s
e
n
c
e
of
a
s
ubd
om
a
i
n
in
t
he
U
R
L
a
ge
_dom
A
ge
of
t
he
dom
a
i
n
ur
l
_e
nt
r
opy
E
nt
r
opy
of
t
he
U
R
L
pr
e
s
_punyc
ode
P
r
e
s
e
nc
e
of
P
unyc
ode
in
t
he
U
R
L
pr
e
s
_s
hor
t
_ur
l
P
r
e
s
e
nc
e
of
t
he
s
hor
t
U
R
L
i
s
_pa
t
h_m
a
ni
pul
a
t
i
on
C
he
c
k
t
he
pa
t
h
m
a
ni
pul
a
t
i
on
ha
s
_r
e
di
r
e
c
t
C
he
c
ks
t
he
pr
e
s
e
nc
e
of
t
he
U
R
L
-
r
e
di
r
e
c
t
i
on
ke
yw
or
ds
in
t
he
U
R
L
ha
s
_s
u
s
p_ke
yw
or
d
C
he
c
ks
t
he
pr
e
s
e
nc
e
of
s
u
s
pi
c
i
ous
ke
y
w
or
ds
in
t
he
U
R
L
no_dom
_s
e
gm
e
nt
N
um
be
r
of
s
ub
-
dom
a
i
ns
s
e
gm
e
nt
s
a
vg_s
e
g_l
e
n
A
ve
r
a
ge
l
e
ngt
h
of
t
he
s
ubdom
a
i
n
s
e
gm
e
nt
s
pa
t
h_e
nt
r
opy
E
nt
r
opy
of
t
he
U
R
L
pa
t
h
pa
t
h_l
e
n
L
e
ngt
h
of
t
he
pa
t
h
pa
t
h_s
pl
c
hr
_c
ount
C
ount
t
he
s
pe
c
i
a
l
c
h
a
r
a
c
t
e
r
s
in
t
he
pa
t
h
dom
_num
e
r
i
c
Is
t
he
dom
a
i
n
num
e
r
i
c
s
ym
_dot
s
C
ount
dot
s
ym
bol
s
in
t
he
U
R
L
s
ym
_s
l
a
s
h
C
ount
s
l
a
s
h
s
ym
bol
s
in
t
he
U
R
L
s
ym
_hyphe
n
C
ount
hyphe
n
s
ym
bol
s
in
t
he
U
R
L
s
ym
_ha
s
h
C
ount
ha
s
h
s
ym
bol
s
in
t
he
U
R
L
s
ym
_s
e
m
i
c
ol
C
ount
s
e
m
i
c
ol
on
s
ym
bol
s
in
t
he
U
R
L
s
ym
_a
nd
C
ount
a
nd
s
ym
bol
s
in
t
he
U
R
L
s
ym
_unde
r
s
c
r
C
ount
unde
r
s
c
or
e
s
ym
bol
s
in
t
he
U
R
L
c
ount
_a
l
pha
C
ount
t
he
a
l
pha
be
t
s
in
t
he
U
R
L
c
ount
_num
C
ount
num
be
r
s
in
t
he
U
R
L
ur
l
_t
ype
U
R
L
t
ype
(0
-
B
e
ni
gn,
1
-
M
a
l
i
c
i
ous
)
3
.3
.
F
e
at
u
r
e
s
e
le
c
t
io
n
T
he
ne
xt
s
te
p
of
f
e
a
tu
r
e
e
xt
r
a
c
ti
on
is
f
e
a
tu
r
e
s
e
le
c
ti
on.
In
t
he
da
ta
pr
e
pr
oc
e
s
s
in
g
s
t
a
ge
of
ML
,
f
e
a
tu
r
e
s
e
le
c
ti
on
is
an
e
s
s
e
nt
ia
l
s
te
p
th
a
t
r
e
duc
e
s
di
m
e
ns
io
na
li
t
y
a
nd
c
hoos
e
s
th
e
m
os
t
im
por
ta
nt
f
e
a
tu
r
e
s
to
im
pr
ove
m
ode
l
pe
r
f
o
r
m
a
nc
e
.
T
hi
s
pr
oc
e
dur
e
he
lp
s
r
e
m
ove
ir
r
e
le
va
nt
or
r
e
dunda
nt
f
e
a
tu
r
e
s
,
le
a
di
ng
to
ove
r
f
it
ti
ng
a
nd
in
c
r
e
a
s
e
d
c
om
put
in
g
c
os
ts
[
26]
.
U
s
in
g
s
t
a
ti
s
ti
c
a
l
te
s
ti
ng,
th
e
S
e
le
c
tKB
e
s
t
te
c
hni
que
is
u
s
e
d
f
or
th
e
e
xpe
r
im
e
nt
to
s
e
le
c
t
th
e
f
e
a
tu
r
e
s
w
it
h
th
e
s
tr
onge
s
t
c
or
r
e
la
ti
on
to
th
e
ta
r
ge
t
va
r
ia
bl
e
.
T
he
a
na
ly
s
is
of
va
r
ia
nc
e
(
ANOVA
)
te
s
t
de
te
r
m
in
e
s
w
he
th
e
r
th
e
r
e
a
r
e
s
ig
ni
f
ic
a
nt
di
f
f
e
r
e
nc
e
s
in
th
e
m
e
a
ns
of
di
f
f
e
r
e
nt
gr
oups
ba
s
e
d
on
th
e
f
e
a
tu
r
e
s
,
a
ll
ow
in
g
S
e
le
c
tKB
e
s
t
to
r
a
nk
f
e
a
tu
r
e
s
in
or
de
r
of
s
ig
ni
f
ic
a
nc
e
.
3
.4
.
B
as
e
l
e
ar
n
e
r
s
A
f
te
r
f
e
a
tu
r
e
s
e
le
c
ti
on,
ML
m
ode
l
s
w
e
r
e
tr
a
in
e
d.
E
ig
ht
c
la
s
s
if
ic
a
ti
on
m
ode
ls
w
e
r
e
u
s
e
d
f
or
th
e
e
xpe
r
im
e
nt
:
S
V
C
,
LR
,
G
a
us
s
ia
n
N
a
ïv
e
B
a
ye
s
(
G
N
B
)
,
KNN,
D
T
,
R
F
,
gr
a
di
e
nt
boos
ti
ng
(
GB
)
,
a
nd
e
xt
r
e
m
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
dv
A
ppl
S
c
i
I
S
S
N
:
2252
-
8814
A
hy
br
id
f
e
at
ur
e
s
ba
s
e
d m
al
e
v
ol
e
nt
dom
ai
n de
te
c
ti
on i
n c
y
be
r
s
pac
e
…
(
Sal
e
e
m
R
aj
a A
bdul
Sam
ad)
921
gr
a
di
e
nt
boos
ti
ng
(
X
G
B
)
[
16]
.
K
-
f
ol
d
va
li
da
ti
on
w
a
s
u
s
e
d
to
m
e
a
s
ur
e
th
e
ML
m
od
e
l'
s
pe
r
f
or
m
a
nc
e
m
or
e
a
c
c
ur
a
te
ly
.
G
e
ne
r
a
ll
y,
a
K
-
f
ol
d
di
vi
de
s
a
da
ta
s
e
t
in
to
K
e
qua
ll
y
-
s
iz
e
d
f
ol
ds
or
s
ubs
e
t
s
.
T
h
e
va
lu
e
of
K
f
or
our
e
xpe
r
im
e
nt
is
10.
O
ne
-
f
ol
d
is
ut
il
iz
e
d
as
th
e
va
li
da
ti
on
s
e
t,
a
nd
th
e
r
e
m
a
in
in
g
K
-
1
f
ol
ds
a
r
e
u
s
e
d
f
or
tr
a
in
in
g
each
it
e
r
a
ti
on.
T
hi
s
m
e
th
od
e
n
s
ur
e
s
th
a
t
e
ve
r
y
da
ta
poi
nt
is
u
s
e
d
f
or
tr
a
in
in
g
a
nd
va
li
da
ti
on,
r
e
duc
in
g
bi
a
s
a
nd
de
li
ve
r
in
g
a
m
or
e
a
c
c
ur
a
te
e
va
lu
a
ti
on
of
th
e
m
ode
l'
s
pe
r
f
or
m
a
n
c
e
[
27]
.
T
he
m
ode
l’
s
pr
e
di
c
ti
ons
a
r
e
pa
s
s
e
d
to
a
m
e
ta
-
le
a
r
ne
r
dur
in
g
va
li
da
ti
on
to
m
a
ke
th
e
f
in
a
l
de
c
is
io
n.
3
.5
.
P
r
e
p
r
oc
e
s
s
In
pha
s
e
2,
th
e
s
a
m
e
da
ta
s
e
t
is
us
e
d
f
or
n
-
g
r
a
m
ge
ne
r
a
ti
on.
H
o
w
e
ve
r
,
da
ta
pr
e
pr
oc
e
s
s
in
g
is
r
e
qui
r
e
d
be
f
or
e
ge
ne
r
a
ti
ng
n
-
gr
a
m
s
[
28
]
.
T
hi
s
in
c
lu
de
s
c
onve
r
ti
ng
a
ll
U
R
L
s
to
lo
w
e
r
c
a
s
e
a
nd
r
e
m
ovi
ng
ir
r
e
le
va
nt
c
ha
r
a
c
te
r
s
,
s
u
c
h
as
s
pe
c
ia
l
s
ym
bol
s
.
F
ur
th
e
r
m
or
e
,
f
r
e
que
nt
te
r
m
s
s
uc
h
as
"
ht
tp
s
,"
"
ht
tp
,"
"
f
tp
,"
a
nd
"
w
w
w
"
a
r
e
s
tr
ip
pe
d
a
w
a
y
to
pr
ovi
de
c
le
a
ne
r
a
nd
m
or
e
in
f
or
m
a
ti
ve
n
-
gr
a
m
a
na
ly
s
is
.
3
.6
.
N
-
gr
am
g
e
n
e
r
at
io
n
To
ge
ne
r
a
te
"n
-
gr
a
m
s
"
f
r
o
m
U
R
L
s
,
te
xt
da
ta
is
s
tr
ip
pe
d
in
to
"n
-
gr
a
m
s
,"
w
hi
c
h
a
r
e
c
ol
le
c
ti
ons
of
a
dj
a
c
e
nt
c
ha
r
a
c
te
r
s
of
a
pr
e
de
t
e
r
m
in
e
d
le
ngt
h.
To
c
r
e
a
te
n
-
g
r
a
m
s
f
or
th
e
e
xpe
r
im
e
nt
,
s
e
que
n
c
e
s
of
3
to
7
c
ha
r
a
c
te
r
s
a
r
e
e
xt
r
a
c
te
d
f
r
om
U
R
L
s
[
14]
.
F
or
e
xa
m
pl
e
,
gi
ve
n
a
U
R
L
li
ke
"
m
yw
e
bdom.c
om
,"
3
-
gr
a
m
ge
ne
r
a
ti
on
w
oul
d
pr
oduc
e
s
e
que
nc
e
s
s
uc
h
as
"
m
yw
,"
"
yw
e
,"
a
nd
"
w
e
b,"
.
T
hi
s
pr
oc
e
s
s
he
lp
s
to
c
a
pt
ur
e
bot
h
s
hor
te
r
a
nd
lo
nge
r
pa
tt
e
r
ns
w
it
hi
n
th
e
U
R
L
.
3
.7
.
Wor
d
2V
e
c
e
m
b
e
d
d
in
g
In
N
L
P
,
W
or
d2V
e
c
is
th
e
m
os
t
c
om
m
on
m
e
th
od
f
or
c
r
e
a
ti
ng
w
or
d
e
m
be
ddi
ngs
,
r
e
pr
e
s
e
nt
in
g
w
or
d
s
as
de
ns
e
,
c
ont
in
uous
ve
c
to
r
s
in
a
hi
gh
-
di
m
e
ns
io
na
l
s
p
a
c
e
.
W
or
d2V
e
c
c
ont
a
in
s
two
m
ode
ls
,
S
ki
p
-
gr
a
m
a
nd
c
ont
in
uous
ba
g
of
w
or
ds
(
C
B
O
W
)
[
29]
.
T
hi
s
e
xp
e
r
im
e
nt
us
e
s
th
e
C
B
O
W
m
ode
l,
w
hi
c
h
pr
e
di
c
ts
a
t
a
r
ge
t
w
or
d
f
r
om
its
s
ur
r
ounding
c
ont
e
xt
.
R
e
pr
e
s
e
nt
in
g
U
R
L
n
-
g
r
a
m
s
us
in
g
W
or
d2V
e
c
,
w
hi
c
h
c
a
pt
ur
e
s
th
e
s
e
m
a
nt
ic
r
e
la
ti
ons
hi
ps
be
twe
e
n
U
R
L
c
om
pone
nt
s
to
ge
n
e
r
a
te
d
e
ns
e
,
m
e
a
ni
ngf
ul
ve
c
to
r
r
e
pr
e
s
e
nt
a
ti
ons
.
T
hi
s
te
c
hni
que
a
ll
ow
s
f
or
a
m
or
e
pr
e
c
is
e
U
R
L
a
na
ly
s
is
w
it
h
ML
m
ode
ls
.
3
.8.
D
im
e
n
s
io
n
r
e
d
u
c
t
io
n
u
s
in
g
p
r
in
c
ip
al
c
om
p
on
e
n
t
an
al
ys
is
D
im
e
ns
io
na
li
ty
r
e
duc
ti
on
is
im
por
ta
nt
in
te
xt
ua
l
d
a
ta
pr
e
pr
oc
e
s
s
in
g,
e
s
pe
c
i
a
ll
y
a
f
te
r
n
-
g
r
a
m
r
e
pr
e
s
e
nt
a
ti
on
w
it
h
W
or
d2V
e
c
.
T
he
n
-
gr
a
m
f
or
m
a
t
c
a
pt
ur
e
s
th
e
c
ont
e
xt
a
nd
co
-
oc
c
ur
r
e
nc
e
of
w
or
ds
by
e
va
lu
a
ti
ng
w
or
d
s
e
que
nc
e
s
,
r
e
s
ul
ti
ng
in
hi
gh
-
di
m
e
ns
io
na
l
f
e
a
tu
r
e
s
pa
c
e
s
.
T
hi
s
hi
gh
di
m
e
ns
io
na
li
ty
can
pr
e
s
e
nt
di
f
f
ic
ul
ti
e
s
,
s
uc
h
as
hi
ghe
r
c
om
put
in
g
c
o
s
ts
a
nd
th
e
p
os
s
ib
il
it
y
of
ove
r
f
it
ti
ng.
P
r
in
c
ip
a
l
c
om
pone
nt
a
na
ly
s
is
(
P
C
A
)
is
a
w
id
e
ly
u
s
e
d
di
m
e
ns
io
n
a
li
ty
r
e
duc
ti
on
a
ppr
oa
c
h
th
a
t
c
onve
r
ts
hi
gh
-
di
m
e
ns
io
na
l
da
ta
in
to
a
lo
w
e
r
-
di
m
e
ns
io
na
l
s
pa
c
e
w
hi
le
r
e
ta
in
in
g
as
m
uc
h
va
r
ia
nc
e
as
pos
s
ib
le
[
30]
.
U
s
in
g
P
C
A
a
f
te
r
n
-
gr
a
m
r
e
pr
e
s
e
nt
a
ti
on
w
it
h
W
or
d2V
e
c
e
m
be
ddi
ngs
can
e
f
f
e
c
ti
ve
ly
r
e
duc
e
th
e
da
ta
s
e
t'
s
c
om
pl
e
xi
ty
,
a
ll
ow
in
g
f
or
m
or
e
e
f
f
ic
ie
nt
pr
oc
e
s
s
in
g
a
nd
m
ode
l
pe
r
f
or
m
a
nc
e
.
i)
M
a
tr
ix
X
r
e
pr
e
s
e
nt
s
th
e
W
or
d2V
e
c
e
m
be
ddi
ng
s
of
a
voc
a
bul
a
r
y,
w
he
r
e
e
a
c
h
r
ow
c
or
r
e
s
ponds
to
a
w
or
d
ve
c
to
r
w
it
h
d
d
im
e
ns
io
ns
.
If
th
e
r
e
a
r
e
n
w
or
ds
in
th
e
voc
a
bul
a
r
y,
th
e
m
a
tr
ix
X
w
il
l
ha
ve
di
m
e
ns
io
ns
n
×
d
,
as
s
how
n
in
(
1
)
.
=
[
1
→
2
→
⋮
→
]
(
1)
W
he
r
e
→
is
th
e
W
or
d2V
e
c
e
m
be
ddi
ng
in
th
e
i
th
w
or
d.
ii)
S
ubt
r
a
c
t
th
e
m
e
a
n
of
each
di
m
e
ns
io
n
(
f
e
a
tu
r
e
)
f
r
om
th
e
c
or
r
e
s
ponding
di
m
e
ns
io
n
of
each
w
or
d
ve
c
to
r
as
s
how
n
in
(
2
)
.
T
hi
s
c
e
nt
e
r
s
th
e
da
t
a
a
r
ound
th
e
or
ig
in
.
̅
=
−
̅
(
2)
W
he
r
e
X
̅
i
s
th
e
m
e
a
n
ve
c
to
r
of
th
e
e
m
be
ddi
ngs
in
X
,
c
om
put
e
d
as
s
how
n
in
(
3
)
.
̅
=
1
∑
⃗
⃗
⃗
⃗
=
1
(
3)
iii)
T
he
c
ova
r
ia
nc
e
m
a
tr
ix
C
c
a
pt
ur
e
s
ho
w
di
f
f
e
r
e
nt
di
m
e
ns
io
ns
(
f
e
a
tu
r
e
s
)
of
th
e
W
or
d2V
e
c
e
m
be
ddi
ngs
co
-
va
r
y.
It
is
c
a
lc
ul
a
te
d
as
d
e
s
c
r
ib
e
d
in
(
4
)
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8814
I
nt
J
A
dv A
ppl
S
c
i
,
V
ol
.
14
, N
o.
3
,
S
e
pt
e
m
be
r
20
25
:
916
-
927
922
=
1
−
1
̅
̅
(
4)
W
he
r
e
X
̅
T
X
̅
is
th
e
dot
pr
odu
c
t
of
th
e
c
e
nt
e
r
e
d
m
a
tr
ix
X
̅
a
n
d
i
ts
tr
a
n
s
p
os
e
,
a
nd
1
−
1
is
a
nor
m
a
li
z
a
ti
o
n
f
a
c
to
r
.
iv
)
C
a
lc
ul
a
te
th
e
e
ig
e
nv
e
c
to
r
s
V
a
nd
e
ig
e
nva
lu
e
s
λ
of
th
e
c
ova
r
ia
nc
e
m
a
tr
ix
C
.
T
he
e
ig
e
nve
c
to
r
s
r
e
pr
e
s
e
nt
th
e
di
r
e
c
ti
ons
of
th
e
m
a
xi
m
um
va
r
ia
nc
e
,
a
nd
th
e
e
ig
e
nva
lu
e
s
gi
ve
th
e
m
a
gni
tu
de
of
th
e
va
r
ia
nc
e
in
th
os
e
di
r
e
c
ti
ons
,
as
in
di
c
a
te
d
in
(
5
)
.
=
λ
V
(
5)
v)
S
or
t
th
e
e
ig
e
nve
c
to
r
s
by
th
e
ir
c
or
r
e
s
ponding
e
ig
e
nva
lu
e
s
in
de
s
c
e
ndi
ng
or
de
r
.
S
e
le
c
t
th
e
to
p
k
e
ig
e
nve
c
to
r
s
to
f
or
m
a
pr
oj
e
c
ti
on
m
a
tr
ix
P
as
s
how
n
in
(
6
)
.
=
[
1
2
⋯
]
(
6)
W
he
r
e
1
,
2
,
⋯
a
r
e
th
e
to
p
k
e
ig
e
nve
c
to
r
s
.
vi
)
P
r
oj
e
c
t
th
e
or
ig
in
a
l
W
or
d2V
e
c
e
m
be
ddi
ngs
X
̅
ont
o
th
e
ne
w
s
ubs
pa
c
e
de
f
in
e
d
by
th
e
pr
in
c
ip
a
l
c
om
pone
nt
s
as
in
di
c
a
te
d
in
(
7
)
.
=
̅
P
(
7)
W
he
r
e
Z
is
th
e
ne
w
lo
w
e
r
-
di
m
e
ns
io
na
l
r
e
pr
e
s
e
nt
a
ti
on
of
th
e
W
or
d2V
e
c
e
m
be
ddi
ng
s
,
a
nd
Z
w
il
l
ha
ve
di
m
e
ns
io
ns
n
×
k
,
w
h
e
r
e
k
is
th
e
num
be
r
of
s
e
le
c
te
d
pr
in
c
ip
a
l
c
o
m
pone
nt
s
.
3
.
9
.
M
e
t
a
-
l
e
ar
n
e
r
A
m
e
ta
-
le
a
r
ne
r
is
a
n
ML
c
la
s
s
if
ie
r
th
a
t
e
nh
a
nc
e
s
pr
e
di
c
ti
ve
p
e
r
f
or
m
a
nc
e
by
in
te
gr
a
ti
ng
th
e
r
e
s
ul
ts
f
r
om
m
ul
ti
pl
e
ba
s
e
le
a
r
ne
r
s
.
In
th
is
e
xp
e
r
im
e
nt
,
th
e
m
e
ta
-
le
a
r
n
e
r
is
a
n
LR
c
la
s
s
if
ie
r
th
a
t
c
om
bi
ne
s
two
b
a
s
e
le
a
r
ne
r
s
,
one
w
it
h
a
le
xi
c
a
l
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
a
nd
th
e
ot
he
r
w
it
h
a
n
n
-
gr
a
m
U
R
L
r
e
p
r
e
s
e
nt
a
ti
on.
T
he
le
xi
c
a
l
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
c
a
pt
ur
e
s
th
e
s
tr
uc
tu
r
e
a
nd
c
om
pone
nt
s
of
a
U
R
L
,
w
h
e
r
e
a
s
th
e
n
-
gr
a
m
r
e
pr
e
s
e
nt
a
ti
on
a
na
ly
z
e
s
s
e
que
nt
ia
l
pa
tt
e
r
ns
w
it
hi
n
th
e
U
R
L
.
T
he
m
e
ta
-
le
a
r
n
e
r
c
a
pi
ta
li
z
e
s
on
th
e
c
a
pa
bi
li
ti
e
s
of
bot
h
ba
s
e
le
a
r
ne
r
s
by
in
te
gr
a
ti
ng
th
e
ir
pr
e
di
c
ti
ons
,
a
ll
ow
in
g
it
to
r
e
a
c
h
a
m
or
e
in
f
o
r
m
e
d
de
c
is
io
n.
T
hi
s
m
ode
l
can
e
f
f
ic
ie
nt
ly
w
e
ig
h
e
a
c
h
ba
s
e
le
a
r
ne
r
'
s
c
ont
r
ib
ut
io
n
us
in
g
LR
as
th
e
m
e
ta
-
le
a
r
ne
r
,
im
pr
ovi
ng
th
e
c
la
s
s
if
ic
a
ti
on
a
c
c
ur
a
c
y
[
31]
.
In
(
8
)
,
r
e
pr
e
s
e
nt
s
th
e
pr
oc
e
s
s
of
m
e
ta
-
le
a
r
ni
ng.
=
(
1
.
+
2
.
−
+
)
(
8)
W
he
r
e
is
th
e
f
in
a
l
pr
e
di
c
ti
on
f
or
th
e
m
e
ta
-
le
a
r
ne
r
(
LR
)
;
is
th
e
pr
e
di
c
ti
on
f
r
om
th
e
le
xi
c
a
l
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
ba
s
e
le
a
r
ne
r
;
−
is
th
e
pr
e
di
c
ti
on
f
r
om
th
e
n
-
g
r
a
m
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
ba
s
e
le
a
r
ne
r
;
1
2
a
r
e
th
e
w
e
ig
ht
s
th
e
LR
a
s
s
ig
ns
to
each
ba
s
e
le
a
r
ne
r
’
s
out
put
;
is
th
e
bi
a
s
te
r
m
in
th
e
LR
;
a
nd
is
th
e
s
ig
m
oi
d
f
unc
ti
on,
de
f
in
e
d
as
in
(
9
)
.
(
)
=
1
1
+
(9
)
T
he
LR
m
e
ta
-
le
a
r
ne
r
c
om
bi
ne
s
th
e
out
put
s
a
nd
−
,
a
ppl
ie
s
le
a
r
ne
d
w
e
ig
ht
s
1
2
,
a
nd
pa
s
s
e
s
th
e
r
e
s
ul
t
th
r
ough
th
e
s
ig
m
oi
d
f
unc
ti
on
to
pr
oduc
e
th
e
f
in
a
l
de
c
is
io
n
,
ty
pi
c
a
ll
y
a
pr
oba
bi
li
ty
or
bi
na
r
y
c
la
s
s
if
ic
a
ti
on.
4.
R
E
S
U
L
T
S
AND
D
I
S
C
U
S
S
I
O
N
T
he
e
xpe
r
im
e
nt
w
a
s
pe
r
f
or
m
e
d
on
a
W
in
dow
s
i7
w
it
h
P
yt
hon
a
nd
J
upyt
e
r
N
ot
e
book.
T
he
popula
r
s
c
ik
it
-
le
a
r
n
pa
c
ka
ge
w
a
s
ut
il
iz
e
d
f
or
ML
a
lg
or
it
hm
s
.
T
hi
s
e
xpe
r
im
e
nt
c
om
pa
r
e
d
e
ig
ht
di
f
f
e
r
e
nt
ML
a
lg
or
it
hm
s
.
E
a
c
h
m
ode
l
w
a
s
s
ys
te
m
a
ti
c
a
ll
y
tr
a
in
e
d
a
nd
te
s
te
d
on
a
la
b
e
le
d
U
R
L
da
ta
s
e
t,
a
ll
ow
in
g
f
or
a
c
om
pr
e
he
ns
iv
e
c
om
pa
r
is
on
of
th
e
ir
pe
r
f
or
m
a
nc
e
in
a
c
c
ur
a
te
l
y
c
la
s
s
if
yi
ng
m
a
li
c
io
us
ve
r
s
us
be
ni
gn
U
R
L
s
.
T
he
r
e
s
ul
t
s
of
th
e
le
xi
c
a
l
r
e
pr
e
s
e
nt
a
ti
on
of
th
e
U
R
L
a
r
e
s
ho
w
n
in
T
a
bl
e
5
.
A
to
ta
l
of
25
in
de
p
e
nde
nt
a
nd
1
de
pe
nde
nt
f
e
a
tu
r
e
w
as
e
xt
r
a
c
te
d
f
r
om
th
e
U
R
L
,
as
de
ta
il
e
d
in
T
a
bl
e
3
.
U
s
in
g
th
e
S
e
le
c
tK
B
e
s
t
m
e
th
od
f
or
f
e
a
tu
r
e
s
e
le
c
ti
on,
m
ul
ti
pl
e
M
L
a
lg
or
it
hm
s
w
e
r
e
e
xa
m
in
e
d
w
it
h
a
f
e
a
tu
r
e
r
a
nge
of
15
to
25.
T
he
r
e
s
ul
ts
in
T
a
bl
e
5
in
di
c
a
te
th
a
t
th
e
num
be
r
of
f
e
a
tu
r
e
s
s
e
t
a
t
20
a
nd
25
yi
e
ld
s
ne
a
r
ly
id
e
nt
ic
a
l
out
c
om
e
s
.
I
n
pa
r
ti
c
ul
a
r
,
w
it
h
20
f
e
a
tu
r
e
s
,
th
e
R
F
a
nd
X
G
B
a
lg
or
it
hm
s
a
c
hi
e
ve
a
c
c
ur
a
c
ie
s
of
99.22
a
nd
99.36%
,
r
e
s
pe
c
ti
ve
ly
.
L
e
xi
c
a
l
f
e
a
tu
r
e
s
a
lo
ne
c
oul
d
f
a
il
to
r
e
c
og
ni
z
e
ti
ny
ye
t
vi
ta
l
pa
tt
e
r
ns
in
m
a
li
c
io
u
s
U
R
L
s
,
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
dv
A
ppl
S
c
i
I
S
S
N
:
2252
-
8814
A
hy
br
id
f
e
at
ur
e
s
ba
s
e
d m
al
e
v
ol
e
nt
dom
ai
n de
te
c
ti
on i
n c
y
be
r
s
pac
e
…
(
Sal
e
e
m
R
aj
a A
bdul
Sam
ad)
923
s
uc
h
a
s
s
pe
c
if
ic
c
ha
r
a
c
t
e
r
s
e
que
nc
e
s
or
w
or
d
c
om
bi
na
ti
ons
th
a
t
a
ppe
a
r
f
r
e
que
nt
ly
in
s
uc
h
a
tt
a
c
ks
.
W
or
d2V
e
c
c
a
n
m
a
p
th
e
s
e
s
e
que
nc
e
s
in
to
c
ont
in
uous
ve
c
to
r
s
pa
c
e
s
,
w
he
r
e
s
e
m
a
nt
ic
a
ll
y
s
im
il
a
r
pa
tt
e
r
ns
a
r
e
pos
it
io
ne
d
c
lo
s
e
ly
to
ge
th
e
r
by
r
e
pr
e
s
e
nt
in
g
th
e
U
R
L
a
s
n
-
gr
a
m
s
.
T
hi
s
im
pr
ove
s
th
e
m
ode
l'
s
a
bi
li
ty
to
de
t
e
c
t
a
nom
a
li
e
s
a
nd
unde
r
s
ta
nd
th
e
r
e
la
ti
ons
hi
ps
be
twe
e
n
U
R
L
f
r
a
gm
e
nt
s
.
T
a
bl
e
6
s
how
s
th
e
e
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
f
or
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
us
in
g
n
-
gr
a
m
in
c
om
bi
na
ti
on
w
it
h
W
or
d2V
e
c
.
I
n
th
is
e
xpe
r
im
e
nt
,
P
C
A
w
a
s
u
s
e
d
to
m
in
im
iz
e
th
e
di
m
e
ns
io
na
li
ty
of
th
e
ve
c
to
r
s
pa
c
e
by
s
e
le
c
ti
ng
35
to
45
c
om
pone
nt
s
,
m
a
xi
m
iz
in
g
th
e
r
e
pr
e
s
e
nt
a
ti
on
w
hi
le
ke
e
pi
ng i
m
por
ta
nt
i
nf
or
m
a
ti
on.
T
a
bl
e
5
.
R
e
s
ul
ts
of
th
e
le
xi
c
a
l
r
e
pr
e
s
e
nt
a
ti
on
of
U
R
L
N
um
be
r
of
f
e
a
t
ur
e
s
ML
a
l
gor
i
t
hm
A
c
c
ur
a
c
y
(%)
P
r
e
c
i
s
i
on
(%)
R
e
c
a
l
l
(%)
F1
-
s
c
or
e
(%)
15
LR
86.60
92.13
80.60
85.98
S
V
C
76.65
88.97
61.83
72.95
GNB
84.14
90.20
77.27
83.23
DT
98.50
98.46
98.59
98.53
RF
99.15
98.95
99.40
99.17
GB
98.22
97.55
98.99
98.26
KNN
96.74
96.73
96.87
96.80
XGB
99.20
99.10
99.33
99.22
20
LR
84.59
87.67
81.18
84.29
S
V
C
76.63
88.91
61.84
72.94
GNB
84.51
89.99
78.31
83.74
DT
98.56
98.66
98.52
98.59
RF
99.22
99.00
99.48
99.24
GB
98.35
97.80
98.99
98.39
KNN
97.08
96.24
98.09
97.16
XGB
99.36
99.30
99.45
99.37
25
LR
84.49
88.07
80.50
84.10
S
V
C
95.40
91.90
99.76
95.67
GNB
83.39
89.68
76.15
82.36
DT
98.58
98.63
98.58
98.60
RF
99.23
99.02
99.48
99.25
GB
98.36
97.81
99.00
98.40
KNN
97.08
96.24
98.11
97.17
XGB
99.37
99.30
99.46
99.38
T
a
bl
e
6
s
how
s
th
a
t
th
e
r
e
s
ul
ts
obt
a
in
e
d
w
it
h
40
a
nd
45
P
C
A
c
om
pone
nt
s
a
r
e
a
lm
os
t
id
e
nt
ic
a
l.
T
he
r
e
f
or
e
,
P
C
A
w
it
h
40
c
om
pone
nt
s
w
a
s
c
hos
e
n
f
or
th
e
e
xpe
r
im
e
nt
.
C
om
bi
ni
ng
le
xi
c
a
l
a
nd
n
-
gr
a
m
f
e
a
tu
r
e
r
e
pr
e
s
e
nt
a
ti
ons
of
U
R
L
s
yi
e
ld
s
a
m
or
e
r
obus
t
m
e
th
od
f
o
r
c
la
s
s
if
yi
ng
U
R
L
s
in
to
m
a
li
c
io
us
or
be
ni
gn.
T
hi
s
hybr
id
m
e
th
od
ta
ke
s
a
dva
nt
a
g
e
of
th
e
U
R
L
'
s
s
tr
uc
tu
r
a
l
a
nd
s
e
que
nt
ia
l
pa
tt
e
r
n
s
,
im
pr
ovi
ng
th
e
m
ode
l'
s
c
a
pa
c
it
y
to
de
te
c
t
th
r
e
a
ts
m
or
e
e
f
f
e
c
ti
ve
ly
.
T
a
bl
e
7
pr
e
s
e
nt
s
th
e
e
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
of
th
e
hybr
id
m
e
th
od
(
le
xi
c
a
l
a
nd
n
-
gr
a
m
r
e
pr
e
s
e
nt
a
ti
on of
U
R
L
)
. F
ig
ur
e
2 de
pi
c
ts
t
he
R
O
C
-
A
U
C
C
ur
ve
f
or
t
he
X
G
B
c
la
s
s
if
ie
r
.
T
a
bl
e
6.
R
e
s
ul
ts
of
n
-
gr
a
m
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
w
it
h
W
or
d2V
e
c
a
nd
P
C
A
(
40
c
om
pone
nt
s
)
N
um
be
r
of
c
om
pone
nt
s
ML
a
l
gor
i
t
hm
A
c
c
ur
a
c
y
(%)
P
r
e
c
i
s
i
on
(%)
R
e
c
a
l
l
(%)
F1
-
s
c
or
e
(%)
35
LR
76.10
81.27
68.97
74.62
S
V
C
77.21
88.07
63.94
74.07
GNB
72.93
85.19
56.72
68.09
DT
77.76
77.85
78.75
78.29
RF
85.04
90.06
79.41
84.39
GB
79.88
85.80
72.52
78.59
AB
83.42
87.16
79.12
82.94
XGB
84.07
86.60
81.31
83.87
40
LR
76.49
81.48
69.70
75.12
S
V
C
77.26
88.24
63.88
74.10
GNB
73.02
84.50
57.62
68.50
DT
78.16
78.17
79.26
78.71
RF
85.18
90.19
79.58
84.55
GB
79.78
85.52
72.62
78.54
KNN
83.53
87.37
79.12
83.03
XGB
84.66
87.13
82.01
84.49
45
LR
76.43
81.51
69.52
75.03
S
V
C
77.28
88.26
63.90
74.12
GNB
73.00
83.91
58.17
68.68
DT
77.87
77.99
78.82
78.39
RF
85.13
90.22
79.43
84.48
GB
79.87
85.58
72.75
78.64
KNN
83.61
87.43
79.24
83.13
XGB
84.78
87.27
82.12
84.61
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
S
N
:
2252
-
8814
I
nt
J
A
dv A
ppl
S
c
i
,
V
ol
.
14
, N
o.
3
,
S
e
pt
e
m
be
r
20
25
:
916
-
927
924
T
a
bl
e
7
.
E
xpe
r
im
e
nt
a
l
r
e
s
ul
ts
of
th
e
hybr
id
m
e
th
od
(
le
xi
c
a
l
a
nd
n
-
g
r
a
m
U
R
L
r
e
pr
e
s
e
nt
a
ti
on
)
ML
a
l
gor
i
t
hm
A
c
c
ur
a
c
y
(%)
P
r
e
c
i
s
i
on
(%)
R
e
c
a
l
l
(%)
F1
-
S
c
or
e
(
%
)
LR
84.68
84.85
84.74
84.67
S
V
C
78.12
79.97
78.35
77.87
GNB
84.54
85.04
84.65
84.51
DT
98.51
98.51
98.50
98.51
RF
99.30
99.30
99.30
99.30
GB
98.60
98.61
98.59
98.60
KNN
97.03
97.06
97.01
97.03
XGB
99.43
99.43
99.43
99.43
F
ig
ur
e
2. R
O
C
-
A
U
C
c
ur
ve
f
or
X
G
B
c
la
s
s
if
ie
r
T
he
r
e
s
ul
t
s
how
s
th
a
t
th
e
R
F
a
nd
X
G
B
gi
ve
a
n
a
c
c
ur
a
c
y
of
99.30
a
nd
99.43%
,
r
e
s
pe
c
ti
ve
ly
.
T
a
bl
e
8
a
nd
F
ig
ur
e
3
il
lu
s
tr
a
te
a
p
e
r
f
or
m
a
nc
e
c
om
pa
r
is
on
o
f
th
e
pr
opos
e
d
hybr
id
m
e
th
od
w
it
h
e
xi
s
ti
ng
m
e
th
ods
. T
he
r
e
s
ul
ts
s
how
t
h
a
t
th
e
pr
opos
e
d hybr
id
m
e
th
od s
ur
pa
s
s
e
s
ot
he
r
e
xi
s
ti
ng me
th
ods
i
n
a
c
c
ur
a
c
y.
T
a
bl
e
8
.
P
e
r
f
or
m
a
nc
e
c
om
pa
r
is
on be
twe
e
n e
xi
s
ti
ng me
th
od
s
a
n
d t
he
pr
opos
e
d m
e
th
od
A
ut
hor
(
s
)
A
c
c
ur
a
c
y
(
%
)
J
os
hi
et
al
[
8]
92
R
a
j
a
et
al
.
[
9]
99
Ha
et
al
.
[
10]
95.68
A
f
z
a
l
et
al
.
[
11]
98
K
um
i
et
al
.
[
12]
95.8
Al
-
H
a
i
j
a
a
nd A
l
-
F
a
youm
i
[
13]
99.3
R
a
j
a
e
t
.al
.
[
14]
92.49
L
e
e
et
al
.
[
15]
99
R
a
j
a
et
al
.
[
16]
98.8
P
r
opos
e
d
m
e
t
hod
99.43
F
ig
ur
e
3.
G
r
a
phi
c
a
l
c
om
pa
r
is
on of
pe
r
f
or
m
a
nc
e
be
twe
e
n e
xi
s
ti
ng me
th
ods
a
nd t
he
pr
opos
e
d m
e
th
od
92
99
9
5
.6
8
98
95.8
99.3
92.49
99
98.8
9
9
.4
3
88
90
92
94
96
98
100
A
c
c
ur
a
c
y
(
%
)
A
ut
hor
(
s
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
nt
J
A
dv
A
ppl
S
c
i
I
S
S
N
:
2252
-
8814
A
hy
br
id
f
e
at
ur
e
s
ba
s
e
d m
al
e
v
ol
e
nt
dom
ai
n de
te
c
ti
on i
n c
y
be
r
s
pac
e
…
(
Sal
e
e
m
R
aj
a A
bdul
Sam
ad)
925
5.
C
O
N
C
L
U
S
I
O
N
T
hi
s
s
tu
dy
h
ig
hl
ig
h
ts
t
he
ne
e
d
f
or
e
f
f
ic
ie
n
t
de
te
c
ti
o
n
s
ys
t
e
m
s
to
e
s
c
a
la
te
c
yb
e
r
a
tt
a
c
k
t
h
r
e
a
ts
to
va
r
io
us
in
d
us
t
r
ie
s
.
T
he
e
xa
m
i
na
t
io
n
of
r
e
l
a
te
d
w
o
r
ks
ha
s
i
ll
u
m
in
a
te
d
th
e
l
im
it
a
t
io
ns
of
e
x
is
t
in
g
m
e
t
ho
ds
,
hi
g
hl
ig
ht
in
g
t
he
ne
c
e
s
s
it
y
f
or
o
ur
p
r
opos
e
d
a
p
pr
oa
c
h
.
T
he
pr
opos
e
d
m
e
th
o
d
le
v
e
r
a
ge
s
a
ba
l
a
nc
e
d
da
ta
s
e
t
to
a
v
oi
d
b
ia
s
.
T
he
p
r
op
os
e
d
s
ys
te
m
e
n
ha
nc
e
d
de
te
c
ti
on
a
c
c
ur
a
c
y
by
e
x
tr
a
c
t
in
g
s
e
ns
i
ti
ve
l
e
x
ic
a
l
f
e
a
tu
r
e
s
a
nd
e
m
pl
oy
in
g
n
-
g
r
a
m
f
e
a
tu
r
e
s
w
it
h
W
o
r
d
2V
e
c
e
m
be
dd
in
gs
.
I
n
te
g
r
a
t
in
g
t
he
s
e
h
yb
r
i
d
f
e
a
tu
r
e
s
r
e
s
u
lt
e
d
in
s
upe
r
i
or
c
la
s
s
i
f
ic
a
t
io
n
pe
r
f
o
r
m
a
nc
e
,
s
ig
n
i
f
ic
a
nt
ly
out
pe
r
f
o
r
m
i
ng
e
x
is
t
in
g
m
e
t
ho
ds
.
T
he
e
xpe
r
i
m
e
n
t
r
e
s
ul
ts
r
e
ve
a
l
th
a
t
th
e
RF
a
nd
X
G
B
c
la
s
s
if
ie
r
s
a
c
h
ie
v
e
d
i
m
p
r
e
s
s
iv
e
a
c
c
u
r
a
c
y
r
a
te
s
of
99
.3
0
a
nd
99.
43
%
,
r
e
s
pe
c
t
iv
e
ly
.
F
U
N
D
I
N
G
I
N
F
O
R
M
A
T
I
O
N
T
hi
s
r
e
s
e
a
r
c
h
pr
oj
e
c
t
w
a
s
f
unde
d
by
th
e
U
ni
ve
r
s
it
y
of
T
e
c
hn
ol
ogy
a
nd
A
ppl
ie
d
S
c
ie
nc
e
s
,
S
hi
na
s
,
th
r
ough the
I
nt
e
r
na
l
R
e
s
e
a
r
c
h F
undi
ng P
r
ogr
a
m
-
2024, gr
a
nt
nu
m
be
r
(
U
T
A
S
-
S
hi
na
s
-
c
y01
-
2024
-
002)
.
A
U
T
H
O
R
C
O
N
T
R
I
B
U
T
I
O
N
S
S
T
A
T
E
M
E
N
T
T
hi
s
jo
ur
na
l
us
e
s
th
e
C
ont
r
ib
ut
or
R
ol
e
s
T
a
xonomy
(
C
R
e
di
T
)
to
r
e
c
ogni
z
e
in
di
vi
dua
l
a
ut
hor
c
ont
r
ib
ut
io
ns
, r
e
duc
e
a
ut
hor
s
hi
p di
s
put
e
s
,
a
nd f
a
c
il
it
a
te
c
ol
la
bo
r
a
ti
on.
N
am
e
o
f
A
u
t
h
or
C
M
So
Va
Fo
I
R
D
O
E
Vi
Su
P
Fu
S
a
le
e
m
R
a
ja
A
bdul
S
a
m
a
d
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
P
r
a
de
e
pa
G
a
ne
s
a
n
✓
✓
✓
✓
✓
✓
✓
A
m
na
S
a
li
m
R
a
s
hi
d
Al
-
K
a
a
bi
✓
✓
✓
✓
✓
✓
✓
✓
J
us
ti
n R
a
ja
s
e
ka
r
a
n
✓
✓
✓
M
ur
uga
n S
in
ga
r
a
ve
la
n
✓
✓
✓
✓
✓
P
e
e
r
b
a
s
h
a
S
h
e
b
b
e
e
r
B
a
s
h
a
✓
✓
✓
✓
✓
C
:
C
onc
e
pt
ua
l
i
z
a
t
i
on
M
:
M
e
t
hodol
ogy
So
:
So
f
t
w
a
r
e
Va
:
Va
l
i
da
t
i
on
Fo
:
Fo
r
m
a
l
a
na
l
ys
i
s
I
:
I
nve
s
t
i
ga
t
i
on
R
:
R
e
s
our
c
e
s
D
:
D
a
t
a
C
ur
a
t
i
on
O
:
W
r
i
t
i
ng
-
O
r
i
gi
na
l
D
r
a
f
t
E
:
W
r
i
t
i
ng
-
R
e
vi
e
w
&
E
di
t
i
ng
Vi
:
Vi
s
ua
l
i
z
a
t
i
on
Su
:
Su
pe
r
vi
s
i
on
P
:
P
r
oj
e
c
t
a
dm
i
ni
s
t
r
a
t
i
on
Fu
:
Fu
ndi
ng a
c
qui
s
i
t
i
on
C
O
N
F
L
I
C
T
O
F
I
N
T
E
R
E
S
T
S
T
A
T
E
M
E
N
T
A
ut
hor
s
s
ta
te
no c
onf
li
c
t
of
i
nt
e
r
e
s
t.
D
A
T
A
A
V
A
I
L
A
B
I
L
I
T
Y
D
a
ta
a
va
il
a
bi
li
ty
is
not
a
ppl
ic
a
bl
e
to
th
is
p
a
pe
r
a
s
no
n
e
w
da
ta
w
e
r
e
c
r
e
a
te
d
or
a
na
ly
z
e
d
in
th
is
s
tu
dy.
R
E
F
E
R
E
N
C
E
S
[
1]
A
.
S
.
R
a
j
a
,
B
.
S
unda
r
va
di
va
z
ha
ga
n,
R
.
V
i
j
a
ya
r
a
nga
n,
a
nd
S
.
V
e
e
r
a
m
a
ni
,
“
M
a
l
i
c
i
ous
w
e
bpa
ge
c
l
a
s
s
i
f
i
c
a
t
i
on
ba
s
e
d
on
w
e
b
c
ont
e
nt
f
e
a
t
ur
e
s
us
i
ng m
a
c
hi
ne
l
e
a
r
ni
ng a
nd de
e
p l
e
a
r
ni
ng,”
2022 I
nt
e
r
nat
i
onal
C
onf
e
r
e
nc
e
on G
r
e
e
n E
ne
r
g
y
, C
om
put
i
ng and Su
s
t
ai
nabl
e
T
e
c
hnol
ogy
, G
E
C
O
ST
2022
, pp. 314
–
319, 2022, doi
:
10.1109/
G
E
C
O
S
T
55694.
2022.10010386.
[
2]
D
.
S
a
hoo,
C
.
L
i
u,
a
nd
S
.
C
.
H
.
H
oi
,
“
M
a
l
i
c
i
ous
U
R
L
de
t
e
c
t
i
on
us
i
ng
m
a
c
h
i
ne
l
e
a
r
ni
ng:
a
s
ur
ve
y,”
ar
X
i
v
-
C
om
put
e
r
Sc
i
e
nc
e
,
pp. 1
-
37, A
ug 2019
.
[
3]
S
.
L
i
a
qua
t
ha
l
i
a
nd
V
.
K
a
di
r
ve
l
u,
“
W
C
A
:
i
nt
e
gr
a
t
i
on
of
na
t
ur
a
l
l
a
ngua
g
e
pr
o
c
e
s
s
i
ng
m
e
t
hods
a
nd
m
a
c
hi
ne
l
e
a
r
ni
ng
m
ode
l
f
or
e
f
f
e
c
t
i
ve
a
na
l
ys
i
s
of
w
e
b
c
ont
e
nt
t
o
c
l
a
s
s
i
f
y
m
a
l
i
c
i
ous
w
e
bpa
g
e
s
,”
J
ou
r
nal
of
A
dv
anc
e
d
R
e
s
e
ar
c
h
i
n
A
ppl
i
e
d
Sc
i
e
nc
e
s
and
E
ngi
ne
e
r
i
ng T
e
c
hnol
ogy
, vol
. 47, no. 1, pp. 105
–
122, J
un. 2024, doi
:
10.37934/
a
r
a
s
e
t
.47.1.105122.
[
4]
S
.
L
i
a
qua
t
ha
l
i
a
nd
V
.
K
a
di
r
ve
l
u,
“
I
nt
e
gr
a
t
i
on
o
f
na
t
ur
a
l
l
a
ngua
ge
pr
oc
e
s
s
i
ng
m
e
t
hods
a
nd
m
a
c
hi
ne
l
e
a
r
ni
ng
m
ode
l
f
or
m
a
l
i
c
i
ous
w
e
bpa
ge
de
t
e
c
t
i
on
b
a
s
e
d
on
w
e
b
c
ont
e
nt
s
,”
I
A
E
S
I
nt
e
r
nat
i
onal
J
our
nal
of
R
obot
i
c
s
and
A
ut
om
at
i
on
,
vol
.
14,
no.
1,
p
p
.
47
-
57
,
M
a
r
. 2025, doi
:
10.11591/
i
j
r
a
.v14i
1.pp47
-
57.
[
5]
S
.
S
he
i
khi
a
nd
P
.
K
os
t
a
kos
,
“
S
a
f
e
gua
r
di
ng
c
ybe
r
s
pa
c
e
:
e
nha
nc
i
ng
m
a
l
i
c
i
ous
w
e
bs
i
t
e
de
t
e
c
t
i
on
w
i
t
h
P
S
O
-
opt
i
m
i
z
e
d
X
G
B
oos
t
a
nd
f
i
r
e
f
l
y
-
ba
s
e
d f
e
a
t
ur
e
s
e
l
e
c
t
i
on,”
C
om
put
e
r
s
and Se
c
u
r
i
t
y
, vol
. 142, 2024, doi
:
10.1016/
j
.c
os
e
.2024.103885.
[
6]
S
.
A
ba
d,
H
.
G
hol
a
m
y,
a
nd
M
.
A
s
l
a
ni
,
“
C
l
a
s
s
i
f
i
c
a
t
i
on
of
m
a
l
i
c
i
ous
U
R
L
s
us
i
ng
m
a
c
hi
ne
l
e
a
r
ni
ng,”
Se
ns
or
s
,
vol
.
23,
no.
18,
2023,
doi
:
10.3390/
s
23187760.
Evaluation Warning : The document was created with Spire.PDF for Python.