I
AE
S In
t
er
na
t
io
na
l J
o
urna
l o
f
Art
if
icia
l In
t
ellig
ence
(
I
J
-
AI
)
Vo
l.
15
,
No
.
2
,
A
p
r
il
2
0
2
6
,
p
p
.
1
4
4
1
~
1
4
5
0
I
SS
N:
2
2
5
2
-
8
9
3
8
,
DOI
: 1
0
.
1
1
5
9
1
/ijai.v
15
.i
2
.
p
p
1
4
4
1
-
1
4
5
0
1441
J
o
ur
na
l ho
m
ep
a
g
e
:
h
ttp
:
//ij
a
i
.
ia
esco
r
e.
co
m
TM
A
-
Net
:
a
tran
sfo
rmer
-
ba
sed m
ulti
-
mo
da
l at
tent
i
o
n net
wo
rk
for a
bno
rma
l beh
a
v
io
r det
ection
H
uo
ng
-
G
ia
ng
Do
a
n
1
,
Ng
o
c
-
T
rung
Ng
uy
en
2
1
F
a
c
u
l
t
y
o
f
C
o
n
t
r
o
l
a
n
d
A
u
t
o
ma
t
i
o
n
,
El
e
c
t
r
i
c
P
o
w
e
r
U
n
i
v
e
r
si
t
y
,
H
a
n
o
i
,
V
i
e
t
n
a
m
2
D
e
p
a
r
t
me
n
t
o
f
P
e
r
so
n
n
e
l
O
r
g
a
n
i
z
a
t
i
o
n
a
n
d
A
d
mi
n
i
s
t
r
a
t
i
o
n
,
E
l
e
c
t
r
i
c
P
o
w
e
r
U
n
i
v
e
r
si
t
y
,
H
a
n
o
i
,
V
i
e
t
n
a
m
Art
icle
I
nfo
AB
S
T
RAC
T
A
r
ticle
his
to
r
y:
R
ec
eiv
ed
Sep
18
,
2
0
2
4
R
ev
is
ed
J
an
9
,
2
0
2
6
Acc
ep
ted
J
an
25
,
2
0
2
6
Ab
n
o
rm
a
l
b
e
h
a
v
io
r
d
e
tec
ti
o
n
i
n
c
ro
wd
e
d
e
n
v
ir
o
n
m
e
n
ts
re
m
a
in
s
c
h
a
ll
e
n
g
i
n
g
d
u
e
t
o
c
o
m
p
lex
m
o
ti
o
n
p
a
tt
e
rn
s
,
o
c
c
lu
si
o
n
s,
a
n
d
d
o
m
a
in
v
a
riab
il
it
y
.
Th
is
p
a
p
e
r
p
re
se
n
ts
tran
sf
o
rm
e
r
-
b
a
se
d
m
u
lt
i
-
m
o
d
a
l
a
tt
e
n
ti
o
n
n
e
tw
o
rk
(T
M
A
-
Ne
t),
a
u
n
if
ied
fra
m
e
wo
rk
t
h
a
t
i
n
teg
ra
t
e
s
re
d
,
g
re
e
n
,
a
n
d
b
l
u
e
(
RG
B
)
,
o
p
ti
c
a
l
fl
o
w
(OF)
,
a
n
d
h
e
a
t
m
a
p
(HM)
m
o
d
a
li
ti
e
s
th
ro
u
g
h
a
d
u
a
l
-
sta
g
e
a
tt
e
n
ti
o
n
fu
si
o
n
m
e
c
h
a
n
ism
.
Th
e
sy
ste
m
e
m
p
lo
y
s
y
o
u
o
n
ly
l
o
o
k
o
n
c
e
v
e
rsio
n
1
1
(
Y
OLOv
1
1
)
fo
r
h
u
m
a
n
lo
c
a
li
z
a
ti
o
n
a
n
d
v
i
sio
n
tran
sfo
rm
e
r
(ViT)
-
B
/1
6
f
o
r
fe
a
tu
re
e
n
c
o
d
i
n
g
,
fo
ll
o
we
d
b
y
i
n
tra
-
m
o
d
a
l
se
lf
-
a
tt
e
n
ti
o
n
a
n
d
c
r
o
ss
-
m
o
d
a
l
fu
sio
n
t
o
c
a
p
tu
re
fin
e
-
g
ra
i
n
e
d
sp
a
ti
a
l
–
te
m
p
o
ra
l
a
n
d
m
o
ti
o
n
e
n
e
r
g
y
d
e
p
e
n
d
e
n
c
ies
.
Ex
ten
si
v
e
e
x
p
e
rime
n
ts
o
n
six
p
u
b
li
c
b
e
n
c
h
m
a
rk
s
a
s
UMN,
Cro
wd
-
1
1
,
UBN
o
rm
a
l,
S
h
a
n
g
h
a
iT
e
c
h
,
CU
HK
Av
e
n
u
e
,
UCSD
P
e
d
2
,
a
n
d
EP
UA
b
N
d
a
tas
e
t,
d
e
m
o
n
stra
te
th
a
t
TM
A
-
N
e
t
a
c
h
iev
e
s
u
p
t
o
9
7
.
5
%
a
re
a
u
n
d
e
r
th
e
c
u
rv
e
(
AUC
)
a
n
d
9
6
–
1
0
0
%
a
c
c
u
ra
c
y
,
o
u
tp
e
rfo
rm
i
n
g
p
re
v
io
u
s o
th
e
r
sta
te
-
of
-
th
e
-
a
rt
a
p
p
ro
a
c
h
e
s.
T
h
e
se
re
su
lt
s
h
i
g
h
li
g
h
t
th
e
fra
m
e
wo
rk
’s
str
o
n
g
g
e
n
e
ra
li
z
a
ti
o
n
a
n
d
ro
b
u
stn
e
ss
a
c
ro
ss
b
o
t
h
si
n
g
le
-
a
n
d
c
ro
ss
-
d
a
tas
e
t
e
v
a
l
u
a
ti
o
n
s,
u
n
d
e
r
sc
o
rin
g
it
s
p
o
ten
ti
a
l
fo
r
re
li
a
b
le d
e
p
lo
y
m
e
n
t
in
re
a
l
in
telli
g
e
n
t
su
r
v
e
il
lan
c
e
sy
s
tem
s.
K
ey
w
o
r
d
s
:
Ab
n
o
r
m
al
d
ec
tectio
n
Atten
tio
n
n
etwo
r
k
C
o
n
v
o
lu
tio
n
al
n
eu
r
al
n
etwo
r
k
Sp
atial
-
tem
p
o
r
al
T
r
an
s
f
o
r
m
e
r
T
h
is i
s
a
n
o
p
e
n
a
c
c
e
ss
a
rticle
u
n
d
e
r th
e
CC B
Y
-
SA
li
c
e
n
se
.
C
o
r
r
e
s
p
o
nd
ing
A
uth
o
r
:
Ng
o
c
-
T
r
u
n
g
Ng
u
y
en
Dep
ar
tm
en
t o
f
Per
s
o
n
n
el
Or
g
a
n
izatio
n
an
d
Ad
m
in
is
tr
atio
n
,
E
lectr
ic
Po
wer
Un
iv
er
s
ity
No
.
2
3
5
Ho
an
g
Qu
o
c
Viet
S
tr
ee
t,
Ng
h
ia
Do
W
ar
d
,
Han
o
i Ci
ty
,
Vietn
am
E
m
ail: tr
u
n
g
n
n
@
ep
u
.
ed
u
.
v
n
1.
I
NT
RO
D
UCT
I
O
N
I
n
v
is
u
al
task
s
u
s
in
g
c
o
n
v
o
lu
t
io
n
al
n
eu
r
al
n
etwo
r
k
s
(
C
NNs
)
,
it
ca
n
b
e
ch
allen
g
i
n
g
f
o
r
t
h
e
m
o
d
els
to
p
r
o
ce
s
s
all
o
f
t
h
e
in
p
u
t
d
ata
b
ec
au
s
e
o
f
its
s
ize
an
d
co
m
p
l
ex
ity
.
I
n
o
r
d
er
t
o
s
o
lv
e
th
is
c
h
allen
g
e,
atten
tio
n
m
ec
h
an
is
m
s
ar
e
p
r
o
p
o
s
ed
to
h
elp
C
NNs
f
o
cu
s
o
n
th
e
m
o
s
t
r
elev
an
t
f
ea
tu
r
es
o
f
t
h
e
in
p
u
t
an
d
ig
n
o
r
e
t
h
e
ir
r
elev
an
t
o
n
es
an
d
th
er
eb
y
i
m
p
r
o
v
i
n
g
th
e
ac
cu
r
ac
y
an
d
e
f
f
icien
cy
o
f
th
e
lear
n
in
g
p
r
o
c
ess
.
Dep
en
d
in
g
o
n
d
if
f
er
en
t
C
NN
ar
ch
itectu
r
es
an
d
th
e
lea
r
n
in
g
tar
g
ets
we
h
a
v
e
d
if
f
er
e
n
t
ty
p
es o
f
atten
tio
n
m
ec
h
an
is
m
s
th
at
ca
n
b
e
d
ep
lo
y
ed
.
Fo
r
h
u
m
an
ab
n
o
r
m
al
b
eh
av
i
o
r
d
etec
tio
n
,
th
e
p
o
p
u
lar
atten
tio
n
m
ec
h
a
n
is
m
s
ad
d
ed
in
C
NNs
ar
e
tem
p
o
r
al
atten
tio
n
,
s
p
atial
atten
tio
n
,
an
d
th
e
c
o
m
b
in
atio
n
o
f
s
p
atial
an
d
tem
p
o
r
al
atten
tio
n
s
.
Sp
atial
atten
tio
n
m
ec
h
an
is
m
s
an
s
wer
th
e
q
u
esti
o
n
o
f
wh
er
e
to
p
ay
atten
tio
n
to
th
e
im
ag
e
.
T
h
ey
ar
e
m
ain
ly
ad
d
e
d
to
C
NN
m
o
d
u
l
es
as
ad
d
itio
n
al
lay
er
s
f
o
r
ex
t
r
ac
tin
g
im
p
o
r
tan
t
s
p
atial
f
ea
t
u
r
es
f
r
o
m
th
e
C
NN
o
u
tp
u
ts
.
T
h
e
f
r
am
ew
o
r
k
o
f
C
NN
-
lo
n
g
-
s
h
o
r
t
ter
m
m
e
m
o
r
y
(
L
STM
)
with
atten
tio
n
u
n
its
is
p
r
o
p
o
s
ed
f
o
r
h
u
m
an
ab
n
o
r
m
al
b
eh
av
io
r
d
etec
tio
n
f
r
o
m
v
id
eo
s
[
1
]
.
Firstl
y
,
t
h
e
in
p
u
t
im
ag
es
s
am
p
led
f
r
o
m
th
e
v
id
e
o
s
ar
e
p
r
e
-
p
r
o
ce
s
s
ed
b
y
c
o
n
v
e
r
tin
g
t
o
g
r
ay
s
ca
le,
e
q
u
alizin
g
t
h
e
h
is
to
g
r
am
,
a
n
d
r
esh
ap
in
g
to
a
s
m
aller
s
ize.
T
h
ey
ar
e
p
u
t
in
to
C
NN
lay
er
s
,
f
o
llo
wed
b
y
atten
tio
n
u
n
its
,
an
d
f
in
all
y
d
ir
ec
ted
to
th
e
L
STM
lay
er
f
o
r
in
ter
p
r
etin
g
th
e
f
ea
tu
r
es
o
b
tain
ed
f
r
o
m
C
NN.
T
h
e
atte
n
tio
n
u
n
its
wo
r
k
as
s
u
p
p
lem
en
tal
f
ea
t
u
r
e
e
x
tr
ac
ti
o
n
lay
e
r
s
o
f
C
NN.
Ho
wev
er
,
r
an
d
o
m
s
am
p
lin
g
o
f
f
r
am
es
f
r
o
m
v
id
e
o
s
m
ay
s
k
i
p
f
r
am
es
co
n
tain
in
g
u
n
u
s
u
al
b
eh
av
io
r
s
.
T
h
e
m
o
r
e
co
m
p
lex
atten
tio
n
s
tr
u
ctu
r
e
is
p
r
o
p
o
s
ed
in
[
2
]
.
I
n
th
is
wo
r
k
,
m
o
d
els
o
f
AttM
-
C
NN
-
AG
an
d
AttM
-
C
NN
-
Po
r
n
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
15
,
No
.
2
,
Ap
r
il
20
26
:
1
4
4
1
-
1
4
5
0
1442
ar
e
p
r
o
p
o
s
ed
f
o
r
c
h
ild
s
ex
u
al
ab
u
s
e
co
n
ten
t
d
etec
tio
n
.
I
n
th
i
s
,
I
n
ce
p
tio
n
an
d
R
esNet
d
ee
p
n
eu
r
al
n
etwo
r
k
s
ar
e
d
ep
lo
y
e
d
as
b
asic
u
n
its
.
T
wo
atten
tio
n
m
o
d
u
les
ar
e
ad
d
ed
t
o
th
ese
d
ee
p
n
eu
r
al
m
o
d
els
to
h
elp
au
to
m
atica
lly
f
o
cu
s
o
n
k
e
y
r
eg
io
n
s
i
n
th
e
in
p
u
t
f
r
am
e
s
.
T
h
e
atten
tio
n
m
o
d
u
le
co
n
tain
s
a
1
×
1
co
n
v
o
l
u
tio
n
lay
er
,
f
o
llo
wed
b
y
an
elem
en
t
-
wis
e
d
o
t
p
r
o
d
u
ct
with
th
e
f
ea
tu
r
e
v
ec
to
r
o
f
th
e
r
esp
ec
tiv
e
lay
er
.
T
h
is
r
esu
lt
is
th
en
n
o
r
m
alize
d
b
y
th
e
So
f
tMa
x
o
p
er
atio
n
.
T
h
e
n
o
r
m
alize
d
r
esu
lt
ca
n
b
e
co
n
s
id
er
ed
th
e
co
e
f
f
ici
en
ts
o
f
th
e
a
tten
tio
n
g
r
id
,
wh
ic
h
r
ep
r
esen
t
th
e
im
p
o
r
tan
ce
o
f
th
e
elem
en
ts
in
th
e
f
ea
tu
r
e
m
ap
s
at
th
e
ch
o
s
en
lay
er
o
f
th
e
C
NN.
Alth
o
u
g
h
s
o
m
e
p
o
s
itiv
e
r
esu
lts
h
av
e
b
ee
n
ac
h
iev
ed
o
n
s
elf
-
co
llected
d
atab
a
s
es,
th
er
e
ar
e
s
t
ill
s
o
m
e
lim
ita
tio
n
s
to
th
is
wo
r
k
.
T
h
e
d
etec
tio
n
r
esu
lts
o
f
c
h
ild
s
ex
u
al
ab
u
s
e
d
e
p
en
d
o
n
th
e
ag
e
-
g
r
o
u
p
class
if
icatio
n
m
o
d
u
le,
wh
ich
r
elies o
n
th
e
h
u
m
an
f
ac
e
b
u
t
n
o
o
th
er
h
elp
f
u
l
f
ea
tu
r
es.
T
h
is
h
as
led
to
s
o
m
e
ch
ild
s
ex
u
al
a
b
u
s
e
im
ag
es
b
ein
g
m
is
class
if
ied
b
y
f
ailu
r
e
o
f
a
g
e
-
g
r
o
u
p
class
if
icatio
n
.
T
em
p
o
r
al
atten
tio
n
m
ec
h
an
is
m
s
an
s
wer
th
e
q
u
esti
o
n
o
f
wh
e
n
to
p
ay
atten
tio
n
o
r
w
h
ich
f
r
a
m
es
s
h
o
u
ld
b
e
f
o
cu
s
ed
in
a
f
r
am
e
s
eq
u
e
n
ce
o
f
th
e
v
id
eo
.
T
em
p
o
r
al
a
tten
tio
n
m
o
d
u
les
ar
e
n
o
r
m
all
y
ap
p
lied
f
o
r
v
id
e
o
p
r
o
ce
s
s
in
g
.
I
t
r
elate
s
to
th
e
m
o
tio
n
p
atter
n
s
th
at
a
r
e
c
o
m
m
o
n
ly
ex
tr
ac
te
d
b
y
t
h
e
r
ec
u
r
r
e
n
t
n
e
u
r
al
n
etwo
r
k
(
R
NN
)
n
etwo
r
k
.
I
n
th
e
c
o
m
b
in
e
d
ar
c
h
itectu
r
e
o
f
C
NN
an
d
L
STM
[
3
]
,
C
NN
m
o
d
el
is
u
s
ed
f
o
r
p
r
o
d
u
cin
g
th
e
s
p
atial
f
ea
tu
r
es
f
r
o
m
th
e
in
p
u
t
f
r
a
m
e.
T
h
ese
f
ea
tu
r
es
ar
e
t
h
en
d
ir
ec
ted
in
to
t
h
e
L
STM
m
o
d
u
le
to
g
en
er
ate
tem
p
o
r
al
f
ea
tu
r
es.
T
h
e
f
ea
tu
r
e
m
a
p
s
o
f
th
e
L
STM
co
m
p
o
n
e
n
t
ar
e
th
e
n
f
ed
in
to
an
atten
tio
n
m
o
d
u
le
to
ca
p
t
u
r
e
v
alu
ab
le
an
d
in
f
o
r
m
ativ
e
f
ea
tu
r
es
in
th
e
f
r
am
e
o
f
t
h
e
v
id
e
o
.
T
h
e
ac
tio
n
s
ar
e
r
ec
o
g
n
ized
b
y
t
h
e
in
f
o
r
m
ativ
e
f
ea
tu
r
es
u
s
in
g
th
e
So
f
t
M
ax
m
o
d
u
le.
C
h
o
n
g
an
d
T
ay
[
4
]
p
r
o
p
o
s
ed
a
s
p
atio
tem
p
o
r
al
a
u
to
en
c
o
d
er
u
s
in
g
C
o
n
v
L
STM
to
j
o
in
tly
m
o
d
el
s
p
atial
an
d
tem
p
o
r
al
in
f
o
r
m
atio
n
in
v
id
eo
s
eq
u
e
n
ce
s
.
T
h
e
m
o
d
el
l
ea
r
n
s
n
o
r
m
al
m
o
tio
n
p
atter
n
s
in
an
u
n
s
u
p
er
v
is
ed
m
an
n
er
an
d
d
etec
ts
ab
n
o
r
m
al
ev
en
ts
b
y
m
ea
s
u
r
in
g
r
ec
o
n
s
tr
u
c
tio
n
er
r
o
r
s
o
n
u
n
s
ee
n
v
id
e
o
f
r
a
m
es.
T
h
e
ex
ten
s
iv
e
ex
p
er
im
en
ts
o
n
th
e
UC
F
-
C
r
im
e
[
5
]
,
UM
N
[
6
]
,
an
d
Av
en
u
e
[
7
]
d
atasets
in
d
icate
th
e
b
etter
r
esu
lts
co
m
p
ar
ed
to
o
th
er
s
tate
-
of
-
th
e
-
ar
t
(
SOTA
)
m
o
d
els.
T
h
e
ab
o
v
e
-
m
e
n
tio
n
ed
wo
r
k
s
d
ep
lo
y
tem
p
o
r
al
atten
tio
n
m
ec
h
an
is
m
s
i
n
th
e
s
am
e
m
an
n
er
:
s
p
atial
f
ea
t
u
r
e
ex
tr
ac
tio
n
f
ir
s
t,
th
en
tem
p
o
r
al
f
ea
tu
r
e
ex
tr
ac
tio
n
,
an
d
f
in
ally
an
atten
tio
n
m
ec
h
an
is
m
is
ap
p
lied
f
o
r
wei
g
h
tin
g
t
h
e
tem
p
o
r
al
f
ea
tu
r
es.
C
h
an
g
et
a
l.
[
8
]
p
r
o
p
o
s
ed
a
cl
u
s
ter
in
g
-
d
r
i
v
en
d
ee
p
au
to
en
co
d
er
f
r
a
m
ewo
r
k
f
o
r
v
id
eo
an
o
m
aly
d
etec
tio
n
.
I
n
th
i
s
ap
p
r
o
ac
h
,
s
p
atio
tem
p
o
r
al
f
e
atu
r
es
ar
e
ex
tr
ac
te
d
f
r
o
m
r
ed
,
g
r
ee
n
,
a
n
d
b
lu
e
(
R
GB
)
f
r
am
es
an
d
o
p
tical
f
lo
w
(
OF)
u
s
in
g
two
s
ep
ar
ate
3
D
C
NN
n
etwo
r
k
s
,
an
d
s
u
b
s
eq
u
en
tly
f
u
s
ed
to
f
o
r
m
u
n
if
ied
v
id
eo
s
eg
m
en
t
r
ep
r
esen
tatio
n
s
.
A
d
ee
p
a
u
to
en
c
o
d
er
i
s
em
p
lo
y
e
d
to
lear
n
co
m
p
ac
t
f
ea
tu
r
e
em
b
e
d
d
in
g
s
,
wh
ile
clu
s
ter
in
g
is
in
tr
o
d
u
ce
d
to
ex
p
lo
it
th
e
in
tr
in
s
ic
s
tr
u
ctu
r
e
o
f
n
o
r
m
al
an
d
a
b
n
o
r
m
al
e
v
en
ts
in
a
wea
k
ly
s
u
p
er
v
is
ed
m
an
n
er
.
T
o
f
u
r
th
er
en
h
a
n
ce
an
o
m
al
y
d
is
cr
im
in
atio
n
,
m
u
ltip
le
co
n
s
tr
ain
ts
,
in
clu
d
in
g
e
v
en
t
s
ep
ar
atio
n
a
n
d
tem
p
o
r
al
s
m
o
o
th
n
ess
,
ar
e
i
n
co
r
p
o
r
ate
d
d
u
r
in
g
tr
ain
in
g
.
E
x
p
er
im
en
tal
r
esu
lts
o
n
t
h
e
UC
F
-
C
r
im
e
d
ataset
d
em
o
n
s
tr
ate
th
at
th
e
p
r
o
p
o
s
ed
m
eth
o
d
ac
h
iev
es
s
u
p
er
io
r
p
er
f
o
r
m
an
ce
co
m
p
ar
e
d
to
ex
is
tin
g
ap
p
r
o
ac
h
es in
d
etec
tin
g
a
n
o
m
alo
u
s
ev
e
n
ts
.
Sp
atial
atten
tio
n
aim
s
to
em
p
h
asize
d
is
cr
im
in
ativ
e
r
eg
io
n
s
with
in
in
d
iv
id
u
al
v
id
eo
f
r
a
m
es,
wh
ile
tem
p
o
r
al
atten
tio
n
f
o
cu
s
es
o
n
id
en
tify
in
g
in
f
o
r
m
ativ
e
f
r
am
e
s
o
r
s
eg
m
en
ts
in
a
v
id
eo
s
eq
u
en
ce
.
Fo
r
ab
n
o
r
m
al
h
u
m
an
b
eh
av
i
o
r
d
etec
tio
n
,
w
h
ich
r
elies
o
n
b
o
th
a
p
p
ea
r
a
n
c
e
an
d
m
o
tio
n
cu
es,
j
o
in
tly
m
o
d
elin
g
s
p
atial
an
d
tem
p
o
r
al
i
n
f
o
r
m
atio
n
is
ess
en
tial
f
o
r
im
p
r
o
v
in
g
d
etec
tio
n
p
er
f
o
r
m
a
n
ce
.
T
h
e
c
o
m
b
in
at
io
n
o
f
s
p
atial
a
n
d
tem
p
o
r
al
atten
tio
n
e
n
ab
les
ad
ap
tiv
e
s
elec
tio
n
o
f
im
p
o
r
tan
t
r
eg
io
n
s
an
d
m
o
m
e
n
ts
f
r
o
m
v
i
d
eo
s
.
L
i
et
a
l.
[
9
]
p
r
o
p
o
s
ed
a
s
p
atio
-
tem
p
o
r
al
att
en
tio
n
n
etwo
r
k
f
o
r
ac
tio
n
r
ec
o
g
n
itio
n
a
n
d
d
etec
tio
n
,
wh
er
e
s
p
atial
an
d
tem
p
o
r
al
atten
tio
n
m
o
d
u
les
ar
e
em
b
e
d
d
ed
in
t
o
a
C
NN
to
en
h
an
ce
d
is
cr
im
in
ativ
e
f
ea
tu
r
e
r
ep
r
esen
tatio
n
s
.
T
h
e
s
p
atial
atten
tio
n
m
o
d
u
le
h
ig
h
lig
h
ts
in
f
o
r
m
ativ
e
r
eg
i
o
n
s
in
v
id
eo
f
r
a
m
es,
wh
ile
th
e
tem
p
o
r
al
atten
t
io
n
m
o
d
u
le
ass
ig
n
s
im
p
o
r
tan
ce
weig
h
ts
to
k
e
y
f
r
a
m
es
in
a
v
id
e
o
s
eq
u
en
ce
.
E
x
p
er
im
en
tal
r
esu
lts
o
n
t
h
e
H
MD
B
5
1
an
d
UC
F1
0
1
d
atasets
d
em
o
n
s
tr
ate
th
at
in
co
r
p
o
r
atin
g
s
p
atio
-
tem
p
o
r
al
atten
tio
n
s
ig
n
if
ican
tly
im
p
r
o
v
e
s
p
er
f
o
r
m
a
n
ce
co
m
p
ar
ed
to
m
o
d
els
with
o
u
t
a
tten
tio
n
m
ec
h
an
is
m
s
.
B
u
ild
in
g
u
p
o
n
atte
n
tio
n
-
b
ased
m
o
d
elin
g
,
C
h
en
et
a
l.
[
1
0
]
in
tr
o
d
u
ce
d
a
s
p
atial
–
tem
p
o
r
a
l
g
r
ap
h
atten
tio
n
n
etwo
r
k
f
o
r
v
id
eo
a
n
o
m
aly
d
etec
tio
n
.
I
n
t
h
is
ap
p
r
o
ac
h
,
s
p
atio
tem
p
o
r
al
f
ea
tu
r
es
e
x
tr
ac
ted
b
y
a
3
D
C
NN
b
ac
k
b
o
n
e
a
r
e
o
r
g
an
ized
in
to
a
s
p
atial
–
tem
p
o
r
al
g
r
ap
h
,
w
h
er
e
g
r
ap
h
atten
tio
n
m
ec
h
a
n
is
m
s
ar
e
em
p
l
o
y
ed
to
ca
p
tu
r
e
s
p
a
tial
r
elatio
n
s
h
ip
s
am
o
n
g
r
eg
i
o
n
s
an
d
tem
p
o
r
al
d
ep
en
d
e
n
cies
ac
r
o
s
s
f
r
am
es.
E
x
p
er
im
en
tal
r
esu
lts
o
n
th
e
U
C
F
-
C
r
im
e
d
ataset
an
d
a
v
eh
icl
e
th
ef
t
d
ataset
s
h
o
w
th
at
th
e
f
u
s
io
n
o
f
s
p
atial
an
d
tem
p
o
r
al
g
r
ap
h
atten
tio
n
o
u
tp
er
f
o
r
m
s
u
s
in
g
eith
er
atten
tio
n
alo
n
e
as
well
as
m
eth
o
d
s
with
o
u
t
atten
tio
n
.
H
o
wev
er
,
d
u
e
t
o
th
e
r
elian
ce
o
n
lo
ca
l
g
r
ap
h
s
tr
u
ctu
r
es,
t
h
e
m
eth
o
d
m
ay
e
x
h
ib
it
lim
ited
ca
p
ab
ilit
y
in
m
o
d
elin
g
lo
n
g
-
r
an
g
e
tem
p
o
r
al
d
ep
en
d
en
cies
in
ex
ten
d
e
d
v
id
eo
s
e
q
u
en
ce
s
.
T
o
a
d
d
r
ess
lo
n
g
-
ter
m
tem
p
o
r
al
m
o
d
elin
g
,
L
iu
et
a
l.
[
1
1
]
p
r
o
p
o
s
ed
a
te
m
p
o
r
al
s
eg
m
e
n
t
tr
an
s
f
o
r
m
er
f
r
am
ewo
r
k
with
a
n
em
b
ed
d
e
d
s
p
atial
–
tem
p
o
r
al
a
tten
tio
n
m
ec
h
an
is
m
f
o
r
a
b
n
o
r
m
al
b
eh
av
io
r
r
ec
o
g
n
itio
n
.
B
y
s
am
p
lin
g
v
id
eo
s
eg
m
en
ts
o
v
er
lo
n
g
er
tim
e
s
p
an
s
,
th
e
m
o
d
el
ca
p
tu
r
es
lo
n
g
-
r
an
g
e
tem
p
o
r
al
d
e
p
en
d
e
n
cies
wh
ile
s
u
p
p
r
es
s
in
g
ir
r
elev
an
t
f
r
am
es
an
d
r
e
g
io
n
s
.
E
x
p
er
im
en
tal
r
esu
lts
o
n
UC
F1
0
1
,
HM
DB
5
1
,
J
HM
DB
,
an
d
T
HUM
OS1
4
d
ataset
s
d
em
o
n
s
tr
ate
th
at
in
co
r
p
o
r
atin
g
s
p
atial
–
tem
p
o
r
al
atten
tio
n
s
ig
n
if
ican
tly
im
p
r
o
v
es
r
ec
o
g
n
itio
n
p
er
f
o
r
m
an
ce
co
m
p
ar
ed
to
m
o
d
els
with
o
u
t
atten
tio
n
m
ec
h
an
is
m
s.
I
n
th
is
wo
r
k
,
we
d
e
p
lo
y
b
o
th
s
p
atial
an
d
tem
p
o
r
al
atten
tio
n
u
n
its
f
o
r
ab
n
o
r
m
al
b
e
h
av
io
r
d
etec
tio
n
.
Ho
wev
er
,
it
is
d
if
f
er
en
t
f
r
o
m
o
th
er
p
u
b
lis
h
ed
m
et
h
o
d
s
,
o
u
r
p
r
o
p
o
s
ed
f
r
am
ewo
r
k
s
a
p
p
ly
atten
tio
n
u
n
its
o
n
th
r
ee
in
p
u
ts
o
f
R
GB
,
OF
,
an
d
h
ea
t
m
ap
(
HM
)
im
ag
es.
T
h
e
atten
tio
n
f
ea
tu
r
e
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
TMA
-
N
et:
a
tr
a
n
s
fo
r
mer
-
b
a
s
e
d
mu
lti
-
mo
d
a
l
a
tten
tio
n
n
etw
o
r
k
fo
r
…
(
Hu
o
n
g
-
Gia
n
g
Do
a
n
)
1443
v
ec
to
r
s
f
r
o
m
th
ese
in
p
u
ts
ar
e
th
en
o
p
tim
ally
co
m
b
i
n
ed
to
g
iv
e
o
u
t
th
e
f
in
al
o
n
es
f
o
r
class
if
i
ca
tio
n
.
T
h
is
allo
ws
ef
f
ec
tiv
e
ex
p
lo
itatio
n
an
d
f
o
c
u
s
es
o
n
th
e
im
p
o
r
tan
t
im
ag
e
f
ea
tu
r
es
th
at
n
ee
d
to
b
e
d
etec
ted
f
r
o
m
m
an
y
in
p
u
t
s
o
u
r
ce
s
.
I
n
ad
d
itio
n
,
we
also
a
p
p
ly
th
e
k
n
o
wled
g
e
d
is
till
atio
n
tech
n
iq
u
e
t
o
th
e
p
r
o
p
o
s
ed
f
r
am
ewo
r
k
.
T
h
is
aim
s
at
r
ed
u
cin
g
th
e
co
m
p
u
tin
g
tim
e
o
f
th
e
d
etec
tio
n
s
y
s
tem
.
T
h
e
en
h
an
ce
d
ex
p
er
im
en
ts
ar
e
im
p
lem
en
ted
o
n
s
ev
er
al
b
en
ch
m
ar
k
d
atasets
an
d
o
u
r
d
ataset
u
s
in
g
b
o
t
h
s
in
g
le
-
d
ataset
an
d
cr
o
s
s
-
da
taset
ev
alu
a
tio
n
s
tr
ateg
ies.
T
h
e
r
esu
lts
s
h
o
w
th
e
o
u
tp
er
f
o
r
m
an
ce
o
f
o
u
r
p
r
o
p
o
s
ed
f
r
am
ewo
r
k
in
d
etec
tio
n
ac
cu
r
ac
y
co
m
p
ar
ed
to
o
th
er
SOTA
m
eth
o
d
s
.
Fu
r
th
er
m
o
r
e,
we
a
ls
o
d
em
o
n
s
tr
ate
th
r
o
u
g
h
th
e
ex
p
er
im
en
ts
th
at
u
s
in
g
k
n
o
wled
g
e
d
is
till
atio
n
tech
n
iq
u
e
n
o
t
o
n
l
y
r
e
d
u
ce
s
c
o
m
p
u
tatio
n
co
s
t
b
u
t
also
m
ain
ta
in
s
h
ig
h
ac
cu
r
ac
y
i
n
ab
n
o
r
m
al
b
eh
av
io
r
d
etec
tio
n
.
T
h
e
r
em
ain
d
e
r
o
f
t
h
is
p
ap
er
is
o
r
g
an
ized
as
f
o
llo
ws:
s
ec
ti
o
n
2
f
ir
s
tly
ex
p
lain
s
th
e
p
r
o
p
o
s
ed
ev
alu
atio
n
s
ch
em
e.
T
h
e
ex
p
er
im
e
n
tal
r
esu
lts
an
d
d
is
cu
s
s
io
n
s
ar
e
an
aly
ze
d
in
s
ec
tio
n
3
.
Fin
ally
,
s
ec
ti
o
n
4
co
n
clu
d
es
th
e
p
r
o
p
o
s
ed
r
esear
ch
d
ir
ec
tio
n
s
f
o
r
f
u
tu
r
e
wo
r
k
s
.
2.
M
E
T
H
O
D
T
h
e
p
r
o
p
o
s
ed
f
r
am
ewo
r
k
f
o
r
ab
n
o
r
m
al
b
e
h
av
io
r
d
etec
tio
n
,
illu
s
tr
ated
in
Fig
u
r
e
1
,
is
a
d
ap
ted
an
d
ex
ten
d
ed
f
r
o
m
o
u
r
p
r
ev
io
u
s
wo
r
k
[
1
2
]
,
[
1
3
]
.
I
t
tak
es
t
h
r
ee
i
n
p
u
t
m
o
d
alities
th
at
co
n
s
is
t
o
f
R
GB
,
OF
,
an
d
HM
im
ag
es
to
co
m
p
r
eh
en
s
iv
ely
r
e
p
r
esen
t
s
p
atial,
tem
p
o
r
al,
an
d
m
o
tio
n
-
en
e
r
g
y
in
f
o
r
m
atio
n
.
An
ad
d
itio
n
al
atten
tio
n
b
lo
ck
th
at
is
h
ig
h
lig
h
ted
in
p
i
n
k
in
Fig
u
r
e
1
.
T
h
is
f
r
am
ewo
r
k
is
in
co
r
p
o
r
ated
to
en
h
an
ce
m
u
lti
-
m
o
d
al
f
ea
tu
r
e
in
ter
ac
tio
n
an
d
im
p
r
o
v
e
d
etec
t
io
n
p
e
r
f
o
r
m
a
n
ce
with
v
is
io
n
tr
an
s
f
o
r
m
er
(
ViT
)
[
1
4
]
f
ea
tu
r
e
e
x
tr
ac
tio
n
an
d
cr
o
s
s
atten
tio
n
m
o
d
alities
s
tr
ateg
ies.
Fig
u
r
e
1
.
T
h
e
ViT
an
d
cr
o
s
s
atten
tio
n
m
o
d
alities
-
b
ased
f
r
am
ewo
r
k
f
o
r
ab
n
o
r
m
al
d
etec
tio
n
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
15
,
No
.
2
,
Ap
r
il
20
26
:
1
4
4
1
-
1
4
5
0
1444
2
.
1
.
H
u
m
a
n det
ec
t
io
n a
nd
f
e
a
t
ure
ex
t
ra
c
t
io
n
T
h
e
i
n
i
tial
s
ta
g
e
o
f
t
h
e
p
r
o
p
o
s
e
d
s
y
s
te
m
em
p
l
o
y
s
a
y
o
u
o
n
l
y
l
o
o
k
o
n
ce
(
Y
OL
O
)
-
b
ase
d
d
e
te
cti
o
n
m
o
d
u
le
to
lo
ca
l
iz
e
h
u
m
a
n
r
eg
io
n
s
i
n
ea
c
h
f
r
a
m
e
d
er
iv
e
d
f
r
o
m
R
G
B
,
OF
,
a
n
d
H
M
m
o
d
alit
ies
.
G
iv
e
n
a
n
i
n
p
u
t
f
r
a
m
e
∈
3
,
YO
L
O
p
r
e
d
icts
a
s
e
t
o
f
b
o
u
n
d
i
n
g
b
o
x
es
=
{
,
=
(
1
,
…
,
)
}
wit
h
c
o
r
r
esp
o
n
d
i
n
g
c
o
n
f
i
d
en
ce
s
co
r
es
,
w
h
e
r
e
e
ac
h
b
o
x
d
en
o
t
es t
h
e
c
e
n
te
r
c
o
o
r
d
i
n
a
tes
a
n
d
d
im
e
n
s
i
o
n
s
o
f
a
d
ete
cte
d
p
er
s
o
n
as
(
1
)
.
=
11
(
)
,
=
(
(
)
+
)
(
1
)
W
h
er
e
(
)
r
ep
r
esen
ts
th
e
f
ea
tu
r
e
v
ec
to
r
o
f
t
h
e
ca
n
d
id
ate
r
eg
io
n
an
d
(
.
)
is
th
e
s
ig
m
o
id
ac
tiv
atio
n
.
R
eg
io
n
s
with
ar
e
r
etain
ed
as
v
alid
d
ete
ctio
n
s
an
d
cr
o
p
p
ed
f
o
r
s
u
b
s
eq
u
en
t
p
r
o
ce
s
s
in
g
.
E
ac
h
d
etec
te
d
r
eg
io
n
is
r
esized
to
a
f
ix
ed
s
p
atial
r
eso
lu
tio
n
o
f
2
2
4
×
2
2
4
p
i
x
els as (
2
)
.
=
(
[
]
,
224
,
224
)
(
2
)
T
h
ese
h
u
m
an
i
m
ag
es
th
e
n
p
ass
ed
in
to
a
ViT
-
B
/1
6
en
co
d
er
t
o
o
b
tain
a
d
is
cr
im
in
ativ
e
f
ea
t
u
r
e
r
e
p
r
esen
tatio
n
.
E
ac
h
im
ag
e
is
d
iv
id
ed
in
to
n
o
n
-
o
v
er
la
p
p
in
g
1
6
×
1
6
p
atch
es,
f
latten
ed
,
an
d
p
r
o
jecte
d
in
to
a
laten
t
em
b
ed
d
i
n
g
s
p
ac
e
v
ia
a
lin
ea
r
m
a
p
p
in
g
as (
3
)
.
=
+
(
3
)
W
h
er
e
E
is
th
e
p
atch
-
em
b
ed
d
i
n
g
m
atr
ix
an
d
d
en
o
tes th
e
p
o
s
itio
n
al
en
co
d
in
g
.
T
h
e
p
atch
s
e
q
u
en
ce
is
th
en
p
r
o
ce
s
s
ed
b
y
m
u
ltip
le
tr
an
s
f
o
r
m
er
en
co
d
er
lay
er
s
to
m
o
d
el
g
lo
b
al
d
ep
e
n
d
en
cies a
cr
o
s
s
p
at
ch
es a
s
(
4
)
.
∈
=
−
16
(
)
(
4
)
W
ith
C
b
ein
g
th
e
o
u
tp
u
t
em
b
e
d
d
in
g
d
im
e
n
s
io
n
(
ty
p
ically
C
=7
6
8
)
.
T
h
e
r
esu
ltin
g
v
ec
to
r
en
co
d
es
th
e
s
em
an
tic
an
d
s
p
atial
co
n
tex
t
o
f
ea
c
h
d
et
ec
ted
h
u
m
an
r
eg
io
n
.
T
h
e
f
ea
t
u
r
e
s
ets
ex
tr
ac
ted
f
r
o
m
all
r
eg
io
n
s
in
ea
ch
m
o
d
ality
ar
e
d
en
o
te
d
as a
s
(
5
)
.
=
{
}
,
=
{
}
,
=
{
}
(
5
)
T
h
ese
f
e
at
u
r
e
e
m
b
e
d
d
i
n
g
s
s
e
r
v
e
as
i
n
p
u
ts
t
o
t
h
e
s
u
b
s
e
q
u
e
n
t
m
u
lti
-
m
o
d
al
a
tte
n
t
io
n
f
u
s
i
o
n
m
o
d
u
l
e,
wh
i
ch
p
er
f
o
r
m
s
cr
o
s
s
-
at
te
n
ti
o
n
an
d
l
ate
f
u
s
i
o
n
t
o
i
n
t
eg
r
a
te
s
p
a
tial
,
t
em
p
o
r
al
,
a
n
d
m
o
t
io
n
-
e
n
e
r
g
y
i
n
f
o
r
m
ati
o
n
f
o
r
ab
n
o
r
m
a
l
h
u
m
a
n
ac
t
io
n
r
ec
o
g
n
it
io
n
.
T
h
e
c
o
m
b
i
n
at
i
o
n
o
f
Y
OL
O
a
n
d
V
iT
-
B
/
1
6
l
e
v
e
r
a
g
es
th
e
s
t
r
e
n
g
t
h
s
o
f
b
o
t
h
m
o
d
els:
i
)
YOL
O
p
r
o
v
i
d
es
ef
f
i
cie
n
t
,
r
ea
l
-
tim
e
o
b
je
ct
l
o
ca
l
iza
t
io
n
,
e
n
s
u
r
i
n
g
p
r
ec
is
e
h
u
m
an
-
r
e
g
i
o
n
e
x
t
r
a
cti
o
n
a
n
d
b
a
ck
g
r
o
u
n
d
s
u
p
p
r
ess
i
o
n
;
a
n
d
i
i
)
Vi
T
-
B
/1
6
e
n
co
d
es g
l
o
b
a
l
c
o
n
te
x
t
u
al
d
e
p
e
n
d
e
n
c
ies
wi
th
in
e
ac
h
c
r
o
p
p
e
d
r
e
g
i
o
n
th
r
o
u
g
h
its
s
el
f
-
att
e
n
ti
o
n
m
ec
h
an
is
m
.
T
h
is
h
y
b
r
id
d
esi
g
n
e
n
a
b
les
t
h
e
m
o
d
el
t
o
ca
p
t
u
r
e
b
o
t
h
lo
ca
l
s
p
ati
al
d
et
ail
an
d
g
lo
b
a
l s
em
an
tic
c
o
n
t
ex
t,
f
o
r
m
i
n
g
a
r
o
b
u
s
t
f
e
at
u
r
e
f
o
u
n
d
a
tio
n
f
o
r
s
u
b
s
e
q
u
e
n
t
c
r
o
s
s
-
m
o
d
al
f
u
s
i
o
n
.
2
.
2
.
M
ulti
-
m
o
da
l
a
t
t
ent
io
n f
us
io
n a
nd
cla
s
s
if
ica
t
io
n
2
.
2
.
1
.
I
ntr
a
-
bra
nch self
-
a
t
t
e
ntio
n
E
ac
h
m
o
d
ality
is
f
ir
s
t
r
ef
in
ed
i
n
d
ep
en
d
en
tly
th
r
o
u
g
h
s
elf
-
atte
n
tio
n
m
ec
h
a
n
is
m
to
en
h
an
ce
i
n
tr
a
-
m
o
d
al
r
elatio
n
s
h
ip
s
.
Giv
en
th
r
ee
m
o
d
alities
∈
{
,
,
}
.
Qu
er
y
,
k
e
y
an
d
v
alu
e
s
ar
e
co
m
p
u
te
d
as (
6
)
.
=
,
=
,
=
(
)
=
(
(
)
√
)
(
6
)
W
h
er
e
,
,
ar
e
lear
n
ab
le
p
r
o
ject
io
n
m
atr
ices
a
n
d
is
th
e
k
ey
d
i
m
en
s
io
n
.
T
h
is
o
p
er
atio
n
em
p
h
asize
s
th
e
m
o
s
t in
f
o
r
m
ativ
e
s
p
atial
o
r
tem
p
o
r
al
r
e
g
io
n
s
with
in
ea
ch
m
o
d
ality
.
2
.
2
.
2
.
Cro
s
s
-
m
o
da
l a
t
t
ent
io
n
f
us
io
n (
ea
rly
f
us
io
n
)
T
o
s
y
n
c
h
r
o
n
ize
co
n
tex
tu
al
i
n
f
o
r
m
atio
n
am
o
n
g
m
o
d
alities
,
t
h
e
o
u
tp
u
ts
o
f
th
e
s
elf
-
atten
tio
n
m
o
d
u
les
ar
e
f
u
s
ed
v
ia
a
cr
o
s
s
-
atten
tio
n
b
etwe
en
m
o
d
alities
b
lo
ck
.
Fo
r
in
s
tan
ce
,
th
e
R
GB
b
r
an
ch
atte
n
d
s
to
OF
an
d
HM
f
ea
tu
r
es a
s
(
7
)
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
TMA
-
N
et:
a
tr
a
n
s
fo
r
mer
-
b
a
s
e
d
mu
lti
-
mo
d
a
l
a
tten
tio
n
n
etw
o
r
k
fo
r
…
(
Hu
o
n
g
-
Gia
n
g
Do
a
n
)
1445
(
,
,
)
=
(
[
,
]
√
)
[
,
]
(
7
)
A
n
d
an
alo
g
o
u
s
ly
f
o
r
(
)
an
d
(
)
.
T
h
is
s
tag
e
r
ep
r
esen
ts
ea
r
ly
f
u
s
io
n
,
alig
n
in
g
m
u
lti
-
m
o
d
al
s
em
an
tics
ac
r
o
s
s
s
p
atial,
tem
p
o
r
al,
an
d
m
o
tio
n
en
er
g
y
d
o
m
a
in
s
.
2
.
2
.
3
.
G
lo
ba
l
a
t
t
ent
io
n blo
ck
(
la
t
e
f
us
io
n
)
T
h
e
alig
n
ed
r
e
p
r
esen
tatio
n
s
ar
e
th
en
f
ed
in
t
o
a
g
lo
b
al
atten
ti
o
n
b
lo
c
k
th
at
p
er
f
o
r
m
s
b
o
t
h
s
elf
-
atten
tio
n
an
d
cr
o
s
s
-
atten
tio
n
t
o
p
r
o
d
u
ce
a
u
n
if
ied
j
o
in
t r
ep
r
esen
tatio
n
as (
8
)
.
=
(
∑
(
′
+
∑
(
′
,
′
)
≠
)
3
=
1
)
(
8
)
W
h
e
r
e
′
a
r
e
o
u
t
p
u
t
s
o
f
e
ar
ly
-
f
u
s
i
o
n
s
t
a
g
e,
,
a
r
e
l
ea
r
n
a
b
l
e
w
e
ig
h
t
s
,
a
n
d
N
o
r
m
d
en
o
t
e
s
l
ay
e
r
n
o
r
m
a
l
i
z
a
t
io
n
.
T
h
i
s
l
a
t
e
-
f
u
s
i
o
n
s
t
e
p
c
ap
t
u
r
e
s
h
ig
h
er
-
l
e
v
e
l
i
n
t
er
-
m
o
d
a
l
d
e
p
en
d
en
c
i
e
s
f
o
r
r
o
b
u
s
t
r
e
p
r
e
s
e
n
t
a
t
io
n
l
e
a
r
n
i
n
g
.
2
.
2
.
4
.
F
ea
t
ure
a
g
g
re
g
a
t
io
n a
nd
cla
s
s
if
ica
t
io
n
T
h
e
f
u
s
ed
f
ea
tu
r
e
m
ap
is
f
latten
ed
o
r
g
l
o
b
ally
p
o
o
led
a
n
d
p
ass
ed
th
r
o
u
g
h
a
f
u
lly
c
o
n
n
ec
ted
lay
er
f
o
llo
wed
b
y
a
So
f
tMa
x
a
ctiv
atio
n
as (
9
)
.
=
(
(
)
+
)
(
9
)
Y
ield
in
g
th
e
p
o
s
ter
io
r
p
r
o
b
ab
i
lity
v
ec
to
r
y
=
[
ab
n
o
r
m
al,
n
o
n
-
ab
n
o
r
m
al]
.
T
h
is
d
eter
m
in
es th
e
f
in
al
p
r
ed
ictio
n
o
f
th
e
s
ce
n
e
s
tate.
2
.
3
.
Da
t
a
s
et
s
a
nd
s
ce
na
rio
s
I
n
th
is
wo
r
k
,
s
ev
er
al
b
en
ch
m
a
r
k
d
atasets
an
d
o
n
e
s
elf
-
co
n
s
tr
u
cted
d
ataset
wer
e
em
p
lo
y
e
d
t
o
ev
alu
ate
th
e
p
r
o
p
o
s
ed
m
o
d
el
s
u
ch
as:
t
h
e
UM
N
d
ataset
[
6
]
co
n
tain
s
th
r
ee
in
d
o
o
r
an
d
o
u
td
o
o
r
s
ce
n
es
with
a
to
tal
o
f
4
m
in
1
7
s
o
f
v
id
e
o
at
3
0
f
p
s
(
3
2
0
×2
4
0
p
x
)
.
E
ac
h
s
eq
u
e
n
ce
s
tar
ts
with
n
o
r
m
al
ac
tiv
ities
an
d
e
n
d
s
with
ab
n
o
r
m
al
p
an
ic
b
eh
a
v
io
r
.
T
h
e
C
r
o
wd
-
1
1
d
ataset
[
1
5
]
d
ef
in
es
1
1
cr
o
w
d
m
o
tio
n
p
atter
n
s
in
6
,
0
0
0
v
id
eo
clip
s
(
ab
o
u
t
1
0
0
f
r
am
es
ea
ch
)
,
p
ar
tly
co
llected
f
r
o
m
W
W
W
cr
o
w
d
d
ataset
[
1
6
]
,
C
UHK
[
1
7
]
,
Vio
len
t
-
Flo
ws
[
1
8
]
,
W
o
r
ld
E
x
p
o
1
0
[
1
9
]
,
Ag
o
r
aSet
[
2
0
]
,
PET
S
[
2
1
]
,
UM
N
[
6
]
,
an
d
Ho
c
k
ey
Fig
h
t
[
2
2
]
.
T
h
e
UC
F_
C
C
_
5
0
d
ataset
[
5
]
in
clu
d
es
5
0
h
ig
h
ly
cr
o
wd
ed
im
ag
es
with
6
3
,
9
7
4
an
n
o
tated
p
e
d
estrian
s
(
9
4
to
4
,
5
4
3
p
er
im
a
g
e)
,
p
r
o
v
id
in
g
a
ch
allen
g
i
n
g
b
en
ch
m
ar
k
f
o
r
c
r
o
wd
-
d
en
s
ity
esti
m
atio
n
.
T
h
e
UC
SD
Ped
2
d
ataset
[
2
3
]
co
n
tain
s
2
,
0
0
0
f
r
am
es
o
f
a
s
in
g
le
p
ed
estrian
s
ce
n
e,
with
1
1
t
o
4
6
p
eo
p
le
p
e
r
f
r
am
e
a
n
d
4
9
,
8
8
5
l
ab
eled
in
s
tan
ce
s
.
T
h
e
UB
No
r
m
al
d
ataset
[
2
4
]
h
as
2
3
6
,
9
0
2
s
y
n
t
h
etic
f
r
am
es
g
en
er
ated
f
r
o
m
2
9
n
atu
r
al
s
ce
n
es
(
s
tr
ee
ts
,
s
tatio
n
s
,
o
f
f
ices)
,
ev
en
ly
co
n
tain
in
g
b
o
th
n
o
r
m
al
an
d
ab
n
o
r
m
al
ev
en
ts
.
T
h
e
Sh
an
g
h
aiT
ec
h
d
ataset
[
2
5
]
in
clu
d
es
4
3
7
s
u
r
v
eillan
ce
v
i
d
eo
s
(
3
1
7
,
3
9
8
f
r
am
es,
1
3
s
ce
n
es)
with
1
5
8
an
o
m
alies
in
1
1
ca
teg
o
r
ies,
wid
el
y
u
s
e
d
f
o
r
lar
g
e
-
s
ca
le
an
o
m
al
y
d
et
ec
tio
n
.
T
h
e
C
UHK
A
v
en
u
e
d
ataset
[
7
]
co
n
s
is
ts
o
f
1
5
s
eq
u
e
n
ce
s
(
3
5
,
2
4
0
f
r
am
es)
with
1
4
u
n
u
s
u
al
ev
en
ts
s
u
ch
as
r
u
n
n
in
g
,
th
r
o
win
g
,
an
d
lo
iter
in
g
.
Fin
ally
,
th
e
E
PUAb
N
d
ataset
(
s
e
lf
-
b
u
ilt)
co
m
p
r
is
es
3
0
0
R
GB
v
id
eo
s
ca
p
tu
r
ed
o
u
t
d
o
o
r
s
(
2
6
8
8
×
1
5
2
0
p
ix
els,
3
0
f
p
s
)
u
s
in
g
f
ix
ed
HiKVisi
o
n
DS
-
2
C
D2
6
4
3
G2
-
I
Z
S
ca
m
er
as.
T
h
is
d
ataset
d
ef
in
ed
1
1
ab
n
o
r
m
al
cr
o
wd
b
eh
av
i
o
r
s
,
in
clu
d
in
g
f
i
g
h
tin
g
,
r
o
b
b
e
r
y
,
f
i
r
e,
s
m
o
k
e,
wea
p
o
n
ca
r
r
y
in
g
,
f
allin
g
o
b
jects,
an
d
s
u
d
d
en
v
e
h
icle
en
tr
y
,
with
5
t
o
2
5
p
ar
ticip
an
ts
p
er
s
ce
n
e
.
2
.
4
.
T
he
ev
a
lua
t
io
n c
rit
er
i
a
T
h
e
p
er
f
o
r
m
an
ce
o
f
th
e
p
r
o
p
o
s
ed
m
o
d
el
is
ev
alu
ated
u
s
in
g
m
icr
o
-
ar
ea
u
n
d
e
r
th
e
cu
r
v
e
(
AUC
)
,
m
ac
r
o
-
AUC [
2
4
]
,
an
d
m
icr
o
/
m
ac
r
o
a
cc
u
r
ac
y
[
2
6
]
,
[
2
7
]
.
T
h
e
f
in
al
p
r
e
d
ictio
n
s
co
r
e
o
b
tain
ed
f
r
o
m
th
e
So
f
tMa
x
lay
er
is
th
r
esh
o
ld
(
α
e
q
u
als
0
.
1
to
1
.
0
)
to
class
if
y
ea
ch
f
r
am
e
as
n
o
r
m
al
o
r
ab
n
o
r
m
al,
an
d
a
r
ec
eiv
er
o
p
e
r
atin
g
ch
ar
ac
ter
is
tic
(
R
OC
)
cu
r
v
e
is
co
n
s
tr
u
cted
b
ased
o
n
th
e
tr
u
e
p
o
s
itiv
e
r
ate
(
T
PR
)
an
d
f
alse
p
o
s
itiv
e
r
ate
(
FP
R
)
.
T
h
e
m
icr
o
-
AUC
r
ef
lects
th
e
o
v
er
all
d
etec
tio
n
p
er
f
o
r
m
an
c
e
ac
r
o
s
s
all
test
s
am
p
les,
wh
ile
th
e
m
ac
r
o
-
AUC
av
er
ag
es
th
e
AUC
s
co
r
es
o
v
er
in
d
iv
id
u
al
v
id
e
o
s
.
Acc
u
r
ac
y
is
co
m
p
u
ted
at
α
=0
.
5
,
wh
er
e
m
icr
o
-
ac
cu
r
ac
y
m
ea
s
u
r
es
th
e
g
lo
b
al
class
if
icati
o
n
co
r
r
ec
t
n
ess
an
d
m
ac
r
o
-
a
c
cu
r
ac
y
r
e
p
r
esen
ts
th
e
m
ea
n
a
cc
u
r
ac
y
p
e
r
v
id
e
o
.
T
h
ese
m
etr
ics
co
llectiv
ely
ass
ess
b
o
th
g
lo
b
al
an
d
p
e
r
-
s
ce
n
e
d
etec
tio
n
ef
f
ec
tiv
e
n
ess
.
T
h
ese
m
etr
ics
wer
e
co
n
ce
r
n
e
d
in
d
etail
in
o
u
r
p
r
ev
io
u
s
r
esear
ch
[
1
3
]
.
3.
E
XP
E
R
I
M
E
N
T
A
L
RE
SUL
T
S
All
ev
alu
atio
n
ex
p
e
r
im
en
ts
ar
e
im
p
lem
en
ted
in
Py
th
o
n
u
s
in
g
th
e
Py
T
o
r
ch
d
ee
p
lear
n
i
n
g
f
r
am
ewo
r
k
an
d
ex
ec
u
ted
o
n
a
wo
r
k
s
tatio
n
eq
u
ip
p
e
d
with
an
NVI
DI
A
GPU
with
1
8
GB
m
em
o
r
y
.
Ou
r
m
o
d
els
ar
e
tr
ain
e
d
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
15
,
No
.
2
,
Ap
r
il
20
26
:
1
4
4
1
-
1
4
5
0
1446
f
o
r
1
0
0
ep
o
ch
s
,
ea
r
ly
s
to
p
p
in
g
m
o
d
e,
b
atch
s
ize
3
2
an
d
a
l
ea
r
n
in
g
r
ate
b
etwe
en
10
−
6
to
10
−
4
.
T
h
e
p
r
o
p
o
s
ed
m
eth
o
d
is
ev
alu
ated
o
n
s
ev
er
al
ch
allen
g
in
g
b
en
c
h
m
ar
k
d
at
asets
a
s
p
r
es
en
ted
in
s
ec
tio
n
2
.
3
.
T
wo
ev
alu
atio
n
s
tr
ateg
ies
ar
e
ad
o
p
ted
:
s
in
g
le
d
ataset
ev
alu
atio
n
an
d
cr
o
s
s
d
ataset
ev
alu
ati
o
n
.
I
n
s
in
g
le
d
ataset
ev
alu
atio
n
,
ea
ch
d
ataset
is
d
iv
id
ed
in
to
t
r
ain
in
g
an
d
test
in
g
s
p
lits
ac
co
r
d
i
n
g
to
its
o
r
ig
in
al
p
r
o
to
co
l.
I
n
c
r
o
s
s
d
ataset
ev
alu
atio
n
,
o
n
e
d
ataset
is
u
s
ed
en
tire
ly
f
o
r
tr
ain
in
g
,
wh
ile
an
o
th
e
r
is
u
s
ed
f
o
r
test
in
g
to
ex
am
i
n
e
th
e
m
o
d
el’
s
cr
o
s
s
-
d
o
m
ain
g
en
er
aliza
tio
n
ca
p
ab
ilit
y
.
T
h
e
p
r
o
p
o
s
ed
f
r
am
ewo
r
k
is
ev
al
u
ated
u
n
d
er
b
o
th
s
tr
ateg
ies,
an
d
th
eir
r
esu
lts
ar
e
p
r
esen
ted
in
th
e
f
o
llo
win
g
s
ec
tio
n
s
.
3
.
1
.
Sin
g
le
-
da
t
a
s
et
e
v
a
lua
t
io
n
T
h
e
s
in
g
le
d
ataset
ev
alu
atio
n
wer
e
co
n
d
u
cted
u
s
in
g
AUC
[
2
4
]
an
d
ac
c
u
r
ac
y
[
2
6
]
,
[
2
7
]
m
etr
ics ac
r
o
s
s
s
ix
b
en
ch
m
ar
k
an
o
m
aly
d
etec
tio
n
d
atasets
:
UB
No
r
m
al,
Sh
a
n
g
h
aiT
ec
h
,
C
UHK
Av
en
u
e,
UM
N,
UC
S
D
Ped
2
,
an
d
th
e
p
r
o
p
o
s
ed
E
PUAb
N
d
a
taset.
Fig
u
r
es
2
an
d
3
r
ep
o
r
t
th
e
co
r
r
esp
o
n
d
in
g
m
icr
o
an
d
m
a
cr
o
r
esu
lts
f
o
r
AUC
an
d
ac
cu
r
ac
y
,
r
esp
ec
tiv
ely
,
co
m
p
ar
in
g
th
e
p
r
o
p
o
s
ed
T
MA
-
Net
m
o
d
el
with
p
r
io
r
m
et
h
o
d
s
in
clu
d
in
g
R
OHAC
[
1
3
]
,
R
OHAC
-
KD
[
1
2
]
,
R
OHAC V2
[
1
3
]
,
a
n
d
R
OHAC
-
KD
V2
[
1
3
]
.
3
.
1
.
1
.
AUC
-
ba
s
ed
e
v
a
lua
t
io
n
As
illu
s
tr
ated
in
Fig
u
r
e
2
,
th
e
p
r
o
p
o
s
ed
T
MA
-
Net
ac
h
iev
es
th
e
h
ig
h
est
AUC
v
alu
es
ac
r
o
s
s
al
l
d
atasets
,
co
n
s
is
ten
tly
o
u
tp
er
f
o
r
m
i
n
g
t
h
e
ex
is
tin
g
R
OHAC
-
b
ased
f
r
am
ewo
r
k
s
.
I
n
p
ar
ticu
lar
,
o
n
UB
No
r
m
al
an
d
Sh
an
g
h
aiT
ec
h
,
T
MA
-
Net
y
iel
d
s
im
p
r
o
v
em
en
ts
o
f
ap
p
r
o
x
i
m
ately
0
.
9
%
m
icr
o
-
AUC
an
d
1
.
5
%
m
a
cr
o
-
AUC
o
v
er
R
OHA
C
V2
[
1
3
]
,
h
ig
h
lig
h
tin
g
its
en
h
an
ce
d
s
en
s
itiv
ity
to
s
u
b
tle
ab
n
o
r
m
al
m
o
tio
n
cu
es
in
co
m
p
lex
s
y
n
th
etic
an
d
r
ea
l
-
wo
r
ld
cr
o
wd
s
ce
n
es.
Fo
r
C
UHK
Av
en
u
e,
UM
N,
an
d
UC
SD
Ped
2
,
th
e
AUC
s
co
r
es
o
f
T
MA
-
Net
ar
e
n
ea
r
ly
s
atu
r
ated
,
r
ea
ch
in
g
9
7
%
to
9
9
%,
co
m
p
ar
a
b
le
to
th
e
h
ig
h
est
r
ep
o
r
ted
r
esu
lts
in
liter
atu
r
e.
On
th
e
E
PUAb
N
d
ataset
th
at
co
llected
u
n
d
e
r
d
iv
er
s
e
o
u
td
o
o
r
co
n
d
itio
n
s
,
T
MA
-
Net
ac
h
iev
es
9
9
.
5
%
m
ic
r
o
-
AUC
an
d
9
9
.
6
%
m
ac
r
o
-
AUC,
co
n
f
ir
m
i
n
g
its
r
o
b
u
s
tn
ess
to
illu
m
in
atio
n
ch
an
g
es,
m
o
tio
n
clu
tter
,
a
n
d
s
ca
le
v
ar
iatio
n
s
.
T
h
ese
AUC
r
e
s
u
lt
s
d
em
o
n
s
tr
ate
th
at
T
MA
-
Net
ef
f
ec
tiv
ely
ca
p
tu
r
es
m
u
lti
-
lev
el
s
p
atio
tem
p
o
r
al
d
ep
e
n
d
en
cies
th
r
o
u
g
h
its
d
u
al
atten
tio
n
f
u
s
i
o
n
s
tr
ateg
y
.
T
h
e
c
o
n
s
is
ten
t g
ain
ac
r
o
s
s
h
eter
o
g
en
e
o
u
s
d
atasets
in
d
icate
s
a
s
tr
o
n
g
ca
p
ac
ity
f
o
r
g
e
n
er
alizin
g
b
o
t
h
s
p
atial
s
tr
u
ctu
r
es
f
r
o
m
s
tati
c
s
ce
n
es
an
d
d
y
n
am
ic
m
o
tio
n
co
r
r
elatio
n
s
f
r
o
m
tem
p
o
r
al
p
atter
n
s
.
Fig
u
r
e
2
.
T
h
e
m
icr
o
-
AUC
(
%)
an
d
m
ac
r
o
-
AUC
(
%)
r
esu
lts
o
n
s
in
g
le
d
ataset
ev
alu
atio
n
3
.
1
.
2
.
Acc
ura
cy
-
ba
s
ed
e
v
a
lu
a
t
io
n
T
h
e
a
cc
u
r
ac
y
r
esu
lts
in
Fig
u
r
e
3
r
ei
n
f
o
r
ce
th
e
AUC
f
in
d
in
g
s
.
T
MA
-
Net
ag
ain
attain
s
th
e
b
e
s
t
o
v
er
all
p
er
f
o
r
m
an
ce
,
with
m
icr
o
-
ac
cu
r
ac
y
an
d
m
ac
r
o
-
ac
cu
r
ac
y
e
x
ce
ed
in
g
th
o
s
e
o
f
all
b
aselin
e
m
o
d
els.
Sp
ec
if
ically
,
it
ac
h
iev
es
9
5
.
1
to
9
6
.
7
%
o
n
U
B
No
r
m
al,
9
4
.
6
to
9
5
.
2
%
o
n
S
h
an
g
h
aiT
ec
h
,
an
d
u
p
to
9
8
.
9
%
to
1
0
0
%
o
n
UM
N,
UC
SD
Ped
2
,
an
d
E
PUAb
N
d
atasets
.
C
o
m
p
ar
ed
to
R
OHAC
V2
,
T
MA
-
Net
im
p
r
o
v
es
b
y
a
n
av
er
ag
e
o
f
1
.
2
to
2
.
5
%,
wh
ile
m
ain
tain
in
g
s
tab
l
e
r
esu
lts
ac
r
o
s
s
all
d
o
m
ain
s
.
T
h
e
g
ain
in
ac
cu
r
a
c
y
d
em
o
n
s
tr
ates
th
at
th
e
p
r
o
p
o
s
e
d
f
u
s
io
n
ar
ch
itectu
r
e
n
o
t o
n
l
y
en
h
an
ce
s
an
o
m
aly
d
is
cr
im
in
atio
n
at
th
e
f
ea
tu
r
e
lev
el
b
u
t a
ls
o
y
ield
s
m
o
r
e
r
eliab
le
f
in
al
class
if
icatio
n
d
ec
is
io
n
s
.
No
tab
ly
,
T
MA
-
Net
ac
h
iev
es
p
er
f
ec
t
ac
cu
r
ac
y
at
1
0
0
%
o
n
UC
SD
Ped
2
an
d
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
TMA
-
N
et:
a
tr
a
n
s
fo
r
mer
-
b
a
s
e
d
mu
lti
-
mo
d
a
l
a
tten
tio
n
n
etw
o
r
k
fo
r
…
(
Hu
o
n
g
-
Gia
n
g
Do
a
n
)
1447
E
PUAb
N,
s
u
g
g
esti
n
g
th
at
its
atten
tio
n
-
b
ased
f
ea
tu
r
e
alig
n
m
e
n
t
s
u
cc
ess
f
u
lly
p
r
eser
v
es
tem
p
o
r
al
co
h
er
en
ce
ev
e
n
in
s
im
p
ler
o
r
well
-
s
tr
u
ctu
r
e
d
e
n
v
ir
o
n
m
en
ts
.
T
h
e
s
u
p
er
io
r
r
esu
lts
o
f
T
MA
-
Net
ac
r
o
s
s
b
o
th
AUC
an
d
ac
c
u
r
ac
y
m
etr
ics
co
n
f
ir
m
its
ab
ili
ty
to
ex
tr
ac
t
d
is
cr
im
in
ativ
e
r
ep
r
esen
tatio
n
s
an
d
g
en
er
alize
ac
r
o
s
s
d
iv
er
s
e
d
ata
d
is
tr
ib
u
tio
n
s
.
T
h
e
im
p
r
o
v
em
en
ts
a
r
e
p
ar
ticu
lar
ly
p
r
o
m
i
n
en
t
in
ch
all
en
g
in
g
d
atasets
s
u
ch
as
UB
No
r
m
al
an
d
Sh
an
g
h
aiT
ec
h
,
wh
er
e
s
ce
n
e
co
m
p
lex
ity
,
ca
m
er
a
an
g
les,
an
d
h
u
m
an
d
en
s
ity
v
ar
y
s
ig
n
i
f
ican
tly
.
B
y
co
m
b
in
in
g
s
p
atial,
tem
p
o
r
al,
an
d
m
o
tio
n
en
er
g
y
c
u
es
th
r
o
u
g
h
d
u
al
-
s
tag
e
atten
tio
n
,
T
MA
-
Net
m
itig
ates
o
v
er
f
itti
n
g
t
o
d
ataset
-
s
p
ec
if
ic
co
n
tex
ts
an
d
en
s
u
r
es
co
n
s
is
ten
t
p
er
f
o
r
m
an
ce
o
n
u
n
s
ee
n
s
ce
n
e
s
.
Ov
er
all,
th
e
s
in
g
le
d
ataset
ev
alu
atio
n
r
esu
lts
clea
r
ly
in
d
i
ca
te
th
at
T
MA
-
Net
o
u
tp
er
f
o
r
m
s
all
ex
is
tin
g
R
OHAC
v
ar
ian
ts
in
b
o
th
f
r
am
e
-
lev
el
d
etec
tio
n
p
r
ec
is
io
n
a
n
d
v
i
d
eo
-
lev
el
s
tab
ilit
y
.
T
h
is
estab
lis
h
es
T
MA
-
Net
a
s
a
r
o
b
u
s
t
an
d
s
ca
lab
le
f
r
am
ewo
r
k
f
o
r
“h
an
d
-
in
-
wild
”
ab
n
o
r
m
al
b
eh
av
io
r
d
etec
tio
n
ac
r
o
s
s
m
u
ltip
le
en
v
ir
o
n
m
en
ts
an
d
d
ata
d
o
m
ain
s
.
Fig
u
r
e
3
.
T
h
e
m
icr
o
ac
cu
r
ac
y
(
%)
an
d
m
ac
r
o
ac
cu
r
ac
y
(
%)
r
esu
lts
o
n
s
in
g
le
d
ataset
ev
alu
a
tio
n
3
.
2
.
Cro
s
s
-
da
t
a
s
et
e
v
a
lua
t
io
n
C
r
o
s
s
-
d
ataset
ex
p
er
im
en
ts
we
r
e
co
n
d
u
cted
to
ev
alu
ate
th
e
g
en
er
aliza
tio
n
ca
p
ab
ilit
y
o
f
t
h
e
p
r
o
p
o
s
ed
m
o
d
els
u
n
d
er
d
o
m
ain
s
h
if
ts
b
etwe
en
tr
ain
in
g
an
d
test
in
g
d
a
tasets
.
Fo
llo
win
g
th
e
s
am
e
ex
p
er
im
en
tal
s
etu
p
as
in
[
2
8
]
,
o
n
e
d
ataset
was
u
s
ed
f
o
r
tr
ain
in
g
wh
ile
an
o
th
er
was
r
eser
v
ed
f
o
r
test
in
g
.
E
ac
h
ex
p
e
r
im
en
t
was
r
ep
ea
ted
f
iv
e
tim
es,
an
d
th
e
av
er
a
g
e
m
icr
o
-
AUC
an
d
m
ac
r
o
-
AUC
s
co
r
es
wer
e
r
ep
o
r
ted
.
T
h
e
r
esu
lts
o
v
er
th
r
ee
b
en
ch
m
ar
k
d
atasets
,
s
u
ch
as:
C
UHK
Av
en
u
e,
Sh
an
g
h
aiT
ec
h
,
an
d
UC
SD
Ped
2
wh
ich
ar
e
s
u
m
m
ar
ized
f
r
o
m
T
ab
le
s
1
to
3
.
Ov
er
all,
b
o
th
R
OHAC
V2
[
1
3
]
an
d
th
e
p
r
o
p
o
s
ed
T
MA
-
Net
d
em
o
n
s
tr
ate
co
n
s
is
ten
t
s
u
p
er
io
r
ity
co
m
p
ar
ed
with
p
r
ev
io
u
s
m
eth
o
d
s
,
in
clu
d
in
g
Ge
o
r
g
escu
et
a
l
.
[
2
8
]
a
n
d
R
OHAC
[
1
2
]
.
Acr
o
s
s
all
d
ataset
p
air
s
,
TMA
-
Net
ac
h
iev
es th
e
h
ig
h
est s
co
r
es in
b
o
th
m
icr
o
-
AUC a
n
d
m
ac
r
o
-
AUC,
co
n
f
ir
m
in
g
it
s
s
tr
o
n
g
ad
ap
tab
ilit
y
ac
r
o
s
s
h
eter
o
g
en
e
o
u
s
en
v
i
r
o
n
m
en
ts
.
As s
h
o
wn
in
T
ab
le
1
,
wh
e
n
tr
a
in
ed
o
n
Sh
an
g
h
aiT
ec
h
o
r
UC
SD Ped
2
an
d
test
ed
o
n
C
UHK
Av
en
u
e,
th
e
p
r
o
p
o
s
ed
T
MA
-
Net
ac
h
iev
es
9
7
.
2
%
m
ic
r
o
-
AUC
an
d
9
7
.
5
%
m
ac
r
o
-
AUC,
s
u
r
p
ass
in
g
[
2
8
]
b
y
4
.
9
%
an
d
7
.
1
%,
r
esp
ec
tiv
ely
.
C
o
m
p
a
r
ed
with
R
OHA
C
V2
[
1
3
]
,
th
e
g
ain
s
ar
e
s
m
aller
b
u
t
co
n
s
is
ten
t,
in
d
ic
atin
g
th
at
T
MA
-
Net
p
r
eser
v
es
th
e
s
tab
ilit
y
o
f
R
OHAC
wh
ile
im
p
r
o
v
in
g
cr
o
s
s
-
d
o
m
ain
f
ea
t
u
r
e
g
e
n
er
aliza
tio
n
.
T
h
is
im
p
r
o
v
em
en
t
s
u
g
g
ests
th
at
th
e
d
u
al
-
s
tag
e
atten
tio
n
m
ec
h
an
is
m
ef
f
ec
tiv
ely
ca
p
tu
r
es
h
ig
h
-
lev
el
s
p
atio
te
m
p
o
r
al
s
em
a
n
tics
in
v
ar
ian
t to
d
ataset
-
s
p
ec
if
ic
d
if
f
er
en
ce
s
.
T
ab
le
1
.
T
h
e
m
icr
o
-
AUC an
d
m
ac
r
o
-
AUC
(
%)
r
esu
lts
o
n
cr
o
s
s
d
ataset
ev
alu
atio
n
o
f
C
UHK
Av
en
u
e
d
ataset
M
e
t
h
o
d
C
U
H
K
A
v
e
n
u
e
S
h
a
n
g
h
a
i
T
e
c
h
U
C
S
D
P
e
d
2
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
G
e
o
r
g
e
sc
u
e
t
a
l
.
[
2
8
]
9
2
.
3
9
0
.
4
8
3
.
6
81
-
-
R
O
H
A
C
[
1
2
]
9
3
.
7
9
4
.
8
9
3
.
8
9
4
.
5
9
2
.
5
9
5
.
2
R
O
H
A
C
V
2
[
1
3
]
9
6
.
1
9
5
.
7
9
5
.
4
9
7
.
2
9
5
.
3
9
6
.
7
O
u
r
(
TM
A
-
N
ET)
9
7
.
2
9
7
.
5
9
6
.
4
9
8
.
1
9
6
.
9
9
7
.
1
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
15
,
No
.
2
,
Ap
r
il
20
26
:
1
4
4
1
-
1
4
5
0
1448
W
h
en
ev
alu
atin
g
th
e
Sh
an
g
h
a
iTe
ch
d
ataset
(
T
a
b
le
2
)
,
t
h
e
d
o
m
ain
g
ap
b
ec
o
m
es
m
o
r
e
ch
a
llen
g
in
g
d
u
e
to
co
m
p
lex
c
r
o
wd
d
y
n
am
ics
an
d
d
iv
e
r
s
e
ca
m
er
a
v
iewp
o
in
ts
.
Desp
ite
th
is
,
T
MA
-
N
et
ac
h
iev
es
9
6
.
4
%
m
icr
o
-
AUC
an
d
9
8
.
1
%
m
ac
r
o
-
AUC,
o
u
tp
er
f
o
r
m
in
g
Geo
r
g
e
s
cu
et
a
l.
[
2
8
]
b
y
1
2
.
8
%
an
d
1
7
.
1
%,
r
esp
ec
tiv
ely
,
an
d
ex
ce
ed
i
n
g
R
OHAC V2
[
1
3
]
b
y
a
p
p
r
o
x
im
ately
1
%.
T
h
es
e
r
esu
lts
d
em
o
n
s
tr
ate
th
e
r
o
b
u
s
tn
ess
o
f
T
MA
-
Net
in
m
o
d
elin
g
g
lo
b
al
-
lo
ca
l
m
o
ti
o
n
co
r
r
elatio
n
s
an
d
its
s
u
p
er
io
r
ab
ilit
y
to
tr
an
s
f
er
k
n
o
wled
g
e
b
etwe
en
s
ce
n
es
with
d
if
f
er
en
t
d
en
s
ities
an
d
m
o
tio
n
d
is
tr
ib
u
tio
n
s
.
T
ab
le
2
.
T
h
e
m
icr
o
-
AUC
an
d
m
ac
r
o
-
AUC
(
%)
r
esu
lts
o
n
cr
o
s
s
d
ataset
ev
alu
atio
n
o
f
Sh
a
n
g
h
aiT
ec
h
d
ataset
R
O
H
A
C
-
KD
S
h
a
n
g
h
a
i
T
e
c
h
C
U
H
K
A
v
e
n
u
e
U
C
S
D
P
e
d
2
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
G
e
o
r
g
e
sc
u
e
t
a
l
.
[
2
8
]
8
2
.
7
8
9
.
3
7
6
.
3
8
6
.
3
-
-
R
O
H
A
C
[
1
2
]
9
2
.
4
9
4
.
8
9
1
.
9
9
0
.
1
9
2
.
3
8
9
.
9
R
O
H
A
C
V
2
[
1
3
]
9
5
.
1
9
6
.
2
9
3
.
2
9
2
.
6
9
5
.
1
9
4
.
7
O
u
r
(
TM
A
-
N
ET)
9
6
.
3
9
7
.
8
9
4
.
6
9
4
.
8
9
6
.
8
9
5
.
9
T
ab
le
3
f
u
r
t
h
er
s
h
o
ws
th
at
wh
en
tr
ain
e
d
o
n
o
th
er
d
atasets
an
d
test
ed
o
n
UC
SD
Ped
2
,
T
MA
-
Net
co
n
tin
u
es
to
y
ield
n
ea
r
-
s
atu
r
ated
r
esu
lts
,
ac
h
iev
in
g
9
6
.
9
%
m
icr
o
-
AUC
an
d
9
7
.
1
%
m
ac
r
o
-
AUC.
T
h
is
co
n
s
is
ten
cy
h
ig
h
lig
h
ts
th
e
m
o
d
el’
s
ca
p
ac
ity
to
g
e
n
er
alize
ev
e
n
in
s
im
p
ler
s
u
r
v
eillan
ce
en
v
ir
o
n
m
en
ts
with
lo
we
r
s
ce
n
e
v
ar
iab
ilit
y
.
Mo
r
eo
v
e
r
,
t
h
e
m
ar
g
i
n
al
d
if
f
e
r
en
ce
b
etwe
en
R
OHAC
V2
an
d
T
MA
-
Net
in
d
icate
s
th
at
b
o
th
f
r
am
ewo
r
k
s
m
ai
n
tain
s
tab
le
d
etec
tio
n
ac
cu
r
ac
y
wh
ile
im
p
r
o
v
in
g
in
ter
-
d
ataset
ad
ap
tab
ilit
y
.
T
ab
le
3
.
T
h
e
m
icr
o
-
AUC
an
d
m
ac
r
o
-
AUC
(
%)
r
esu
lts
o
n
cr
o
s
s
d
ataset
ev
alu
atio
n
o
f
UC
SD Ped
2
d
ataset
R
O
H
A
C
-
KD
U
C
S
D
P
e
d
2
C
U
H
K
A
v
e
n
u
e
S
h
a
n
g
h
a
i
T
e
c
h
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
M
i
c
r
o
-
AUC
M
a
c
r
o
-
AUC
G
e
o
r
g
e
sc
u
e
t
a
l
.
[
2
8
]
9
8
.
7
9
9
.
7
87
9
7
.
2
9
0
.
6
9
5
.
7
R
O
H
A
C
[
1
2
]
9
9
.
6
9
9
.
9
9
4
.
8
9
7
.
5
9
5
.
8
9
7
.
8
R
O
H
A
C
V
2
[
1
3
]
9
9
.
6
9
9
.
9
9
7
.
2
9
9
.
1
9
7
.
9
9
8
.
7
O
u
r
(
TM
A
-
N
ET)
9
9
.
7
9
9
.
9
9
8
.
6
9
9
.
4
9
8
.
5
9
9
.
1
T
h
e
cr
o
s
s
-
d
ataset
ev
alu
atio
n
s
d
em
o
n
s
tr
ate
th
at
th
e
p
r
o
p
o
s
ed
T
MA
-
Net
ex
h
ib
its
r
em
ar
k
ab
le
r
o
b
u
s
tn
ess
an
d
g
en
e
r
aliza
tio
n
co
m
p
ar
ed
with
b
o
th
p
r
e
v
io
u
s
R
OHAC
v
ar
ian
ts
an
d
th
e
SOTA
m
eth
o
d
[
2
8
]
.
T
h
e
im
p
r
o
v
em
e
n
ts
r
an
g
e
f
r
o
m
5
%
to
1
7
%
i
n
AUC
s
co
r
es
ac
r
o
s
s
all
d
ataset
p
air
s
.
Su
ch
s
tab
ilit
y
u
n
d
e
r
d
if
f
er
en
t
tr
ain
in
g
a
n
d
test
in
g
d
o
m
ai
n
s
in
d
icate
s
th
at
th
e
m
u
lti
-
m
o
d
al
atten
tio
n
f
u
s
io
n
m
ec
h
an
is
m
en
h
a
n
ce
s
f
ea
tu
r
e
tr
an
s
f
er
ab
ilit
y
an
d
r
ed
u
ce
s
o
v
e
r
f
itti
n
g
t
o
d
ataset
-
s
p
ec
if
ic
p
atter
n
s
.
T
h
er
ef
o
r
e,
T
MA
-
Net
n
o
t
o
n
ly
p
er
f
o
r
m
s
well
u
n
d
er
in
tr
a
-
d
ataset
ev
alu
atio
n
s
b
u
t
also
m
ain
tain
s
s
u
p
er
io
r
ac
cu
r
ac
y
in
c
r
o
s
s
-
d
o
m
ai
n
s
ce
n
ar
io
s
,
a
k
ey
r
eq
u
ir
em
e
n
t f
o
r
r
ea
l
-
wo
r
ld
ab
n
o
r
m
al
b
e
h
av
io
r
d
etec
tio
n
s
y
s
tem
s
d
ep
lo
y
ed
i
n
d
iv
e
r
s
e
s
u
r
v
eillan
ce
co
n
tex
ts
.
4.
CO
NCLU
SI
O
N
T
h
is
p
ap
er
p
r
o
p
o
s
ed
T
MA
-
Net,
a
m
u
lti
-
m
o
d
al
atten
tio
n
-
b
ased
f
r
a
m
ewo
r
k
f
o
r
ab
n
o
r
m
al
b
eh
av
i
o
r
d
etec
tio
n
.
B
y
in
teg
r
atin
g
R
GB
,
OF
,
an
d
HM
m
o
d
alities
t
h
r
o
u
g
h
a
d
u
al
-
s
tag
e
atten
tio
n
f
u
s
io
n
m
ec
h
an
is
m
,
TMA
-
Net
ef
f
ec
tiv
ely
ca
p
tu
r
es
b
o
th
s
p
atial
-
tem
p
o
r
al
a
n
d
m
o
ti
o
n
-
en
e
r
g
y
d
ep
e
n
d
en
cies.
E
x
p
e
r
im
en
tal
r
esu
lts
o
n
s
ix
b
en
ch
m
ar
k
d
atasets
(
UB
No
r
m
al,
Sh
an
g
h
aiT
ec
h
,
C
UHK
Av
en
u
e,
UM
N,
UC
SD
Pe
d
2
,
an
d
E
PUAb
N)
d
em
o
n
s
tr
ate
th
at
T
MA
-
Net
a
ch
iev
es
u
p
to
9
7
–
1
0
0
%
AU
C
an
d
ac
cu
r
ac
y
,
th
e
o
u
tp
er
f
o
r
m
in
g
all
p
r
ev
i
o
u
s
R
OHA
C
-
b
ased
an
d
SOTA
m
eth
o
d
s
.
T
h
ese
r
esu
lts
h
ig
h
lig
h
t
its
s
tr
o
n
g
g
en
e
r
aliza
tio
n
ab
il
ity
,
r
o
b
u
s
tn
ess
,
an
d
p
r
ac
tical
p
o
ten
tial f
o
r
“h
a
n
d
-
in
-
wild
”
in
tellig
en
t su
r
v
eillan
c
e
an
d
ab
n
o
r
m
al
b
eh
av
i
o
r
d
etec
tio
n
s
y
s
tem
s
.
ACK
NO
WL
E
DG
M
E
N
T
S
T
h
is
r
esear
ch
was
s
u
p
p
o
r
ted
b
y
th
e
im
p
lem
en
tatio
n
o
f
th
e
s
c
ien
tific
r
esear
ch
p
r
o
ject
at
E
le
ctr
ic
Po
wer
Un
iv
er
s
ity
in
2
0
2
5
f
o
r
s
taf
f
an
d
em
p
lo
y
ee
s
,
p
r
o
ject
co
d
e
DT
KHCN.0
9
/2
0
2
5
with
titl
e:
“
R
esear
ch
an
d
d
esig
n
o
f
a
m
o
n
ito
r
in
g
an
d
m
an
ag
e
m
en
t sy
s
tem
f
o
r
co
m
p
u
ter
-
b
ased
m
u
ltip
le
-
ch
o
ice
ex
am
in
atio
n
r
o
o
m
s
”.
F
UNDING
I
NF
O
R
M
A
T
I
O
N
T
h
is
r
esear
ch
was
f
u
n
d
ed
b
y
E
lectr
ic
Po
wer
Un
iv
er
s
ity
u
n
d
er
th
e
s
cien
tific
r
esear
ch
p
r
o
ject
f
o
r
s
taf
f
an
d
em
p
l
o
y
ee
s
in
2
0
2
5
,
p
r
o
ject
co
d
e
DT
KHCN.0
9
/2
0
2
5
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
n
t J Ar
tif
I
n
tell
I
SS
N:
2252
-
8
9
3
8
TMA
-
N
et:
a
tr
a
n
s
fo
r
mer
-
b
a
s
e
d
mu
lti
-
mo
d
a
l
a
tten
tio
n
n
etw
o
r
k
fo
r
…
(
Hu
o
n
g
-
Gia
n
g
Do
a
n
)
1449
AUTHO
R
CO
NT
RI
B
UT
I
O
NS ST
A
T
E
M
E
N
T
T
h
is
jo
u
r
n
al
u
s
es
th
e
C
o
n
t
r
ib
u
to
r
R
o
les
T
ax
o
n
o
m
y
(
C
R
ed
iT)
to
r
ec
o
g
n
ize
in
d
iv
id
u
al
au
th
o
r
co
n
tr
ib
u
tio
n
s
,
r
ed
u
ce
au
th
o
r
s
h
ip
d
is
p
u
tes,
an
d
f
ac
ilit
ate
co
llab
o
r
atio
n
.
Na
m
e
o
f
Aut
ho
r
C
M
So
Va
Fo
I
R
D
O
E
Vi
Su
P
Fu
Hu
o
n
g
-
Gian
g
Do
an
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Ng
o
c
-
T
r
u
n
g
Ng
u
y
en
✓
✓
✓
✓
✓
✓
✓
C
:
C
o
n
c
e
p
t
u
a
l
i
z
a
t
i
o
n
M
:
M
e
t
h
o
d
o
l
o
g
y
So
:
So
f
t
w
a
r
e
Va
:
Va
l
i
d
a
t
i
o
n
Fo
:
Fo
r
mal
a
n
a
l
y
s
i
s
I
:
I
n
v
e
s
t
i
g
a
t
i
o
n
R
:
R
e
so
u
r
c
e
s
D
:
D
a
t
a
C
u
r
a
t
i
o
n
O
:
W
r
i
t
i
n
g
-
O
r
i
g
i
n
a
l
D
r
a
f
t
E
:
W
r
i
t
i
n
g
-
R
e
v
i
e
w
&
E
d
i
t
i
n
g
Vi
:
Vi
su
a
l
i
z
a
t
i
o
n
Su
:
Su
p
e
r
v
i
s
i
o
n
P
:
P
r
o
j
e
c
t
a
d
mi
n
i
st
r
a
t
i
o
n
Fu
:
Fu
n
d
i
n
g
a
c
q
u
i
si
t
i
o
n
CO
NF
L
I
C
T
O
F
I
N
T
E
R
E
S
T
ST
A
T
E
M
E
NT
T
h
e
au
th
o
r
s
d
ec
lar
e
th
at
th
e
y
h
av
e
n
o
co
n
f
lict o
f
in
ter
est.
I
NF
O
RM
E
D
CO
NS
E
N
T
I
n
f
o
r
m
ed
c
o
n
s
en
t is n
o
t a
p
p
licab
le
f
o
r
th
is
s
tu
d
y
as th
e
d
atasets
u
s
ed
ar
e
p
u
b
licly
a
v
ailab
le
an
d
co
n
tain
n
o
id
en
tifia
b
le
p
er
s
o
n
al
in
f
o
r
m
atio
n
.
E
T
H
I
CAL AP
P
RO
V
AL
E
th
ical
ap
p
r
o
v
al
is
n
o
t a
p
p
licab
le
f
o
r
t
h
is
s
tu
d
y
s
in
ce
n
o
h
u
m
an
o
r
a
n
im
al
s
u
b
jects we
r
e
i
n
v
o
lv
e
d
DATA AV
AI
L
AB
I
L
I
T
Y
Pu
b
lic
d
atasets
u
s
ed
in
th
is
s
t
u
d
y
(
UM
N,
C
r
o
wd
-
1
1
,
UB
No
r
m
al,
Sh
an
g
h
aiT
ec
h
,
C
UHK
Av
en
u
e,
an
d
UC
SD
Ped
2
)
ar
e
av
ailab
le
f
r
o
m
th
eir
o
r
i
g
in
al
s
o
u
r
ce
s
.
T
h
e
E
PUAb
N
d
ataset
g
en
er
ated
d
u
r
in
g
t
h
is
s
tu
d
y
is
av
ailab
le
f
r
o
m
th
e
co
r
r
esp
o
n
d
in
g
au
th
o
r
u
p
o
n
r
ea
s
o
n
ab
le
r
eq
u
est.
RE
F
E
R
E
NC
E
S
[
1
]
R
.
T
.
I
o
n
e
sc
u
,
F
.
S
.
K
h
a
n
,
M
.
-
I
.
G
e
o
r
g
e
s
c
u
,
a
n
d
L.
S
h
a
o
,
“
O
b
j
e
c
t
-
c
e
n
t
r
i
c
a
u
t
o
-
e
n
c
o
d
e
r
s a
n
d
d
u
mm
y
a
n
o
m
a
l
i
e
s f
o
r
a
b
n
o
r
ma
l
e
v
e
n
t
d
e
t
e
c
t
i
o
n
i
n
v
i
d
e
o
,
”
i
n
2
0
1
9
I
E
EE
/
C
V
F
C
o
n
f
e
r
e
n
c
e
o
n
C
o
m
p
u
t
e
r
V
i
si
o
n
a
n
d
P
a
t
t
e
r
n
R
e
c
o
g
n
i
t
i
o
n
(
C
V
PR)
,
Ju
n
.
2
0
1
9
,
p
p
.
7
8
3
4
–
7
8
4
3
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
9
.
0
0
8
0
3
.
[
2
]
W
.
Li
u
,
W
.
L
u
o
,
D
.
L
i
a
n
,
a
n
d
S
.
G
a
o
,
“
F
u
t
u
r
e
f
r
a
me
p
r
e
d
i
c
t
i
o
n
f
o
r
a
n
o
mal
y
d
e
t
e
c
t
i
o
n
-
a
n
e
w
b
a
se
l
i
n
e
,
”
i
n
2
0
1
8
I
EEE/
C
V
F
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
,
J
u
n
.
2
0
1
8
,
p
p
.
6
5
3
6
–
6
5
4
5
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
8
.
0
0
6
8
4
.
[
3
]
M
.
H
a
sa
n
,
J.
C
h
o
i
,
J.
N
e
u
ma
n
n
,
A
.
K
.
R
.
-
C
h
o
w
d
h
u
r
y
,
a
n
d
L.
S
.
D
a
v
i
s,
“
L
e
a
r
n
i
n
g
t
e
mp
o
r
a
l
r
e
g
u
l
a
r
i
t
y
i
n
v
i
d
e
o
se
q
u
e
n
c
e
s,”
i
n
2
0
1
6
I
EEE
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
P
a
t
t
e
r
n
R
e
c
o
g
n
i
t
i
o
n
(
C
VPR)
,
J
u
n
.
2
0
1
6
,
p
p
.
7
3
3
–
7
4
2
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
6
.
8
6
.
[
4
]
Y
.
S
.
C
h
o
n
g
a
n
d
Y
.
H
.
Ta
y
,
“
A
b
n
o
r
m
a
l
e
v
e
n
t
d
e
t
e
c
t
i
o
n
i
n
v
i
d
e
o
s
u
si
n
g
sp
a
t
i
o
t
e
mp
o
r
a
l
a
u
t
o
e
n
c
o
d
e
r
,
”
A
d
v
a
n
c
e
s
i
n
N
e
u
ra
l
N
e
t
w
o
r
k
s
-
I
S
N
N
2
0
1
7
,
2
0
1
7
,
p
p
.
1
8
9
–
1
9
6
,
d
o
i
:
1
0
.
1
0
0
7
/
9
7
8
-
3
-
3
1
9
-
5
9
0
8
1
-
3
_
2
3
.
[
5
]
H
.
I
d
r
e
e
s,
I
.
S
a
l
e
e
mi
,
C
.
S
e
i
b
e
r
t
,
a
n
d
M
.
S
h
a
h
,
“
M
u
l
t
i
-
s
o
u
r
c
e
mu
l
t
i
-
sca
l
e
c
o
u
n
t
i
n
g
i
n
e
x
t
r
e
mel
y
d
e
n
s
e
c
r
o
w
d
i
ma
g
e
s,”
i
n
2
0
1
3
I
EEE
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
,
J
u
n
.
2
0
1
3
,
p
p
.
2
5
4
7
–
2
5
5
4
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
3
.
3
2
9
.
[
6
]
R
.
M
e
h
r
a
n
,
A
.
O
y
a
ma
,
a
n
d
M
.
S
h
a
h
,
“
A
b
n
o
r
m
a
l
c
r
o
w
d
b
e
h
a
v
i
o
r
d
e
t
e
c
t
i
o
n
u
si
n
g
so
c
i
a
l
f
o
r
c
e
m
o
d
e
l
,
”
i
n
2
0
0
9
I
E
EE
C
o
n
f
e
r
e
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
P
a
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
,
J
u
n
.
2
0
0
9
,
p
p
.
9
3
5
–
9
4
2
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
0
9
.
5
2
0
6
6
4
1
.
[
7
]
C
.
L
u
,
J
.
S
h
i
,
a
n
d
J.
Ji
a
,
“
A
b
n
o
r
ma
l
e
v
e
n
t
d
e
t
e
c
t
i
o
n
a
t
1
5
0
F
P
S
i
n
M
A
T
L
A
B
,
”
i
n
I
EEE
I
n
t
e
rn
a
t
i
o
n
a
l
C
o
n
f
e
r
e
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
si
o
n
,
S
y
d
n
e
y
,
A
u
s
t
r
a
l
i
a
,
2
0
1
3
,
p
p
.
2
7
2
0
–
2
7
2
7
,
d
o
i
:
1
0
.
1
1
0
9
/
I
C
C
V
.
2
0
1
3
.
3
3
8
.
[
8
]
Y
.
C
h
a
n
g
,
Z.
T
u
,
W
.
X
i
e
,
a
n
d
J
.
Y
u
a
n
,
“
C
l
u
st
e
r
i
n
g
d
r
i
v
e
n
d
e
e
p
a
u
t
o
e
n
c
o
d
e
r
f
o
r
v
i
d
e
o
a
n
o
m
a
l
y
d
e
t
e
c
t
i
o
n
,
”
i
n
E
u
ro
p
e
a
n
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
(
E
C
C
V)
,
2
0
2
0
,
p
p
.
3
2
9
–
3
4
5
,
d
o
i
:
1
0
.
1
0
0
7
/
9
7
8
-
3
-
0
3
0
-
5
8
5
5
5
-
6
_
2
0
.
[
9
]
J.
L
i
,
X
.
L
i
u
,
W
.
Z
h
a
n
g
,
M
.
Z
h
a
n
g
,
J.
S
o
n
g
,
a
n
d
N
.
S
e
b
e
,
“
S
p
a
t
i
o
-
t
e
mp
o
r
a
l
a
t
t
e
n
t
i
o
n
n
e
t
w
o
r
k
s
f
o
r
a
c
t
i
o
n
r
e
c
o
g
n
i
t
i
o
n
a
n
d
d
e
t
e
c
t
i
o
n
,
”
I
EEE
T
r
a
n
s
a
c
t
i
o
n
s
o
n
M
u
l
t
i
m
e
d
i
a
,
v
o
l
.
2
2
,
n
o
.
1
1
,
p
p
.
2
9
9
0
–
3
0
0
1
,
N
o
v
.
2
0
2
0
,
d
o
i
:
1
0
.
1
1
0
9
/
TM
M
.
2
0
2
0
.
2
9
6
5
4
3
4
.
[
1
0
]
H
.
C
h
e
n
,
X
.
M
e
i
,
Z.
M
a
,
X
.
W
u
,
a
n
d
Y
.
W
e
i
,
“
S
p
a
t
i
a
l
–
t
e
m
p
o
r
a
l
g
r
a
p
h
a
t
t
e
n
t
i
o
n
n
e
t
w
o
r
k
f
o
r
v
i
d
e
o
a
n
o
ma
l
y
d
e
t
e
c
t
i
o
n
,
”
I
m
a
g
e
a
n
d
Vi
si
o
n
C
o
m
p
u
t
i
n
g
,
v
o
l
.
1
3
1
,
M
a
r
.
2
0
2
3
,
d
o
i
:
1
0
.
1
0
1
6
/
j
.
i
m
a
v
i
s.
2
0
2
3
.
1
0
4
6
2
9
.
[
1
1
]
H
.
C
.
Li
u
,
J
.
H
.
C
h
u
a
h
,
A
.
S
.
M
.
K
h
a
i
r
u
d
d
i
n
,
X
.
M
.
Zh
a
o
,
a
n
d
X
.
D
.
W
a
n
g
,
“
C
a
m
p
u
s
a
b
n
o
r
ma
l
b
e
h
a
v
i
o
r
r
e
c
o
g
n
i
t
i
o
n
w
i
t
h
t
e
mp
o
r
a
l
seg
m
e
n
t
t
r
a
n
sf
o
r
mers
,
”
I
E
EE
Ac
c
e
ss
,
v
o
l
.
1
1
,
p
p
.
3
8
4
7
1
–
3
8
4
8
4
,
2
0
2
3
,
d
o
i
:
1
0
.
1
1
0
9
/
A
C
C
ESS
.
2
0
2
3
.
3
2
6
6
4
4
0
.
[
1
2
]
A
.
D
.
H
o
,
H
.
G
.
D
o
a
n
,
a
n
d
T.
T.
T
h
u
y
,
“
M
u
l
t
i
-
m
o
d
a
l
i
t
y
a
b
n
o
r
m
a
l
c
r
o
w
d
d
e
t
e
c
t
i
o
n
w
i
t
h
s
e
l
f
-
a
t
t
e
n
t
i
o
n
a
n
d
k
n
o
w
l
e
d
g
e
d
i
s
t
i
l
l
a
t
i
o
n
,
”
En
g
i
n
e
e
ri
n
g
,
T
e
c
h
n
o
l
o
g
y
a
n
d
A
p
p
l
i
e
d
S
c
i
e
n
c
e
R
e
se
a
rc
h
,
v
o
l
.
1
4
,
n
o
.
5
,
p
p
.
1
6
6
7
4
–
1
6
6
7
9
,
2
0
2
4
,
d
o
i
:
1
0
.
4
8
0
8
4
/
e
t
a
sr
.
8
1
9
4
.
[
1
3
]
A
.
D
.
H
o
,
H
.
G
.
D
o
a
n
,
a
n
d
N
.
T
.
N
g
u
y
e
n
,
“
A
b
n
o
r
m
a
l
h
u
ma
n
b
e
h
a
v
i
o
r
d
e
t
e
c
t
i
o
n
i
m
p
r
o
v
e
m
e
n
t
w
i
t
h
a
n
e
f
f
i
c
i
e
n
t
a
t
t
e
n
t
i
o
n
b
l
o
c
k
,
”
En
g
i
n
e
e
ri
n
g
,
T
e
c
h
n
o
l
o
g
y
a
n
d
A
p
p
l
i
e
d
S
c
i
e
n
c
e
R
e
se
a
rc
h
,
v
o
l
.
1
5
,
n
o
.
4
,
p
p
.
2
5
0
4
8
–
2
5
0
5
4
,
2
0
2
5
,
d
o
i
:
1
0
.
4
8
0
8
4
/
e
t
a
sr
.
1
1
4
6
3
.
[
1
4
]
A
.
D
o
s
o
v
i
t
s
k
i
y
e
t
a
l
.
,
“
A
n
i
ma
g
e
i
s w
o
r
t
h
1
6
×
1
6
w
o
r
d
s
:
t
r
a
n
sf
o
r
mers
f
o
r
i
mag
e
r
e
c
o
g
n
i
t
i
o
n
a
t
sc
a
l
e
,
”
2
0
2
1
,
a
r
X
i
v
:
2
0
1
0
.
1
1
9
2
9
.
[
1
5
]
C
.
D
u
p
o
n
t
,
L.
To
b
i
a
s,
a
n
d
B
.
Lu
v
i
so
n
,
“
C
r
o
w
d
-
1
1
:
a
d
a
t
a
se
t
f
o
r
f
i
n
e
g
r
a
i
n
e
d
c
r
o
w
d
b
e
h
a
v
i
o
u
r
a
n
a
l
y
si
s
,
”
i
n
2
0
1
7
I
EEE
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
P
a
t
t
e
r
n
R
e
c
o
g
n
i
t
i
o
n
W
o
rks
h
o
p
s (
C
VP
RW
)
,
Ju
l
.
2
0
1
7
,
p
p
.
2
1
8
4
–
2
1
9
1
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
W
.
2
0
1
7
.
2
7
1
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
SS
N
:
2
2
5
2
-
8
9
3
8
I
n
t J Ar
tif
I
n
tell
,
Vo
l.
15
,
No
.
2
,
Ap
r
il
20
26
:
1
4
4
1
-
1
4
5
0
1450
[
1
6
]
J.
S
h
a
o
,
C
.
C
.
L
o
y
,
K
.
K
a
n
g
,
a
n
d
X
.
W
a
n
g
,
“
C
r
o
w
d
e
d
sc
e
n
e
u
n
d
e
r
s
t
a
n
d
i
n
g
b
y
d
e
e
p
l
y
l
e
a
r
n
e
d
v
o
l
u
m
e
t
r
i
c
sl
i
c
e
s,”
I
E
EE
T
ra
n
s
a
c
t
i
o
n
s
o
n
C
i
rc
u
i
t
s
a
n
d
S
y
st
e
m
s
f
o
r
Vi
d
e
o
T
e
c
h
n
o
l
o
g
y
,
v
o
l
.
2
7
,
n
o
.
3
,
p
p
.
6
1
3
–
6
2
3
,
M
a
r
.
2
0
1
7
,
d
o
i
:
1
0
.
1
1
0
9
/
TC
S
V
T.
2
0
1
6
.
2
5
9
3
6
4
7
.
[
1
7
]
J.
S
h
a
o
,
C
.
C
.
L
o
y
,
a
n
d
X
.
W
a
n
g
,
“
S
c
e
n
e
-
i
n
d
e
p
e
n
d
e
n
t
g
r
o
u
p
p
r
o
f
i
l
i
n
g
i
n
c
r
o
w
d
,
”
i
n
2
0
1
4
I
EE
E
C
o
n
f
e
r
e
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
rn
Re
c
o
g
n
i
t
i
o
n
,
J
u
n
.
2
0
1
4
,
p
p
.
2
2
2
7
–
2
2
3
4
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
4
.
2
8
5
.
[
1
8
]
T.
H
a
ss
n
e
r
,
Y
.
I
t
c
h
e
r
,
a
n
d
O
.
K
.
-
G
r
o
s
s,
“
V
i
o
l
e
n
t
f
l
o
w
s:
r
e
a
l
-
t
i
m
e
d
e
t
e
c
t
i
o
n
o
f
v
i
o
l
e
n
t
c
r
o
w
d
b
e
h
a
v
i
o
r
,
”
i
n
I
EEE
C
o
m
p
u
t
e
r
S
o
c
i
e
t
y
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
W
o
rks
h
o
p
s
,
2
0
1
2
,
p
p
.
1
–
6
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
W
.
2
0
1
2
.
6
2
3
9
3
4
8
.
[
1
9
]
C
.
Z
h
a
n
g
,
H
.
Li
,
X
.
W
a
n
g
,
a
n
d
X
.
Y
a
n
g
,
“
C
r
o
ss
-
s
c
e
n
e
c
r
o
w
d
c
o
u
n
t
i
n
g
v
i
a
d
e
e
p
c
o
n
v
o
l
u
t
i
o
n
a
l
n
e
u
r
a
l
n
e
t
w
o
r
k
s
,
”
i
n
2
0
1
5
I
E
E
E
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
(
C
VP
R)
,
J
u
n
.
2
0
1
5
,
p
p
.
8
3
3
–
8
4
1
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
5
.
7
2
9
8
6
8
4
.
[
2
0
]
P
.
A
l
l
a
i
n
,
N
.
C
o
u
r
t
y
,
a
n
d
T
.
C
o
r
p
e
t
t
i
,
“
A
G
O
R
A
S
ET:
a
d
a
t
a
set
f
o
r
c
r
o
w
d
v
i
d
e
o
a
n
a
l
y
si
s
,
”
i
n
1
st
I
C
P
R
I
n
t
e
r
n
a
t
i
o
n
a
l
Wo
r
k
s
h
o
p
o
n
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
a
n
d
C
r
o
w
d
A
n
a
l
y
si
s
,
2
0
1
2
,
p
p
.
1
–
6
.
[
2
1
]
T.
E
l
l
i
s,
“
P
e
r
f
o
r
ma
n
c
e
m
e
t
r
i
c
s
a
n
d
m
e
t
h
o
d
s
f
o
r
t
r
a
c
k
i
n
g
i
n
s
u
r
v
e
i
l
l
a
n
c
e
,
”
i
n
3
r
d
I
EE
E
Wo
r
k
s
h
o
p
o
n
Pe
r
f
o
rm
a
n
c
e
Ev
a
l
u
a
t
i
o
n
o
f
T
ra
c
k
i
n
g
a
n
d
S
u
r
v
e
i
l
l
a
n
c
e
,
O
c
t
.
2
0
0
2
,
p
p
.
2
6
–
31
.
[
2
2
]
E.
B
.
N
i
e
v
a
s,
O
.
D
.
S
u
a
r
e
z
,
G
.
B
.
G
a
r
c
í
a
,
a
n
d
R
.
S
u
k
t
h
a
n
k
a
r
,
“
V
i
o
l
e
n
c
e
d
e
t
e
c
t
i
o
n
i
n
v
i
d
e
o
u
si
n
g
c
o
m
p
u
t
e
r
v
i
si
o
n
t
e
c
h
n
i
q
u
e
s,
”
C
o
m
p
u
t
e
r
A
n
a
l
y
si
s
o
f
I
m
a
g
e
s
a
n
d
Pa
t
t
e
r
n
s
,
B
e
r
l
i
n
,
H
e
i
d
e
l
b
e
r
g
:
S
p
r
i
n
g
e
r
,
2
0
1
1
,
p
p
.
3
3
2
–
3
3
9
,
d
o
i
:
1
0
.
1
0
0
7
/
9
7
8
-
3
-
6
4
2
-
2
3
6
7
8
-
5
_
3
9
.
[
2
3
]
V
.
M
a
h
a
d
e
v
a
n
,
W
.
L
i
,
V
.
B
h
a
l
o
d
i
a
,
a
n
d
N
.
V
a
sc
o
n
c
e
l
o
s,
“
A
n
o
ma
l
y
d
e
t
e
c
t
i
o
n
i
n
c
r
o
w
d
e
d
s
c
e
n
e
s,
”
i
n
2
0
1
0
I
EEE
C
o
m
p
u
t
e
r
S
o
c
i
e
t
y
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
Pa
t
t
e
r
n
Re
c
o
g
n
i
t
i
o
n
,
J
u
n
.
2
0
1
0
,
p
p
.
1
9
7
5
–
1
9
8
1
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
.
2
0
1
0
.
5
5
3
9
8
7
2
.
[
2
4
]
A
.
A
c
s
i
n
t
o
a
e
e
t
a
l
.
,
“
U
B
n
o
r
m
a
l
:
n
e
w
b
e
n
c
h
mar
k
f
o
r
su
p
e
r
v
i
s
e
d
o
p
e
n
-
s
e
t
v
i
d
e
o
a
n
o
m
a
l
y
d
e
t
e
c
t
i
o
n
,
”
i
n
2
0
2
2
I
EE
E
/
C
V
F
C
o
n
f
e
re
n
c
e
o
n
C
o
m
p
u
t
e
r
Vi
s
i
o
n
a
n
d
P
a
t
t
e
r
n
R
e
c
o
g
n
i
t
i
o
n
(
C
VP
R)
,
J
u
n
.
2
0
2
2
,
p
p
.
2
0
1
1
1
–
2
0
1
2
1
,
d
o
i
:
1
0
.
1
1
0
9
/
C
V
P
R
5
2
6
8
8
.
2
0
2
2
.
0
1
9
5
1
.
[
2
5
]
W
.
L
u
o
,
W
.
Li
u
,
a
n
d
S
.
G
a
o
,
“
A
r
e
v
i
si
t
o
f
s
p
a
r
s
e
c
o
d
i
n
g
b
a
se
d
a
n
o
ma
l
y
d
e
t
e
c
t
i
o
n
i
n
s
t
a
c
k
e
d
R
N
N
f
r
a
mew
o
r
k
,
”
i
n
2
0
1
7
I
EE
E
I
n
t
e
r
n
a
t
i
o
n
a
l
C
o
n
f
e
r
e
n
c
e
o
n
C
o
m
p
u
t
e
r V
i
s
i
o
n
(
I
C
C
V)
,
O
c
t
.
2
0
1
7
,
p
p
.
3
4
1
–
3
4
9
,
d
o
i
:
1
0
.
1
1
0
9
/
I
C
C
V
.
2
0
1
7
.
4
5
.
[
2
6
]
B
.
Y
.
-
M
e
n
g
,
W
.
Y
a
n
g
,
a
n
d
W
.
S
.
-
S
h
e
n
,
“
D
e
t
e
c
t
i
o
n
o
f
a
b
n
o
r
m
a
l
h
u
m
a
n
b
e
h
a
v
i
o
r
i
n
v
i
d
e
o
i
m
a
g
e
s
b
a
s
e
d
o
n
a
h
y
b
r
i
d
a
p
p
r
o
a
c
h
,
”
I
n
t
e
r
n
a
t
i
o
n
a
l
J
o
u
r
n
a
l
o
f
A
d
v
a
n
c
e
d
C
o
m
p
u
t
e
r
S
c
i
e
n
c
e
a
n
d
A
p
p
l
i
c
a
t
i
o
n
s
,
v
o
l
.
1
3
,
n
o
.
1
1
,
2
0
2
2
,
d
o
i
:
1
0
.
1
4
5
6
9
/
I
J
A
C
S
A
.
2
0
2
2
.
0
1
3
1
1
3
8
.
[
2
7
]
H
.
B
a
g
h
e
r
i
n
e
z
h
a
d
a
n
d
S
.
Y
.
S
o
l
t
a
n
i
,
“
A
b
n
o
r
ma
l
h
u
ma
n
b
e
h
a
v
i
o
r
d
e
t
e
c
t
i
o
n
s
y
st
e
m
i
n
v
i
d
e
o
su
r
v
e
i
l
l
a
n
c
e
s
y
st
e
ms,
”
S
S
RN
E
l
e
c
t
r
o
n
i
c
J
o
u
rn
a
l
,
2
0
2
2
,
d
o
i
:
1
0
.
2
1
3
9
/
ssr
n
.
4
1
0
6
3
2
3
.
[
2
8
]
M
.
I
.
G
e
o
r
g
e
s
c
u
,
R
.
I
o
n
e
sc
u
,
F
.
S
.
K
h
a
n
,
M
.
P
o
p
e
sc
u
,
a
n
d
M
.
S
h
a
h
,
“
A
b
a
c
k
g
r
o
u
n
d
-
a
g
n
o
s
t
i
c
f
r
a
m
e
w
o
r
k
w
i
t
h
a
d
v
e
r
sar
i
a
l
t
r
a
i
n
i
n
g
f
o
r
a
b
n
o
r
m
a
l
e
v
e
n
t
d
e
t
e
c
t
i
o
n
i
n
v
i
d
e
o
,
”
I
E
EE
T
ra
n
s
a
c
t
i
o
n
s
o
n
P
a
t
t
e
r
n
An
a
l
y
s
i
s
a
n
d
M
a
c
h
i
n
e
I
n
t
e
l
l
i
g
e
n
c
e
,
v
o
l
.
4
4
,
n
o
.
9
,
p
p
.
4
5
0
5
–
4
5
2
3
,
2
0
2
2
,
d
o
i
:
1
0
.
1
1
0
9
/
T
P
A
M
I
.
2
0
2
1
.
3
0
7
4
8
0
5
.
B
I
O
G
RAP
H
I
E
S O
F
AUTH
O
RS
H
u
o
n
g
-
G
i
a
n
g
Do
a
n
re
c
e
iv
e
d
B.
E.
d
e
g
re
e
in
In
str
u
m
e
n
tati
o
n
a
n
d
In
d
u
strial
In
fo
rm
a
t
ics
in
2
0
0
3
,
M
.
E.
in
I
n
str
u
m
e
n
tatio
n
a
n
d
A
u
to
m
a
ti
c
Co
n
tr
o
l
S
y
ste
m
in
2
0
0
6
a
n
d
P
h
.
D
.
in
Co
n
tro
l
E
n
g
i
n
e
e
rin
g
a
n
d
Au
t
o
m
a
ti
o
n
i
n
2
0
1
7
,
a
ll
fro
m
Ha
n
o
i
Un
iv
e
rsity
o
f
S
c
ien
c
e
a
n
d
Tec
h
n
o
l
o
g
y
,
Ha
n
o
i,
Vie
t
n
a
m
.
S
h
e
c
a
n
b
e
c
o
n
tac
ted
a
t
e
m
a
il
:
g
ian
g
d
th
@e
p
u
.
e
d
u
.
v
n
.
Ng
o
c
-
Tr
u
n
g
N
g
u
y
e
n
re
c
e
iv
e
d
B.
E
.
d
e
g
re
e
i
n
P
o
we
r
S
y
ste
m
in
2
0
0
3
,
M
.
E
i
n
El
e
c
tri
c
a
l
En
g
in
e
e
rin
g
in
2
0
0
6
,
a
ll
fr
o
m
Ha
n
o
i
U
n
iv
e
rsit
y
o
f
S
c
ien
c
e
a
n
d
Tec
h
n
o
l
o
g
y
,
Ha
n
o
i
,
Vie
tn
a
m
;
re
c
e
iv
e
d
P
h
.
D
.
i
n
El
e
c
t
rica
l
En
g
i
n
e
e
rin
g
fr
o
m
Un
i
v
e
rsit
y
o
f
P
a
lerm
o
,
P
a
lerm
o
,
Italy
,
in
2
0
1
4
.
He
c
a
n
b
e
c
o
n
tac
ted
a
t
e
m
a
il
:
tru
n
g
n
n
@e
p
u
.
e
d
u
.
v
n
.
Evaluation Warning : The document was created with Spire.PDF for Python.