Inter
national
J
our
nal
of
Inf
ormatics
and
Communication
T
echnology
(IJ-ICT)
V
ol.
14,
No.
1,
April
2025,
pp.
276
∼
286
ISSN:
2252-8776,
DOI:
10.11591/ijict.v14i1.pp276-286
❒
276
Ensemble
appr
oach
to
rumor
detection
with
BER
T
,
GPT
,
and
POS
featur
es
Barsha
P
attanaik
1
,
Soura
v
Mandal
1
,
Rudra
Mohan
T
ripath
y
1
,
Arif
Ahmed
Sekh
2
1
School
of
Computer
Science
and
Engineering,
XIM
Uni
v
ersity
,
Bhubanesw
ar
,
India
2
Department
of
Computer
Science,
UiT
The
Arctic
Uni
v
ersity
of
Norw
ay
,
T
roms,
Norw
ay
Article
Inf
o
Article
history:
Recei
v
ed
Jun
26,
2024
Re
vised
Oct
7,
2024
Accepted
No
v
19,
2024
K
eyw
ords:
BER
T
BiLSTM
Ensemble
model
GPT
P
art
of
speech
Rumor
detection
ABSTRA
CT
As
v
ast
amounts
of
rumor
content
are
transmitted
on
social
media,
it
is
v
ery
challenging
to
detect
them.
This
study
e
xplores
an
ens
emble
approach
to
ru-
mor
detection
in
social
media
messages,
le
v
eraging
the
strengths
of
adv
anced
natural
language
processing
(NLP)
models.
Specically
,
we
impl
emented
three
distinct
models:
(i)
generati
v
e
pre-trained
transformer
(GPT)
combined
with
a
bidirectional
long
short-term
memory
(BiLSTM)
netw
ork;
(ii)
a
model
inte-
grating
part-of-speech
(POS)
tagging
with
bidirectional
encode
r
representations
from
transformers
(BER
T)
and
BiLSTM,
and
(iii)
a
model
that
mer
ges
POS
tag-
ging
with
GPT
and
BiLSTM.
W
e
included
additional
features
from
t
he
dataset
in
all
these
models.
Each
model
captures
dif
ferent
linguistic,
syntactical,
and
conte
xtual
features
within
the
te
xt,
contrib
uting
uniquely
to
the
classication
task.
T
o
enhance
the
rob
ustness
and
accurac
y
of
our
predictions,
we
emplo
yed
an
ensemble
method
using
hard
v
oting.
This
technique
aggre
g
ates
the
predic-
tions
from
each
model,
determining
the
nal
classication
based
on
the
majority
v
ote.
Our
e
xperimental
results
demonstrate
that
the
ensemble
approach
signif-
icantly
outperforms
indi
vidual
models,
achie
ving
superior
accurac
y
in
identify-
ing
r
umors.
T
o
determine
the
performance
of
our
model,
we
used
PHEME
and
W
eibo
datasets
a
v
ailable
publicly
.
W
e
found
our
model
g
a
v
e
97.6%
and
98.4%
accurac
y
,
respecti
v
ely
,
on
the
datasets
and
has
outperformed
the
state-of-the-art
models.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthors:
Barsha
P
attanaik
School
of
Computer
Science
and
Engineering,
XIM
Uni
v
ersity
Bhubanesw
ar
-752050,
Odisha,
India
Email:
barsha@xustudent.edu.in
1.
INTR
ODUCTION
In
this
digital
age,
social
media
platforms
ha
v
e
become
ubiquitous,
serving
as
prim
ary
channels
for
information
dissemination
and
communication.
While
these
platforms
of
fer
numerous
benets,
the
y
also
f
a-
cilitate
the
rapid
spread
of
misinformation
and
rumors,
which
can
ha
v
e
signicant
societal
impacts.
The
pro-
liferation
of
f
alse
information
on
social
media
can
lead
to
public
panic,
misinformation
crises,
and
harm
to
indi
viduals
or
groups.
Therefore,
de
v
eloping
ef
fecti
v
e
techniques
for
detecting
and
mitig
ating
the
spread
of
rumors
has
become
a
critical
area
of
research.
T
raditional
rumor
detection
methods
often
rely
on
manual
v
er
-
ication,
which
is
time-consuming
and
impractical
gi
v
en
the
v
ast
v
olume
of
data
generated
on
social
media.
Automated
rumor
detection
systems,
le
v
eraging
adv
ancements
in
natural
language
processing
(NLP),
machine
learning
(ML),
and
deep
learning
(DL),
of
fer
a
promising
solution
to
this
challenge.
Recent
de
v
elopments
in
DL-based
models
,
such
as
generati
v
e
pre-trained
transformer
(GPT)
and
bidirecti
onal
encoder
representa-
J
ournal
homepage:
http://ijict.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Inf
&
Commun
T
echnol
ISSN:
2252-8776
❒
277
tions
from
transformers
(BER
T),
ha
v
e
demonstrated
remarkable
capabilities
in
understanding
and
generating
human-lik
e
te
xt,
making
them
suitable
for
tackling
comple
x
linguistic
tasks.
Pre
vious
research
focused
on
the
role
of
te
xtual
characteristics
in
rumor
identica
tion,
although
it
h
a
s
not
specically
considered
the
ef
fect
of
additional
social
media
f
actors,
such
as
the
number
of
retweets,
user
follo
ws,
and
user
friends.
These
features
of
fer
an
essential
conte
xt
for
comprehending
the
dissemination
and
inuence
of
rumors.
Pre
vious
studies
ha
v
e
predominantly
concentrated
on
language-centric
models,
ignoring
the
possibilities
of
social
media
information.
This
study
addresses
this
g
ap
by
inte
grating
te
xt
embeddings
and
supplementary
social
media
data
into
an
ensemble
model,
pro
viding
a
more
comprehensi
v
e
approach
to
rumor
identication.
Furthermore,
adding
to
the
no
v
elty
,
we
ha
v
e
incorporated
both
BER
T
and
GPT
embeddings
to
study
the
semantic
understanding
and
syntactic
features
along
with
conte
xtual
embeddings.
In
this
study
,
we
propose
an
ensemble-bas
ed
approach
for
rumor
d
e
tection,
inte
grating
the
strengths
of
multiple
NLP
models.
F
or
this
purpose,
we
e
xtracted
four
dif
ferent
types
of
features
using
NLP
tools
and
libraries.
All
the
features
are
e
xplained
belo
w
,
along
with
an
o
v
ervie
w
of
our
proposed
methodology
.
Features
e
xtraction
and
methodology
includes:
−
W
e
use
pre-trained
BER
T
embeddings
to
generate
dense
v
ector
embeddings
that
capture
the
semantic
mean-
ing
and
conte
xt
of
the
w
ords
in
the
messages.
−
GPT
embeddings,
li
k
e
BER
T
embeddings,
capture
the
conte
xtual
meaning
of
a
te
xt
b
ut
are
unidirectional,
considering
conte
xt
from
left
to
right.
The
y
help
the
model
understand
language
features
and
semantics.
−
P
art
of
speech
(POS)
tag
features
pro
vide
syntactic
information
crucial
for
understanding
sentence
structure
and
identifying
patterns
in
rumors.
Each
w
ord
is
tagged
with
its
part
of
speech
(noun
and
v
erb),
and
these
tags
are
con
v
erted
into
one-hot
encoded
v
ectors.
−
Additional
features
(AF)
include
retweet
count,
‘user
.follo
wers
count’,
‘user
.friends
count’,
and
‘f
a
v
orite
count’,
normalized
using
a
standard
scaler
.
These
features
quantify
post
eng
agement
and
user
inuence,
indicating
the
spread
and
credibility
of
the
message.
These
features
include
numerical
metrics
associated
with
the
social
media
post,
such
as
the
num
b
e
r
of
f
a
v
orites,
retweets,
and
user
-specic
information
lik
e
follo
wer
and
friend
counts
to
pro
vide
quantitati
v
e
information
about
the
post’
s
eng
agement
and
the
user’
s
social
inuence,
which
can
be
indicati
v
e
of
the
spread
and
credibility
of
the
message.
By
usi
n
g
these
features,
we
de
v
eloped
three
distinct
models
as
discussed
in
section
3.2.
T
o
enhance
the
o
v
erall
performance,
we
emplo
yed
an
ensemble
method
of
these
three
models
using
hard
v
oting,
wherein
the
majority
v
ote
from
the
indi
vidual
models
determines
the
nal
classication.
This
ensemble
approach
aims
to
le
v
erage
the
di
v
erse
strengths
of
each
model,
resulting
in
impro
v
ed
accurac
y
and
reliability
in
detecting
rumors.
W
e
ha
v
e
used
tw
o
publicly
a
v
ailable
rumor
datasets-PHEME
[1],
and
W
eibo
[2]
for
our
e
xperiments
in
this
study
.
W
e
ha
v
e
translated
the
W
eibo
dataset
into
English
language
using
Google
T
ranslator
disccused
in
section
3.1.
Therefore,
we
refer
to
this
dataset
as
‘W
eiboE’.
The
follo
wing
are
our
signicant
contrib
utions
in
this
study:
i)
W
e
ha
v
e
used
multiple
baseline
architectures
and
pre-trained
models,
such
as
BER
T
and
GPT
,
to
create
alternati
v
e
neural
netw
orks
for
rumor
detection.
ii)
W
e
ha
v
e
presented
an
ensemble
classier
for
rumor
detection
(the
rst
of
its
kind),
including
thorough
research
and
performance
analysis
and
impro
ving
the
baseline.
iiii)
Our
proposed
model
outperformed
all
the
pre
vious
rumor
detection
systems
on
both
the
standard
datasets-
PHEME
and
W
eibo.
Ne
xt,
we
describe
related
w
ork
in
section
2,
the
proposed
methodology
in
section
3,
result
analysis
in
section
4,
and
conclude
in
section
5.
2.
RELA
TED
W
ORK
In
literature,
numerous
techniques
ha
v
e
been
de
v
eloped
to
detect
f
ak
e
ne
ws
[3]–[5].
Recentl
y
,
the
research
community
has
also
focus
ed
on
identifying
rumors.
Although
f
ak
e
ne
ws
and
rumors
are
distinct,
the
methods
emplo
yed
by
researchers
are
quite
similar
,
in
v
olving
te
xt
or
document
c
lassication
using
NLP
and
adv
anced
ML
or
DL
techniques.
W
ith
the
increasing
pre
v
alence
of
rumor
content
on
social
media,
researchers
ha
v
e
de
vised
v
arious
DL-based
models
to
tackle
this
issue.
Detailed
information
on
these
methods
and
their
performance
can
be
found
in
se
v
eral
surv
e
y
papers
[6]–[11].
In
this
secti
on,
we
summarize
some
of
the
notable
research
that
utilized
ML
or
DL-based
approaches
and
demonstrated
strong
performance
on
their
respecti
v
e
datasets.
Ensemble
appr
oac
h
to
rumor
detection
with
BERT
,
GPT
,
and
POS
featur
es
(Bar
sha
P
attanaik)
Evaluation Warning : The document was created with Spire.PDF for Python.
278
❒
ISSN:
2252-8776
2.1.
T
raditional
methods
f
or
rumor
detection
Early
approaches
to
rumor
detection
primarily
relied
on
traditional
ML
techniques
combined
with
handcrafted
features.
These
methods
typically
in
v
olv
ed
tw
o
things:
feature
e
xtraction
by
using
le
xical
cues,
metadata
(e.g.,
user
inform
ation,
message
propag
ation
patterns),
and
temporal
patterns,
which
were
manually
e
xtracted.
Second,
ML
algorithms
lik
e
support
v
ector
machines
(SVM),
decision
trees,
and
Nai
v
e
Bayes
were
used
to
classify
messages
as
rumors
or
non-rumors.
F
or
instance,
Castillo
et
al.
[12]
discussed
the
rele
v
ance
and
signicance
of
information
quality
particularly
credibility
of
the
information
in
the
conte
xt
of
T
witter
which
is
one
of
the
f
astest-gro
wing
social
media
platforms
posting
information
both
true
and
f
alse
rumors.
Hence,
the
y
de
v
eloped
a
method
that
applies
SVM
with
f
actors
lik
e
the
number
of
retweets,
data
URLs,
and
credibility
scores
of
the
user
publishing
on
the
T
witter
platform
to
lter
out
f
ak
e
ne
ws.
This
approach
of
fered
impro
v
ement
b
ut
it
had
its
do
wnsides
since
the
features
had
to
be
e
xtracted
tediously
by
hand
and
mi
ght
not
al
w
ays
apply
to
other
datasets
or
scenarios.
Depending
on
the
accurac
y
le
v
el
of
their
study
,
which
w
as
about
86%,
the
y
managed
to
nd
answers
to
questions.
2.2.
Use
of
deep
lear
ning
models
Ne
w
generations
of
the
model
were
de
v
eloped
by
man
y
researchers
as
DL
came
in,
which
is
capable
of
e
xtracti
ng
features
from
ra
w
te
xt
without
requi
ring
human
ma
nu
a
l
interv
ention.
Durat
ion-based
recurrent
neural
netw
orks
(RNNs),
part
icularly
long
short-term
memory
(LSTM),
and
bidirectional
LSTM
(BiLSTM)
netw
orks
[13],
emer
ged
as
a
rich
source
of
capturing
the
sequential
structure
of
te
xtual
data.
Meanwhile,
Ma
et
al.
[2]
further
proposed
a
no
v
el
model
adopting
LSTM
netw
orks,
which
incorporated
temporal
infor
-
mation
of
tweet
sequences
to
enhance
the
identication
of
e
xisting
rumor
tweets.
The
accurac
y
on
the
T
witter
dataset
is
88.1%
while
on
the
W
eiboE
dataset
is
91%.
Using
both
datasets,
Ruchansk
y
et
al.
[14]
proposed
addressing
f
ak
e
ne
ws
detection
by
using
CSI
(capture,
score,
inte
grate)
model,
based
on
RNNs
and
user
and
comments’
features.
RNN
and
con
v
olutional
neural
netw
ork
(CNN)
[15]
are
used
for
capturing
both
temporal
and
content
features
hence
gi
ving
high
accurac
y
while
the
incorporation
of
user
beha
vior
signicantly
enhances
the
rob
ustness
of
the
model.
There
w
as
approxima
tely
89%
accurac
y
when
applied
in
the
T
witter
dataset
and
95.3%
in
the
W
eibo
dataset
depending
on
the
CSI
model.
2.3.
Use
of
transf
ormer
-based
models
Later
,
deeper
models
lik
e
BER
T
,
GPT
and
man
y
others
ha
v
e
transformed
NLP
by
pro
viding
better
encoding
techniques
of
conte
xt
information.
These
model
s
do
store
deep
semantic
and
syntactic
features
and
are
generally
ef
fecti
v
e
for
the
te
xt
classication
problem.
The
paper
by
De
vlin
et
al.
[16]
presented
BER
T
,
a
totally
ne
w
model
that
ne-tuned
preempti
v
ely
on
the
lar
ge
te
xt
associated
with
the
v
ocab
ulary
obtained
by
W
eb
Scraping
and
achie
v
ed
state-of
the-art
accurac
y
on
numerous
NLP
enterprises
through
emplo
ying
bidirectional
conte
xt.
This
is
mainly
due
to
one
of
the
considerable
adv
antages
of
BER
T
that
allo
ws
it
to
recognize
the
w
ord
conte
xt
in
both
directions
which
is
especially
benecial
for
recognizing
the
language
used
in
rumors.
Anggrainingsih
et
al.
[17]
de
v
eloped
a
BER
T
-based
approach
for
rumor
detection
by
using
sentence
embedding
to
capture
the
conte
xtual
meanings
of
the
message.
GPT
-2
w
as
designed
by
Radford
et
al.
[18],
and
its
e
xcellent
capacity
for
language
generation
established
through
‘guessing’
the
ne
xt
w
ord
in
a
gi
v
en
te
xt
results
from
capturing
conte
xtual
relation
dependencies.
This
w
ork
is
one
of
the
P
aragon
w
orks
in
this
group
demonstrating
that
massi
v
e
language
models
can
perform
a
number
of
tasks
without
problem-specic
training.
Liu
et
al.
[19]
used
dif
ferent
lar
ge
language
models
such
as
GPT
and
BER
T
to
check
whether
these
models
can
detect
rumors
in
social
media
by
using
both
ne
ws
and
comments
and
their
propag
ation
information.
2.4.
Use
of
ensemble
methods
The
combination
of
models
has
been
referred
to
as
ensemble
methods
due
to
the
ability
of
the
higher
and
numerous
models
to
impro
v
e
on
the
basic
models.
The
y
can
enable
impro
v
ement
by
using
the
strengths
of
each
model
when
it
comes
to
impro
v
e
the
o
v
erall
performance.
Hard
v
oting
[20]
w
orks
by
taking
a
majority
v
ote
for
cate
gories
while
soft
v
oting
tak
es
the
probability
that
each
model
has
assigned
to
cate
gories.
As
can
be
seen
from
the
abo
v
e
equations,
both
techniques
ha
v
e
made
enhancements
in
enhancing
classication
tasks
by
o
v
ercoming
the
demerits
of
separate
models.
K
otteti
et
al.
[21]
emplo
yed
an
ensemble
of
dL-based
models
by
us
ing
CNNs,
RNNs,
and
LSTMs
for
processing
time-series
data
to
impro
v
e
the
accurac
y
and
rob
ustness
of
rumor
detection.
The
y
used
features
from
the
time-series
data,
including
tweet
content,
user
metadata,
and
netw
ork
propag
ation
patterns.
The
e
xtracted
features
were
then
fused
to
create
a
comprehensi
v
e
representation
used
for
classication.
The
model
g
a
v
e
64.3%,
which
is
79%
more
in
terms
of
micro
F1-score
on
PHEME
Int
J
Inf
&
Commun
T
echnol,
V
ol.
14,
No.
1,
April
2025:
276–286
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Inf
&
Commun
T
echnol
ISSN:
2252-8776
❒
279
datasets
compared
to
the
baselines.
Recently
,
Y
uan
et
al.
[22]
de
v
eloped
an
ensembling
model
by
using
tw
o
dif
ferent
features,
such
as
image
and
te
xt.
F
or
te
xt
data,
the
authors
used
BER
T
and
BiLSTM,
g
ated
recurrent
unit
(GR
U)
[23],
and
for
image
data,
the
y
used
CNN.
After
that,
the
y
ensembled
dif
ferent
models
and
used
soft
v
oting
for
the
nal
class
ication.
Nith
ya
et
al.
[24]
de
v
eloped
a
h
ybrid
model
le
v
erages
the
strengths
of
multiple
adv
anced
techniques
in
NLP
and
DL
to
ef
fecti
v
ely
classify
rumor
te
xts,
combining
deep
conte
x-
tual
understanding
(using
BER
T),
hierarchical
feature
e
xtraction,
feature
importance
analysis,
and
sequence
modeling
(using
Bi-LSTM)
into
a
cohesi
v
e
frame
w
ork.
Our
w
ork
focused
on
de
v
eloping
three
dif
ferent
models
to
capture
dif
ferent
aspects
of
te
xtual
data,
le
v
eraging
both
semantic
understanding
and
syntactic
features
along
with
conte
xtual
embeddings.
Finally
,
an
ensemble
model
is
proposed
to
inte
grate
the
predictions
from
these
models,
enhancing
the
rob
ustness
and
accu-
rac
y
of
the
classication.
W
e
are
the
rst
to
propose
a
model
by
considering
dif
ferent
features
and
inte
grating
them
into
an
ensemble
model.
3.
PR
OPOSED
METHOD
The
objecti
v
e
of
this
research
is
to
detect
if
a
message
or
post
is
a
rumor
or
not.
W
e
ha
v
e
considered
a
binary
classication
approach.
F
or
this
research
as
a
baseline
model,
we
ha
v
e
used
BER
T
embedding
with
the
BiLSTM
netw
ork
[25],
[26]
de
v
eloped
for
sentiment
analysis
and
entity
recognition
for
clinical
tests,
follo
wed
by
a
dense
layer
and
softmax
for
classication.
The
BiLSTM
processes
the
sequence
of
embeddings
in
both
forw
ard
and
backw
ard
directions,
capturing
dependencies
from
the
past
and
future
conte
xts.
3.1.
Data
collection
and
pr
e-pr
ocessing
F
or
our
e
xperiment,
we
use
tw
o
publicly
a
v
ailable
datasets,
such
as
PHEME
[1]
and
W
eibo
[2]
focus-
ing
on
messages
labeled
as
either
rumors
or
non-rumors.
The
dataset
w
as
curated
to
ensure
a
balanced
represen-
tation
of
both
classes,
allo
wing
for
ef
fecti
v
e
training
and
e
v
aluation
of
our
models.
The
PHEME
dataset
(5,802
samples)
pro
vides
a
detailed
breakdo
wn
of
rumor
(1,972
instances)
and
non-rumor
(3,830
instances)
tweets
across
se
v
eral
v
e
e
v
ents
(O
T
:
Otta
w
a
shooting,
GC:
Germanwings
crash,
FE:
Fer
guson,
CH:
Charlihebdo,
SY
:
Sydne
ysidge).
Similarly
,
the
W
eibo
dataset
contains
2,313
rumors
and
2,351
non-rumors.
The
W
eibo
dataset
contains
man
y
features,
b
ut
we
only
used
four
features
such
as
‘retweet
count’,
‘user
.follo
wers
count’,
‘user
.friends
count’,
and
‘f
a
v
orite
count’
for
our
model.
W
e
translate
the
W
eibo
dataset
ori
ginally
a
v
ailable
in
the
Chinese
language
to
English
using
the
“Google
T
ranslator”
of
‘deep
translator’
package.
The
name
of
the
dataset
is
gi
v
en
as
W
eiboE
and
the
link
to
the
dataset
is
a
v
ailable
on
the
github
[27]
for
future
research.
W
eibo
is
not
se
gre
g
ated
in
e
v
ents
lik
e
PHEME.
The
details
about
the
datasets
are
e
xplained
belo
w
in
a
stack
ed
bar
graph
in
Figure
1
and
T
able
1
sho
ws
the
sample
messages
of
PHEME
datasets
containing
AF
and
labels.
Data
pre-processing
in
v
olv
ed
se
v
eral
steps
to
clean
and
prepare
the
te
xt
data
for
model
training.
All
t
he
te
xt
breaks
do
wn
into
indi
vidual
tok
ens
(w
ords
or
subw
ords).
Then,
common
stop-w
ords
are
remo
v
ed
that
don’
t
ha
v
e
signicant
contrib
utions
to
the
meaning
of
a
sentence.
Then,
lemmatization
is
done
to
mak
e
a
w
ord
into
its
base
forms.
Annotation
uses
POS
tags
to
capture
the
syntactic
information.
After
tagging,
we
use
padding
and
truncation
to
ensure
uniform
input
lengths
for
batch
processing.
F
or
e
xample,
in
the
case
of
the
Otta
w
a
shooting
e
v
ent,
a
sample
of
data
is
“Otta
w
a
police
are
conrming
a
shooting
at
the
W
ar
Memorial.
Minutes
ago.
No
other
info.
#cbcO
TT
#O
TTne
ws”
rst
tok
eni
zation
is
done.
Subsequently
,
lo
wercasing
transforms
all
tok
ens
to
lo
wercase
to
ensure
uniformity
.
Stopw
ords
lik
e
“are”
and
“at”
are
eliminated
to
emphasize
more
signicant
k
e
yw
ords.
Punct
uation
and
special
characters,
including
hashtags,
are
remo
v
ed
to
sanitize
the
data.
The
cleaned
tok
ens
obtained
are
[‘Otta
w
a’,
‘police’,
‘conrming’,
‘shooting’,
‘w
ar’,
memorial’,
‘minutes’,
‘ago’,
‘info’,
‘cbcott’,
‘ottne
ws’].
Moreo
v
er
,
POS
tagging
can
be
utilized
to
pro
vide
grammatical
classications
to
each
tok
en,
so
of
fering
enhanced
linguistic
conte
xt.
3.2.
Experimental
models
3.2.1.
Baseline
model-
BER
T
and
BiLSTM
W
e
use
a
simple
BER
T
with
BiLSTM
netw
ork
(BER
T+BiLSTM)
de
v
eloped
by
[25],
[26]
as
the
base
model
for
our
rumor
detection
task.
W
e
used
the
PHEME
dataset
of
v
e
e
v
ents
containing
social
media
posts
as
input
messages.
The
model
used
a
pre-trained
transformer
,
BER
T
,
to
e
xtract
conte
xtual
information
for
w
ord
embedding.
Ne
xt,
the
embedding
v
ectors
from
the
BER
T
are
sequentially
fed
to
the
BiLSTM
netw
ork
for
learning
bi-directional
long-term
dependencies
of
the
w
ords
(v
ectors)
across
the
input
sentences.
The
con-
catenated
and
attened
v
ector
for
each
sequence
is
then
fed
to
the
dense
layer
,
follo
wed
by
the
softmax
layer
Ensemble
appr
oac
h
to
rumor
detection
with
BERT
,
GPT
,
and
POS
featur
es
(Bar
sha
P
attanaik)
Evaluation Warning : The document was created with Spire.PDF for Python.
280
❒
ISSN:
2252-8776
for
clas
sication.
W
e
introduce
three
ne
w
v
ariants
of
this
baseline
model,
each
using
dif
ferent
feat
ure
combi-
nations
b
ut
sharing
a
common
BiLSTM
architecture
which
are
e
xplained
in
the
sections
belo
w
.
Figure
1.
Rumors
and
non-rumors
data
distrib
ution
across
PHEME
and
W
eiboE
datasets
T
able
1.
Sample
data
of
PHEME
and
W
eiboE
datasets
sho
wing
features
with
labels
(0-non
rumor
,
1-rumor)
Sample
te
xts
f
a
v
orite
count
retweet
count
user
.follo
wers
count
user
.friends
count
Label
Being
black
in
this
country
is
dangerous
b
usiness.
#Fer
guson
(PHEME)
117
198
25565
1593
0
Rest
in
Peace,
Cpl.
Nathan
Cirillo.
Killed
today
in
#Otta
w
aShooting
http://t.co/YzLXYX5JJt
http://t.co/8F0qAcj9sg
(PHEME)
96
112
14793
1052
1
At
8:26
am
on
February
26,
Li
T
ian
yi
w
as
released
on
bail
and
is
no
w
returning
home.(W
eiboE)
33
1272
977
499
1
W
e
are
all
grandsons,
wh
y
are
our
realms
so
dif
ferent?
(W
eiboE)
4
1073
32914
2180
0
3.2.2.
Model-1
(GPT
,
AF
,
and
BiLSTM)
In
this
model,
we
use
tw
o
features:
GPT
embeddings
and
additional
features,
or
AF
as
sho
wn
in
Figure
2.
W
e
utilized
the
pre-trained
GPT
instead
of
BER
T
to
generate
conte
xtualized
embeddings
for
each
w
ord
in
the
input
message.
GPT’
s
capacity
t
o
understand
and
generate
coherent
te
xt
w
as
le
v
eraged
to
capture
the
nuanced
conte
xt
within
the
data.
The
embedding
v
ectors
from
GPT
,
along
with
the
other
features,
are
then
fed
into
a
BiLSTM
netw
ork
(GPT+AF+BiLSTM)
and
the
rest
are
the
same
as
in
the
baseline
model.
3.2.3.
Model-2
(POS,
BER
T
,
AF
,
and
BiLSTM)
Model
2
in
Figure
3
uses
three
features,
such
as
AF
deri
v
ed
from
the
te
xt,
POS
features,
and
BER
T
embeddings
(POS+BER
T+AF+BiLSTM).
POS
is
a
critical
linguistic
processing
step
that
in
v
olv
es
annotating
each
w
ord
in
a
sentence
with
it
s
corresponding
part
of
speech,
s
uch
as
noun,
v
erb,
and
adjecti
v
e.
In
the
conte
xt
of
r
u
m
or
detection,
POS
tag-based
features
serv
e
se
v
eral
important
purposes.
Each
tok
en
is
annotated
with
its
POS
tag,
pro
viding
additional
syntactic
information.
These
features
can
be
particularly
useful
for
capturing
syntactic
and
grammatical
nuances
that
purely
w
ord-based
embedding
techniques
might
miss.
This
enriched
feature
set
can
impro
v
e
the
o
v
erall
perform
ance
of
the
models
in
detecting
subtle
cues
indicati
v
e
of
rumors.
In
this
model,
the
tok
ens
and
their
POS
are
input
into
the
BER
T
model
to
obtain
rich,
conte
xtualized
embedding
v
ectors.
Simil
ar
to
the
BER
T+BiLSTM
model,
these
embedding
v
ectors
are
then
processed
through
a
BiLSTM
netw
ork
to
capture
sequential
dependencies.
Int
J
Inf
&
Commun
T
echnol,
V
ol.
14,
No.
1,
April
2025:
276–286
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Inf
&
Commun
T
echnol
ISSN:
2252-8776
❒
281
Figure
2.
Model-1
with
GPT
with
additional
features
and
BiLSTM
Figure
3.
Model-2
and
model-3;
BER
T
and
GPT
interchangeably
used
to
mak
e
model-2
or
model-3
3.2.4.
Model-3
(POS,
GPT
,
AF
,
and
BiLSTM)
Lik
e
the
pre
vious
model,
the
model
uses
GPT
embeddings
instead
of
BER
T
(Figure
3)
along
with
the
other
t
w
o
features
(POS+GPT+AF+BiLSTM).
POS
tagging
boosts
rumor
detection
by
enhancing
syntactic,
conte
xtual,
and
semantic
understanding
and
enriching
features
for
deep
learning.
The
POS-tagged
tok
ens
are
processed
t
h
r
ou
gh
the
GPT
model
to
generate
embeddings.
The
embeddings
are
input
into
a
BiLSTM
netw
ork
to
capture
conte
xtual
dependencies.
3.2.5.
Detail
pr
ocedur
e
and
descriptions
As
described
in
the
pre
vious
sections
,
we
use
four
types
of
features
as
input
to
the
BiLSTM.
First,
the
tok
enized
and
cleaned
te
xt
is
processed
with
NL
TK’
s
pos
tag
function
to
obtain
POS
tags,
assigning
gram-
matical
cate
gories
lik
e
nouns,
v
erbs,
and
adjecti
v
es
to
each
w
ord.
Each
tok
enized
w
ord
is
labeled
with
a
POS
tag
using
NL
TK’
s
pos
tag
in
a
bid
to
e
xtract
the
POS
features.
After
that,
these
POS
tags
were
encoded
by
a
numerical
v
ector
using
one-hot
encoding.
When
included
with
models
such
as
BER
T
and
GPT
,
POS
tags
allo
w
for
science-based
denitions
of
rumors
that
can
capture
the
linguistic
patterns
of
the
phenomenon
more
accu-
rately
.
When
rumors
mak
e
statements
about
a
certain
topic,
the
y
tend
to
do
it
with
the
use
of
certain
adjecti
v
es
or
adv
erbs
that
inate
the
f
act
at
hand.
The
algorithms
can
learn
these
patterns
more
ef
ciently
and
enhance
the
accurac
y
of
rumor
detection
when
these
POS
tags
are
incorporated.
In
this
step,
independently
we
use
one
of
the
BER
T
or
GPT
tok
enizers
from
the
hugging
f
ace
transformers
[28]
library
to
tok
enize
each
te
xt.
Conte
xtual
embeddings
are
then
obtained
by
passing
the
tok
enized
te
xt
with
BER
T
or
GPT
.
The
additional
features
or
AF
are
also
oored
and
ceilinged
before
being
s
caled
using
the
‘Standard
Scaler’
in
‘scikit-learn’.
The
BER
T
or
GPT
embedding
v
ectors,
one-hot
encoded
POS
features,
and
scaled
additional
features
are
combined
into
a
single
feature
v
ector
for
each
te
xt
using
‘np.hstack’
specic
to
the
model’
s
requirements,
resulting
in
a
com-
prehensi
v
e
feature
v
ector
for
each
input
te
xt.
The
BiLSTM
[29]
model
tak
es
these
combined
feature
v
ectors
as
Ensemble
appr
oac
h
to
rumor
detection
with
BERT
,
GPT
,
and
POS
featur
es
(Bar
sha
P
attanaik)
Evaluation Warning : The document was created with Spire.PDF for Python.
282
❒
ISSN:
2252-8776
the
input,
processes
them,
and
feeds
them
into
the
fully
connected
layer
.
The
fully
connected
layer
,
follo
wed
by
the
softmax
layer
,
is
responsible
for
mapping
the
combined
BiLSTM
output
to
the
class
scores.
3.3.
Model-4:
pr
oposed
ensemble
method
Figure
4
depicts
our
proposed
ensemble
model
for
rumor
detection.
W
e
emplo
yed
a
hard
v
oting
mechanism
[20]
to
combine
the
predictions
from
the
three
models
discussed
in
the
pre
vious
subsection.
Each
model
independently
classies
a
message
as
a
rumor
or
non-rumor
,
and
the
nal
classication
is
determined
by
the
majority
v
ote
among
the
three
model
s.
This
approach
le
v
erages
the
strengths
and
compensates
for
the
weaknesses
of
indi
vidual
models,
aiming
to
enhance
o
v
erall
accurac
y
.
Algorithm
1
e
xplains
the
algorithm
of
our
ensemble
model
with
hard
v
oting.
Detail
procedure
of
each
model
is
described
in
section
3.2.5.
Through
the
utilization
of
these
three
models
in
an
ensemble,
we
w
ant
to
capitalize
on:
−
The
capability
of
GPT
to
ef
fecti
v
ely
capture
long-range
dependencies
and
conte
xtual
continuity
.
−
BER
T’
s
bidirectional
conte
xtual
comprehension
augmented
by
POS
tagging.
−
Additional
(supplementary)
features
(AF)
to
record
beha
vioral
indicators,
including
tweet
and
retweet
fre-
quencies.
−
BiLSTM’
s
sequential
modeling
f
acilitates
the
capturing
of
both
forw
ard
and
backw
ard
te
xt
dependencies.
Figure
4.
Proposed
ensemble
model
for
rumor
detection
Algorithm
1.
Ensemble
by
v
ote
Require:
T
▷
T
e
xt
Ensure:
P
f
inal
▷
Final
prediction
1:
procedure
E
N
S
E
M
B
L
E
(
T
)
2:
p
1
=
P
r
edict
model
−
1
(
T
)
3:
p
2
=
P
r
edict
model
−
2
(
T
)
4:
p
3
=
P
r
edict
model
−
3
(
T
)
5:
p
f
inal
=
mode
(
p
i
)
▷
i
=
1
..
3
6:
return
P
f
inal
7:
end
procedure
3.4.
Ev
aluation
metrics
In
our
studies,
we
emplo
yed
tw
o
critical
assessment
criteria
to
e
v
aluate
the
model’
s
performance:
−
Accurac
y:
the
ratio
of
accurately
predicted
labels
to
the
total
number
of
labels,
serving
as
a
comprehensi
v
e
indicator
of
the
model’
s
cate
gorization
ef
cac
y
.
−
F1-score
(weighted):
the
weighted
F1-score
considers
both
accurac
y
and
recall,
rendering
it
more
appropri-
ate
for
datasets
with
imbalanced
classes.
The
weighted
a
v
erage
F1-score
is
especially
ef
fecti
v
e
for
assessing
performance
across
all
classes
for
our
classications
challenge
Int
J
Inf
&
Commun
T
echnol,
V
ol.
14,
No.
1,
April
2025:
276–286
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Inf
&
Commun
T
echnol
ISSN:
2252-8776
❒
283
3.5.
Experimental
setup,
training
and
testing
F
or
our
e
xperiment,
we
di
vided
the
dataset
into
training
and
testing
sets
using
an
80-20
split,
meaning
80%
of
the
data
is
allocated
for
training
the
model.
At
t
h
e
same
time,
the
remaining
20%
is
reserv
ed
for
testing
its
performance.
The
model
w
as
trained
with
128
BiLSTM
layers
with
a
learning
rate
of
0.001.
W
e
used
‘
Adam’
optimizer
and
‘CrossEntrop
yLoss’
loss
function.
The
model
w
as
trained
for
300
epochs.
During
each
epoch,
the
loss,
training
accurac
y
,
and
F1-s
core
were
track
ed,
and
the
model
w
as
e
v
aluated
on
the
test
set
to
monitor
its
performance.
W
e
ha
v
e
used
a
standard
scaler
to
normalize
numerical
data,
referred
to
as
AF
.
W
e
measured
loss
v
alues,
accurac
y
,
and
F1-scores
for
e
v
ery
epoch
along
with
Plots
of
loss
v
alues,
testing,
training,
and
F1-scores.
4.
RESUL
TS
AND
AN
AL
YSIS
This
study
e
xplored
the
ef
cac
y
of
inte
grating
pre-trained
language
models
(BER
T
and
GPT)
with
POS
tagging
and
other
additional
features
in
identifying
rumors
on
social
media.
Pre
vious
studies
ha
v
e
e
x-
amined
indi
vidual
models
such
as
BER
T
for
te
xt
classication
problems.
Ho
we
v
er
,
the
y
ha
v
e
not
specically
considered
GPT
embedding
and
the
benets
of
incorporating
supplementary
features
(e.g.,
‘retweet
count’,
‘user
.follo
wers
count’,
‘user
.friends
count’,
and
‘f
a
v
orite
count’)
with
ensemble
learning
to
enhance
the
per
-
formance
of
the
rumor
detection
model.
In
this
section,
we
discuss
the
performance
of
the
dif
ferent
models
along
with
the
proposed
ensemble
model
(model-4)
in
terms
of
accurac
y
and
F1-score
on
PHEME
and
trans-
lated
W
eibo
datasets.
The
results
indicat
ed
that
the
ensemble
model
consistently
outperformed
the
indi
vidual
models
across
all
metrics.
The
hard
v
oting
mechanism
ef
fecti
v
ely
combined
the
strengths
of
the
dif
ferent
mod-
els,
leading
to
a
more
accurate
and
reliable
rumor
detection
system.
Ho
we
v
er
,
for
W
eibo,
it
is
not
properly
justied
to
compare
the
performance
with
other
systems,
as
we
ha
v
e
used
a
translated
dataset
in
English.
T
able
2
sho
ws
a
comparison
between
all
the
models
along
with
our
ensemble
model,
which
pro
v
es
that
the
ensemble
model
g
a
v
e
an
a
v
erage
accurac
y
of
97.6%
on
PHEME
datasets
and
98.4%
on
W
eiboE
dataset.
The
bold
v
alues
sho
w
the
best
performance
for
an
indi
vidual
model
on
these
dataset
s.
As
we
ha
v
e
seen,
among
the
three
models,
four
e
v
ents
of
PHEME
dat
asets
gi
v
e
the
best
results
for
the
POS+AF+GPT+BiLSTM
model
or
model
-3,
whereas
for
one
e
v
ent
(Fer
guson),
the
GPT+AF+BiLSTM
model
or
model-1
gi
v
es
the
best
results.
Model-3
gi
v
es
the
best
result
on
the
W
eiboE
dataset.
The
inte
gration
of
additional
characteristics
and
POS
tags
enhanced
generalization
in
the
task.
The
proposed
ensemble
model,
or
model-4,
combining
GPT+AF+BiLSTM
(model-1),
POS+BER
T+AF+BiLSTM
(model-2),
and
POS+GPT+AF+BiLSTM
(model-3)
e
xcels
in
rumor
de-
tection
by
le
v
eraging
semantic
and
syntactic
features,
achie
ving
superior
accurac
y
and
rob
ustness
compared
to
indi
vidual
models.
T
able
2.
Comparison
between
baseline
models
with
other
v
ariants.
O
T
:
Otta
w
ashooting,
GC:
Germanwings
crash,
FE:
Fer
guson,
CH:
Charlihebdo,
SY
:
Sydne
ysidge
Models
PHEME
W
eiboE
O
T
GC
FC
CH
SY
Baseline
Acc=84.8%
Acc=86.2%
Acc=86.2%
Acc=88.5%
Acc=85.7%
Acc=92.8%
F1=84.8%
F1=86.2%
F1=86.7%
F1=88.4%
F1=85.7%
F1=92.8%
Model-1
Acc=89.3%
Acc=85.1%
Acc=
86.9%
Acc=
92.5%
Acc=84.9%
Acc=93.9%
F1=89.3%
F1=85.1%
F1=
86.9%
F1=
92.4%
F1=84.9%
F1=93.8%
Model-2
Acc=83.7%
Acc=81.9%
Acc=82.9%
Acc=89.2%
Acc=84.9%
Acc=93.2%
F1=83.2%
F1=81.9%
F1=83.3%
F1=89.2%
F1=84.9%
F1=93.2%
Model-3
Acc=
89.3%
Acc=
87.2%
Acc=86.4%
Ac
c=
92.6%
Acc=
86.2%
Acc=
94.3%
F1=
89.3%
F1=
87.2%
F1=86.6%
F1=
92.4%
F1=
86.1%
F1=93.9%
Model-4
(proposed)
Acc=
97.5%
Acc=
97.8%
Acc=
97.3%
Acc=
98.4%
Acc=
97.1%
Acc=
98.4%
F1=
97.4%
F1=
97.7%
F1=
97.2%
F1=
98.3%
F1=
97.1%
F1=
98.3%
T
ables
3
and
4
sho
ws
the
comparison
with
the
similar
models
in
terms
of
accurac
y
and
F1-score
on
PHEME
and
W
eiboE
datasets.
Our
ensemble
model-4
outperformed
similar
systems
by
a
great
percentage.
Inte
grating
di
v
erse
neural
architectures
mitig
ates
indi
vidual
weaknesses
and
enhances
generalization
across
v
aried
datasets.
Ho
we
v
er
,
this
approach
entail
s
signicant
computational
o
v
erhead
and
comple
xity
,
raising
challenges
in
real-time
applications
and
model
maintenance.
Ov
er
-tting
risks
and
dependenc
y
on
high-quality
training
data
are
notable
concerns,
along
with
dif
culties
in
interpretability
and
deb
ugging.
Despite
these
chal-
Ensemble
appr
oac
h
to
rumor
detection
with
BERT
,
GPT
,
and
POS
featur
es
(Bar
sha
P
attanaik)
Evaluation Warning : The document was created with Spire.PDF for Python.
284
❒
ISSN:
2252-8776
lenges,
the
model’
s
high
performance
on
benchmarks
highlights
its
potential,
necessitating
further
optimization
and
v
alidation
for
practical
deplo
yment.
Our
ndings
corroborate
prior
research
indicating
the
ef
cac
y
of
GPT
and
BER
T
models
in
processing
te
xtual
data;
ho
we
v
er
,
our
ensemble
model
re
v
ealed
that
inte
grating
additional
metadata
and
applying
hard
v
oting
can
signicantly
impro
v
e
classication
performance.
T
able
3.
Comparison
between
dif
ferent
models
on
PHEME
datasets
Model
Accurac
y
F1-score
gD
AR
T
[30]
94.8%
89.7%
RDLNP
[31]
88.6%
88.6%
CNN-IG-A
CO
NB
[32]
73.28%
73.2%
BiLSTM-CNN
[33]
86.1%
86.1%
BER
T+BiLSTM
[25],
[26]
86.2%
86.1%
Model-4
(proposed)
97.6%
97.5%
T
able
4.
Comparison
between
dif
ferent
models
on
W
eibo
datasets
Model
Accurac
y
F1-score
V
AE-GCN
[34]
94.1%
94.0%
PostCom2DR
[35]
95.0%
95.0%
Bi-GCN
[36]
96.0%
96.0%
DDGCN
[37]
94.8%
95.2%
Model-4
(proposed)
98.4%
98.3%
Note:
W
e
ha
v
e
used
W
eiboE
(translated
in
English)
In
this
w
ork,
we
assessed
the
ef
cac
y
of
three
distinct
models
model-1,
model-2,
and
model-3
each
emplo
ying
v
aried
feature
sets
and
topologies.
Model-1
amalg
amated
GPT
embeddings,
AF
,
and
BiLSTM;
model-2
emplo
yed
POS
tagging,
BER
T
embeddings,
AF
,
and
BiLSTM;
model-3
incorporated
POS
tagging,
GPT
embeddings,
AF
,
and
BiLSTM.
Their
performance
uctuated,
with
accurac
y
between
84.8%
and
92.6%
and
F1-scores
from
84.8%
to
92.4%
for
PHEME
dataset
of
dif
ferent
datasets
and
accurac
y
from
to
92.8%
to
94.3%
with
F1-score
from
92.8%
to
93.9%
for
W
eibo
dataset.
The
proposed
ensemble
model,
which
consoli-
dates
predictions
from
these
models
through
hard
v
oting,
attained
an
o
v
erall
accurac
y
of
97.6%
and
an
F1-score
of
97.5%
for
the
PHEME
dataset
and
an
accurac
y
of
98.4%
and
an
F1-score
of
98.3%
for
the
W
eiboE
dataset,
illustrating
substantial
enhancement
by
utilizing
the
strengths
of
each
m
od
e
l
and
impro
ving
o
v
erall
classi-
cation
rob
ustness.
The
proposed
ensemble
model
inte
grating
GPT
,
BER
T
,
POS
tagging,
and
supplementary
characteristics
e
xhibited
enhanced
performance
compared
to
indi
vidual
models.
The
results
strongly
indicate
that
inte
grating
di
v
erse
informat
ion
types
enhances
rumor
identication,
with
potential
applications
in
real-time
monitoring
systems.
Although
the
ensemble
model
sho
wed
strong
performance,
this
w
ork
concentrated
mostly
on
a
dataset
of
tweets,
perhaps
constraining
the
applicability
of
the
ndings
to
other
types
of
te
xtual
data.
The
computational
cost
ass
ociated
with
training
e
xtensi
v
e
models
such
as
GPT
and
BER
T
may
pro
vide
a
constraint
for
real-time
applications.
5.
CONCLUSION
Our
w
ork
adv
ance
s
rumor
detection
research
by
inte
grating
POS
tagging,
additional
features,
and
BER
T
or
GPT
-based
embeddings
with
BiLSTM
netw
orks
using
the
standard
PHEME
and
W
iebo
datasets.
W
e
de
v
eloped
three
predicti
v
e
models
by
combining
these
features
and
ultimately
proposed
an
ensemble
method.
This
approach
aims
to
le
v
erage
the
strengths
of
indi
vidual
models
into
an
ensemble,
resulting
in
a
rob
ust
and
accurate
rumor
detection
system.
Our
method
addresses
the
limitations
of
traditional
techniques
and
standalone
deep
learning
models,
of
fering
a
comprehensi
v
e
solution.
Through
v
arious
e
xperimental
studies,
indeed,
it
is
clear
that
our
ensemble
method
ranks
better
than
other
methods
in
terms
of
the
accurac
y
of
rumor
detection.
Through
the
use
of
multiple
models
that
encode
dif
ferent
aspects
of
the
language
and
conte
xt,
the
possibility
of
certain
methods’
deciencies
reecting
on
the
nal
output
is
signicantly
reduced.
More
w
ork
is
planned
to
be
done
in
the
future
including
the
e
xamination
of
other
types
of
ensemble
algorithms,
e.g.
soft
v
oting
or
stacking,
as
well
as
including
ne
w
features
such
as
temporal
data
or
metadata
about
the
users
in
order
to
impro
v
e
detection
rates.
Int
J
Inf
&
Commun
T
echnol,
V
ol.
14,
No.
1,
April
2025:
276–286
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Inf
&
Commun
T
echnol
ISSN:
2252-8776
❒
285
REFERENCES
[1]
A.
Zubiag
a,
M.
Liakata,
R.
Procter
,
G.
W
.
S.
Hoi,
and
P
.
T
olmie,
“
Analysing
ho
w
people
orient
to
and
spread
rumours
in
social
media
by
looking
at
con
v
ersational
threads,
”
PLoS
ONE
,
v
ol.
11,
no.
3,
p.
e0150989,
No
v
.
2016,
doi:
10.1371/journal.pone.0150989.
[2]
J.
Ma
et
al.
,
“Detecting
rumors
from
microblogs
with
recurrent
neural
netw
orks,
”
in
IJCAI
International
J
oint
Confer
ence
on
Articial
Intellig
ence
,
2016,
pp.
3818–3824.
[3]
M.
Celliers
and
M.
Hattingh,
“
A
systematic
re
vie
w
on
f
ak
e
ne
ws
themes
reported
in
literature,
”
in
Responsible
Design,
Implemen-
tation
and
Use
of
Information
and
Communication
T
ec
hnolo
gy:
19th
IFIP
WG
6.11
Confer
ence
on
e-Business,
e-Services,
and
e-Society
,
I3E
2020
,
2020,
v
ol.
12067
LNCS,
pp.
223–234,
doi:
10.1007/978-3-030-45002-1
19.
[4]
X.
Zhou
and
R.
Zaf
arani,
“
A
surv
e
y
of
f
ak
e
ne
ws:
fundamental
theories,
detection
methods,
and
opportunities,
”
A
CM
Computing
Surve
ys
,
v
ol.
53,
no.
5,
pp.
1–40,
Sep.
2021,
doi:
10.1145/3395046.
[5]
D.
de
Beer
and
M.
Matthee,
“
Approaches
to
identify
f
ak
e
ne
ws:
a
systematic
literature
re
vie
w
,
”
Inte
gr
ated
science
in
digital
a
g
e
2020
,
v
ol.
136,
pp.
13–22,
2021,
doi:
10.1007/978-3-030-49264-9
2.
[6]
A.
Zubiag
a,
A.
Ak
er
,
K.
Bontche
v
a,
M.
Liakata,
and
R.
Procter
,
“Detection
and
resolution
of
rumours
in
social
media,
”
A
CM
Computing
Surve
ys
,
v
ol.
51,
no.
2,
pp.
1–36,
Mar
.
2018,
doi:
10.1145/3161603.
[7]
M.
Al-Sarem,
W
.
Boulila,
M.
Al-Harby
,
J.
Qadir
,
and
A.
Alsaeedi,
“Deep
learning-based
rumor
detection
on
microblogging
platforms:
a
systematic
re
vie
w
,
”
IEEE
Access
,
v
ol.
7,
pp.
152788–152812,
2019,
doi:
10.1109/A
CCESS.2019.2947855.
[8]
A.
Bondielli
and
F
.
Marcelloni,
“
A
surv
e
y
on
f
ak
e
ne
ws
and
rumour
detection
techniques,
”
Information
Sciences
,
v
ol.
497,
pp.
38–55,
Sep.
2019,
doi:
10.1016/j.ins.2019.05.035.
[9]
D.
V
arshne
y
and
D.
K.
V
ishw
akarma,
“
A
re
vie
w
on
rumour
prediction
and
v
eracity
assess
ment
in
online
social
netw
ork,
”
Expert
Systems
with
Applications
,
v
ol.
168,
p.
114208,
Apr
.
2021,
doi:
10.1016/j.esw
a.2020.114208.
[10]
L.
T
an,
G.
W
ang,
F
.
Jia,
and
X.
Lian,
“Research
status
of
deep
learning
methods
for
rumor
detection,
”
Multimedia
T
ools
and
Applications
,
v
ol.
82,
no.
2,
pp.
2941–2982,
Jan.
2023,
doi:
10.1007/s11042-022-12800-8.
[11]
B.
P
attanaik,
S.
Mandal,
and
R.
M.
T
ripath
y
,
“
A
surv
e
y
on
rumor
detection
and
pre
v
ention
in
social
media
using
deep
learning,
”
Knowledg
e
and
Information
Systems
,
v
ol.
65,
no.
10,
pp.
3839–3880,
Oct.
2023,
doi:
10.1007/s10115-023-01902-w
.
[12]
C.
Castillo,
M.
Mendoza,
and
B.
Poblete,
“Information
credi
bility
on
twitter
,
”
in
Pr
oceedings
of
the
20th
international
confer
ence
on
W
orld
W
ide
W
eb
,
Mar
.
2011,
pp.
675–684,
doi:
10.1145/1963405.1963500.
[13]
S.
Hochreiter
and
J.
Schmidhuber
,
“Long
short-term
memory
,
”
Neur
al
Computation
,
v
ol.
9,
no.
8,
pp.
1735–1780,
No
v
.
1997,
doi:
10.1162/neco.1997.9.8.1735.
[14]
N.
Ruchansk
y
,
S.
Seo,
and
Y
.
Liu,
“CSI:
a
h
ybrid
deep
model
for
f
ak
e
ne
ws,
”
in
P
r
oceedings
of
the
2017
A
CM
on
Confer
ence
on
Information
and
Knowledg
e
Mana
g
ement
,
2017,
pp.
797–806.
[15]
Y
.
LeCun,
P
.
Haf
fner
,
L.
Bottou,
and
Y
.
Bengio,
“Object
recognition
with
gradient-based
learning,
”
in
Shape
,
Contour
and
Gr
ouping
in
Computer
V
ision
,
Springer
,
1999.
[16]
J.
De
vlin,
M.-W
.
Chang,
K.
Lee,
K.
T
.
Google,
and
A.
I.
Language,
“BER
T
:
pre-training
of
deep
bidirectional
transformers
for
language
understanding,
”
arXiv
pr
eprint
arXiv:1810.04805
,
2018.
[17]
R.
Anggrainingsih,
G.
M.
Hassan,
and
A.
Datta,
“BER
T
based
classication
system
for
detecting
rumours
on
T
witter
,
”
arXiv
pr
eprint
arXiv:2109.02975
,
2021,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/2109.02975.
[18]
A.
Radford,
J.
W
u,
R.
Child,
D.
Luan,
D.
Amodei,
and
I.
Sutsk
e
v
er
,
“Language
models
are
unsupervised
multitask
l
earners,
”
OpenAI
Blo
g
,
v
ol.
1,
no.
8,
p.
9,
2018.
[19]
Q.
Liu,
X.
T
ao,
J.
W
u,
S.
W
u,
and
L.
W
ang,
“Can
lar
ge
language
model
s
detect
rumors
on
social
media?,
”
arXiv
pr
eprint
arXiv:2402.03916
,
2024,
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/2402.03916.
[20]
A.
Chakraborty
,
S.
Joardar
,
and
A.
A.
Sekh,
“Ensemble
classier
for
Hindi
hostile
content
detection,
”
A
CM
T
r
ansactions
on
Asian
and
Low-Resour
ce
Langua
g
e
Information
Pr
ocessing
,
v
ol.
23,
no.
1,
pp.
1–17,
2024,
doi:
10.1145/3591353.
[21]
C.
M.
M.
K
otteti,
X.
Dong,
and
L.
Qian,
“Ensemble
deep
learning
on
time-series
representation
of
tweets
for
rumor
detection
in
social
media,
”
Applied
Sciences
(Switzerland)
,
v
ol.
10,
no.
21,
pp.
1–21,
2020,
doi:
10.3390/app10217541.
[22]
L.
Y
uan,
J.
W
ang,
and
X.
Zhang,
“YNU-HPCC
at
SemEv
al-2020
T
ask
8:
using
a
parallel-channel
model
for
memotion
analysis,
”
in
14th
Inter
national
W
orkshops
on
Semantic
Evaluation,
SemEval
2020
-
co-located
28th
International
Confer
ence
on
Computational
Linguistics,
COLING
2020,
Pr
oceedings
,
2020,
pp.
916–921,
doi:
10.18653/v1/2020.seme
v
al-1.116.
[23]
K.
Cho
et
al.
,
“Learning
phrase
representations
using
RNN
encoder
-decoder
for
statistical
machine
translation,
”
in
EMNLP
2014
-
2014
Confer
ence
on
Empirical
Methods
in
Natur
al
Langua
g
e
Pr
ocessing
,
Pr
oceedings
of
the
Confer
ence
,
2014,
pp.
1724–1734,
doi:
10.3115/v1/d14-1179.
[24]
K.
Nith
ya,
M.
Krishnamoorthi,
S.
V
.
Easw
aramoorth
y
,
C.
R.
Dhi
vyaa,
S.
Y
oo,
and
J.
Cho,
“Hybrid
approach
of
deep
feature
e
xtraction
using
BER
T–
OPCNN
&
FIA
C
with
customized
Bi-LSTM
for
rumor
te
xt
classication,
”
Ale
xandria
Engineering
J
ournal
,
v
ol.
90,
pp.
65–75,
2024,
doi:
10.1016/j.aej.2024.01.056.
[25]
R
.
C
ai
et
al.
,
“Sentiment
anal
ysis
about
in
v
estors
and
consumers
in
ener
gy
mark
et
based
on
BER
T
-BILSTM,
”
IEEE
Access
,
v
ol.
8,
pp.
171408–171415,
2020,
doi:
10.1109/A
CCESS.2020.3024750.
[26]
Z.
Zhu
and
L.
W
ang,
“BER
T
-BiLSTM
model
for
entity
recognition
in
clinical
te
xt,
”
Pr
oceedings
of
the
Iberian
Langua
g
es
Evalua-
tion
F
orum
(IberLEF
2022)
co-located
with
the
Confer
ence
of
the
Spanish
Society
for
Natur
al
Langua
g
e
Pr
ocessing
(SEPLN
2022)
,
v
ol.
3202,
2022.
[27]
B
.
P
a
ttanaik,
“W
eiboE,
”
GitHub
,
2024.
https://github
.com/barshapattanaik/W
eiboE
(accessed
Jun.
27,
2024).
[28]
“G
pt-2
documentation,
”
Hug
ging
F
ace
.
https://huggingf
ace.co/docs/transformers/main/en/model
doc/gpt2
(accessed
No
v
.
29,
2024).
[29]
M.
M.
Rahman,
Y
.
W
atanobe,
and
K.
Nakamura,
“
A
bidirectional
LSTM
language
model
for
code
e
v
aluation
and
repair
,
”
Symmetry
,
v
ol.
13,
no.
2,
p.
247,
2021,
doi:
10.3390/sym13020247.
[30]
S.
Ro
y
,
M.
Bhanu,
S.
Sax
ena,
S.
Dandapat,
and
J.
Chandra,
“gD
AR
T
:
impro
ving
rumor
v
erication
in
social
media
with
discrete
attention
representations,
”
Information
Pr
ocessing
&
Mana
g
ement
,
v
ol.
59,
no.
3,
p.
102927,
May
2022,
doi:
10.1016/j.ipm.2022.102927.
[31]
A.
Lao,
C.
Shi,
and
Y
.
Y
ang,
“
Rumor
detection
with
eld
of
linear
and
non-linear
propag
ation,
”
in
Pr
oceedings
of
the
W
eb
Confer
ence
2021
,
Apr
.
2021,
pp.
3178–3187,
doi:
10.1145/3442381.3450016.
Ensemble
appr
oac
h
to
rumor
detection
with
BERT
,
GPT
,
and
POS
featur
es
(Bar
sha
P
attanaik)
Evaluation Warning : The document was created with Spire.PDF for Python.