IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
14,
No.
6,
December
2025,
pp.
4814
∼
4827
ISSN:
2252-8938,
DOI:
10.11591/ijai.v14.i6.pp4814-4827
❒
4814
Machine
and
deep
lear
ning
classiers
f
or
binary
and
multi-class
netw
ork
intrusion
detection
systems
Ahmad
Aloqaily
1
,
Emad
Eddien
Abdallah
1
,
Esraa
Ab
u
Elsoud
2
,
Y
azan
Hamdan
3
,
Khaled
J
allad
3
1
Department
of
Information
T
echnology
,
F
aculty
of
Prince
Al-Hussein
Bin
Abdullah
II
for
Information
T
echnology
,
The
Hashemite
Uni
v
ersity
,
Zarqa,
Jordan
2
Department
of
Cybersecurity
and
Cloud
Computing,
F
aculty
of
Information
T
echnology
,
Applied
Science
Pri
v
ate
Uni
v
ersity
,
Amman,
Jordan
3
Department
of
Computer
Information
Systems,
F
aculty
of
Prince
Al-Hussein
Bin
Abdullah
II
for
Information
T
echnology
,
The
Hashemite
Uni
v
ersity
,
Zarqa,
Jordan
Article
Inf
o
Article
history:
Recei
v
ed
Sep
23,
2024
Re
vised
Jun
24,
2025
Accepted
Oct
18,
2025
K
eyw
ords:
Cyber
attacks
Cyber
security
Deep
learning
Intrusion
detection
Machine
learning
ABSTRA
CT
The
rapid
proliferation
of
the
internet
and
adv
anceme
nts
in
communication
technologies
ha
v
e
signicantly
impro
v
ed
netw
orki
ng
and
increased
data
v
ol-
ume.
This
phenomenon
has
subsequently
caused
a
multitude
of
no
v
el
attacks,
thereby
presenting
signicant
challenges
for
netw
ork
security
in
the
intrusion
detection
system
(IDS).
Moreo
v
er
,
the
ongoing
threat
from
authorized
entities
who
try
to
carry
out
v
arious
types
of
attacks
on
the
netw
ork
is
a
concer
n
that
must
be
handled
seriously
.
IDS
are
used
to
pro
vide
netw
ork
a
v
ailabili
ty
,
con-
dentiality
,
and
inte
grity
by
emplo
ying
machine
learning
(ML)
and
deep
learn-
ing
(DL)
algorithms.
This
research
aimed
to
study
the
impacts
of
the
binary
and
multi-attack
instances
label
by
establishing
IDS
that
le
v
erages
h
ybrid
al-
gorithms,
including
articial
neural
netw
orks
(ANN),
random
forest
(R
F),
and
logistic
model
trees
(LMTs).
The
paper
addresses
challenges
such
as
data
pre-
processing,
feature
selection,
and
managing
imbalanced
datasets
by
applying
synthetic
minority
o
v
ersampling
technique
(SMO
TE)
and
Pearson’
s
correlation
methodologies.
The
IDS
w
as
tested
using
netw
ork
security
laboratory
kno
wl-
edge
disco
v
ery
datasets
(NSL-KDD)
and
catalonia
independence
corpus
intru-
sion
detecti
on
system
(CIC-IDS-2017)
datasets,
achie
ving
an
a
v
erage
F1-score
of
96%
for
binary
classication
on
NSL-KDD
and
85%
for
binary
classication
on
CIC-IDS-2017,
while
for
multi-classication,
the
proposed
model
achie
v
ed
an
a
v
erage
F1-score
of
82%
and
96%
for
NSL-KDD
and
CIC-IDS-2017
succes-
si
v
ely
.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Ahmad
Aloqaily
Department
of
Information
T
echnology
F
aculty
of
Prince
Al-Hussein
Bin
Abdullah
II
for
Information
T
echnology
,
The
Hashemite
Uni
v
ersity
P
.O.
Box
330127,
Zarqa
13133,
Jordan
Email:
aloqaily@hu.edu.jo
1.
INTR
ODUCTION
The
e
xponential
increase
in
internet
usage
in
daily
life
has
led
to
an
increase
in
c
yberattacks,
such
as
the
SolarW
inds
breach
in
2020
ha
v
e
highlighted
the
increasing
sophistication
of
netw
ork
intrusions.
Accord-
ing
to
the
Internet
Security
Threat
Report
(ISTR),
mal
w
are
is
found
in
one
of
e
v
ery
thirteen
W
eb
queries’.
A
c
yberattack
starts
with
tar
get
reconnaissance
and
ends
with
using
vulnerabilities
to
carry
out
a
harmful
oper
-
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4815
ation.
These
c
yberattacks
result
i
n
system
intrusions,
which
are
characterized
as
unauthorized
system
access
that
compromises
the
condentiality
,
inte
grity
,
and
a
v
ailability
(CIA)
of
security
measures
protecting
computer
or
netw
ork
resources.
In
recent
years,
we
ha
v
e
seen
the
emer
gence
of
numerous
ne
w
c
yberattacks,
including
cross-site
scripting,
brute
force,
botnets,
distrib
uted
denial
of
service,
and
others,
where
in
2023,
the
w
orldwide
number
of
mal
w
are
attacks
reached
6.06
billion,
an
increase
of
10%
compared
to
the
preceding
year
[1].
These
intrusions
raised
more
serious
than
e
v
er
concerns
re
g
arding
c
ybersecurity
[2].
Ho
we
v
er
,
securing
the
netw
orks
becomes
essential;
one
of
the
most
ef
fecti
v
e
w
ays
to
identify
these
threats
is
intrusion
detection
system
(IDS),
which
depends
on
analyzing
and
monitoring
the
netw
ork
traf
c.
A
host
intrusion
detecti
on
system
(HIDS)
is
an
IDS
approach
that
uses
system
acti
vities
that
appear
through
a
v
ariety
of
log
les
created
on
the
local
host
com
p
ut
er
to
identify
possible
intrusions,
whereby
these
log
les
are
collected
through
local
sensors
[3].
On
the
other
hand,
a
netw
ork
intrusion
detection
system
(NIDS)
analyzes
the
contents
of
pack
ets
within
netw
ork
traf
c
streams,
whereas
HIDS
primely
emplo
ys
data
deri
v
ed
from
log
les,
system
logs,
sensor
logs,
le
system
data,
disk
resource
allocation,
and
other
rele
v
ant
information
from
each
system.
Man
y
or
g
anizations
use
a
h
ybrid
approach
that
combines
both
NIDS
and
HIDS
techniques
[4].
The
emplo
yment
of
stateful
protocol
analysis,
anomaly
detection,
and
signature
detection
techniques
are
used
for
analyzing
netw
ork
traf
c
o
ws.
Signature
detection
depends
on
human
in
v
olv
ement
to
refresh
the
signature
databas
e
continuously
and
uses
pre-established
signatures
and
ltration
algorithms
to
identify
attacks.
This
methodology
w
orks
well
for
identifying
kno
wn
threats,
b
ut
it
is
completely
inef
fecti
v
e
ag
ainst
unkno
wn
attacks.
Ho
we
v
er
,
anomaly
det
ection
often
leads
to
a
signicantly
higher
percentage
of
f
alse
positi
v
es.
Most
or
g
anizations
choose
to
apply
h
ybrid
approaches
to
get
a
more
ef
fecti
v
e
detection
model
[5].
Depending
on
the
standard
frame
w
ork
of
communication
TCP/IP
model,
analysis
protocols
on
the
netw
ork,
application,
and
transport
layers
are
the
most
po
werful
techniques
to
detect
an
y
potential
threats
[6].
Machine
learning
(ML)
methods
ha
v
e
sho
wn
e
xcellence
in
achie
ving
high
detection
accurac
y
.
Al-
though
there
are
some
limitations,
such
as
handling
ra
w
,
unlabeled,
high
dimensional
data
and
manual
feature
e
xtraction,
these
limitations
af
fect
the
accurac
y
of
IDS
[7],
to
address
these
dra
wbacks,
deep
learning
(DL)
emer
ged.
This
research
aims
to
enhance
security
through
IDSs
by
applying
both
ML
and
DL
algorithms
to
netw
ork
security
laboratory
kno
wledge
disco
v
ery
datasets
(NSL-KDD)
and
catalonia
independence
cor
-
pus
intrusion
detection
system
(CIC-IDS-2017)
datasets
to
impro
v
e
o
v
erall
system
archite
cture
and
detection
performance.
These
datasets
pro
vide
a
foundation
for
benign
and
attack
netw
ork
traf
c,
although
the
y
ha
v
e
shortcomings
such
as
labeling
issues,
duplicate
o
ws,
and
insuf
cient
attack
v
ariation.
The
proposed
model
in
this
research
seeks
to
address
these
limitations
and
de
v
elop
a
more
resilient
IDS,
by
applying
a
comprehensi
v
e
e
xperiment
including
tw
o
phases,
study
the
ef
fects
of
binary
class
and
multi-class
in
the
performance
of
the
IDS.
Where
se
v
eral
researchers
highlight
this
issue
due
to
its
importance
in
the
performance
of
the
IDS
[8],
[9].
Additionally
,
we
identify
the
major
g
ap
in
the
literature
re
g
arding
the
inte
gration
of
ML
and
DL
approaches
in
the
conte
xt
of
HIDS
and
NIDS.
While
pre
vious
studies
ha
v
e
e
xamined
the
ef
fecti
v
eness
of
dif
ferent
detection
strate
gies,
the
y
often
do
not
e
xplicitly
i
n
v
est
ig
ate
ho
w
detection
accurac
y
could
be
impro
v
ed
by
combining
ML
and
DL
to
w
ork
together
on
dif
fere
n
t
attack
v
ectors.
This
is
especially
true
when
it
comes
to
reducing
high
f
alse
positi
v
e
rates
and
the
challenges
that
come
with
data
labeling.
The
remainder
of
this
paper
is
or
g
anized
as
follo
ws:
section
2
re
vie
ws
recent
studies
on
IDS.
Section
3
outlines
the
research
methodology
.
Section
4
presents
the
results
and
discus
sion.
Lastly
,
section
5
contains
the
conclusion.
2.
LITERA
TURE
REVIEW
Man
y
researchers
studied
IDS
by
proposing
dif
ferent
approaches
where
the
IDS
could
be
dif
ferent
in
technology
used,
the
dataset,
feature
selection
techniques,
and
man
y
more
criteria
that
af
fect
the
performance
of
the
proposed
model.
In
this
section,
we
will
illustrate
these
dif
ferences
by
mentioning
some
of
these
studies.
The
poor
performance
of
con
v
entional
intrusion
detection
techniques
prompted
the
research
in
[10]
to
suggest
a
neural
netw
ork
methodology
.
A
multi-layer
con
v
olutional
neural
netw
ork
(CNN)
is
used
for
feature
e
xtraction
and
selection.
T
o
cate
gorize
the
netw
ork
attacks,
a
soft-max
classier
is
used.
T
o
do
additional
analysis,
a
multi-layer
deep
neural
netw
ork
(DNN)
is
utilized
for
netw
ork
intrusions.
T
w
o
commonly
utilized
benchmark
intrusion
detection
datasets,
NSL-KDD
and
KDDCUP’99,
ha
v
e
been
used
in
the
research
in
v
estig
ations.
F
our
performance
metrics—accurac
y
,
recall,
F1-score,
and
precision—are
used
to
e
v
aluate
the
suggested
model’
s
performance.
Comparing
the
sugges
ted
method
to
other
IDSs,
the
testing
ndings
demonstrate
that
it
attained
Mac
hine
and
deep
learning
classier
s
for
binary
and
...
(Ahmad
Aloqaily)
Evaluation Warning : The document was created with Spire.PDF for Python.
4816
❒
ISSN:
2252-8938
an
accurac
y
of
99%.
The
research
in
[11]
e
xamined
the
applicability
of
DL
to
internet
of
things
(IoT)
data
security
and
conducted
a
comparison
analysis
using
three
DL
models,
including
CNN,
long
short-term
memory
(LSTM),
and
DNN.
Based
on
the
results,
DNN
achie
v
es
94.61%
accurac
y
,
while
CNN
and
LSTM
achie
v
e
98.61%
and
97.67%,
respecti
v
ely
.
It
has
been
established
through
this
comparati
v
e
study
and
literature
re
vie
w
that
DL
models
perform
better
in
the
IoT
IDS
setting
than
other
approaches.
Although
the
DL
models
e
xhibit
better
accurac
y
,
their
future
w
ork
should
focus
on
creating
a
h
ybrid
DL
model
for
IoT
ID
that
can
anticipate
attacks
more
accurately
while
e
xperimenting
with
real-time
datasets.
The
h
ybrid
model
is
used
for
IoT
IDS
installation
strate
gy
and
detection
techniques.
P
atil
et
al
.
[12]
presented
an
IDS
model
that
enables
the
use
of
ML
algorithms
lik
e
support
v
ector
machine
(SVM),
random
forests
(RF),
and
decision
trees.
F
ollo
wing
the
model’
s
training,
an
ensemble
method
kno
wn
as
a
v
oting
classier
w
as
included,
and
it
w
as
able
to
attain
96.25%
accurac
y
.
The
study
suggests
that
trust
is
necessary
for
human-machine
interactions
to
be
producti
v
e.
Local
interpretable
model-agnostic
e
xplanation
(LIME)
is
an
e
xtendable,
modular
technique
that
pro
vides
concise,
comprehensible
descriptions
of
predictions.
An
e
xplanation
of
prediction
is
highly
useful
for
the
selection
of
representati
v
e
models.
It
is
emplo
yed
in
model
select
ion,
trust
e
v
aluation,
model
impro
v
ement
for
unreliable
models,
and
prediction
analysis
for
both
system
e
xperts
and
non-e
xperts.
T
o
comprehend
the
model’
s
prediction,
the
paper
suggests
deplo
ying
a
LIME
e
xplainable
frame
w
ork
after
emplo
ying
an
ense
mble
of
ML
models.
The
ensemble
of
ML
models
sho
wed
an
impro
v
ed
accurac
y
of
96.25%.
Meng
[13]
e
xamines
the
use
of
supervised
and
unsupervised
learning
methods
to
impro
v
e
c
yberse-
curity
threat
detection
accurac
y
in
his
research.
Additionally
,
the
study
emphasizes
the
use
of
reinforcement
learning
in
adapti
v
e
threat
modeling.
The
approach
helps
systems
disco
v
er
the
best
methods
to
respond
to
threats,
making
them
more
adapti
v
e
to
changing
c
yber
threats.
The
article
also
addresses
real-time
threat
iden-
tication
using
neural
netw
orks
and
DL
algorithms.
Hnamte
and
Hussa
in
[14]
describe
an
adv
anced
and
ef
cient
netw
ork-based
NIDS
that
uses
DL
tech-
niques
to
detect
attacks.
CIC-IDS-2018
and
Edge
IIoT
are
tw
o
real-time
datasets
on
which
the
model
has
been
painstakingly
trained.
Multiclass
clas
sication
is
used
to
e
xamine
the
model’
s
performance,
and
the
re-
sults
sho
w
remarkable
accuracies
of
100%
and
99.64%.
In
contrast
,
Qazi
et
al
.
[15]
implemented
a
h
ybrid
DL-based
NIDS,
wh
i
ch
le
v
erages
neural
netw
ork
architectures,
applying
it
to
the
CIC-IDS-2018
dataset,
and
attained
an
accurac
y
of
98.9%.
Musleh
et
al
.
[16]
seek
to
present
a
comprehensi
v
e
study
on
ML-based
IDS
within
the
IoT
conte
xt,
emplo
ying
v
arious
feature
e
xtraction
techniques
and
ML
algorithms
to
enhance
their
proposed
model.
The
in
v
estig
ation
e
v
aluates
an
array
of
feature
e
xtractors,
including
image
ltering
techniques
and
transfer
learning
frame
w
orks.
The
study
culminates
in
an
assessment
utilizing
the
IEEE
Dataport
dataset,
achie
ving
an
accurac
y
rate
of
98.3%.
Mo
ving
to
research
that
focuses
on
the
ef
fecti
v
eness
of
binary
and
multi-class
in
IDS,
Acharya
et
al
.
[8]
create
a
unique
and
reliable
heterogeneous
ensemble
ML
model,
to
identify
abnormal-
ities
in
NIDS.
T
o
address
the
class-imbalance
issue
with
NIDS
datasets,
the
suggested
model
initially
uses
subsampling.
Then,
applying
the
Min-Max
technique
for
normalization
translated
the
input
data
into
the
0–1
range,
reducing
o
v
ertting
and
promoting
con
v
er
gence.
Often
emplo
yed
in
meta-heuristic-based
techniques,
feature
reduction
is
utilized
to
decrease
the
features
while
retaining
the
most
appropriate
features
and
a
v
oiding
computational
o
v
erheads.
T
o
accomplish
both
tw
o-class
and
multi-class
classication
across
feature-selected
NSL-KDD,
KDD99,
and
UNSW
-NB-15
datasets,
the
suggested
NIDS
approach
ultimately
created
a
hetero-
geneous
ensemble
learning
model
using
J48,
k-nearest
neighbors
(k-NN),
SVM,
Bagging,
AdaBoost,
and
RF
algorithms
as
base-classiers.
Bace
vicius
and
T
arase
viciene
[17]
aims
to
address
the
dif
culties
that
arise
when
testing
multi-class
classication
performance
for
netw
ork
intrusions
using
highly
imbalanced
ra
w
data,
such
as
the
CIC-IDS-2017
and
CSE-CIC-IDS-2018
datasets.
The
main
objecti
v
e
of
the
study
is
to
e
xamine
se
v
eral
ML
models,
such
as
CNNs,
articial
neural
netw
orks
(ANN),
RF
,
decision
trees,
and
logistic
re
gression.
It
also
uses
e
xplainable
articial
intelligence
(XAI)
tools
to
e
xamine
potential
interpretations
of
the
data.
W
ith
an
a
v
erage
macro
F1-score
of
0.96878,
the
results
sho
wed
that
decision
trees
using
the
classi
cation
and
re
gression
trees
(CAR
T)
strate
gy
performed
better
than
other
methods
on
the
28-class
classication
task.
Tseng
and
Chang
[18]
presented
an
ensemble
feature
selection
frame
w
ork
that
combines
three
fea-
ture
scoring
techniques—classication
and
re
gression
tree,
random
forest,
and
e
xtra
tree—with
tw
o
dif
ferent
feature
selection
methodologies
to
produce
six
distinct
feature
sets.
The
frame
w
ork
determines
the
best
fea-
ture
set
based
on
accurac
y
for
each
binary
model.
By
utilising
random
sampling
and
of
fering
a
customised
Int
J
Artif
Intell,
V
ol.
14,
No.
6,
December
2025:
4814–4827
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4817
sample
size
based
on
the
tar
get
class
dimensions
in
each
binary
model,
the
proposed
ensemble
data
balanc-
ing
technique
signicantly
enhances
con
v
entional
data
balancing
approaches.
Random
sampling,
the
synthetic
minority
o
v
ersampling
technique
(SMO
TE),
and
T
omek
Link
methods
are
all
included
in
this
frame
w
ork.
It
also
incorporates
four
encoder
modes
to
identify
the
best
feature
e
xtraction
conguration
for
each
binary
model.
Experimental
ndings
demonstrate
that
ensem
ble
binary
detection
models
achie
v
e
higher
accurac
y
in
identifying
three
types
of
wireless
attacks
in
the
Ae
gean
W
i-Fi
intrusion
dataset
(A
WID)
compared
to
similar
studies
using
traditional
multi-class
detection
frame
w
orks
[18].
In
addition,
a
data
resampling
method
based
on
the
adapti
v
e
synthetic
(AD
ASYN)
and
T
omek
l
inks
algorithms
is
presented
in
[2],
combined
with
se
v
eral
DL
models.
Using
the
benchmark
NSL-KDD
dataset,
the
proposed
model
is
e
v
aluated
through
accurac
y
,
precision,
recall,
and
F-score
metrics.
Experimental
results
indicate
that
the
approach
achie
v
es
99.8%
accurac
y
in
binary
classication,
outperforming
e
xisting
models.
Its
performance
in
multi-class
classication
also
impro
v
es,
surpassing
state-of-the-art
accurac
y
le
v
els
of
99.9%.
3.
RESEARCH
METHODOLOGY
Our
proposed
methodology
consists
of
v
e
phases
illustrated
in
Figure
1.
W
e
used
tw
o
datasets,
NSL-
KDD
[19]
and
CIC-IDS-2017
[20],
the
data
were
cleaned
by
remo
ving
the
noise
instances
and
an
y
duplicated
data.
The
third
phase
aimed
to
con
v
ert
features
into
numerical
data
using
an
ordinal
encoder
.
T
o
ensure
that
fea-
tures
are
treated
equally
during
the
training
phase,
MinMax
scaling
scales
data
in
the
standard
range
between
0
and
1.
Furthermore,
we
used
Pearson’
s
correlation
coef
cient
to
e
v
aluate
the
linear
relat
ionships
between
fea-
tures
in
both
the
NSL-KDD
and
CIC-IDS-2017
datasets.
A
correlation
coef
cient
threshold
of
0.8
(in
absolute
v
alue)
w
as
chosen
to
identify
highly
correlated
features.
Features
with
correlation
coef
cients
greater
than
this
threshold
were
considered
redundant
and
remo
v
ed,
as
the
y
did
not
pro
vide
additional
information
for
model
training.
This
threshold
w
as
selected
to
balance
between
reducing
dimensionality
and
retaining
informati
v
e
features.
T
o
ensure
that
features
were
treated
equally
during
model
trai
ning,
we
applied
MinMax
scaling
to
scale
all
features
to
the
range
[0,
1].
In
our
study
,
we
applied
SMO
TE
after
feature
selection
to
ensure
that
the
generated
synthetic
data
w
as
based
on
rele
v
ant
features.
The
technique
w
as
crucial
in
impro
ving
the
clas-
sier’
s
performance,
particularly
for
detecting
rare
attack
types
in
the
NSL-KDD
and
CIC-IDS-2017
datasets,
which
were
otherwise
underrepresented.
SMO
TE
is
a
po
werful
technique
used
to
address
class
imbalance
by
generating
synthetic
samples
for
the
underrepresented
class.
The
algorithm
w
orks
by
selecting
a
sample
from
the
minority
class,
nding
its
k-NN,
and
then
creating
synthetic
instances
by
interpolating
between
the
selected
sample
and
its
neighbors.
This
approach
helps
to
increase
the
decision
boundary
comple
xity
for
the
minority
class,
thus
impro
ving
the
classier’
s
ability
to
distinguish
between
the
classes.
Figure
1.
Proposed
methodology
According
to
T
able
1
and
Figure
2
Benign
traf
c
is
disproportionately
predominant
(454,495
in-
stances),
indicating
a
major
imbalance
in
the
distrib
ution.
There
ha
v
e
been
a
lot
of
DDoS
attacks
(25,545
instances),
DoS
Hulk
attacks
(45,887
instances),
and
PortScan
attacks
(31,702
instances),
b
ut
v
ery
fe
w
Heart-
Mac
hine
and
deep
learning
classier
s
for
binary
and
...
(Ahmad
Aloqaily)
Evaluation Warning : The document was created with Spire.PDF for Python.
4818
❒
ISSN:
2252-8938
bleed
attacks
(2
occurrences),
inltration
attacks
(9
instances),
and
SQL
injection
attacks
(5
instances).
Because
there
is
insuf
cient
data
to
train
algorithms,
this
mism
atch
mak
es
it
dif
cult
to
classify
attacks
accurately
,
es-
pecially
for
under
-represented
attack
types.
Ov
erall,
the
approach
e
xceeds
other
classiers
in
binary
and
multi-
class
cas
es,
especially
when
applied
to
handling
rare
attack
types,
making
it
the
most
ef
fecti
v
e
model
for
the
dataset.
T
able
1.
Number
of
instances
for
each
attacks
type
Standards
Number
of
Instances
BENIGN
454495
Bot
388
DDoS
25545
DoS
GoldenEye
2020
DoS
Hulk
45887
DoS
slo
whttptes
1140
DoS
slo
wloris
1180
FTP-P
atator
1620
Heartbleed
2
Inltration
9
PortScan
31702
SSH-P
atator
1164
Brute
force
281
Sql
injection
5
XSS
142
Figure
2.
Data
distrib
ution
for
BENIGN
and
attack
classes
Finally
,
we
applied
dif
ferent
cl
assication
algorithms
to
the
train
data
to
e
v
aluate
the
proposed
model
performance.
The
methodology
phases
can
be
outlined
in
these
v
e
steps:
i)
Data
e
xtraction:
the
e
xperiments
conducted
depend
on
tw
o
datasets,
NSL-KDD
and
CIC-IDS-2017.
T
able
2
summarizes
the
used
dataset.
F
or
model
training
and
e
v
aluation,
we
di
vided
the
datasets
into
training
and
test
sets.
In
the
c
ase
of
the
NSL-KDD
dataset,
we
used
80%
of
the
instances
for
training
(100,000
instances)
and
reserv
ed
the
remaining
20%
(25,000
instances)
for
testing.
Similarly
,
for
the
CIC-IDS-2017
dataset,
80%
of
the
instances
(2,264,594)
were
allocated
for
training,
and
the
remaining
20%
(566,149
instances)
were
used
for
testing.
T
able
2.
Summary
of
NSL-KDD
and
CIC-IDS-2017
datasets
Dataset
name
Number
of
instances
Number
of
features
Attack
NSL-KDD
125,000
41
DOS,
Probe,
R2L
and
U2R
CIC-IDS-2017
2,830,743
79
Brute
force
FTP
,
Brute
force
SSH,
DoS,
Heartbleed,
W
eb
attack,
inltration,
Botnet,
and
DDoS
ii)
Preprocessing:
the
initial
step
in
the
preparation
of
data
is
to
remo
v
e
constant
features
that
add
no
mean-
ingful
v
alue
to
the
dataset.
Subsequently
,
data
encoding
is
applied
to
con
v
ert
non-numeric
properties
into
Int
J
Artif
Intell,
V
ol.
14,
No.
6,
December
2025:
4814–4827
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4819
numeric
representations.
This
is
especially
useful
for
ordinal
data,
which
are
cate
gorical
data
with
a
par
-
ticular
hierarch
y
.
After
encoding,
the
data
is
normalized
using
the
MinMaxScaler
,
which
scales
features
to
a
predetermined
range
(usually
0-1)
while
preserving
the
structure
of
the
original
distrib
ution
[21].
By
ensuring
that
e
v
ery
v
ariable
contrib
utes
equally
to
the
model,
this
normal
ization
helps
to
pre
v
ent
bias
and
impro
v
es
the
stability
and
speed
of
DL
and
ML
algorithms
duri
ng
training.
The
MinMaxScaler
operates
by
applying
(1)
to
feature
v
alues
to
t
them
into
the
specied
range
[22].
X
scaled
=
X
−
min(
X
)
max(
X
)
−
min(
X
)
(1)
iii)
Feature
selection:
Pearson’
s
correlati
on
coef
cient
is
used
to
determine
the
correlations
between
the
v
ariables
in
the
datasets
to
select
features.
This
statistical
tool
produces
a
correlation
coef
cient
that
ranges
from
-1
to
+1
by
e
v
aluating
the
linear
relationship
between
tw
o
continuous
v
ariables
[23],
[24].
When
a
coef
cient
is
close
to
±1,
it
represents
a
strong
linear
link;
when
it
is
close
to
0,
it
denotes
no
linear
association.
The
methodology
assumes
that
the
v
ariables
in
v
olv
ed
ha
v
e
a
normal
distrib
ution,
are
independent,
and
are
linear
.
T
able
3
sho
ws
the
features
that
were
identied
based
on
the
chosen
algorithm.
T
able
3.
T
op
features
from
NSL-KDD
and
CIC-IDS-2017
Datasets
NSL-KDD
features
CIC-IDS-2017
features
duration
protocol
type
Flo
w
IA
T
Std
Max
P
ack
et
Length
service
ag
Init
W
in
bytes
forw
ard
act
data
pkt
fwd
src
bytes
dst
bytes
Subo
wFwdBytes
T
otalBackw
ardP
ack
ets
land
wrong
fragment
Flo
w
IA
T
Mean
A
CK
Flag
Count
ur
gent
hot
A
vg
Bwd
Se
gment
Size
URG
Flag
Count
num
f
ailed
logins
logged
in
Fwd
P
ack
et
Length
Max
ECE
Flag
Count
num
compromised
root
shell
P
ack
et
LengthStd
IdleMean
su
attempted
num
root
Init
W
in
bytes
backw
ard
P
ack
etLength
Mean
num
le
creations
num
shells
RST
Flag
Count
Fwd
Header
Length
num
access
les
is
host
login
Bwd
P
ack
et
Length
Max
min
se
g
size
forw
ard
is
guest
login
count
IdleMax
BwdP
ack
ets/s
srv
count
serror
rate
T
otalFwd
P
ack
ets
Fwd
P
ack
et
Length
Mean
srv
serror
rate
rerror
rate
Fwd
Header
Length.1
Fwd
P
ack
et
Length
Std
srv
rerror
rate
same
srv
rate
PSH
Flag
Count
Fwd
IA
T
Max
dif
f
srv
rate
srv
dif
f
host
rate
Acti
v
e
Mean
Idle
Min
dst
host
count
dst
host
srv
count
Bwd
P
ack
et
Length
Mean
A
v
erage
P
ack
et
Size
dst
host
same
srv
rate
dst
host
dif
f
srv
rate
Fwd
PSH
Flags
T
otal
Length
of
Fwd
P
ack
ets
dst
host
same
src
port
rate
dst
host
srv
dif
f
host
rate
Fwd
IA
T
Std
Flo
w
IA
T
Max
dst
host
serror
rate
dst
host
srv
serror
rate
Bwd
P
ack
et
Length
Std
A
vg
Fwd
Se
gment
Size
dst
host
rerror
rate
dst
host
srv
rerror
rate
Flo
w
P
ack
ets/s
Do
wn/Up
Ratio
Destination
Port
P
ack
et
Length
V
ariance
Subo
w
Fwd
P
ack
ets
SYN
Flag
Count
i
v)
Ov
er
-sampling:
the
datasets’
class
imbalance
w
as
solv
ed
using
the
SMO
TE.
Rather
than
just
cop
ying
samples
from
the
e
xisting
dataset,
this
technique
generates
ne
w
,
synthetic
samples.
W
e
specically
emplo
yed
SMO
TE
to
reduce
the
size
of
the
CIC-IDS-2017
dataset
to
24,607,475
instances
and
the
NSL-
KDD
dataset
to
308,830
instances.
v)
Classiers:
in
the
conte
xt
of
ML,
a
classier
is
an
algorithm
that
automatically
sorts
or
groups
data
into
one
or
more
”classes.
”
Data
is
cate
gorized
or
classied
according
to
specic
feat
ures
[25].
In
this
research,
we
ha
v
e
used
three
classiers:
multi-layer
perceptron
(MLP),
RF
,
and
logistic
model
trees
(LMTs).
–
MLP
is
a
type
of
ANN
that
consists
of
multiple
layers
of
interconnected
nodes,
called
neurons.
It
is
one
of
the
si
mplest
and
most
used
neural
netw
ork
architectures
[26].
F
or
binary
classication
tasks,
the
output
layer
of
the
MLP
typically
uses
the
sigmoid
acti
v
ation
function.
This
acti
v
ation
function
outputs
a
v
alue
between
0
and
1,
which
can
be
interpreted
as
the
probability
of
the
instance
belonging
to
one
of
the
classes.
A
threshold
of
0.5
is
commonly
used
to
assign
the
class
label:
v
alues
abo
v
e
0.5
are
classied
as
class
1,
and
v
alues
belo
w
0.5
as
class
0.
In
the
hidden
layers,
rectied
l
inear
unit
(ReLU)
is
often
emplo
yed
to
introduce
non-linearity
,
helping
the
model
to
learn
comple
x
patterns
in
the
data.
Mac
hine
and
deep
learning
classier
s
for
binary
and
...
(Ahmad
Aloqaily)
Evaluation Warning : The document was created with Spire.PDF for Python.
4820
❒
ISSN:
2252-8938
–
RF
is
one
of
the
popular
ML
algorithms
that
belong
to
the
ensemble
learning
cate
gory
.
It
is
used
for
both
classication
and
re
gression
tasks
and
is
based
on
the
concept
of
decision
tree
[27].
–
LMTs
combine
decision
tree
structures
with
logistic
re
gression
functions.
A
logistic
re
gression
model
is
stored
in
each
leaf
node
of
the
LMT
and
is
used
to
cate
gorize
occurrences
that
f
all
into
the
appropriate
re
gion.
LMTs
di
vide
the
instance
space
into
discrete
re
gions,
each
represented
by
a
leaf
node
with
a
logistic
re
gression
function
on
it
[28].
4.
RESUL
TS
AND
DISCUSSION
W
e
e
v
aluate
the
classication
models
using
binar
y
and
multi-class
labels
to
identify
the
most
ef
fecti
v
e
IDS
model.
In
the
binary
classication
setup,
all
attack
instances
are
labeled
as
1,
and
all
normal
instances
are
labeled
as
0.
On
the
other
hand,
for
multi-classication
the
tar
geted
attack
instances
are
labeled
as
1,
other
types
of
attack
instances
are
labeled
as
2,
and
all
normal
instances
are
labeled
as
0.
W
e
illustrate
the
ef
fect
of
the
label
o
n
accurac
y
by
performing
an
e
xtensi
v
e
performance
analysis
of
the
models
on
the
NSL-KDD
2017
and
CIC-IDS
2017
datasets.
The
results
of
applying
the
selected
classiers
ment
ioned
in
the
pre
vious
section
which
applied
to
the
NSL-KDD
dataset
are
presented
in
T
able
4.
T
able
4.
Performance
metrics
for
binary
and
multi-class
NSL-KDD
Classiers
Performance
metrics
Binary
NSL-KDD
Multi
class
NSL-KDD
U2R
Dos
R2L
propel
MLP
Precision
0.99
0.84
0.99
0.24
0.99
Recall
0.99
0.94
0.99
0.43
0.99
F1-score
0.99
0.89
0.99
0.31
0.99
RF
Precision
0.99
0.91
0.99
0.65
0.99
Recall
0.99
0.94
0.99
0.62
0.99
F1-score
0.99
0.93
0.99
0.63
0.99
LMT
Precision
0.88
0.34
0.98
0.02
0.76
Recall
0.94
0.88
0.97
0.71
0.94
F1-score
0.91
0.49
0.97
0.04
0.84
T
able
4
pro
vides
information
on
the
precision,
recall,
and
F1-score
performance
measures
for
three
dif
ferent
classiers:
MLP
,
RF
,
and
LMT
.
These
classiers
were
e
v
aluated
using
the
binary
and
multi-class
NSL-KDD
datasets.
When
it
comes
to
binary
class
ication,
both
RF
and
MLP
perform
almost
optimally
with
similar
metrics.
The
y
both
achie
v
e
an
F1-score,
precision,
and
recall
of
0.99,
which
indicates
an
e
xtraordinary
ability
to
identify
instances
with
minimal
errors.
On
the
other
hand,
LMT
performs
some
what
w
orse
than
the
other
classiers
with
a
precision
of
0.88,
a
recall
of
0.94,
and
an
F1-score
of
0.91.
This
suggests
that
although
it
can
detect
true
positi
v
es,
it
produces
more
f
alse
positi
v
es
than
the
other
classiers,
as
sho
wn
in
Figure
3.
Figure
3.
Binary
NSL-KDD
classication
W
ithin
multi-class
classication,
there
is
a
signicant
dif
ference
in
the
performance
among
the
four
dif
ferent
attack
cate
gories
(U2R,
DoS,
L2R,
and
Probe).
In
most
classes,
the
MLP
performs
ef
fecti
v
ely;
it
achie
v
es
high
metrics
for
DoS
and
Probe
(all
approximately
0.99);
ho
we
v
er
,
it
has
signicant
issues
with
L2R,
as
indicated
by
a
lo
w
F1-score
of
0.31,
which
is
due
to
poor
precision
(0.24)
and
recall
(0.43).
On
the
other
Int
J
Artif
Intell,
V
ol.
14,
No.
6,
December
2025:
4814–4827
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4821
hand,
in
multi-class
scenarios,
the
RF
algorithm
typically
outperforms
MLP
,
attaining
high
precision,
recall,
and
F1
scores
in
most
cate
gories.
While
in
U2R,
it
records
an
F1-score
of
0.93
and
performs
well
in
the
DoS
and
Probe
classications.
Ho
we
v
er
,
e
v
en
with
L2R,
it
stil
l
f
aces
some
moderate
issues.
There
is
a
noticeable
drop
in
metrics
for
U2R
and
L2R
and
inconsistent
performance
across
man
y
multi-class
cate
gories
for
LMT
.
In
U2R,
it
achie
v
es
a
comparati
v
ely
high
recall
(0.88)
b
ut
lo
w
accurac
y
(0.34),
resulting
in
a
lo
wer
F1
score
of
0.49.
W
ith
a
precision
of
0.02
and
an
F1-score
of
0.04
indicating
that
LMT
is
almost
us
eless
in
correctly
identifying
the
L2R
cate
gory
,
this
cate
gory
presents
signicant
challenges
for
LMT
.
LMT
performs
satisf
actorily
in
the
DoS
and
Probe
cate
gories
despite
these
dif
culties,
especially
in
recall.
Finally
,
while
RF
and
MLP
both
perform
e
xceptionally
well
in
binary
classication,
RF
is
the
more
rob
ust
model
in
multi-class
classication,
especially
when
dealing
with
a
v
ariety
of
attack
types,
whereas
LMT
clearly
sho
ws
deciencies,
particularly
concerning
less
pre
v
al
ent
attack
classes,
Figure
4
illustrate
the
performance
of
multi-class
classication
for
the
NSL-KDD
dataset.
Figure
4.
Multi-class
NSL-KDD
classication
F
or
the
CIC-IDS-2017
dataset,
T
ables
5
and
6
pro
vided
a
comprehensi
v
e
comparison
of
MLP
,
RF
,
and
LMT—across
dif
ferent
attack
cate
gories
and
general
types.
In
the
binary
classication
frame
w
ork
of
the
CIC-IDS-2017
dataset,
we
found
that
the
MLP
e
xhibits
outstanding
results,
with
a
precision
of
0.92,
a
recall
of
0.99,
and
an
F1-score
of
0.95.
This
implies
that
the
MLP
is
usually
good
at
distinguishing
between
le
gitimate
and
malicious
traf
c.
Ho
we
v
er
,
it
should
be
noted
that
its
precision
is
less
than
that
of
the
RF
precision.
On
the
other
hand,
the
RF
classier
performs
better
on
all
e
v
aluation
measures,
achi
e
ving
an
F1
score
of
0.99,
a
precision
of
0.99,
and
a
recall
of
0.99.
This
indicates
that
RF
can
identify
and
classify
cases
as
malicious
or
le
gitimate
instances
with
remarkable
precision
and
reliability
.
Ho
we
v
er
,
LMT
has
a
signicantly
lo
wer
performance,
with
a
precision
of
0.52,
a
recall
of
0.80,
and
an
F1-score
of
0.63.
The
suboptimal
precision
and
F1
score
imply
that
LMT
encounters
greater
challenges
in
accurately
classifying
instances
while
sustaining
a
balance
between
precision
and
recall,
as
sho
wn
in
Figure
5.
T
able
5.
Performance
metrics
for
binary
and
multi-class
CIC-IDS-2017
Classiers
Metrics
Binary
Bot
DDoS
DoS
GoldenEye
DoS
Hulk
DoS
Slo
whttptest
DoS
Slo
wloris
MLP
Precision
0.92
0.99
0.99
0.99
0.99
0.99
0.99
Recall
0.99
0.67
0.99
0.99
1.00
0.99
0.98
F1-score
0.95
0.80
0.99
0.99
0.99
0.99
0.99
RF
Precision
0.99
1.00
1.00
1.00
1.00
1.00
0.99
Recall
0.99
0.96
1.00
1.00
1.00
0.99
1.00
F1-score
0.99
0.98
1.00
1.00
1.00
1.00
1.00
LMT
Precision
0.52
0.00
0.99
0.91
0.97
0.88
0.87
Recall
0.80
0.00
0.97
0.85
0.95
0.71
0.81
F1-score
0.63
0.00
0.98
0.88
0.96
0.79
0.84
Mac
hine
and
deep
learning
classier
s
for
binary
and
...
(Ahmad
Aloqaily)
Evaluation Warning : The document was created with Spire.PDF for Python.
4822
❒
ISSN:
2252-8938
T
able
6.
Performance
metrics
for
binary
and
multi-class
CIC-IDS-2017
Classiers
Metrics
FTP-P
atator
Heartbleed
Inltration
PortScan
SSH-P
atator
Brute
F
orce
SQL
Injection
XSS
MLP
Precision
1.00
0.00
1.00
1.00
0.98
0.61
0.00
0.70
Recall
1.00
0.00
0.22
1.00
0.99
0.19
0.00
0.10
F1-score
1.00
0.00
0.36
1.00
0.99
0.29
0.00
0.10
RF
Precision
1.00
1.00
0.83
1.00
1.00
0.71
1.00
0.50
Recall
1.00
1.00
0.56
1.00
1.00
0.76
0.20
0.40
F1-score
1.00
1.00
0.67
1.00
1.00
0.73
0.33
0.40
LMT
Precision
0.84
0.50
0.00
0.90
0.87
0.00
0.00
0.00
Recall
1.00
1.00
0.00
1.00
0.51
0.00
0.00
0.00
F1-score
0.91
0.67
0.00
0.94
0.64
0.00
0.00
0.00
Figure
5.
Binary
CIC-IDS-2017
classication
In
terms
of
multi-class
classication,
we
found
that
the
MLP
classier
performs
well
(
x
≥
y
0.99)
in
most
cate
gories;
ne
v
ertheless,
in
terms
of
’Heartbleed’,
’Brute
F
orce’,
’SQL
Injection’,
and
’XSS’,
Precision
signicantly
decreases
(from
0.00
to
w
ards
0.73).
The
MLP
recal
l
numbers
sho
w
some
v
ariation;
for
e
xam-
ple,
it
performs
well
in
the
”DDoS”
cate
gory
(0.99)
and
the
”DoS”
attack
cate
gory
(98–1.00);
at
this
point,
it
f
alls
poorly
in
the
”Heartbleed,
”
”Inltration,
”
and
”XSS”
cate
gories
(0.00–0.06).
In
se
v
eral
cate
gories,
the
F1-scores
for
MLP
are
high
(0.99
for
some).
Still,
the
y
are
signicantly
lo
wer
in
’Heartbleed’,
’SQL
Injection’,
and
’XSS’,
indicating
dif
culties
in
nding
a
balance
between
precision
and
recall
for
v
arious
attack
types.
RF
consistently
maintains
a
high
recall
(
x
≥
y
0.96),
although
some
lo
wer
v
alues
(0.56
to
0.39)
are
sho
wn
in
’Inltration’,
’SQL
Injection’,
and
’XSS’,
indicating
an
ability
to
ignore
some
rare
attacks.
The
F1-score
for
RF
are
consistently
high
(
x
≥
y
0.98)
in
all
cate
gories;
ne
v
ertheless,
the
y
sho
w
lo
wer
scores
in
’Inltration’,
’SQL
Injection’,
and
’XSS’,
sho
wing
certain
domains
where
it
f
ails
to
balance
precision
and
recall.
LMT
dis-
plays
a
v
aried
precision
prole
with
high
sc
o
r
es
in
’DDoS’,
’DoS
GoldenEye’,
’DoS
Hulk’,
and
’FTP-P
atator’
(from
0.84
to
0.97),
and
v
ery
lo
w
scores
in
’Bot’,
’Heartbleed’,
’Inltration’,
and
’XSS’
(from
0.00
to
0.50).
LMT
performs
po
or
ly
in
areas
lik
e
”Bot,
”
”SQL
Injection,
”
and
”XSS,
”
b
ut
it
obtains
good
recall
in
”DDoS,
”
”DoS
GoldenEye,
”
and
”DoS
Hulk”
(ranging
from
0.71
to
1.00).
The
F1-scores
of
LMT
are
highest
in
’DDoS’,
’DoS
GoldenEye’,
’DoS
Hulk’,
and
’FTP-P
atator’
(from
0.84
to
0.96),
b
ut
the
y
are
lo
w
or
none
xistent
in
other
cate
gories,
indicating
that
LMT
f
aces
signicant
dif
culties
when
handling
less
frequent
or
rare
attack
sce-
narios.
Figure
6
presents
the
performance
metrics
for
multi-class
classication
on
the
CIC-IDS-2017
dataset.
Figure
6(a)
illustrates
the
precision
v
alues
for
each
attack
type
across
dif
ferent
classiers,
Figure
6(b)
sho
ws
the
corresponding
recall
performance,
and
Figure
6(c)
displays
the
F1-scores,
which
summarize
the
balance
between
precision
and
recall.
Ov
erall,
the
RF
classier
achie
v
ed
consistently
higher
scores
across
most
attack
cate
gories.
Int
J
Artif
Intell,
V
ol.
14,
No.
6,
December
2025:
4814–4827
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4823
(a)
(b)
(c)
Figure
6.
Performance
metrics
for
multi-class
in
CIC-IDS-2017
dataset:
(a)
precision
for
multi-class,
(b)
recall
for
multi-class,
and
(c)
F1-score
for
multi-class
There
can
be
notable
dif
ferences
in
IDS
performance
between
binary
and
multi-class
classicati
on
methods.
By
concentrating
on
dif
ferentiating
between
benign
and
malicious
communicat
ions,
binary
classi-
cation
frequently
impro
v
es
the
detection
process
and
can
increase
the
detection
rates
of
minority
classes.
On
the
other
hand,
multi-class
cate
gorization
seeks
to
dist
inguish
between
dif
ferent
types
of
attacks,
which
may
mak
e
it
dif
cult
to
reliably
identify
attacks
that
occur
rarely
.
T
able
7
compares
our
ndings
with
other
recent
studies
that
focus
on
analyzing
the
ef
fects
of
binary
class
and
multi-class
classication
in
the
performance
of
the
IDS.
Using
DL
models,
Singh
et
al
.
[29]
demonstrated
tw
o
cutting-edge
IDS.
While
the
second
combines
temporal
con
v
olutional
netw
ork
(TCN),
CNN,
and
bidirectional
long
short-term
memory
(Bi-LSTM),
the
rst
emplo
ys
Bi-LSTM
and
LuNet.
The
systems
outperform
con
v
entional
ML
models
in
tests
conducted
on
the
NSL-KDD
and
UNSW
-NB15
datasets.
Classication
accurac
y
of
up
to
99%
w
as
achie
v
ed
by
using
ensemble
Mac
hine
and
deep
learning
classier
s
for
binary
and
...
(Ahmad
Aloqaily)
Evaluation Warning : The document was created with Spire.PDF for Python.