IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
14,
No.
5,
October
2025,
pp.
4171
∼
4180
ISSN:
2252-8938,
DOI:
10.11591/ijai.v14.i5.pp4171-4180
❒
4171
BonoNet:
a
deep
con
v
olutional
neural
netw
ork
f
or
r
ecognizing
bangla
compound
characters
Kazi
Rifat
Ahmed
1
,
Nusrat
J
ahan
2,3
,
Adiba
Masud
1,4
,
Nusrat
T
asnim
5
,
Sazia
Sharmin
6
,
Nusrat
J
ahan
Mim
1
,
Imran
Mahmud
1
1
Department
of
Softw
are
Engineering,
F
aculty
of
Science
and
Information
T
echnology
,
Daf
fodil
International
Uni
v
ersity
,
Dhaka,
Bangladesh
2
Department
of
Information
T
echnology
and
Management,
F
aculty
of
Science
and
Information
T
echnology
,
Daf
fodil
International
Uni
v
ersity
,
Dhaka,
Bangladesh
3
F
aculty
of
Electronic
Engineering
and
T
echnology
(FKTEN),
Uni
v
ersiti
Malaysia
Perlis,
Arau,
Malaysia
4
Department
of
Computer
Science,
Colle
ge
of
AI,
Cyber
and
Computing,
Uni
v
ersity
of
T
e
xas
at
San
Antonio,
San
Antonio,
United
States
5
Department
of
Information
and
Communication
T
echnology
,
Bangladesh
Uni
v
ersity
of
Professionals,
Dhaka,
Bangladesh
6
Department
of
Computer
Science,
F
aculty
of
Science
and
T
echnology
,
American
International
Uni
v
ersity
,
Dhaka,
Bangladesh
Article
Inf
o
Article
history:
Recei
v
ed
Aug
11,
2024
Re
vised
Jun
28,
2025
Accepted
Aug
6,
2025
K
eyw
ords:
Bangla
BonoNet
Compound
characters
Deep
con
v
olutional
neural
netw
ork
Handwritten
Optical
character
recognition
ABSTRA
CT
The
bangla
alphabet
includes
v
o
wels,
consonants,
and
compound
symbols.
The
compound
nature
of
bangla
is
a
product
of
combining
tw
o
or
more
root
bangla
characters
into
one
graph.
The
y
are
dif
cult
to
dif
ferentiate
because
the
y
ha
v
e
a
sophisticated
geometric
shape
and
an
immense
v
ariety
of
scripts
used
by
dif-
ferent
places
and
indi
vi
duals.
This
is
one
of
the
greatest
challenges
in
creating
ef
fecti
v
e
optical
character
recognition
(OCR)
systems
for
bangla.
In
this
paper
,
a
deep
con
v
olutional
neural
netw
ork
(DCNN)-based
system
is
presented
to
iden-
tify
bangla
compound
characters
with
high
precision.
The
model
w
as
trained
using
the
AIBangla
dataset.
It
has
about
171
classes
of
bangla
compound
char
-
acters.
A
DCNN
system,
BonoNet,
w
as
designed
to
classify
compound
charac-
ters.
BonoNet
outperformed
all
other
state-of-the-art
architecture
on
the
test
set
and
impro
v
ed
o
v
er
current
state-of-the-art
architecture
methods.
BonoNet
will
greatly
impro
v
e
the
automation
and
analysis
of
the
bangla
language
by
accu-
rately
identifying
these
compound
comple
x
characters.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Nusrat
Jahan
Department
of
Information
T
echnology
and
Management,
F
aculty
of
Science
and
Information
T
echnology
Daf
fodil
International
Uni
v
ersity
Dhaka,
Bangladesh
Email:
nusrat.swe@diu.edu.bd
1.
INTR
ODUCTION
Bangla
is
the
se
v
enth
most
widely
spok
en
language
on
earth,
and
it
is
spok
en
by
nearly
300
mil
lion
people
in
South
Asia’
s
Beng
ali
re
gion.
It
is
the
of
cial
and
national
language
of
Bangladesh
and
is
spok
en
by
close
to
98%
of
Bangladesh’
s
population.
The
script
of
the
language
has
v
o
wels,
consonants,
and
comple
x
letters
within
distincti
v
e
and
separate
visual
structures,
leading
to
a
distincti
v
e
and
elaborate
syste
m
of
writing.
This
comple
xity
has
pro
v
en
to
be
dif
cult
for
optical
character
recognition
(OCR)
to
handle,
especially
in
the
case
of
character
recognition
of
handwritten
documents
on
ph
ysical
media.
OCR
technology
has
been
prized
as
an
in
v
aluable
resource
for
digitizing
written
materials
for
man
y
years,
b
ut
bangla’
s
compound
characters
pose
unique
challenges
according
to
their
structural
comple
xity
and
di
v
ersity
.
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
4172
❒
ISSN:
2252-8938
V
arious
computational
solutions
e
xist
to
be
implemented
in
handwritten
character
recognition.
These
include
machine
learning
techniques,
articial
neural
netw
orks
(ANN),
multilayer
perceptrons
(MLP),
support
v
ector
machines
(SVM),
and
gro
wing
emphasis
on
dee
p
learning
models
such
as
con
v
olutional
neural
netw
orks
(CNNs)
[1],
[2].
The
CNNs
themselv
es
ha
v
e
long
been
in
demand
for
ha
ving
greater
precision
and
less
reliance
on
human-created
feature
e
xtraction.
Their
ability
to
l
earn
visual
features
automatically
at
a
hierarchical
le
v
el
mak
es
them
e
xtremely
resourceful
in
image
recognition
and
classication
tasks.
Bangla
script
layout
consists
of
50
simple
characters,
out
of
which
11
are
v
o
wels
and
39
are
con-
sonants
[3].
The
y
together
create
more
than
171
compound
characters
by
the
combination
of
simple
ones.
Despite
remarkable
adv
ancements
in
studies,
the
majority
of
the
earlier
research
mostly
had
an
aim
t
o
recog-
nize
simple
characters
only
.
Compound
characters,
due
to
their
unpredictability
and
strong
v
ariability
,
ha
v
e
been
less
e
xplored
in
e
xisting
OCR
models
[4].
There
thus
remains
an
enormous
scope
for
systems
capable
of
handling
such
comple
xity
with
rob
ust
accurac
y
and
generalizability
.
Figure
1
sho
ws
e
xamples
of
the
simple
and
compound
characters
used
in
this
w
ork.
Figure
1.
Bangla
basic
and
compound
characters
e
xample
In
the
recent
past,
se
v
eral
deep
learning-inspired
models
ha
v
e
been
introduced
for
bangla
character
recognition.
Ahmed
et
al
.
[5]
introduced
a
deep
con
v
olutional
neural
netw
ork
(DCNN)
with
76,000
training
im-
ages
for
character
classi
cation,
while
Ashiquzzaman
et
al
.
[6]
emplo
yed
e
xponential
linear
unit
(ELU)-based
methods
to
enhance
performance
on
the
CMA
TERDB
3.1.3.3
dataset.
Azad
et
al
.
[7]
introduced
DCon
vAEN-
Net,
an
autoencoder
-DCNN
combination,
on
dat
asets
such
as
BanglaLekha-Isolate
and
Ekush.
Uddin
et
al
.
[8]
used
h
ybrid
Con
vLSTM
to
sho
w
good
performance
on
identifying
bangla
handwritten
digit
s.
Be
gum
et
al
.
[9]
used
longest
run
(LR)+chain
code
histogram
(CH)
features,
whereas
Chakraborty
and
P
aul
[10]
did
bidirec-
tional
con
v
ersion
from
simple
to
compound
and
vice
v
ersa.
Cho
wdhury
et
al
.
[11]
achie
v
ed
impro
v
ed
accurac
y
using
CNN
with
data
augmentation,
whereas
Hasan
et
al
.
[12],
[13]
e
xperimented
with
V
GG-16,
ResNet-50,
and
DenseNet—identifying
DenseNet
particularly
ef
fecti
v
e
for
simple
as
well
as
compound
characters
on
the
AIbangla
dataset.
Other
approaches
follo
wed
handcrafted
features
and
combination
strate
gies.
Kibria
et
al
.
[14]
emplo
yed
SVM
and
MLP
classiers
with
local
recepti
v
e
eld
(LRF),
histogram
of
oriented
gradients
(HOG),
and
diagonal
features,
and
Khan
et
al
.
[15]
follo
wed
high
performance
on
the
BanglaLekha-Exclusi
v
e
dataset
using
SE-ResNeXt.
Mukherjee
et
al
.
[16]
e
xperimented
with
v
arious
learning
methods
on
10,000
bangla
web
images.
Saha
et
al
.
[17],
[18]
introduced
BBCNet-15
for
impro
v
ed
basic
character
recognition
and
compared
local
binary
pattern
(LBP)-based
descriptors
under
v
arious
classes.
Sarika
et
al
.
[19]
demonstrated
V
GG-16
performance
for
T
elugu
script,
and
R
abi
et
al
.
[20]
demonstrated
e
xcellent
results
for
KD
ANet
on
BanglaLekha.
Pramanik
and
Bag
[21]
used
chain-code
features
for
compound
character
recognition
using
ICD
AR
and
CMA-
TERdb
databases.
K
oiso
et
al
.
[22]
e
xtended
OCR
research
to
Japanese
script.
Separately
,
Jishan
et
al
.
[1]
in-
te
grated
NLP
with
h
ybrid
neural
netw
orks
for
te
xt
image
recognition,
utilizing
grammar
analysis
and
language
modeling
techniques,
along
with
other
researchers,
used
NLP
to
recognize
dif
ferent
channels
from
images
and
te
xts
[23]–[25].
Despite
such
a
heterogeneous
body
of
w
ork,
most
w
orks
still
emphasize
unique
character
recognition.
Handwritten
compound
characters
are
still
dif
cult
to
classify
since
the
y
are
visually
v
ariable
and
conte
xtually
v
ariable.
T
o
address
this
issue,
in
this
w
ork,
a
shallo
w
DCNN
architecture
named
BonoNet
is
introduced
that
is
specically
tar
geted
to
w
ards
the
accurate
recognition
of
bangla
compound
characters.
BonoNet
outperforms
the
state-of-the-art
models
ResNet
and
DenseNet
on
the
AIBangla
dataset.
Unlik
e
other
methods,
BonoNet
automates
feature
e
xtraction
and
tackles
the
high
intra-
class
similarity
and
inter
-class
v
ariability
common
in
compound
bangla
characters.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
4171–4180
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4173
2.
METHOD
In
this
section,
the
approach
of
selecting
bangla
compound
characters
which
will
be
utilized
is
e
xplained.
A
skilled
DCNN,
’BonoNet’,
has
been
designed
to
identify
bangla
compound
characters
ef
ciently
.
The
suggested
approach
is
illustrated
in
Figure
2.
Figure
2.
Proposed
methodology
for
compound
character
recognition
2.1.
Data
collection
The
methodology
that
w
as
suggested
w
as
adapted
from
the
AIBangla
dataset
created
by
Hasan
et
al
.
[12].
The
dataset
consists
of
handwritten
bangla
characters
through
submissions
of
more
than
2,000
indi
viduals
from
v
arious
institutes
in
Bangladesh.
It
is
a
ne
w
benchmark
in
the
eld
with
a
holistic
use
case
and
performance
benchmark.
AIBangla
dataset
has
a
lar
ge
bangla
character
set,
including
249,911
im-
ages
for
compound
characters
and
80,403
images
for
simple
characters
in
50
classes.
As
their
dataset
does
not
contain
an
y
number
data,
AIBangla
g
athered
330,314
images
for
221
classes.
The
y
also
ha
v
e
a
dataset
of
171
compound
characters
in
bangla,
which
we
will
use.
The
AIBangla
dataset
samples
are
represented
in
Figure
3.
Figure
3.
Fe
w
e
xamples
of
the
dataset
2.2.
Data
pr
epr
ocessing
Data
processing
is
benecial
for
impro
ving
accurac
y
and
reduci
ng
the
comple
xity
of
an
image.
Python
OpenCV
w
as
utilized
to
implement
the
preprocessing
steps.
Figure
4
sho
ws
the
preprocessing
steps
that
were
adopted
here.
F
or
achie
ving
that,
it
w
as
initiated
by
transforming
RGB
images
into
gray-scale
to
lo
wer
their
dimensionality
and
mitig
ate
the
load
on
the
model.
The
dataset
needs
to
be
transf
ormed
to
gray-scale
so
as
to
cancel
tone
v
ariability
and
noise.
OpenCV
library
in
Python
is
utilized
for
achie
ving
that,
which
succeeds
in
transforming
images
into
gray-scale
and
remo
ving
noise
by
making
use
of
Gaussian
blur
.
Image
thresholding
also
reduces
analysis
by
transforming
images
into
binary
black
and
white.
P
articularly
utilized
multi-otsu
thresholding,
which
classies
pix
els
into
classes
as
per
their
intensity
of
gray
le
v
el.
W
ith
the
use
of
OpenCV
under
Python,
m
ulti-otsu
thresholding
is
used
o
v
er
the
dataset,
enhancing
its
prepara
tory
procedure.
T
o
nd
precise
handwritten
compound
characters,
the
unnecessary
parts
of
the
image
are
eliminated.
Utilizing
contour
detection
from
Python’
s
OpenCV
,
to
detect
the
edges
of
Bangla
compound
characters.
After
detection,
the
image
is
cropped
to
the
size
of
the
character
.
Image
resizing
is
another
critical
step,
which
accelerates
neural
netw
ork
training
by
minim
izing
the
pix
els.
In
our
instance,
images
are
resized
to
28
×
28,
leading
to
better
model
results
as
presented
in
Figure
5.
The
results
before
pre-processing
are
sho
wn
in
Figure
5(a)
and
after
pre-processing
are
sho
wn
in
Figure
5(b).
BonoNet:
a
deep
con
volutional
neur
al
network
for
r
eco
gnizing
bangla
...
(Kazi
Rifat
Ahmed)
Evaluation Warning : The document was created with Spire.PDF for Python.
4174
❒
ISSN:
2252-8938
Figure
4.
Data
preprocessing
steps
of
train
and
test
data
(a)
(b)
Figure
5.
The
dataset
of
(a)
before
preprocessing
and
(b)
after
preprocessing
T
raining,
test,
and
v
alidati
on
sets
ha
v
e
been
di
vided.
T
o
be
specic,
80%
of
the
data
were
de
v
oted
to
training
purposes,
and
10%
to
test
purposes,
with
the
remaining
10%
for
v
alidation
purposes.
Comple
x
characters
are
the
center
of
attention,
which
are
further
classied
into
171
cate
gories.
A
total
of
199,803
samples
are
present
for
utilization
in
training
the
model.
A
total
of
25,123
sampl
es
are
al
so
present
for
testing
the
model’
s
performance.
Additionally
,
24,908
samples
are
present
for
v
alidating
the
performance
of
the
model
while
training
it.
The
model
can
be
e
xhausti
v
ely
trained,
tested,
and
v
alidated
with
this
pro
vision.
2.3.
Pr
oposed
method:
BonoNet
ar
chitectur
e
This
DCNN
produces
28
×
28
images
with
7
con
v
olutional
layers,
3
fully
connected
layers,
and
5
dropout
layers.
The
rst
and
second
layers:
k
ernel
of
size
3
×
3
and
32
lters.
In
the
model,
a
batch-
normalization
layer
with
pool
size
of
22
and
without
strides
v
alue
is
used
after
each
con
v
olution
layer
.
Rectied
linear
unit
(ReLU)
is
an
acti
v
ati
on
function
for
all
con
v
olutional
layers,
and
max
pooling
is
skipped
by
the
rst
layer
.
F
or
reducing
o
v
ertting,
a
dropout
layer
is
appl
ied
after
the
second
layer
,
and
then
a
max
pooling
layer
.
All
third
and
fourth
con
v
olutional
layers
consi
st
of
64
lters
with
an
acti
v
ation
function.
ReLU
acti
v
ation
func-
tion
and
a
maxpooling
layer
were
applied
after
the
fourth
layer
.
Batch
normalization
is
applied
e
v
erywhere
e
xcept
the
output
layer
.
A
dropout
layer
after
the
max
pool
layer
is
complete.
5
th
,
6
th
,
and
7
th
ha
v
e
128,
128,
and
256
lters.
BatchNormalization
follo
wed
by
max
pooling
and
dropout
in
each
layer
.
Then
we
ha
v
e
the
atten
layer
to
classify
,
follo
wed
by
three
fully
connected
layers
of
512,
512,
and
171
neurons.
Batchnormal-
ization
layer
on
the
last
layer
,
dropout
in
the
lo
wer
tw
o.
Us
es
softmax
acti
v
ation
in
the
nal
layer
.
Figure
6
is
the
proposed
DCNN
model.
2.4.
Model
br
eakdo
wn
The
BonoNet
architecture
is
or
g
anized
into
three
main
components.
It
consists
of
a
feature
e
xtractor
and
a
classier
.
It
also
includes
training
parameters
that
optimize
the
o
v
erall
performance
of
the
model.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
4171–4180
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4175
2.4.1.
F
eatur
e
extractor
Feature
e
xtraction
lies
at
the
foundation
of
the
BonoNet
architecture,
dealing
with
input
image
data
ef-
ciently
.
Ra
w
inputs
are
con
v
erted
into
structured
features
through
the
application
of
se
v
eral
layers.
Processing
initiates
with
an
input
layer
,
with
the
con
v
o
y
layers
for
recognizing
the
features.
–
Input
layer:
the
model
be
gins
with
an
input
layer
that
tak
es
in
the
image
data.
–
Con
v
olutional
layers
(Con
v2D):
multiple
con
v
olutional
layers
e
xtract
features
from
the
image.
Each
layer
applies
lters
to
detect
patterns
lik
e
edges,
te
xtures,
and
shapes
at
dif
ferent
le
v
els
of
abstraction.
–
ReLU
acti
v
ation:
ReLU
is
used
to
introduce
non-linearity
,
helping
the
model
capture
comple
x
patterns.
–
Batch
normalization:
this
layer
normalizes
the
output
of
the
pre
vious
layer
.
–
Max
pooli
ng
:
max
pooling
do
wnsamples
the
feature
maps,
reducing
dimensionality
and
computational
cost
while
preserving
important
information.
–
Dropout:
dropout
randomly
deacti
v
ates
a
portion
of
neurons
during
training,
pre
v
enting
o
v
ertting.
2.4.2.
Classier
Dense
layers
are
fully
connected
layers
that
use
the
e
xtracted
features
to
determine
the
class
of
the
image.
The
architecture
includes
an
initial
dense
layer
with
512
units,
follo
wed
by
a
layer
with
1,024
units.
Finally
,
there
is
an
output
layer
with
a
number
of
units
equal
to
the
number
of
classes.
2.4.3.
Model
training
parameter
The
proposed
model
w
as
trained
using
a
set
of
designated
parameters
optimized
to
ensure
ef
fecti
v
e
con
v
er
gence
and
generalization.
T
able
1
sho
ws
the
parameters
has
been
used
in
the
proposed
methodology
.
These
parameters
were
carefully
selected
to
enhance
the
model’
s
training
performance.
Figure
6.
BonoNet
architecture
T
able
1.
T
raining
parameters
used
for
BonoNet
P
arameter
V
alue
Learning
rate
0.0001
Decay
f
actor
0.2
Early
stopping
patience
2
Loss
function
Cate
gorical
cross-entrop
y
T
otal
epochs
100
Epochs
before
stopping
36
Ev
aluation
dataset
Fresh
v
alidation
set
2.5.
Benets
f
or
image
pr
ocessing
The
structure
of
a
CNN
is
e
xactly
what
is
needed
to
solv
e
image
processing
problems
because
it
is
pos
sible
to
train
it
to
learn
hierarchical
representations
of
visual
features.
The
con
v
olutional
layers
are
able
to
detect
local
features,
and
the
pooling
layers
of
fer
dimensionality
reduction
and
translation
in
v
ariance.
The
dense
layers
then
concatenate
these
features
ultimately
to
enable
accurate
prediction.
The
use
of
batch
normalization
and
dropout
methods
impro
v
es
training
ef
cienc
y
and
generalization
and
thus
mak
es
the
model
in
v
ariant
to
image
data
v
ariation.
BonoNet:
a
deep
con
volutional
neur
al
network
for
r
eco
gnizing
bangla
...
(Kazi
Rifat
Ahmed)
Evaluation Warning : The document was created with Spire.PDF for Python.
4176
❒
ISSN:
2252-8938
3.
RESUL
TS
AND
DISCUSSION
3.1.
Setup
and
en
vir
onment
The
research
made
use
of
the
mentioned
resources
and
specications.
The
e
xperiment
w
as
conducted
by
using
a
CPU
model
Intel(R)
Core
(TM)
i5-8265U
running
at
1.60
GHz
and
1.80
GHz
with
8
GB
of
RAM.
The
de
vice
is
equipped
with
8
GB
of
RAM,
Intel(R)
UHD
Graphics
620
and
NVIDIA
GeF
orce
MX110
GPU,
T
oshiba
MQ04ABF100
HDD,
and
W
indo
ws
11
Home
operating
system.
The
program
w
as
e
x
ecuted
using
Jup
yter
Notebook
6.
4.12
as
well
as
the
Anaconda
platform
v
ersion
2.3.
3.2.
Experiment
r
esults
The
‘BonoNet’
model
w
as
tested
with
10%
of
images
from
the
datasets.
From
the
171
classes
of
im-
age
classication,
this
model
acquired
around
90.01%
training
accurac
y
and
89.99%
v
alidation
accurac
y
.
This
model
als
o
obtained
90.01%
precision
in
the
train
and
90.01%
in
the
v
alidation
set.
Ov
erall,
it
pro
vided
90.01%
accurac
y
in
recognizing
bangla
compound
characters.
T
able
2
sho
ws
the
result
achie
v
ed
in
classication.
The
e
v
aluation
criteria
for
the
model
depend
on
its
performance
with
the
v
alidation
dataset,
which
contains
24,908
samples.
The
model’
s
precision,
recall,
and
F1
score
are
all
0.90,
indicating
its
90%
accurac
y
in
correctly
identifying
the
class
in
its
predictions
(precision),
in
the
actual
instances
pres
ent
(recall),
and
in
o
v
erall
performance
considering
both
precision
and
recall
(F1
score).
This
sho
ws
that
the
model
is
reliable
and
ef
fecti
v
e
in
accurately
predicting
the
correct
classes
for
t
he
v
alidation
data.
Figure
7
sho
ws
the
training
and
v
alidation
accurac
y
of
the
‘BonoNet’
model
for
identifying
the
Bangla
compound
characters.
T
able
2.
Detailed
metrics
of
BonoNet
model
Class
Prec
ision
Recall
F1-score
Support
class
0
0.89
0.89
0.89
1169
class
1
0.90
0.90
0.90
1170
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
class
170
0.89
0.89
0.89
1169
micro
a
vg
0.89
0.90
0.90
199803
macro
a
vg
0.90
0.89
0.89
199803
weighted
a
vg
0.90
0.90
0.90
199803
Figure
7.
BonoNet
model
training
and
v
alidation
accurac
y
and
loss
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
4171–4180
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4177
The
charts
display
ho
w
well
the
model
performed
during
35
epochs.
The
model’
s
accurac
y
increas
es
as
it
trains,
as
sho
wn
in
the
left
plot.
Both
training
and
v
alidation
accurac
y
initially
be
gin
at
a
lo
w
le
v
el
and
quickly
impro
v
e,
reaching
a
plateau
of
about
90%,
indicating
impro
v
ed
predicti
v
e
abilities
of
the
model.
The
model’
s
loss,
whic
h
is
a
me
asure
of
error
,
is
displayed
on
the
plot
to
the
right.
The
be
ginning
and
nal
states
for
both
training
and
v
alidation
loss
start
quite
high
and
proceed
to
drop
of
f
signicantly
until
both
le
v
els
sta-
bilize
at
a
lo
w
le
v
el.
This
sho
ws
that
the
model
is
learning
ef
fecti
v
ely
,
with
errors
decreasing
as
it
under
goes
training.
In
general,
the
data
i
nd
i
cates
that
the
model
is
performi
ng
uniformly
well
on
both
the
training
and
v
al-
idation
sets,
achie
ving
high
accurac
y
with
minim
al
errors.
The
‘BonoNet’
model
successfully
classied
com-
pound
characters
with
intricate
structures,
leading
to
lo
wer
errors
compared
to
simple
and
numeral
characters.
T
able
3.
displays
the
comparison
between
the
accurac
y
and
class
cate
gorization
of
the
‘BonoNet’
model
and
the
current
model.
T
able
3.
Comparison
with
e
xisting
and
proposed
models
Name
Number
of
classes/images
Accurac
y
(%)
Chakraborty
and
P
aul
[10]
300,000
89.20
Hasan
et
al
.
[12]
171
81.83
Kibria
et
al
.
[14]
171
85.91
Pramanik
and
Bag
[21]
171
88.74
Saha
et
al
.
[17]
171
73.3
Proposed
Model
(BonoNet)
171
90.01
Here,
T
able
3
illustrates
the
precision
of
v
arious
models
in
identifying
bangla
compound
charac-
ters.
In
their
study
,
Chakraborty
and
P
aul
[10]
obtained
an
89.20%
accurac
y
using
a
v
ast
dataset
containing
300,000
images.
Hasan
et
al
.
[12],
Kibria
et
al
.
[14],
Pramanik
and
Bag
[21],
and
Saha
et
al
.
[17]
utilized
datasets
comprising
171
classes
and
attained
accuracies
of
81.83%,
85.91%,
88.74%,
and
73.3%
in
the
same
order
.
Similarly
,
the
BonoNet
model,
which
w
as
proposed,
utilized
a
dataset
consisting
of
171
cate
gories
and
managed
to
reach
an
accurac
y
rate
of
90.01%,
the
highest
among
all
models.
This
indicates
that
the
BonoNet
model
is
more
accurate
than
the
other
models
for
this
particular
task.
The
‘BonoNet’
model
surpasses
v
arious
models
to
achie
v
e
impro
v
ed
results
in
recognizing
compound
characters.
4.
CONCLUSION
Recently
,
CNN
has
g
ained
much
notice
due
to
its
adv
anced
ability
to
cate
gorize
images
ef
fe
cti
v
ely
.
The
model
is
consistently
rele
v
ant.
The
’BonoNet’
model,
de
v
eloped
with
CNN,
outperformed
the
prior
model
in
accurately
recognizing
bangla
compound
characters.
W
e
utilized
the
model
to
impro
v
e
the
results.
The
model
’BonoNet’
can
achie
v
e
optimal
recogniti
on
accurac
y
for
accurate
identi
cation
of
bangl
a
compound
characters.
Conclusions
were
compared
to
the
graphs
generated
to
v
erify
the
model.
Graphs
were
produced
for
accurac
y
and
loss
functions
at
each
c
ycle.
The
proposed
models
achie
v
ed
a
90.01%
le
v
el
of
accurac
y
.
T
o
enhance
the
model’
s
accurac
y
for
potential
gro
wth
in
the
future.
Using
more
adv
anced
and
operational
de
vices
in
addition
to
our
trained
de
vice
can
enhance
the
accurac
y
of
the
proposed
model.
The
training
potential
of
the
dataset
will
gro
w
as
tim
e
is
sa
v
ed.
Increasing
the
size
of
the
datasets
for
training
and
v
alidation
could
potentially
impro
v
e
the
odds.
Alternati
v
ely
,
the
model
can
be
trained
using
lar
ger
image
input
sizes
in
order
to
potentially
impro
v
e
results.
Only
indi
vidual
bangl
a
compound
characters
can
be
used
with
the
suggested
method.
In
the
future,
we
aim
to
combine
simple
and
comple
x
bangla
characters
to
dene
a
complete
bangla
w
ord
within
a
sentence.
A
CKNO
WLEDGMENTS
W
e
w
ould
lik
e
to
thanks
all
the
authors
for
their
contrib
ution.
The
Daf
fodil
International
Un
i
v
er
sity
has
pro
vided
a
great
support
for
pro
viding
the
en
vironment
to
do
the
research.
FUNDING
INFORMA
TION
No
nancial
support
w
as
recei
v
ed
for
the
completion
of
this
study
.
BonoNet:
a
deep
con
volutional
neur
al
network
for
r
eco
gnizing
bangla
...
(Kazi
Rifat
Ahmed)
Evaluation Warning : The document was created with Spire.PDF for Python.
4178
❒
ISSN:
2252-8938
A
UTHOR
CONTRIB
UTIONS
ST
A
TEMENT
This
journal
uses
the
Cont
rib
utor
Roles
T
axonomy
(CRediT)
to
recognize
indi
vidual
author
contrib
u-
tions,
reduce
authorship
disputes,
and
f
acilitate
collaboration.
Name
of
A
uthor
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Kazi
Rif
at
Ahmed
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Nusrat
Jahan
✓
✓
✓
✓
✓
Adiba
Masud
✓
✓
✓
✓
✓
✓
Nusrat
T
asnim
✓
✓
✓
✓
✓
✓
Sazia
Sharmin
✓
✓
✓
✓
✓
✓
Nusrat
Jahan
Mim
✓
✓
✓
✓
✓
✓
Imran
Mahmud
✓
✓
✓
✓
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
Administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
Acquisition
F
o
:
F
o
rmal
Analysis
E
:
Writing
-
Re
vie
w
&
E
diting
CONFLICT
OF
INTEREST
ST
A
TEMENT
The
authors
af
rm
that
this
study
w
as
conducted
without
an
y
conicting
interests.
D
A
T
A
A
V
AILABILITY
The
data
supporting
this
research
are
directly
a
v
ailable
on
Kaggle
via
https://www
.kaggle.com/datasets/
a
wmium/handwritten-bangla-characterdataset-aaibangla,
originally
published
by
the
dataset
authors
in
associa-
tion
with
the
paper
a
v
ailable
at
https://doi.or
g/10.1109/ICBSLP47725.2019.201481.
The
dataset,
titled
“Hand-
written
bangla
character
dataset
(AI-Bangla)”,
w
as
used
under
the
terms
specied
by
its
public
release.
REFERENCES
[1]
M.
A.
Jishan,
K.
R.
Mahmud,
A.
K.
Al
Azad,
M.
R.
A.
Rashid,
B.
P
aul,
and
M.
S.
Alam,
“Bangla
language
te
xtual
i
mage
description
by
h
ybrid
neural
netw
ork
model,
”
Indonesian
J
ournal
of
Electrical
Engineering
and
Computer
Science
,
v
ol.
21,
no.
2,
pp.
757–767,
2020,
doi:
10.11591/ijeecs.v21.i2.pp757-767.
[2]
M.
G.
Hussain,
B.
Sultana,
M.
Rahman,
and
M.
R.
Hasan,
“Comparison
analysis
of
B
angla
ne
ws
articles
classication
using
support
v
ector
machine
and
logistic
re
gression,
”
TELK
OMNIKA
(T
elecommunication
Computing
Electr
onics
and
Contr
ol)
,
v
ol.
21,
no.
3,
pp.
584–591,
2023,
doi:
10.12928/TELK
OMNIKA.v21i3.23416.
[3]
A.
Hasan,
M.
H.
Jobayer
,
M.
A.
A.
M.
Pias,
T
.
Alam,
and
R.
Khan,
“Bangla
sign
language
recognition
with
multimodal
deep
learning
fusion,
”
Engineering
Reports
,
v
ol.
7,
no.
4,
2025,
doi:
10.1002/eng2.70139.
[4]
M
.
Kabir
,
O.
B
.
Mahfuz,
S.
R.
Raiyan,
H.
Mahmud,
and
M.
K.
Hasan,
“BanglaBook:
A
lar
ge-scale
bangla
dataset
for
sentiment
analysis
from
book
re
vie
ws,
”
arXiv-Computer
Science
,
2023,
doi:
10.48550/arXi
v
.2305.06595.
[5]
S
.
Ahmed,
F
.
T
absun,
A.
S.
Re
yadh,
A.
I.
Shaa,
and
F
.
M.
Shah,
“Beng
ali
handwritten
alphabet
recognition
using
deep
con
v
olutional
neural
netw
ork,
”
5th
International
Confer
ence
on
Computer
,
Communication,
Chemical,
Materials
and
Electr
onic
Engineering
(IC4ME2)
,
2019,
doi:
10.1109/IC4ME247184.2019.9036572.
[6]
A
.
Ashiquzzaman,
A.
K.
T
ushar
,
S.
Dutta,
and
F
.
Mohsin,
“
An
ef
cient
method
for
impro
ving
classication
accurac
y
of
handwrit-
ten
Bangla
compound
characters
using
DCNN
with
dropout
and
ELU,
”
2017
3r
d
IEEE
International
Confer
ence
on
Resear
c
h
in
Computational
Intellig
ence
and
Communication
Networks
(ICRCICN)
,
pp.
147–152,
2017,
doi:
10.1109/ICRCICN.2017.8234497.
[7]
M.
A.
Azad,
H.
S.
Singha,
and
M.
M.
H.
Nahid,
“Bangla
handwritten
character
recognition
using
deep
con
v
olutional
autoencoder
neural
netw
ork,
”
2020
2nd
International
Confer
ence
on
Advanced
Information
and
Communication
T
ec
hnolo
gy
(ICAICT)
,
2020,
doi:
10.1109/ICAICT51780.2020.9333472.
[8]
A.
H.
Uddin,
J.
Khatun,
M.
A.
M
e
ghna,
and
P
.
Mahmud,
“Bangla
handwritten
digit
recognition
using
RNN-C
NN
h
y-
brid
approach,
”
2022
25th
International
Confer
ence
on
Computer
and
Information
T
ec
hnolo
gy
(ICCIT)
,
pp.
288–293,
2022,
doi:
10.1109/ICCIT57492.2022.10055089.
[9]
H.
Be
gum,
A.
Rad,
and
M.
M.
Islam,
“Recognition
of
bangla
handwritten
characters
using
feature
combinations,
”
2018
5th
IEEE
Uttar
Pr
adesh
Section
International
Confer
ence
on
Electri
cal,
Electr
onics
and
Computer
Engineering
(UPCON)
,
2018,
doi:
10.1109/UPCON.2018.8597076.
[10]
S.
Chakraborty
and
S.
P
aul,
“Beng
ali
handwritten
character
transformation:
basic
to
compound
and
compound
to
basic
using
con
v
olutional
neural
netw
ork,
”
International
Confer
ence
on
Robotics,
Electrical
and
Signal
Pr
ocessing
T
ec
hniques
,
pp.
142–146,
2021,
doi:
10.1109/ICREST51555.2021.9331247.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
4171–4180
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
4179
[11]
R.
R.
Cho
wdhury
,
M.
S.
Hossain,
R.
U.
Islam,
K.
Andersson,
and
S.
Hossain,
“Bangla
handwritten
character
recognition
using
con
v
olutional
neural
netw
ork
with
data
augment
ation,
”
2019
J
oint
8th
International
Confer
ence
on
Informatics,
Electr
onics
and
V
ision
(ICIEV)
and
2019
3r
d
International
Confer
ence
on
Ima
ging
,
V
ision
and
P
attern
Reco
gnition
(icIVPR)
,
pp.
318–323,
2019,
doi:
10.1109/ICIEV
.2019.8858545.
[12]
M.
M.
Hasan,
M.
M.
Abir
,
M.
Ibrahim
,
M.
Sayem,
and
S.
Abdullah,
“
AIBangla:
A
benchmark
dataset
for
isolated
bangla
handwritten
basic
and
compound
character
recognition,
”
2019
International
Confer
ence
on
Bangla
Speec
h
and
Langua
g
e
Pr
ocessing
(ICBSLP)
,
2019,
doi:
10.1109/ICBSLP47725.2019.201481.
[13]
M.
N.
Hasan,
R.
I.
Sultan,
and
M.
Kasedullah,
“
An
automated
system
for
recognizing
isolated
handwritten
bangla
characters
using
deep
con
v
olutional
neural
netw
ork,
”
ISCAIE
2021
-
IEEE
11th
Symposium
on
Computer
Applications
and
Industrial
Electr
onics
(ISCAIE)
,
pp.
13–18,
2021,
doi:
10.1109/ISCAIE51753.2021.9431799.
[14]
M.
R.
Kibria,
A.
Ahmed,
Z.
Firda
ws
i,
and
M.
A.
Y
ousuf,
“Bangla
compound
character
recognition
using
support
v
ec-
tor
machine
(SVM)
on
adv
anced
feature
sets,
”
2020
IEEE
Re
gion
10
Symposium
(TENSYMP)
,
pp.
965–968,
2020,
doi:
10.1109/TENSYMP50017.2020.9230609.
[15]
M.
M.
Khan,
M.
S.
Uddin,
M.
Z.
P
arv
ez,
and
L.
Nahar
,
“
A
squeeze
and
e
xcitation
ResNeXt-based
deep
learning
model
for
Bangla
handwritten
compound
character
recognition,
”
J
ournal
of
King
Saud
Univer
sity
-
Computer
and
Information
Sciences
,
v
ol.
34,
no.
6,
pp.
3356–3364,
2022,
doi:
10.1016/j.jksuci.2021.01.021.
[16]
P
.
Mukherjee,
S.
Sen,
K.
Ro
y
,
and
R.
Sarkar
,
“Recognition
of
online
handwritten
Bangla
characters
using
supervised
and
unsuper
-
vised
learning
approaches,
”
International
J
ournal
of
Computer
V
ision
and
Ima
g
e
Pr
ocessing
(IJCVIP)
,
v
ol.
10,
no.
3,
pp.
18–30,
2020,
doi:
10.4018/ijcvip.2020070102.
[17]
C.
Saha,
R.
H.
F
aisal,
and
M.
M.
Rahm
an,
“Bangla
handwritten
basic
character
recognition
using
deep
con
v
olutional
neural
net-
w
ork,
”
2019
J
oint
8th
International
Confer
ence
on
Informatics,
Electr
onics
and
V
ision
(ICIEV)
and
3r
d
International
Confer
ence
on
Ima
ging
,
V
ision
and
P
attern
Reco
gnition
(icIVPR)
,
pp.
190–195,
2019,
doi:
10.1109/ICIEV
.2019.8858575.
[18]
C.
Saha,
R.
H.
F
aisal,
and
M.
M.
Rahman,
“Bangla
handwritten
character
recognition
using
local
binary
pattern
and
its
v
ari-
ants,
”
2018
International
Confer
ence
on
Inno
vations
in
Science
,
Engineering
and
T
ec
hnolo
gy
(ICISET)
,
pp.
236–241,
2018,
doi:
10.1109/ICISET
.2018.8745645.
[19]
N.
Sarika,
N.
Sirisala,
and
M.
S.
V
elpuru,
“CNN
based
optical
character
recognition
and
applications,
”
Pr
oceedings
of
the
6th
Inter
-
national
Confer
ence
on
In
ventive
Computation
T
ec
hnolo
gies
(ICICT)
,
pp.
666–672,
2021,
doi:
10.1109/ICICT50816.2021.9358735.
[20]
K.
K.
Rabbi,
A.
Hossain,
P
.
De
v
,
A.
Sadman,
D.
Z.
Karim,
and
A.
A.
Rasel,
“KD
ANet:
Handwritten
character
recognition
for
Bangla
language
using
deep
learning,
”
2022
25th
International
Confer
ence
on
Computer
and
Information
T
ec
hnolo
gy
(ICCIT)
,
pp.
651–656,
2022,
doi:
10.1109/ICCIT57492.2022.10054708.
[21]
R.
Pramanik
and
S.
Bag,
“Shape
decomposition-based
handwritten
compound
character
recognition
for
Bangla
OCR,
”
J
ournal
of
V
isual
Communication
and
Ima
g
e
Repr
esentation
,
v
ol.
50,
pp.
123–134,
Jan.
2018,
doi:
10.1016/j.jvcir
.2017.11.016.
[22]
N.
K
oiso,
Y
.
T
ak
emoto,
Y
.
Ishika
w
a,
and
M.
T
akata,
“Proposed
met
hod
of
acquiring
train
data
for
early-modern
Japanese
printed
character
recognizers,
”
J
ournal
of
Super
computing
,
v
ol.
81,
no.
6,
2025,
doi:
10.1007/s11227-024-06866-4.
[23]
F
.
M.
Rusli,
K.
A.
Adhiguna,
and
H.
Ira
w
an,
“Indonesian
ID
card
e
xtractor
using
optical
character
recognition
and
natural
language
post-processing,
”
2021
9th
International
Confer
ence
on
Information
and
Communication
T
ec
hnolo
gy
(ICoICT)
,
pp.
621–626,
2021,
doi:
10.1109/ICoICT52021.2021.9527510.
[24]
H
.
Moussaoui,
N.
E.
Akkad,
and
M.
Benslimane,
“License
plate
te
xt
recognition
using
deep
lear
ning,
NLP
,
and
im-
age
processing
techniques,
”
Statistics,
Optimization
and
Information
Computing
,
v
ol.
12,
no.
3,
pp.
685–696,
2024,
doi:
10.19139/SOIC-2310-5070-1966.
[25]
S.
Rajendran,
M.
A.
K
umar
,
R.
Rajalakshmi,
V
.
Dhanalakshmi,
P
.
Balasubramanian,
and
K.
P
.
Soman,
“T
amil
NLP
technologies:
challenges,
state
of
the
art,
trends
and
future
scope,
”
Communications
in
Computer
and
Information
Science
,
pp.
73–98,
2023,
doi:
10.1007/978-3-031-33231-9
6
.
BIOGRAPHIES
OF
A
UTHORS
Kazi
Rifat
Ahmed
has
comple
ted
his
B.Sc.
from
Daf
fodil
International
Uni
v
ersity
in
Softw
are
Engineering
and
M.Sc.
from
the
Institute
of
Information
T
echnology
,
Jahangirnag
ar
Uni-
v
ersity
.
He
is
currently
w
orking
as
a
lecturer
in
the
Department
of
Softw
are
Engineering,
Daf
fodil
International
Uni
v
ersity
.
His
research
interests
are
machine
learning,
deep
learning,
NLP
,
and
com-
puter
vision.
He
has
published
in
high-impact
journals
and
conferences,
aiming
to
adv
ance
AI-dri
v
en
solutions
in
healthcare
and
security
.
He
can
be
contacted
at
email:
rif
at.swe@diu.edu.bd.
Nusrat
J
ahan
is
w
orking
as
an
assistant
professor
and
head
at
Department
of
Information
T
echnology
&
Management
in
Daf
fodil
International
Uni
v
ersity
,
Bangladesh.
She
completed
her
M.Sc.
and
B.Sc.
in
Information
T
echnology
from
Institute
of
Information
T
echnology
,
Jahangirnag
ar
Uni
v
ersity
.
She
is
doing
her
Ph.D.
from
Department
of
Computer
Engineering,
Uni
v
ersity
Malaysia
Perlis
(Uni
Map).
She
is
interested
in
technology
management,
computer
netw
orks,
m
achine
learning,
and
articial
intelligence.
She
can
be
contacted
at
email:
nusrat.swe@diu.edu.bd.
BonoNet:
a
deep
con
volutional
neur
al
network
for
r
eco
gnizing
bangla
...
(Kazi
Rifat
Ahmed)
Evaluation Warning : The document was created with Spire.PDF for Python.
4180
❒
ISSN:
2252-8938
Adiba
Masud
is
currently
pursuing
her
Ph.D.
in
the
Department
of
Computer
Science,
Uni
v
ersity
of
T
e
xas
at
San
Antonio,
T
e
xas,
USA.
She
has
completed
her
B.Sc.
and
M.Sc.
from
the
Institute
of
Information
T
echnology
,
Jahangirnag
ar
Uni
v
ersity
.
She
is
currently
on
study
lea
v
e
as
a
lecturer
in
the
Department
of
Softw
are
Engineering,
Daf
fodil
International
Uni
v
ersity
.
Her
research
interests
are
machine
learning,
deep
learning,
NLP
,
and
computer
vision.
She
can
be
contacted
at
email:
adiba.swe@diu.edu.bd.
Nusrat
T
asnim
has
completed
her
B.Sc.
and
M.Sc.
from
the
Institute
of
Information
T
echnology
,
Jahangirnag
ar
Uni
v
ersity
.
She
is
currently
w
orking
as
a
lecturer
in
the
Department
of
Information
and
Communication
T
echnology
,
Bangladesh
Uni
v
ersity
of
Professionals.
She
w
as
a
former
lecturer
in
the
Department
of
Softw
are
Engineering,
Daf
fodi
l
International
Uni
v
ersity
.
Her
research
interests
are
machine
learning,
deep
learning,
NLP
,
and
computer
vision.
She
can
be
con-
tacted
at
email:
nusrattasnim17@gmail.com.
Sazia
Sharmin
has
completed
her
B.Sc.
and
M.Sc.
from
t
he
Institute
of
Information
T
echnology
,
Jahangirnag
ar
Uni
v
ersity
.
She
is
currently
w
orking
as
a
lecturer
in
the
Department
of
Computer
Science
at
the
American
International
Uni
v
ersity
,
and
she
w
as
pre
viously
a
lecturer
in
the
Department
of
Softw
are
Engineering
at
Daf
fodil
International
Uni
v
ersity
.
Her
research
interests
are
machine
learning,
deep
learning,
NLP
,
and
computer
vision.
She
can
be
contacted
at
email:
sazia.sharmin@aiub
.edu.
Nusrat
J
ahan
Mim
has
completed
her
B.Sc.
and
M.Sc.
from
the
Department
of
Softw
are
Engineering,
Daf
fodil
International
Uni
v
ers
ity
.
She
is
currently
w
orking
as
a
lecturer
in
the
Department
of
Softw
are
Engineering,
Daf
fodil
International
Uni
v
ersity
.
Her
research
interests
are
machine
learning,
deep
learning,
NLP
,
and
computer
vision.
She
can
be
contacted
at
email:
nusratjahan.swe@diu.edu.bd.
Imran
Mahmud
recei
v
ed
the
master’
s
de
gree
in
softw
are
engineering
from
the
Uni
v
ersity
of
Hertfordshire,
U.K.,
in
2008,
and
the
Ph.D.
de
gree
in
technology
management
from
Uni
v
ersiti
Sains
Malaysia,
in
2017.
He
is
currently
the
head
and
an
professor
with
the
Department
of
Softw
are
Engineering,
Daf
fodil
International
Uni
v
ersity
,
Bangladesh.
He
is
also
a
visiting
professor
with
the
Graduate
School
of
Business,
Uni
v
ersiti
Sains
Malaysia.
Pre
viously
,
he
w
as
a
senior
lecturer
with
the
Graduate
School
of
Business,
Uni
v
ersiti
Sains
Malaysia.
He
w
as
a
visiting
lecturer
with
the
Institute
of
T
echnology
,
Bandung,
Indonesia,
and
the
Hong
K
ong
Management
Association,
Hong
K
ong.
He
achie
v
ed
se
v
eral
a
w
ards,
including
the
Hall
of
F
ame
and
Prestigious
Publication
A
w
ard
from
Uni
v
ersiti
Sains
Malaysia,
the
young
researcher
from
Kasetsart
Uni
v
ersity
,
Thailand,
and
the
young
scientis
t
in
T
echnology
Management
from
the
V
enus
International
F
oundation,
India.
He
can
be
contacted
at
email:
imranmahmud@daf
fodilv
arsity
.edu.bd.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
4171–4180
Evaluation Warning : The document was created with Spire.PDF for Python.