IAES
Inter
national
J
our
nal
of
Articial
Intelligence
(IJ-AI)
V
ol.
14,
No.
5,
October
2025,
pp.
3656
∼
3666
ISSN:
2252-8938,
DOI:
10.11591/ijai.v14.i5.pp3656-3666
❒
3656
In
v
erse-Mel
scale
spectr
ograms
f
or
high-fr
equency
featur
e
extraction
and
audio
anomaly
detection
in
industrial
machines
Kader
Basha
T
ajuddin
Shaikh
1
,
Nar
esh
P
.
J
awarkar
2
,
V
asif
Ahmed
3
,
Nadir
Nizar
Ali
Char
niya
4
1
Department
of
Automation
and
Robotics
Engineering,
V
i
v
ekanand
Education
Society’
s
Institute
of
T
echnology
,
Mumbai,
India
2
Department
of
Electrical
and
Po
wer
Engineering,
Go
v
ernment
Colle
ge
of
Engineering,
Amra
v
ati,
India
3
Department
of
Articial
Intelligence
and
Data
Science,
Babasaheb
Naik
Colle
ge
of
Engineering,
Pusad,
India
4
Department
of
Electronics
and
T
elecommunication
Engineering,
V
i
v
ekanand
Education
Society’
s
Institute
of
T
echnology
,
Mumbai,
India
Article
Inf
o
Article
history:
Recei
v
ed
Mar
19,
2025
Re
vised
Jun
30,
2025
Accepted
Jul
13,
2025
K
eyw
ords:
Audio
anomaly
detection
Domain
generalization
High-frequenc
y
feature
e
xtraction
In
v
erse-Mel
scale
Machine
health
monitoring
ABSTRA
CT
Unlik
e
humans,
the
ener
gies
in
industria
l
machine
sounds
(IMS)
v
ary
across
a
wide
range
of
frequencies.
Mel
scales,
which
are
de
v
eloped
for
the
perception
of
human
audio,
f
ail
to
capture
the
complete
information
present
in
IMS.
T
o
im-
pro
v
e
performance,
we
propos
e
using
an
in
v
erse-Mel
scale,
along
with
the
con-
catenation
and
combination
of
Mel
and
in
v
erse-Mel
scale
based
spectrograms,
as
feature
v
ectors
for
audi
o
anomaly
detection
(AAD)
in
industrial
machines.
Adaptation
in
the
Librosa
Python
package
and
the
DCASE
2022
Challenge
T
ask
2
baseline
system
is
pursued
for
the
construction
of
in
v
erse-Mel
scale
spectro-
grams.
Experiments
are
conducted
using
the
malfunctioning
industrial
machine
in
v
estig
ation
and
inspection
for
domain
generalization
(MIMII
DG)
datasets.
Systems
based
on
the
in
v
erse-Mel
scale
achie
v
e
a
maximum
impro
v
ement
of
up
to
37%
in
the
bearing
machine
and
an
a
v
erage
impro
v
ement
of
up
to
9%
in
the
area
under
the
curv
e
(A
UC)
score
across
all
machines
in
the
M
IMII
DG
datasets.
The
proposed
features
also
enhance
DG,
o
v
ercoming
the
ef
fects
of
en
vironmental
and
operational
domain
shifts
caused
by
v
ariations
in
recording
setup,
load,
background
noise,
and
opera
tional
patterns.
Challenge
of
cial
e
v
al-
uator
assessed
the
proposed
system
ag
ainst
the
e
v
aluation
datasets,
ranking
it
three
positions
higher
than
the
baseline
system.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Kader
Basha
T
ajuddin
Shaikh
Department
of
Automation
and
Robotics
Engineering,
V
i
v
ekanand
Education
Society’
s
Institute
of
T
echnology
Mumbai
400074,
India
Email:
kader
.shaikh@v
es.ac.in
1.
INTR
ODUCTION
Industrial
machine
sounds
(IMS)
con
v
e
y
considerable
information
about
the
status
of
a
machine
[1]–[3].
Through
astute
listening
and
careful
observ
ation,
an
operator
can
quickly
assess
the
healt
h
of
the
machine.
An
e
xperienced
operator
can
easily
identify
f
aults
that
may
arise
in
an
otherwise
health
y
w
orking
machine.
The
operator’
s
e
xpertise
enables
the
anticipation
and
pre
v
ention
of
potential
crises.
Audio
anomaly
detection
(AAD)
systems
for
industrial
machines
mimic
the
beha
vior
of
operators
to
identify
machine
health
conditions
and
operational
anomalies.
AAD
for
f
ault
diagnosis
and
prognosis
in
industrial
machines
is
being
widely
res
earched
and
has
been
one
of
se
v
eral
tasks
in
all
editions
of
the
DCASE
challenges
since
2020
[4]–[7].
J
ournal
homepage:
http://ijai.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3657
Se
v
eral
researchers
focused
on
the
analysis
of
high-frequenc
y
re
gions
in
IMS.
Liu
et
al
.
[8]
e
xplored
f
ault
analysis
in
belt
con
v
e
yor
idlers.
The
ef
fecti
v
e
distinguishing
frequenc
y
bands
for
v
arious
f
ault
condi-
tions
due
to
damaged
cages,
race
w
ay
slots,
and
lar
ge
pits
in
the
inner/outer
races
on
the
rolling
element,
are
found
to
be
concentrated
in
the
medium
to
high-frequenc
y
(6–20
kHz)
ranges.
Guochao
et
al
.
[3]
e
xamined
the
audible
sounds
produced
by
milling
machines
and
found
that
the
sound
signals
s
p
a
nn
e
d
the
full
audible
range.
The
authors
identied
lo
w-frequenc
y
sound
s
ignals
generated
by
tool
holder
vibrat
ions,
mid-range
frequenc
y
sounds
from
metal
deformation
processes,
and
high-frequenc
y
sounds
from
friction
mechanisms.
Liu
et
al
.
[9]
proposed
a
lightweight
f
ault
diagnosis
netw
ork
called
MPNet
for
identifying
bearing
f
aults
in
rotating
machinery
.
Authors
out
lined
the
limitations
of
Mel-frequenc
y
cepstral
coef
cients
(MFCC)
being
sensiti
v
e
only
to
lo
w-frequenc
y
information
and
instead
used
linear
spectrograms
constructed
using
short-time
F
ourier
transform
as
features.
Liu
et
al
.
[10]
observ
ed
high-frequenc
y
components
in
the
audio
s
ignals
of
belt
con
v
e
yors,
specically
in
the
range
of
1
to
5
kHz.
The
impacts
and
vibrations
from
defecti
v
e
rollers
contrib
ute
to
the
generation
of
these
high-fre
qu
e
nc
y
audio
signals.
Zhou
et
al
.
[11]
noted
acoustic
signals
generated
by
b
ulge
conditions
in
tire
endurance
tests
conducted
on
a
drum
tes
ting
machine
to
generate
high
ener
gy
peaks
in
the
high-frequenc
y
re
gions.
Zhao
et
al
.
[12]
noted
that
features
e
xtracted
from
high-frequenc
y
re
gions
of
vibration
signals
are
more
ef
fecti
v
e
in
characterizing
f
aults
in
po
wer
end
b
e
arings.
Ma
et
al
.
[13]
proposed
the
fusion
of
MFCCs,
in
v
erted
Mel-scale
frequenc
y
cepstrum
coef
cients
(IMFCCs),
Gammatone
frequenc
y
cepstral
coef
cients
(GFCCs),
and
linear
prediction
cepstral
coef
cients
(LPCCs)
to
create
a
h
ybrid
cepstral
feature
kno
wn
as
Mel-in
v
erted-Gammatone-linear
cepstral
coef
cients
(MIGLCCs).
This
feature
encapsulated
the
indi
vidual
adv
antages
of
each
constituent
feature.
Their
ndings
indicated
that
the
fusion
of
MFCCs
and
IMFCCs
yielded
the
best
results
among
all
dual
feature
combinations
tested.
All
the
abo
v
e
research
emphasized
the
importance
of
focusing
on
the
ener
gy
present
in
higher
frequenc
y
re
gions
and
highlights
the
benets
achie
v
ed
through
the
use
of
in
v
erse-Mel
scale
frequenc
y
w
arping
technique.
Ho
we
v
er
,
the
application
of
in
v
erse-Mel
scale
based
spectrograms
for
AAD
in
industrial
machines
w
as
not
considered.
Based
on
the
original
in
v
estig
ation,
this
research
pursued
the
construction
of
an
in
v
erse-Mel
scale,
a
combination
of
Mel
and
in
v
erse-Mel
scale
spectrograms,
as
front-end
features
for
e
xtracting
ener
gy
distrib
ution
across
the
complete
range
of
frequencies
in
IMS.
The
spectrograms
are
constructed
by
adapting
the
Librosa
Python
library
.
These
constructed
spectrograms
serv
e
as
input
for
an
autoencoder
-based
AAD
system
designed
to
identify
anomalous
operations
in
industrial
machines.
Experiments
conducted
on
the
malfunctioning
indus-
trial
machine
in
v
estig
ation
and
inspection
for
domain
generali
zation
(MIMII
DG)
dataset
[4]
demonstrate
that
AAD
systems
with
in
v
erse-Mel
scale
spectrograms
perform
better
.
This
w
ork
is
moti
v
ated
by
the
DCASE
Challenge
2022
T
ask
2
[4]–[7],
which
focuses
on
AAD
and
domain
generalization
(DG)
techniques
in
industrial
machines.
A
total
of
31
teams
submitted
81
entries
to
the
challenge.
Most
participants
used
Mel
scale
based
acoustic
features
such
as
Mel
ener
gies,
log-Mel
ener
gies,
MFCC,
Mel
spectrograms,
and
log-Mel
spectrograms
in
their
systems
[7].
Use
of
in
v
erse-Mel
scale
based
acoustic
features
for
AAD
and
DG
on
the
challenge
datasets
is
proposed
in
this
research.
This
rese
arch
is
the
rst
of
its
kind
to
propose
the
use
of
the
in
v
erse-Mel
scale
for
DCASE
Challenge
2022
T
ask
2.
Comparison
with
the
published
challenge
scores
[7]
deri
v
es
a
relati
v
e
posit
ion
of
21st
rank
for
the
results
presented
in
this
research.
This
ranking
is
three
positions
higher
than
the
of
cial
ranking
of
the
baseline
system.
Rest
of
the
paper
is
or
g
anized
as
follo
ws:
section
2
describes
the
materials
and
methods
emplo
yed
in
this
e
xperimentation.
It
includes
the
methods
for
construction
of
spectrograms,
details
the
MIMII
DG
dataset,
and
the
e
xperimental
setup
along
with
the
e
v
aluation
metrics
for
the
DCASE
Challenge
202
2
T
ask
2.
Section
3
presents
and
discusses
the
results,
including
performance
scores
and
impro
v
ements
observ
ed
on
both
the
de
v
elopment
and
e
v
aluation
datasets.
Section
4
summarizes
the
conclusions
dra
wn
from
this
research.
2.
MA
TERIALS
AND
METHODS
2.1.
Sound
database
of
industrial
machines
MIMII
DG
[4]
a
public
database
shared
as
a
de
v
elopment
and
e
v
aluation
dataset
for
T
ask
2
of
t
he
DCASE
Challenge
2022
[7]
is
used
in
this
w
ork.
This
dataset
includes
normal
and
anomalous
operating
sounds
from
v
e
dif
ferent
industrial
machines.
It
is
des
igned
for
the
de
v
elopment
and
e
v
aluation
of
AAD
and
DG
techniques
in
industrial
machines.
The
dataset
is
di
vided
into
source
and
tar
get
domain
data.
The
source
domain
data
contains
only
the
normal
and
anomalous
operating
sounds
of
the
machine
under
test,
whereas
operational
and
en
vironmental
domain
shifts
commonly
encountered
in
industrial
setups
are
synthetically
infused
into
these
In
ver
se-Mel
scale
spectr
o
gr
ams
for
high-fr
equency
featur
e
e
xtr
action
and
...
(Kader
Basha
T
ajuddin
Shaikh)
Evaluation Warning : The document was created with Spire.PDF for Python.
3658
❒
ISSN:
2252-8938
sounds
to
generate
the
tar
get
domain
data.
The
source
domain
data
is
emplo
yed
for
e
v
aluating
AAD,
while
the
tar
get
domain
data
is
used
for
e
v
aluating
DG.
2.2.
Construction
of
in
v
erse-Mel
scale
spectr
ograms
2.2.1.
Equations
of
in
v
erse-Mel
scale
The
tw
o
commonly
used
implementations
for
transformation
between
linear
and
Mel
scale
frequenc
ies
are
hidden
Mark
o
v
toolkit
3
(HTK)
[14]
and
Slane
y
[15].
Slane
y
implementations
apply
a
linear
formula
for
frequencies
up
to
1
kHz
and
a
log
arithmic
or
anti-log
arithmic
formula
for
con
v
ersions
abo
v
e
1
kHz.
HTK
implementations
follo
w
a
log
arithmic
or
anti-log
arithmic
formula
for
the
entire
range
of
frequencies.
HTK
implementations
are
used
in
thi
s
w
ork.
The
relationship
between
linear
frequenc
y
scale
(
f
H
z
)
and
Mel-
frequenc
y
scale
(
f
mel
)
is
noted
in
(1)
and
(2),
f
mel
=
2595
∗
l
og
10
(1
+
f
H
z
700
)
(1)
f
H
z
=
700
∗
(10
(
f
mel
/
2595)
−
1)
(2)
Se
v
eral
researchers
in
[13],
[16]–[22]
dened
the
in
v
erse-Mel
scale
as
the
complement
of
the
Mel
scale.
The
authors
s
uggested
ipping
the
original
Mel
lterbank
around
its
midpoint
to
deri
v
e
the
in
v
erse-Mel
lterbank.
Mathematical
relationships
between
the
linear
frequenc
y
scale
(
f
H
z
)
and
the
in
v
erse-Mel
frequenc
y
scale
(
f
iM
el
)
are
proposed
by
Chakroborty
[23],
[24],
Sharma
[25],
Latha
[16],
Lalitha
[18],
and
Ma
[13].
Latha
[16]
and
Lalitha
[18]
introduced
in
(3),
Ma
[13]
proposed
in
(4),
Chakroborty
[23],
[24]
and
Sharma
[25]
presented
in
(5).
f
iM
el
=
2146
.
1
−
2595
∗
l
og
10
(1
+
4000
−
f
H
z
700
)
(3)
f
iM
el
=
2146
.
1
−
1127
∗
l
og
10
(1
+
4000
−
f
H
z
700
)
(4)
f
iM
el
=
2195
.
286
−
2595
∗
l
og
10
(1
+
4031
.
25
−
f
H
z
700
)
(5)
Equation
(5)
is
emplo
yed
in
this
research.
In
the
w
orks
of
Chakroborty
[23],
[24]
and
Sharma
[25],
the
sampling
frequenc
y
is
8
kHz,
whereas
the
sampling
frequenc
y
in
the
MIMII
DG
[4]
database
is
16
kHz.
Hence,
the
constant
terms
are
changed
from
2195.286
to
2844.06
a
n
d
4031.25
to
8031.25.
The
modied
equations
used
in
this
w
ork
to
con
v
ert
between
the
linear
frequenc
y
scale
(
f
H
z
)
and
the
in
v
erse-Mel
frequenc
y
scale
(
f
iM
el
)
are
presented
in
(6)
and
(7).
f
iM
el
=
2844
.
06
−
2595
∗
l
og
10
(1
+
8031
.
25
−
f
H
z
700
)
(6)
f
H
z
=
8031
.
25
−
700
∗
(10
(2844
.
06
−
f
iM
el
/
2595)
−
1)
(7)
Figure
1
sho
ws
plot
of
center
frequencies
for
all
lters
in
Mel
scale
and
in
v
erse-Mel
scales.
Center
frequencies
represent
the
midpoint
of
frequenc
y
bands
used
i
n
Mel
and
in
v
erse-Mel
transformations.
The
Mel
scale
follo
ws
a
log
arithmic
scale,
whereas
in
v
erse-Mel
scale
functions
on
an
anti-log
arithmic
scale.
Figure
1.
Center
frequencies
in
Mel
and
in
v
erse-Mel
scales
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3656–3666
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3659
2.2.2.
V
ariants
of
spectr
ograms
The
follo
wing
types
of
spectrograms
are
constructed
in
this
w
ork.
–
Mel
scale
spectrogram
constructed
using
the
standard
equations
of
the
Mel
scale.
Functions
for
con-
structing
Mel
spectrograms,
as
dened
in
the
Librosa
Python
package,
are
emplo
yed.
–
In
v
erse-Mel
scale
spectrogram
constructed
using
the
in
v
erse-Mel
scale
equations
described
in
section
2.2.1.
The
adapt
ation
made
to
the
Librosa
Python
package
for
constructing
in
v
erse-Mel
scale
spectro-
grams
is
described
in
section
2.2.3.
–
Concatenated
spectrogram
constructed
by
v
ertically
stacking
Mel
and
in
v
erse-Mel
spectrograms.
The
Mel
spectrogram
captures
lo
wer
frequencies,
ranging
from
0
to
4
kHz,
while
the
in
v
erse-Mel
spectrogram
captures
higher
frequencies
from
4
to
8
kHz.
–
Combinational
spectrograms
constructed
by
aggre
g
ating
Mel
and
in
v
erse-Mel
spectrograms
across
the
entire
frequenc
y
range.
The
v
alue
at
a
specic
frequenc
y
is
determined
by
applying
maximum,
minimum,
or
a
v
erage
pooling
to
the
Mel
and
in
v
erse-Mel
v
alues.
Consequently
,
this
w
ork
de
v
elops
three
types
of
combinational
spectrograms:
maximum,
minimum,
and
a
v
erage
v
alue
spectrograms.
2.2.3.
Adaptations
in
Libr
osa
package
and
DCASE
2022
baseline
system
f
or
construction
of
in
v
erse-Mel
scale
spectr
ograms
Adaptations
ha
v
e
been
made
in
se
v
eral
source
les
of
the
Librosa
package
[26]
for
the
cons
truc-
tion
and
presentation
of
in
v
erse-Mel
spectrograms.
T
w
o
additional
parameters,
“isIn
v
erseMel”
and
“isHTK,
”
are
included
as
ar
guments
in
the
melspectrogram,
mel,
and
mel
frequencies
functions
in
the
’lters.p
y’
and
‘spectral.p
y’
les
of
the
Librosa
package.
The
“i
sIn
v
e
rseMel”
parameter
allo
ws
for
toggling
between
Mel
and
in
v
erse-Mel
scale
formulas,
while
the
“isHTK”
parameter
enables
the
selection
of
either
Slane
y
or
HTK
im-
plementations.
The
concatenation
and
combination
of
Mel
and
in
v
erse-Mel
spectrograms
are
performed
in
the
‘common.p
y’
le
of
the
DCASE
2022
baseline
system.
The
adapted
source
les
are
a
v
ailable
for
do
wnload
under
the
GNU
General
Public
License
at
https://github
.com/KaderShaikhVESIT/in
v
erse-Mel.
2.3.
Experimental
set-up
and
e
v
aluation
metrics
F
ocus
of
this
w
ork
is
to
introduce
the
in
v
erse-Mel
scale
and
discuss
its
implicati
on
s
.
Hence,
this
w
ork
utilized
the
baseline
system
of
DCASE
2022
Challenge
T
ask
2
[4],
[6],
[7]
as
a
detector
.
The
baseline
detector
is
a
deep
autoencoder
.
Each
10
seconds
of
audio
is
con
v
erted
into
a
spectrogram
that
acts
as
an
input
feature
v
ector
for
the
autoencoder
.
The
de
v
elopment
and
e
v
aluation
datasets
of
DCASE
Challenge
2022
T
ask
2
are
used
for
training
and
testing
the
detector
.
Confusion
matrix,
precision,
recall,
F1
score,
and
area
under
the
curv
e
(A
UC)
are
calculated
for
both
source
and
tar
get
domain
data,
whereas
partial
area
under
the
curv
e
(pA
UC)
is
calculated
for
combined
source
and
tar
get
data.
Equations
for
calculation
of
A
UC
and
pA
UC
scores
are
dened
in
[4],
[6].
System
e
v
aluation
and
ranking
is
done
using
the
of
cial
e
v
aluator
shared
by
the
or
g
anizers
[27].
3.
RESUL
TS
AND
DISCUSSION
3.1.
Infer
ences
on
all
spectr
ograms
All
spectrograms
of
a
typical
machine
sound
recording
from
the
Slide
rail
machine
(section
00
source
train
normal
0010
v
el
1100.w
a
v)
in
the
MIMII
DG
dataset
are
sho
wn
in
Figure
2.
The
spectrograms
utilize
a
blue-white-red
(BWR)
colormap.
Where
bright
red
indicates
higher
amplitude
or
acti
v-
ity
and
blue
indicates
lo
wer
amplitude.
Figure
2(a)
pres
ents
the
spectrogram
using
a
linear
frequenc
y
scale
based
on
short-time
F
ourier
transform
(STFT),
which
e
xhibits
bright
red
spots
in
both
lo
w
and
high-frequenc
y
re
gions,
suggesting
that
sound
ener
gy
is
distrib
uted
across
the
entire
frequenc
y
range.
In
Figure
2(b),
the
Mel
spectrogram
empha-
sizes
lo
wer
frequenc
y
re
gions
whil
e
suppressing
the
higher
frequenc
y
re
gions.
Frequencies
abo
v
e
2
kHz
are
primarily
depicted
in
white-blue
color
,
indicating
a
repression
of
high-frequenc
y
components.
This
limitation
suggests
that
the
Mel
scale
spectrogram
f
ai
ls
to
capture
and
present
the
complete
information
i
nherent
in
IMS.
In
contrast,
the
in
v
erse-Mel
scale
s
pectrogram
sho
wn
in
Figure
2(c)
enhances
the
high-frequenc
y
re
gions,
ef
fecti
v
ely
re
v
ealing
the
ener
gy
content
that
is
otherwise
suppressed
in
the
Mel
scale
spectrogram.
Ener
gy
components
abo
v
e
6
kHz,
which
are
often
obscure
in
Mel
spectrograms,
are
vi
vidly
displayed
here.
The
concatenated
spectrogram
sho
wn
in
Figure
2(d)
mer
ges
Mel
and
in
v
erse-Mel
spectrograms
at
the
midpoint
In
ver
se-Mel
scale
spectr
o
gr
ams
for
high-fr
equency
featur
e
e
xtr
action
and
...
(Kader
Basha
T
ajuddin
Shaikh)
Evaluation Warning : The document was created with Spire.PDF for Python.
3660
❒
ISSN:
2252-8938
frequenc
y
of
4
kHz,
capturing
prominent
characteristics
from
both.
This
concatenated
spectrogram
ef
fecti
v
ely
captures
and
represents
re
gions
of
high
amplitude
and
acti
vity
present
in
both
types
of
spectrograms.
Figures
2(e)
to
2(g)
sho
w
pix
el-wise
combinations
of
Mel
and
in
v
erse-Mel
spectrograms
using
a
v
erage,
maximum,
and
minimum
aggre
g
at
ion
methods,
respecti
v
ely
.
These
spectrograms
successfully
capture
the
shape
and
vi
vid
colors
characteristic
of
both
Mel
scale
and
in
v
ers
e-Mel
scale
spectrograms.
The
intensity
of
the
colors
v
aries
depending
on
the
aggre
g
ation
formula
used
in
their
construction.
Thus,
the
use
of
the
in
v
erse-Mel
scale
enables
complete
representation
of
the
information
present
in
IMS.
The
concatenation
and
combination
spectrograms
further
support
this
representation.
W
ith
these
spectrograms,
this
research
is
able
to
delv
e
into
une
xplored
re
gions
of
IMS.
Figure
2.
Spectrogram
representations
of
a
typical
slide
rail
machine
sound
from
the
MIMII
DG
dataset:
(a)
linear
-frequenc
y
spectrogram
using
STFT
,
(b)
Mel
scale
spectrogram,
(c)
in
v
erse-Mel
scale
spectrogram,
(d)
concatenated
Mel
and
in
v
erse-Mel
spectrograms,
(e)
combined
a
v
erage
spectrogram,
(f)
combined
maximum
spectrogram,
and
(g)
combined
minimum
spectrogram
3.2.
Infer
ences
on
the
experiment
r
esults
Ev
aluations
are
conducted
for
all
machine
types,
sections,
and
domains
in
the
MIMII
DG
de
v
elop-
ment
and
e
v
aluation
datasets
[4].
T
ables
1
and
2
present
the
scores
and
percentage
impro
v
ements
observ
ed
in
the
de
v
elopment
datasets.
T
ables
3
and
4
present
the
scores
and
percentage
impro
v
ements
observ
ed
in
the
e
v
aluation
datasets.
The
source
domain
A
UC,
tar
get
domain
A
UC,
and
pA
UC
scores
for
all
machine
types,
sections,
and
domains
in
the
de
v
elopment
datasets
are
listed
in
T
ables
1(a)
and
1(b).
T
ables
2(a)
and
2(b)
lists
the
percentage
impro
v
ements
for
these
scores
relati
v
e
to
the
results
from
Mel
scale
spectrograms.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3656–3666
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3661
T
able
1.
A
UC
and
pA
UC
scores
of
all
machines
on
de
v
elopment
dataset
(best
v
alues
are
highlighted)
with
(a)
Mel
scale,
in
v
erse
Mel
scale,
and
combination
maximum;
and
(b)
concatenated,
combination
a
v
erage,
and
combination
minimum
(a)
Mel
scale
(A
UC)
In
v
erse
Mel
scale
(A
UC)
Combination
maximum
(A
UC)
Section
source
tar
get
partial
source
tar
get
partial
source
tar
get
partial
Bearing
0
0.5504
0.6048
0.50737
0.5613
0.6915
0.48922
0.5322
0.6103
0.50316
1
0.7176
0.5547
0.54869
0.5293
0.7608
0.4979
0.751
0.6068
0.60395
2
0.4563
0.5581
0.52316
0.415
0.5275
0.52764
0.4972
0.5695
0.55132
A
v
erage
0.57477
0.57254
0.52641
0.50187
0.65994
0.50492
0.59347
0.59554
0.55281
F
an
0
0.778
0.343
0.59158
0.7338
0.3745
0.59053
0.7969
0.3397
0.59237
1
0.7096
0.4577
0.51843
0.6691
0.4386
0.505
0.7721
0.4377
0.53395
2
0.7744
0.6346
0.62764
0.7712
0.5825
0.56606
0.8985
0.6093
0.64369
A
v
erage
0.754
0.47844
0.57922
0.7247
0.4652
0.55386
0.8225
0.46224
0.59
Gearbox
0
0.6558
0.6555
0.61369
0.707
0.7604
0.63869
0.6088
0.6981
0.61079
1
0.6605
0.5803
0.535
0.6866
0.6241
0.52737
0.6599
0.5707
0.51369
2
0.7744
0.6623
0.61711
0.8108
0.6928
0.66053
0.7484
0.6589
0.6079
A
v
erage
0.6969
0.6327
0.5886
0.7348
0.69244
0.60886
0.67237
0.64257
0.57746
Slider
0
0.8068
0.5681
0.61843
0.751
0.6088
0.68264
0.8469
0.5944
0.61237
1
0.6841
0.4969
0.53895
0.7755
0.5775
0.54632
0.678
0.4657
0.54579
2
0.8709
0.3866
0.53658
0.8809
0.4324
0.56158
0.8838
0.3431
0.525
A
v
erage
0.78727
0.48387
0.56465
0.80247
0.53957
0.59685
0.8029
0.46774
0.56106
V
alv
e
0
0.5408
0.5182
0.52474
0.5991
0.5506
0.51158
0.5195
0.504
0.51974
1
0.5257
0.5313
0.50106
0.5808
0.5951
0.49527
0.5388
0.5083
0.49606
2
0.5187
0.4422
0.49395
0.5891
0.5008
0.49711
0.5635
0.4461
0.4879
A
v
erage
0.5284
0.49724
0.50658
0.58967
0.54884
0.50132
0.5406
0.48614
0.50123
A
v
erage
o
v
erall
0.66827
0.53296
0.55309
0.6707
0.5812
0.55316
0.68637
0.53084
0.55651
(b)
Concatenated
Combination
a
v
erage
Combination
minimum
Section
source
tar
get
partial
source
tar
get
partial
source
tar
get
partial
Bearing
0
0.4945
0.6248
0.49369
0.4959
0.6138
0.5
0.5617
0.68
0.49158
1
0.6664
0.6121
0.57369
0.7009
0.6333
0.57685
0.5748
0.6511
0.55632
2
0.5327
0.6153
0.59474
0.4817
0.5834
0.49158
0.4791
0.5662
0.48922
A
v
erage
0.56454
0.6294
0.55404
0.5595
0.61017
0.52281
0.53854
0.63244
0.51237
F
an
0
0.7862
0.3522
0.58948
0.7808
0.3805
0.59343
0.6314
0.4412
0.59343
1
0.6763
0.4758
0.51527
0.6775
0.4517
0.5129
0.6876
0.402
0.52183
2
0.7506
0.5375
0.60974
0.7364
0.5882
0.6
0.5619
0.6061
0.59343
A
v
erage
0.7377
0.45517
0.5715
0.73157
0.47347
0.56878
0.62697
0.4831
0.56957
Gearbox
0
0.6615
0.6905
0.58895
0.4991
0.5832
0.49843
0.6484
0.7284
0.57237
1
0.66
0.5972
0.54106
0.6162
0.5455
0.52316
0.6709
0.5925
0.52474
2
0.7841
0.6822
0.61685
0.7656
0.678
0.63211
0.8228
0.6682
0.625
A
v
erage
0.70187
0.65664
0.58229
0.62697
0.60224
0.55123
0.71404
0.66304
0.57404
Slider
0
0.8127
0.5875
0.63685
0.7914
0.5787
0.64106
0.7855
0.6671
0.65027
1
0.7487
0.5592
0.55869
0.7096
0.5207
0.55316
0.7244
0.6876
0.60922
2
0.8766
0.4179
0.56711
0.8632
0.4327
0.57737
0.8689
0.387
0.54922
A
v
erage
0.81267
0.52154
0.58755
0.78807
0.5107
0.59053
0.79294
0.50857
0.6029
V
alv
e
0
0.5599
0.5498
0.52632
0.564
0.5305
0.52185
0.5346
0.5224
0.5129
1
0.5188
0.5393
0.50053
0.5277
0.5314
0.50343
0.5155
0.5315
0.50237
2
0.5286
0.4699
0.49422
0.5494
0.4794
0.49685
0.5659
0.4973
0.49843
A
v
erage
0.53577
0.51967
0.50702
0.54704
0.51377
0.50737
0.53867
0.51707
0.50457
A
v
erage
o
v
erall
0.67051
0.55648
0.56048
0.65063
0.54207
0.54815
0.64223
0.57524
0.55269
In
ver
se-Mel
scale
spectr
o
gr
ams
for
high-fr
equency
featur
e
e
xtr
action
and
...
(Kader
Basha
T
ajuddin
Shaikh)
Evaluation Warning : The document was created with Spire.PDF for Python.
3662
❒
ISSN:
2252-8938
T
able
2.
Percentage
impro
v
ements
in
scores
on
de
v
elopment
dataset
(best
mean
v
alues
are
highlighted)
(a)
in
v
erse
Mel
scale,
combination
maximum,
and
concatenated;
and
(b)
combination
a
v
erage
and
combination
minimum
(a)
In
v
erse-Mel
scale
(A
UC)
Combination
maximum
(A
UC)
Concatenated
(A
UC)
Section
source
tar
get
partial
source
tar
get
partial
source
tar
get
partial
Bearing
0
1.99
14.34
-3.58
-3.31
0.91
-0.83
-10.16
3.31
-2.7
1
-26.25
37.16
-9.26
4.66
9.4
10.08
-7.14
10.35
4.56
2
-9.06
-5.49
0.86
8.97
2.05
5.39
16.75
16.7
13.69
A
v
erage
-12.69
15.27
-4.09
3.26
4.02
5.02
-1.78
9.94
5.25
F
an
0
-5.69
9.19
-0.18
2.43
-0.97
0.14
1.06
2.69
-0.36
1
-5.71
-4.18
-2.6
8.81
-4.37
3
-4.7
3.96
-0.61
2
-0.42
-8.21
-9.82
16.03
-3.99
2.56
-3.08
-15.31
-2.86
A
v
erage
-3.89
-2.77
-4.38
9.09
-3.39
1.87
-2.17
-4.87
-1.34
Gearbox
0
7.81
16.01
4.08
-7.17
6.5
-0.48
0.87
5.34
-4.04
1
3.96
7.55
-1.43
-0.1
-1.66
-3.99
-0.08
2.92
1.14
2
4.71
4.61
7.04
-3.36
-0.52
-1.5
1.26
3.01
-0.05
A
v
erage
5.44
9.45
3.45
-3.52
1.56
-1.9
0.72
3.79
-1.08
Slider
0
-6.92
7.17
10.39
4.98
4.63
-0.98
0.74
3.42
2.98
1
13.37
16.23
1.37
-0.9
-6.28
1.27
9.45
12.54
3.67
2
1.15
11.85
4.66
1.49
-11.26
2.16
0.66
8.1
5.69
A
v
erage
1.94
11.52
5.71
1.99
-3.34
-0.64
3.23
7.79
4.06
V
alv
e
0
10.79
6.26
-2.51
-3.94
-2.29
-0.96
3.54
6.1
0.31
1
10.49
12.01
-1.16
2.5
-4.33
-1
-1.32
1.51
-0.11
2
13.58
13.26
0.64
8.64
-0.89
-1.23
1.91
6.27
0.06
A
v
erage
11.6
10.38
-1.04
2.31
-2.24
-1.06
1.4
4.52
0.09
A
v
erage
o
v
er
all
machines
0.37
9.06
0.02
2.71
-0.4
0.62
0.34
4.42
1.34
(b)
Combination
a
v
erage
(A
UC)
Combination
minimum
(A
UC)
Section
source
tar
get
partial
source
tar
get
partial
Bearing
0
-9.91
1.49
-1.46
2.06
12.44
-3.12
1
-2.33
14.17
5.14
-19.9
17.38
1.4
2
5.57
4.54
-6.04
5
1.46
-6.49
A
v
erage
-2.66
6.58
-0.69
-6.31
10.47
-2.67
F
an
0
0.36
10.94
0.32
-18.85
28.63
0.32
1
-4.53
-1.32
-1.07
-3.11
-12.17
0.66
2
-4.91
-7.32
-4.41
-27.45
-4.5
-5.46
A
v
erage
-2.98
-1.04
-1.81
-16.85
0.98
-1.67
Gearbox
0
-23.9
-11.03
-18.79
-1.13
11.13
-6.74
1
-6.71
-6
-2.22
1.58
2.11
-1.92
2
-1.14
2.38
2.44
6.25
0.9
1.28
A
v
erage
-10.04
-4.82
-6.35
2.46
4.8
-2.48
Slider
0
-1.91
1.87
3.66
-2.65
17.43
5.15
1
3.73
4.79
2.64
5.9
38.38
13.04
2
-0.89
11.93
7.61
-0.23
0.11
2.36
A
v
erage
0.11
5.55
4.59
0.73
19.99
6.78
V
alv
e
0
4.29
2.38
-0.56
-1.15
0.82
-2.26
1
0.39
0.02
0.48
-1.95
0.04
0.27
2
5.92
8.42
0.59
9.1
12.47
0.91
A
v
erage
3.53
3.33
0.16
1.95
3.99
-0.4
A
v
erage
o
v
er
all
machines
-2.64
1.71
-0.9
-3.9
7.94
-0.08
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3656–3666
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3663
T
able
3.
A
UC
and
pA
UC
scores
of
all
machines
on
e
v
aluation
dataset
(best
v
alues
are
highlighted)
Harmonic
mean
o
v
er
all
machine
types,
sections,
and
domains
Of
cial
score
Mel
scale
A
UC
0.476997654
0.485524897
pA
UC
0.503527942
In
v
erse-Mel
scale
A
UC
0.476953026
0.487278196
pA
UC
0.509330358
Combination
maximum
A
UC
0.490115307
0.495681532
pA
UC
0.507202091
Concatenated
A
UC
0.475036924
0.485275108
pA
UC
0.507135059
Combination
a
v
erage
A
UC
0.46975013
0.481373758
pA
UC
0.506436574
Combination
minimum
A
UC
0.475282691
0.486243539
pA
UC
0.509755228
T
able
4.
Percentage
impro
v
ements
in
scores
on
e
v
aluation
dataset
(best
mean
v
alues
are
highlighted)
Harmonic
mean
o
v
er
all
machine
types,
sections,
and
domains
Of
cial
score
In
v
erse-Mel
scale
A
UC
-0.01
0.37
pA
UC
1.16
Combination
Maximum
A
UC
2.76
2.1
pA
UC
0.73
Concatenated
A
UC
-0.42
-0.06
pA
UC
0.72
Combination
A
v
erage
A
UC
-1.52
-0.86
pA
UC
0.58
Combination
Minimum
A
UC
-0.36
0.15
pA
UC
1.24
Use
of
plain
in
v
erse-Mel
scale
spectrograms
has
enhanced
the
tar
get
domain
A
UC
in
all
machines
,
e
xcept
for
the
f
an
machine.
The
most
signicant
impro
v
ement,
approximately
37%,
is
noted
in
the
tar
get
domain
A
UC
for
the
type
2
domai
n
shift
condition
of
the
bearing
machine.
On
a
v
erage,
there
is
about
a
9%
increase
in
the
tar
get
domain
A
UC
across
all
machines.
Experiments
conducted
under
v
arious
domain
shift
conditions
sho
w
that
the
tar
get
domain
A
UC
impro
v
es
within
a
range
of
5-36%
for
all
machines,
e
xcluding
the
f
an
machine.
Commonly
occurring
domain
shifts—such
as
changes
in
microphone
location
(bearing
machine
section
2),
v
arying
loads
(gearbox
machine
section
2),
uctuations
in
operational
v
oltages
(gearbox
machine
section
1),
dif
ferences
in
operational
speeds
(bearing
machine
section
1),
v
ariations
in
operational
v
elocity
(slide
rail
machine
section
1),
changes
in
operational
acceleration
(slide
rail
machine
section
2),
dif
fering
operational
patterns
(v
alv
e
machine
section
1),
and
the
mixing
of
v
ar
ious
f
actory
noises
at
dif
ferent
inde
x
es
(slide
rail
machine
section
3)—are
ef
fecti
v
ely
identied
by
in
v
erse-Mel
scales.
These
impro
v
ements
highlight
the
ef
fecti
v
eness
of
the
in
v
erse-Mel
scale
in
accurately
detecting
operational
and
en
vironment
al
domain
shifts
commonly
encountered
in
IMS.
Use
of
plain
in
v
erse-Mel
scale
spectrograms
has
also
impro
v
ed
the
source
domain
A
UC
and
pA
UC
in
gearbox,
slide
rail,
and
v
alv
e
machines.
An
a
v
erage
impro
v
ement
of
approximately
6%
and
3%
in
source
domain
A
UC
and
pA
UC,
respecti
v
ely
,
is
observ
ed
across
the
abo
v
e
three
machines.
These
impro
v
ements
pro
v
e
the
supremac
y
of
the
in
v
erse-Mel
scale
in
the
detection
of
anomalous
beha
vior
from
IMS.
Combinational
maximum
spectrograms
are
observ
ed
to
enhance
tar
get
domain
A
UC
and
pA
UC
scores
in
both
the
bearing
and
f
an
machines.
Use
of
plai
n
in
v
erse-Mel
scale
spectrograms
resulted
in
poor
perfor
-
mance
for
these
machines.
This
is
due
to
the
f
a
ct
that
bearing
and
f
an
machines
produce
a
lo
w
le
v
el
of
sound
ener
gy
,
with
the
emitted
ener
gy
primarily
concentrated
in
the
lo
w-frequenc
y
re
gions.
Ne
v
ertheless,
the
use
of
combinational
m
aximum
spectrograms
has
demonstrated
impro
v
ed
detection
accurac
y
.
On
a
v
erage,
there
is
an
impro
v
ement
of
approximately
6%
in
source
domain
A
UC
and
4%
in
pA
UC
across
both
machines.
Additionally
,
concatenated
spectrograms
are
observ
ed
to
enhance
pA
UC
scores
in
bearing,
slide
rail,
and
v
alv
e
machines,
yielding
an
a
v
erage
impro
v
ement
of
around
3%
in
pA
UC
across
these
three
machines.
These
results
suggest
that
plain
in
v
erse-Mel
scales
may
not
al
w
ays
yield
optimal
results;
ho
we
v
er
,
the
use
of
combinational
or
concatenated
spectrograms
could
impro
v
e
the
performance
of
the
detection
system.
Ev
aluations
are
also
conducted
on
the
e
v
aluation
dataset.
The
of
cial
D
A
CSE
2022
Challenge
e
v
alu-
ator
[27]
is
e
x
ecuted
with
the
anomaly
scores
and
decision
results
generated
by
the
trained
models.
Harmonic
In
ver
se-Mel
scale
spectr
o
gr
ams
for
high-fr
equency
featur
e
e
xtr
action
and
...
(Kader
Basha
T
ajuddin
Shaikh)
Evaluation Warning : The document was created with Spire.PDF for Python.
3664
❒
ISSN:
2252-8938
means
of
A
UC
and
pA
UC
scores
calculated
across
all
machine
types,
sections,
and
domains
of
e
v
aluation
datasets
are
presented
in
T
able
3.
The
of
cial
scores,
as
e
v
aluated
by
the
of
cial
e
v
aluator
,
are
listed
in
T
able
3.
The
of
cial
scores
are
utilized
to
rank
the
participating
systems
and
teams.
T
able
4
lists
the
per
-
centage
impro
v
ements
in
all
the
aforementioned
scores
in
comparison
to
the
results
obtained
from
Mel
scale
spectrograms.
Among
all
the
proposed
methods,
the
combinational
maximum
spectrograms
ha
v
e
been
sho
wn
to
generate
the
best
v
alues
of
harmonic
mean
for
both
A
UC
and
pA
UC
scores
across
all
machine
types,
sections,
and
domains.
Impro
v
ements
of
approxim
ately
3%
in
A
UC
scores
and
1%
in
pA
UC
scores
are
observ
ed.
The
of
cial
score
for
the
combinational
maximum
spectrograms
indicates
an
impro
v
ement
of
about
2%
compared
to
the
of
cial
score
of
Mel
scale
spectrograms.
This
enhanced
score
results
in
a
rank
of
21
st
in
the
of
cial
ranking
released
by
the
DCASE
Challenge
2022
T
ask
2
[7].
This
ranking
is
three
positions
higher
than
that
of
the
baseline
system.
4.
CONCLUSION
In
this
w
ork,
in
v
erse-Mel
scales
are
used
to
capture
the
ener
gy
present
in
the
high
frequencies
of
IMS.
This
approach
captures
the
information
ne
glected
by
standard
Mel
scales.
An
autoencoder
emplo
ying
in
v
erse-Mel
scales,
as
well
as
the
concatenation
and
combination
of
Mel
and
in
v
erse-Mel
scale
spectrograms
as
front-end
features,
is
implemented
for
AAD
in
industrial
machines.
Experiments
are
conducted
on
all
machines
in
the
MIMII
DG
datasets.
The
use
of
in
v
erse-Mel
scales,
along
with
combinational
maximum
and
concatenated
spectrograms,
has
been
sho
wn
to
enhance
source
domain
A
UC,
tar
get
domain
A
UC,
and
pA
UC
scores
by
8%,
9%,
and
2%,
respecti
v
ely
,
across
all
machines.
The
impro
v
em
ent
in
tar
get
domain
A
UC
is
particularly
signicant
as
it
demonstrates
the
ef
fecti
v
eness
of
the
proposed
method
in
identifying
challenging
operational
and
en
vironmental
domain
shifts.
The
higher
ranking
a
w
arded
by
the
of
cial
challenge
e
v
aluator
in
the
e
v
aluation
datasets
reects
the
system’
s
capability
to
ef
fecti
v
ely
capture
domain
shifts.
The
results
indicate
that
IMS
contain
a
considerable
amount
of
ener
gy
in
higher
frequenc
y
ranges
that
standard
Mel
scales
f
ail
to
detect.
In
v
erse-Mel
scales
are
more
ef
cient
in
capturing
these
high-frequenc
y
components
and
are
hence
advised
to
be
used
in
AAD
for
industrial
machines.
FUNDING
INFORMA
TION
Authors
state
no
funding
in
v
olv
ed.
A
UTHOR
CONTRIB
UTIONS
ST
A
TEMENT
This
journal
uses
the
Contrib
utor
Roles
T
axonomy
(CRediT)
to
recognize
indi
vidual
author
contrib
u-
tions,
reduce
authorship
disputes,
and
f
acilitate
collaboration.
Name
of
A
uthor
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Kader
Basha
T
ajuddin
Shaikh
✓
✓
✓
✓
✓
✓
✓
✓
✓
Naresh
P
.
Ja
w
arkar
✓
✓
✓
✓
✓
✓
✓
V
asif
Ahmed
✓
✓
✓
Nadir
Nizar
Ali
Charniya
✓
✓
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
Administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
Acquisition
F
o
:
F
o
rmal
Analysis
E
:
Writing
-
Re
vie
w
&
E
diting
CONFLICT
OF
INTEREST
ST
A
TEMENT
Authors
state
no
conict
of
interest.
Int
J
Artif
Intell,
V
ol.
14,
No.
5,
October
2025:
3656–3666
Evaluation Warning : The document was created with Spire.PDF for Python.
Int
J
Artif
Intell
ISSN:
2252-8938
❒
3665
D
A
T
A
A
V
AILABILITY
The
supporting
data
of
this
study
are
openly
a
v
ailable
at
https://zenodo.or
g/record/6529888
[4].
REFERENCES
[1]
K.
B.
T
.
Shaikh,
N.
P
.
Ja
w
arkar
,
and
V
.
Ahmed,
“Machine
diagnosis
using
acoustic
analysis:
a
re
vie
w
,
”
in
2021
IEEE
Confer
ence
on
Norbert
W
iener
in
the
21st
Century
(21CW)
,
Chennai,
India:
IEEE,
Jul.
2021,
pp.
1–6,
doi:
10.1109/21CW48944.2021.9532537.
[2]
T
.
Salm,
K.
T
atar
,
and
J.
Chilo,
“Real-time
acoustic
measurement
system
for
cut
ting-tool
analysis
during
stainless
steel
machining,
”
Mac
hines
,
v
ol.
12,
no.
12,
Dec.
2024,
doi:
10.3390/machines12120892.
[3]
G.
Li,
X.
Shang,
L.
Sun,
B.
Fu,
L.
Y
ang,
and
H.
Zhou,
“
Application
of
audible
sound
signals
in
tool
wear
monitoring:
a
re
vie
w
,
”
Advanced
Manufacturing
Science
and
T
ec
hnolo
gy
,
v
ol.
5,
no.
1,
2025,
doi:
10.51393/j.jamst.2025003.
[4]
K.
Dohi
et
al
.,
“MIMII
DG:
Sound
dataset
for
malfunctioning
industrial
machine
in
v
estig
ation
and
inspection
for
domain
general-
ization
task,
”
in
Pr
oceedings
of
the
7th
Detecti
on
and
Classication
of
Acoustic
Scenes
and
Events
2022
W
orkshop
(DCASE2022)
,
Nanc
y
,
France,
No
v
.
2022,
pp.
1–5.
[5]
K.
Dohi
et
al
.,
“Description
and
discussion
on
DCASE
2022
challenge
T
ask
2:
unsupervised
anomalous
sound
detection
for
machine
condition
monitoring
applying
domain
generalization
techniques,
”
Detection
and
Classication
of
Acoustic
Scenes
and
Events
2022
,
No
v
.
2022,
pp.
1-5.
[6]
N.
Harada,
D.
Niizumi,
D.
T
ak
euchi,
Y
.
Ohishi,
M.
Y
asuda,
and
S.
Saito,
“T
o
yADMOS2:
Another
dataset
of
miniature-machine
operating
sounds
for
anomalous
sound
detection
under
domain
shift
conditions,
”
arXiv-Electrical
Engineering
and
Systems
Science
,
pp.
1-5,
Jun.
2021,
doi:
10.48550/arXi
v
.2106.02369.
[7]
K.
Dohi
et
al.
,
“Unsupervised
anomalous
sound
detection
for
machine
condition
monitoring
applying
domain
generalization
tech-
niques,
”
DCASE
Community
,
2022.
Accessed:
Aug.
20,
2025.
[Online].
A
v
ailable:
https://dcase.community/challenge2022/task-
unsupervised-anomalous-sound-detection-for
-machine-condition-monitoring.
[8]
Y
.
Liu,
C.
Miao,
X.
Li,
J.
Ji,
and
D.
Meng,
“Research
on
the
f
ault
analysis
method
of
belt
con
v
e
yor
idlers
based
on
sound
and
thermal
infrared
image
features,
”
Measur
ement
,
v
ol.
186,
Dec.
2021,
doi:
10.1016/j.measurement.2021.110177.
[9]
Y
.
Liu,
Y
.
Chen,
X.
Li,
X.
Zhou,
and
D.
W
u,
“MPNet:
A
lightweight
f
ault
diagnosis
netw
ork
for
rotating
machinery
,
”
Measur
ement
,
v
ol.
239,
Jan.
2025,
doi:
10.1016/j.measurement.2024.115498.
[10]
J.
Liu,
S.
Fu,
F
.
Liu,
and
X.
Cheng,
“Intelligent
f
ault
diagnosis
of
bel
t
con
v
e
yor
rollers
using
a
polar
KNN
algorithm
with
audio
features,
”
Engineering
F
ailur
e
Analysis
,
v
ol.
168,
Feb
.
2025,
doi:
10.1016/j.engf
ailanal.2024.109101.
[11]
H.
Zhou,
Z.
Gao,
H.
Li,
and
Y
.
Zhang,
“State
identifying
method
for
rolling
tire
in
lab
test
using
acoustic
signal,
”
Applied
Acoustics
,
v
ol.
231,
Mar
.
2025,
doi:
10.1016/j.apacoust.2024.110487.
[12]
Y
.
Zhao,
B.
Qin,
Y
.
Zhou,
and
X.
Xu,
“Bearing
f
ault
diagnosis
based
on
in
v
erted
Mel-scale
frequenc
y
cepstral
coef
cients
and
deformable
con
v
olution
netw
orks,
”
Measur
ement
Science
and
T
ec
hnolo
gy
,
v
ol.
34,
no.
5,
Feb
.
2023,
doi:
10.1088/1361-6501/acb0ea.
[13]
L.
Ma,
A.
Jiang,
and
W
.
Jiang,
“The
intell
igent
diagnosis
of
a
h
ydraulic
plunger
pump
based
on
the
MIGLCC-DLSTM
method
using
sound
signals,
”
Mac
hines
,
v
ol.
12,
no.
12,
No
v
.
2024,
doi:
10.3390/machines12120869.
[14]
S.
Y
oung
et
al
.,
The
HTK
book
,
Cambridge,
United
Kingdom:
Cambridge
Uni
v
ersity
Engineering
Department,
2002.
[15]
M.
Slane
y
,
“
Auditory
toolbox:
a
MA
TLAB
toolbox
for
auditory
modeling
w
ork,
”
Interval
Resear
c
h
Corpor
ation
,
pp.
1-41,
1998.
[16]
Latha,
“Rob
ust
speak
er
identication
incorporating
high
frequenc
y
feature
s,
”
Pr
ocedia
Computer
Science
,
v
ol.
89,
pp.
804–811,
2016,
doi:
10.1016/j.procs.2016.06.064.
[17]
H.
K.
Kathania,
S.
Shahna
w
azuddin,
W
.
Ahmad,
and
N.
Adig
a,
“Role
of
li
near
,
mel
and
in
v
erse-mel
lterbanks
in
automatic
recognition
of
speech
from
high-pitched
speak
ers,
”
Cir
cuits
Systems
Signal
Pr
ocess
,
v
ol.
38,
no.
10,
pp.
4667–4682,
Oct.
2019,
doi:
10.1007/s00034-019-01072-7.
[18]
S.
Lalitha,
S.
T
ripathi,
and
D.
Gupta,
“Enhanced
speech
emotion
detection
using
deep
neural
netw
orks,
”
International
J
ournal
of
Speec
h
T
ec
hnolo
gy
,
v
ol.
22,
pp.
497–510,
Sept.
2019,
doi:
10.1007/s10772-018-09572-8.
[19]
Z.
W
ang,
J.
Y
an,
Y
.
W
ang,
and
X.
W
ang,
“Speech
emotion
feature
e
xtraction
method
based
on
impro
v
ed
MFCC
and
IMFCC
fusion
features,
”
in
2023
IEEE
2nd
International
Confer
ence
on
Electrical
Engineering
,
Big
Data
and
Algorithms
(EEBD
A)
,
Feb
.
2023,
pp.
1917–1924.
doi:
10.1109/EEBD
A56825.2023.10090810.
[20]
S.
Aziz
and
S.
Shahna
w
azuddin,
“Ef
fecti
v
e
pres
erv
ation
of
higher
-frequenc
y
contents
in
the
conte
xt
of
short
utterance
based
children’
s
speak
er
v
erication
system,
”
Applied
Acoustics
,
v
ol.
209,
June
2023,
doi:
10.1016/j.apacoust.2023.109420.
[21]
S.
Aziz
and
S.
Shahna
w
azuddin,
“Experimental
studies
for
i
mpro
ving
the
performance
of
children’
s
speak
er
v
erication
system
using
short
utterances,
”
Applied
Acoustics
,
v
ol.
216,
Jan.
2024,
doi:
10.1016/j.apacoust.2023.109783.
[22]
S.
Aziz
and
S.
Shahna
w
azuddin,
“Role
of
data
augmentation
and
ef
fecti
v
e
conserv
at
ion
of
high-frequenc
y
contents
in
the
conte
xt
children’
s
speak
er
v
erication
system,
”
Cir
cuits
Systems
Signal
Pr
ocess
,
v
ol.
43,
pp.
3139–3159,
May
.
2024,
doi:
10.1007/s00034-024-02598-1.
[23]
S.
Chakrobort
y
,
A.
Ro
y
,
S.
Majumdar
,
and
G.
Saha,
“Capturing
complementary
information
via
re
v
ersed
lter
bank
and
parallel
implementation
with
MFCC
for
impro
v
ed
te
xt-independent
speak
er
identication,
”
in
2007
International
Confer
ence
on
Computing:
Theory
and
Applications
(ICCT
A
’07)
,
Mar
.
2007,
pp.
463–467,
doi:
10.1109/ICCT
A.2007.35.
[24]
S.
C
hakroborty
,
A.
Ro
y
,
and
G.
Saha,
“Impro
v
ed
closed
set
te
xt-independent
speak
er
identicati
on
by
combining
MFCC
with
e
vi-
dence
from
ipped
lter
banks,
”
International
J
ournal
of
Electr
onics
and
Communication
Engineering
,
v
ol.
2,
no.
11,
pp.
2554–2561,
2008.
[25]
D.
Sharma
and
I.
Ali,
“
A
modied
MFCC
feature
e
xtraction
technique
for
rob
ust
speak
er
recognition,
”
in
2015
Inter
-
national
Confer
ence
on
Advances
in
Computing
,
Communications
and
Informatics
(ICA
CCI)
,
Aug.
2015,
pp.
1052–1057,
doi:
10.1109/ICA
CCI.2015.7275749.
[26]
B.
McFee
et
al
.,
“Librosa:
0.10.0.post2,
”
GitHub
,
2023.
[Online].
A
v
ailable:
https://github
.com/librosa/librosa/releases/tag/0.10.0.post2
[27]
K.
Dohi,
“Dcase2022
task
22
e
v
aluator
,
”
GitHub
,
2022.
Acces
sed:
Aug.
20,
2025.
[Online].
A
v
ailable:
https://github
.com/K
ota-
Dohi/dcase2022
e
v
aluator
In
ver
se-Mel
scale
spectr
o
gr
ams
for
high-fr
equency
featur
e
e
xtr
action
and
...
(Kader
Basha
T
ajuddin
Shaikh)
Evaluation Warning : The document was created with Spire.PDF for Python.