IAES Inter national J our nal of Articial Intelligence (IJ-AI) V ol. 14, No. 5, October 2025, pp. 3656 3666 ISSN: 2252-8938, DOI: 10.11591/ijai.v14.i5.pp3656-3666 3656 In v erse-Mel scale spectr ograms f or high-fr equency featur e extraction and audio anomaly detection in industrial machines Kader Basha T ajuddin Shaikh 1 , Nar esh P . J awarkar 2 , V asif Ahmed 3 , Nadir Nizar Ali Char niya 4 1 Department of Automation and Robotics Engineering, V i v ekanand Education Society’ s Institute of T echnology , Mumbai, India 2 Department of Electrical and Po wer Engineering, Go v ernment Colle ge of Engineering, Amra v ati, India 3 Department of Articial Intelligence and Data Science, Babasaheb Naik Colle ge of Engineering, Pusad, India 4 Department of Electronics and T elecommunication Engineering, V i v ekanand Education Society’ s Institute of T echnology , Mumbai, India Article Inf o Article history: Recei v ed Mar 19, 2025 Re vised Jun 30, 2025 Accepted Jul 13, 2025 K eyw ords: Audio anomaly detection Domain generalization High-frequenc y feature e xtraction In v erse-Mel scale Machine health monitoring ABSTRA CT Unlik e humans, the ener gies in industria l machine sounds (IMS) v ary across a wide range of frequencies. Mel scales, which are de v eloped for the perception of human audio, f ail to capture the complete information present in IMS. T o im- pro v e performance, we propos e using an in v erse-Mel scale, along with the con- catenation and combination of Mel and in v erse-Mel scale based spectrograms, as feature v ectors for audi o anomaly detection (AAD) in industrial machines. Adaptation in the Librosa Python package and the DCASE 2022 Challenge T ask 2 baseline system is pursued for the construction of in v erse-Mel scale spectro- grams. Experiments are conducted using the malfunctioning industrial machine in v estig ation and inspection for domain generalization (MIMII DG) datasets. Systems based on the in v erse-Mel scale achie v e a maximum impro v ement of up to 37% in the bearing machine and an a v erage impro v ement of up to 9% in the area under the curv e (A UC) score across all machines in the M IMII DG datasets. The proposed features also enhance DG, o v ercoming the ef fects of en vironmental and operational domain shifts caused by v ariations in recording setup, load, background noise, and opera tional patterns. Challenge of cial e v al- uator assessed the proposed system ag ainst the e v aluation datasets, ranking it three positions higher than the baseline system. This is an open access article under the CC BY -SA license . Corresponding A uthor: Kader Basha T ajuddin Shaikh Department of Automation and Robotics Engineering, V i v ekanand Education Society’ s Institute of T echnology Mumbai 400074, India Email: kader .shaikh@v es.ac.in 1. INTR ODUCTION Industrial machine sounds (IMS) con v e y considerable information about the status of a machine [1]–[3]. Through astute listening and careful observ ation, an operator can quickly assess the healt h of the machine. An e xperienced operator can easily identify f aults that may arise in an otherwise health y w orking machine. The operator’ s e xpertise enables the anticipation and pre v ention of potential crises. Audio anomaly detection (AAD) systems for industrial machines mimic the beha vior of operators to identify machine health conditions and operational anomalies. AAD for f ault diagnosis and prognosis in industrial machines is being widely res earched and has been one of se v eral tasks in all editions of the DCASE challenges since 2020 [4]–[7]. J ournal homepage: http://ijai.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3657 Se v eral researchers focused on the analysis of high-frequenc y re gions in IMS. Liu et al . [8] e xplored f ault analysis in belt con v e yor idlers. The ef fecti v e distinguishing frequenc y bands for v arious f ault condi- tions due to damaged cages, race w ay slots, and lar ge pits in the inner/outer races on the rolling element, are found to be concentrated in the medium to high-frequenc y (6–20 kHz) ranges. Guochao et al . [3] e xamined the audible sounds produced by milling machines and found that the sound signals s p a nn e d the full audible range. The authors identied lo w-frequenc y sound s ignals generated by tool holder vibrat ions, mid-range frequenc y sounds from metal deformation processes, and high-frequenc y sounds from friction mechanisms. Liu et al . [9] proposed a lightweight f ault diagnosis netw ork called MPNet for identifying bearing f aults in rotating machinery . Authors out lined the limitations of Mel-frequenc y cepstral coef cients (MFCC) being sensiti v e only to lo w-frequenc y information and instead used linear spectrograms constructed using short-time F ourier transform as features. Liu et al . [10] observ ed high-frequenc y components in the audio s ignals of belt con v e yors, specically in the range of 1 to 5 kHz. The impacts and vibrations from defecti v e rollers contrib ute to the generation of these high-fre qu e nc y audio signals. Zhou et al . [11] noted acoustic signals generated by b ulge conditions in tire endurance tests conducted on a drum tes ting machine to generate high ener gy peaks in the high-frequenc y re gions. Zhao et al . [12] noted that features e xtracted from high-frequenc y re gions of vibration signals are more ef fecti v e in characterizing f aults in po wer end b e arings. Ma et al . [13] proposed the fusion of MFCCs, in v erted Mel-scale frequenc y cepstrum coef cients (IMFCCs), Gammatone frequenc y cepstral coef cients (GFCCs), and linear prediction cepstral coef cients (LPCCs) to create a h ybrid cepstral feature kno wn as Mel-in v erted-Gammatone-linear cepstral coef cients (MIGLCCs). This feature encapsulated the indi vidual adv antages of each constituent feature. Their ndings indicated that the fusion of MFCCs and IMFCCs yielded the best results among all dual feature combinations tested. All the abo v e research emphasized the importance of focusing on the ener gy present in higher frequenc y re gions and highlights the benets achie v ed through the use of in v erse-Mel scale frequenc y w arping technique. Ho we v er , the application of in v erse-Mel scale based spectrograms for AAD in industrial machines w as not considered. Based on the original in v estig ation, this research pursued the construction of an in v erse-Mel scale, a combination of Mel and in v erse-Mel scale spectrograms, as front-end features for e xtracting ener gy distrib ution across the complete range of frequencies in IMS. The spectrograms are constructed by adapting the Librosa Python library . These constructed spectrograms serv e as input for an autoencoder -based AAD system designed to identify anomalous operations in industrial machines. Experiments conducted on the malfunctioning indus- trial machine in v estig ation and inspection for domain generali zation (MIMII DG) dataset [4] demonstrate that AAD systems with in v erse-Mel scale spectrograms perform better . This w ork is moti v ated by the DCASE Challenge 2022 T ask 2 [4]–[7], which focuses on AAD and domain generalization (DG) techniques in industrial machines. A total of 31 teams submitted 81 entries to the challenge. Most participants used Mel scale based acoustic features such as Mel ener gies, log-Mel ener gies, MFCC, Mel spectrograms, and log-Mel spectrograms in their systems [7]. Use of in v erse-Mel scale based acoustic features for AAD and DG on the challenge datasets is proposed in this research. This rese arch is the rst of its kind to propose the use of the in v erse-Mel scale for DCASE Challenge 2022 T ask 2. Comparison with the published challenge scores [7] deri v es a relati v e posit ion of 21st rank for the results presented in this research. This ranking is three positions higher than the of cial ranking of the baseline system. Rest of the paper is or g anized as follo ws: section 2 describes the materials and methods emplo yed in this e xperimentation. It includes the methods for construction of spectrograms, details the MIMII DG dataset, and the e xperimental setup along with the e v aluation metrics for the DCASE Challenge 202 2 T ask 2. Section 3 presents and discusses the results, including performance scores and impro v ements observ ed on both the de v elopment and e v aluation datasets. Section 4 summarizes the conclusions dra wn from this research. 2. MA TERIALS AND METHODS 2.1. Sound database of industrial machines MIMII DG [4] a public database shared as a de v elopment and e v aluation dataset for T ask 2 of t he DCASE Challenge 2022 [7] is used in this w ork. This dataset includes normal and anomalous operating sounds from v e dif ferent industrial machines. It is des igned for the de v elopment and e v aluation of AAD and DG techniques in industrial machines. The dataset is di vided into source and tar get domain data. The source domain data contains only the normal and anomalous operating sounds of the machine under test, whereas operational and en vironmental domain shifts commonly encountered in industrial setups are synthetically infused into these In ver se-Mel scale spectr o gr ams for high-fr equency featur e e xtr action and ... (Kader Basha T ajuddin Shaikh) Evaluation Warning : The document was created with Spire.PDF for Python.
3658 ISSN: 2252-8938 sounds to generate the tar get domain data. The source domain data is emplo yed for e v aluating AAD, while the tar get domain data is used for e v aluating DG. 2.2. Construction of in v erse-Mel scale spectr ograms 2.2.1. Equations of in v erse-Mel scale The tw o commonly used implementations for transformation between linear and Mel scale frequenc ies are hidden Mark o v toolkit 3 (HTK) [14] and Slane y [15]. Slane y implementations apply a linear formula for frequencies up to 1 kHz and a log arithmic or anti-log arithmic formula for con v ersions abo v e 1 kHz. HTK implementations follo w a log arithmic or anti-log arithmic formula for the entire range of frequencies. HTK implementations are used in thi s w ork. The relationship between linear frequenc y scale ( f H z ) and Mel- frequenc y scale ( f mel ) is noted in (1) and (2), f mel = 2595 l og 10 (1 + f H z 700 ) (1) f H z = 700 (10 ( f mel / 2595) 1) (2) Se v eral researchers in [13], [16]–[22] dened the in v erse-Mel scale as the complement of the Mel scale. The authors s uggested ipping the original Mel lterbank around its midpoint to deri v e the in v erse-Mel lterbank. Mathematical relationships between the linear frequenc y scale ( f H z ) and the in v erse-Mel frequenc y scale ( f iM el ) are proposed by Chakroborty [23], [24], Sharma [25], Latha [16], Lalitha [18], and Ma [13]. Latha [16] and Lalitha [18] introduced in (3), Ma [13] proposed in (4), Chakroborty [23], [24] and Sharma [25] presented in (5). f iM el = 2146 . 1 2595 l og 10 (1 + 4000 f H z 700 ) (3) f iM el = 2146 . 1 1127 l og 10 (1 + 4000 f H z 700 ) (4) f iM el = 2195 . 286 2595 l og 10 (1 + 4031 . 25 f H z 700 ) (5) Equation (5) is emplo yed in this research. In the w orks of Chakroborty [23], [24] and Sharma [25], the sampling frequenc y is 8 kHz, whereas the sampling frequenc y in the MIMII DG [4] database is 16 kHz. Hence, the constant terms are changed from 2195.286 to 2844.06 a n d 4031.25 to 8031.25. The modied equations used in this w ork to con v ert between the linear frequenc y scale ( f H z ) and the in v erse-Mel frequenc y scale ( f iM el ) are presented in (6) and (7). f iM el = 2844 . 06 2595 l og 10 (1 + 8031 . 25 f H z 700 ) (6) f H z = 8031 . 25 700 (10 (2844 . 06 f iM el / 2595) 1) (7) Figure 1 sho ws plot of center frequencies for all lters in Mel scale and in v erse-Mel scales. Center frequencies represent the midpoint of frequenc y bands used i n Mel and in v erse-Mel transformations. The Mel scale follo ws a log arithmic scale, whereas in v erse-Mel scale functions on an anti-log arithmic scale. Figure 1. Center frequencies in Mel and in v erse-Mel scales Int J Artif Intell, V ol. 14, No. 5, October 2025: 3656–3666 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3659 2.2.2. V ariants of spectr ograms The follo wing types of spectrograms are constructed in this w ork. Mel scale spectrogram constructed using the standard equations of the Mel scale. Functions for con- structing Mel spectrograms, as dened in the Librosa Python package, are emplo yed. In v erse-Mel scale spectrogram constructed using the in v erse-Mel scale equations described in section 2.2.1. The adapt ation made to the Librosa Python package for constructing in v erse-Mel scale spectro- grams is described in section 2.2.3. Concatenated spectrogram constructed by v ertically stacking Mel and in v erse-Mel spectrograms. The Mel spectrogram captures lo wer frequencies, ranging from 0 to 4 kHz, while the in v erse-Mel spectrogram captures higher frequencies from 4 to 8 kHz. Combinational spectrograms constructed by aggre g ating Mel and in v erse-Mel spectrograms across the entire frequenc y range. The v alue at a specic frequenc y is determined by applying maximum, minimum, or a v erage pooling to the Mel and in v erse-Mel v alues. Consequently , this w ork de v elops three types of combinational spectrograms: maximum, minimum, and a v erage v alue spectrograms. 2.2.3. Adaptations in Libr osa package and DCASE 2022 baseline system f or construction of in v erse-Mel scale spectr ograms Adaptations ha v e been made in se v eral source les of the Librosa package [26] for the cons truc- tion and presentation of in v erse-Mel spectrograms. T w o additional parameters, “isIn v erseMel” and “isHTK, are included as ar guments in the melspectrogram, mel, and mel frequencies functions in the ’lters.p y’ and ‘spectral.p y’ les of the Librosa package. The “i sIn v e rseMel” parameter allo ws for toggling between Mel and in v erse-Mel scale formulas, while the “isHTK” parameter enables the selection of either Slane y or HTK im- plementations. The concatenation and combination of Mel and in v erse-Mel spectrograms are performed in the ‘common.p y’ le of the DCASE 2022 baseline system. The adapted source les are a v ailable for do wnload under the GNU General Public License at https://github .com/KaderShaikhVESIT/in v erse-Mel. 2.3. Experimental set-up and e v aluation metrics F ocus of this w ork is to introduce the in v erse-Mel scale and discuss its implicati on s . Hence, this w ork utilized the baseline system of DCASE 2022 Challenge T ask 2 [4], [6], [7] as a detector . The baseline detector is a deep autoencoder . Each 10 seconds of audio is con v erted into a spectrogram that acts as an input feature v ector for the autoencoder . The de v elopment and e v aluation datasets of DCASE Challenge 2022 T ask 2 are used for training and testing the detector . Confusion matrix, precision, recall, F1 score, and area under the curv e (A UC) are calculated for both source and tar get domain data, whereas partial area under the curv e (pA UC) is calculated for combined source and tar get data. Equations for calculation of A UC and pA UC scores are dened in [4], [6]. System e v aluation and ranking is done using the of cial e v aluator shared by the or g anizers [27]. 3. RESUL TS AND DISCUSSION 3.1. Infer ences on all spectr ograms All spectrograms of a typical machine sound recording from the Slide rail machine (section 00 source train normal 0010 v el 1100.w a v) in the MIMII DG dataset are sho wn in Figure 2. The spectrograms utilize a blue-white-red (BWR) colormap. Where bright red indicates higher amplitude or acti v- ity and blue indicates lo wer amplitude. Figure 2(a) pres ents the spectrogram using a linear frequenc y scale based on short-time F ourier transform (STFT), which e xhibits bright red spots in both lo w and high-frequenc y re gions, suggesting that sound ener gy is distrib uted across the entire frequenc y range. In Figure 2(b), the Mel spectrogram empha- sizes lo wer frequenc y re gions whil e suppressing the higher frequenc y re gions. Frequencies abo v e 2 kHz are primarily depicted in white-blue color , indicating a repression of high-frequenc y components. This limitation suggests that the Mel scale spectrogram f ai ls to capture and present the complete information i nherent in IMS. In contrast, the in v erse-Mel scale s pectrogram sho wn in Figure 2(c) enhances the high-frequenc y re gions, ef fecti v ely re v ealing the ener gy content that is otherwise suppressed in the Mel scale spectrogram. Ener gy components abo v e 6 kHz, which are often obscure in Mel spectrograms, are vi vidly displayed here. The concatenated spectrogram sho wn in Figure 2(d) mer ges Mel and in v erse-Mel spectrograms at the midpoint In ver se-Mel scale spectr o gr ams for high-fr equency featur e e xtr action and ... (Kader Basha T ajuddin Shaikh) Evaluation Warning : The document was created with Spire.PDF for Python.
3660 ISSN: 2252-8938 frequenc y of 4 kHz, capturing prominent characteristics from both. This concatenated spectrogram ef fecti v ely captures and represents re gions of high amplitude and acti vity present in both types of spectrograms. Figures 2(e) to 2(g) sho w pix el-wise combinations of Mel and in v erse-Mel spectrograms using a v erage, maximum, and minimum aggre g at ion methods, respecti v ely . These spectrograms successfully capture the shape and vi vid colors characteristic of both Mel scale and in v ers e-Mel scale spectrograms. The intensity of the colors v aries depending on the aggre g ation formula used in their construction. Thus, the use of the in v erse-Mel scale enables complete representation of the information present in IMS. The concatenation and combination spectrograms further support this representation. W ith these spectrograms, this research is able to delv e into une xplored re gions of IMS. Figure 2. Spectrogram representations of a typical slide rail machine sound from the MIMII DG dataset: (a) linear -frequenc y spectrogram using STFT , (b) Mel scale spectrogram, (c) in v erse-Mel scale spectrogram, (d) concatenated Mel and in v erse-Mel spectrograms, (e) combined a v erage spectrogram, (f) combined maximum spectrogram, and (g) combined minimum spectrogram 3.2. Infer ences on the experiment r esults Ev aluations are conducted for all machine types, sections, and domains in the MIMII DG de v elop- ment and e v aluation datasets [4]. T ables 1 and 2 present the scores and percentage impro v ements observ ed in the de v elopment datasets. T ables 3 and 4 present the scores and percentage impro v ements observ ed in the e v aluation datasets. The source domain A UC, tar get domain A UC, and pA UC scores for all machine types, sections, and domains in the de v elopment datasets are listed in T ables 1(a) and 1(b). T ables 2(a) and 2(b) lists the percentage impro v ements for these scores relati v e to the results from Mel scale spectrograms. Int J Artif Intell, V ol. 14, No. 5, October 2025: 3656–3666 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3661 T able 1. A UC and pA UC scores of all machines on de v elopment dataset (best v alues are highlighted) with (a) Mel scale, in v erse Mel scale, and combination maximum; and (b) concatenated, combination a v erage, and combination minimum (a) Mel scale (A UC) In v erse Mel scale (A UC) Combination maximum (A UC) Section source tar get partial source tar get partial source tar get partial Bearing 0 0.5504 0.6048 0.50737 0.5613 0.6915 0.48922 0.5322 0.6103 0.50316 1 0.7176 0.5547 0.54869 0.5293 0.7608 0.4979 0.751 0.6068 0.60395 2 0.4563 0.5581 0.52316 0.415 0.5275 0.52764 0.4972 0.5695 0.55132 A v erage 0.57477 0.57254 0.52641 0.50187 0.65994 0.50492 0.59347 0.59554 0.55281 F an 0 0.778 0.343 0.59158 0.7338 0.3745 0.59053 0.7969 0.3397 0.59237 1 0.7096 0.4577 0.51843 0.6691 0.4386 0.505 0.7721 0.4377 0.53395 2 0.7744 0.6346 0.62764 0.7712 0.5825 0.56606 0.8985 0.6093 0.64369 A v erage 0.754 0.47844 0.57922 0.7247 0.4652 0.55386 0.8225 0.46224 0.59 Gearbox 0 0.6558 0.6555 0.61369 0.707 0.7604 0.63869 0.6088 0.6981 0.61079 1 0.6605 0.5803 0.535 0.6866 0.6241 0.52737 0.6599 0.5707 0.51369 2 0.7744 0.6623 0.61711 0.8108 0.6928 0.66053 0.7484 0.6589 0.6079 A v erage 0.6969 0.6327 0.5886 0.7348 0.69244 0.60886 0.67237 0.64257 0.57746 Slider 0 0.8068 0.5681 0.61843 0.751 0.6088 0.68264 0.8469 0.5944 0.61237 1 0.6841 0.4969 0.53895 0.7755 0.5775 0.54632 0.678 0.4657 0.54579 2 0.8709 0.3866 0.53658 0.8809 0.4324 0.56158 0.8838 0.3431 0.525 A v erage 0.78727 0.48387 0.56465 0.80247 0.53957 0.59685 0.8029 0.46774 0.56106 V alv e 0 0.5408 0.5182 0.52474 0.5991 0.5506 0.51158 0.5195 0.504 0.51974 1 0.5257 0.5313 0.50106 0.5808 0.5951 0.49527 0.5388 0.5083 0.49606 2 0.5187 0.4422 0.49395 0.5891 0.5008 0.49711 0.5635 0.4461 0.4879 A v erage 0.5284 0.49724 0.50658 0.58967 0.54884 0.50132 0.5406 0.48614 0.50123 A v erage o v erall 0.66827 0.53296 0.55309 0.6707 0.5812 0.55316 0.68637 0.53084 0.55651 (b) Concatenated Combination a v erage Combination minimum Section source tar get partial source tar get partial source tar get partial Bearing 0 0.4945 0.6248 0.49369 0.4959 0.6138 0.5 0.5617 0.68 0.49158 1 0.6664 0.6121 0.57369 0.7009 0.6333 0.57685 0.5748 0.6511 0.55632 2 0.5327 0.6153 0.59474 0.4817 0.5834 0.49158 0.4791 0.5662 0.48922 A v erage 0.56454 0.6294 0.55404 0.5595 0.61017 0.52281 0.53854 0.63244 0.51237 F an 0 0.7862 0.3522 0.58948 0.7808 0.3805 0.59343 0.6314 0.4412 0.59343 1 0.6763 0.4758 0.51527 0.6775 0.4517 0.5129 0.6876 0.402 0.52183 2 0.7506 0.5375 0.60974 0.7364 0.5882 0.6 0.5619 0.6061 0.59343 A v erage 0.7377 0.45517 0.5715 0.73157 0.47347 0.56878 0.62697 0.4831 0.56957 Gearbox 0 0.6615 0.6905 0.58895 0.4991 0.5832 0.49843 0.6484 0.7284 0.57237 1 0.66 0.5972 0.54106 0.6162 0.5455 0.52316 0.6709 0.5925 0.52474 2 0.7841 0.6822 0.61685 0.7656 0.678 0.63211 0.8228 0.6682 0.625 A v erage 0.70187 0.65664 0.58229 0.62697 0.60224 0.55123 0.71404 0.66304 0.57404 Slider 0 0.8127 0.5875 0.63685 0.7914 0.5787 0.64106 0.7855 0.6671 0.65027 1 0.7487 0.5592 0.55869 0.7096 0.5207 0.55316 0.7244 0.6876 0.60922 2 0.8766 0.4179 0.56711 0.8632 0.4327 0.57737 0.8689 0.387 0.54922 A v erage 0.81267 0.52154 0.58755 0.78807 0.5107 0.59053 0.79294 0.50857 0.6029 V alv e 0 0.5599 0.5498 0.52632 0.564 0.5305 0.52185 0.5346 0.5224 0.5129 1 0.5188 0.5393 0.50053 0.5277 0.5314 0.50343 0.5155 0.5315 0.50237 2 0.5286 0.4699 0.49422 0.5494 0.4794 0.49685 0.5659 0.4973 0.49843 A v erage 0.53577 0.51967 0.50702 0.54704 0.51377 0.50737 0.53867 0.51707 0.50457 A v erage o v erall 0.67051 0.55648 0.56048 0.65063 0.54207 0.54815 0.64223 0.57524 0.55269 In ver se-Mel scale spectr o gr ams for high-fr equency featur e e xtr action and ... (Kader Basha T ajuddin Shaikh) Evaluation Warning : The document was created with Spire.PDF for Python.
3662 ISSN: 2252-8938 T able 2. Percentage impro v ements in scores on de v elopment dataset (best mean v alues are highlighted) (a) in v erse Mel scale, combination maximum, and concatenated; and (b) combination a v erage and combination minimum (a) In v erse-Mel scale (A UC) Combination maximum (A UC) Concatenated (A UC) Section source tar get partial source tar get partial source tar get partial Bearing 0 1.99 14.34 -3.58 -3.31 0.91 -0.83 -10.16 3.31 -2.7 1 -26.25 37.16 -9.26 4.66 9.4 10.08 -7.14 10.35 4.56 2 -9.06 -5.49 0.86 8.97 2.05 5.39 16.75 16.7 13.69 A v erage -12.69 15.27 -4.09 3.26 4.02 5.02 -1.78 9.94 5.25 F an 0 -5.69 9.19 -0.18 2.43 -0.97 0.14 1.06 2.69 -0.36 1 -5.71 -4.18 -2.6 8.81 -4.37 3 -4.7 3.96 -0.61 2 -0.42 -8.21 -9.82 16.03 -3.99 2.56 -3.08 -15.31 -2.86 A v erage -3.89 -2.77 -4.38 9.09 -3.39 1.87 -2.17 -4.87 -1.34 Gearbox 0 7.81 16.01 4.08 -7.17 6.5 -0.48 0.87 5.34 -4.04 1 3.96 7.55 -1.43 -0.1 -1.66 -3.99 -0.08 2.92 1.14 2 4.71 4.61 7.04 -3.36 -0.52 -1.5 1.26 3.01 -0.05 A v erage 5.44 9.45 3.45 -3.52 1.56 -1.9 0.72 3.79 -1.08 Slider 0 -6.92 7.17 10.39 4.98 4.63 -0.98 0.74 3.42 2.98 1 13.37 16.23 1.37 -0.9 -6.28 1.27 9.45 12.54 3.67 2 1.15 11.85 4.66 1.49 -11.26 2.16 0.66 8.1 5.69 A v erage 1.94 11.52 5.71 1.99 -3.34 -0.64 3.23 7.79 4.06 V alv e 0 10.79 6.26 -2.51 -3.94 -2.29 -0.96 3.54 6.1 0.31 1 10.49 12.01 -1.16 2.5 -4.33 -1 -1.32 1.51 -0.11 2 13.58 13.26 0.64 8.64 -0.89 -1.23 1.91 6.27 0.06 A v erage 11.6 10.38 -1.04 2.31 -2.24 -1.06 1.4 4.52 0.09 A v erage o v er all machines 0.37 9.06 0.02 2.71 -0.4 0.62 0.34 4.42 1.34 (b) Combination a v erage (A UC) Combination minimum (A UC) Section source tar get partial source tar get partial Bearing 0 -9.91 1.49 -1.46 2.06 12.44 -3.12 1 -2.33 14.17 5.14 -19.9 17.38 1.4 2 5.57 4.54 -6.04 5 1.46 -6.49 A v erage -2.66 6.58 -0.69 -6.31 10.47 -2.67 F an 0 0.36 10.94 0.32 -18.85 28.63 0.32 1 -4.53 -1.32 -1.07 -3.11 -12.17 0.66 2 -4.91 -7.32 -4.41 -27.45 -4.5 -5.46 A v erage -2.98 -1.04 -1.81 -16.85 0.98 -1.67 Gearbox 0 -23.9 -11.03 -18.79 -1.13 11.13 -6.74 1 -6.71 -6 -2.22 1.58 2.11 -1.92 2 -1.14 2.38 2.44 6.25 0.9 1.28 A v erage -10.04 -4.82 -6.35 2.46 4.8 -2.48 Slider 0 -1.91 1.87 3.66 -2.65 17.43 5.15 1 3.73 4.79 2.64 5.9 38.38 13.04 2 -0.89 11.93 7.61 -0.23 0.11 2.36 A v erage 0.11 5.55 4.59 0.73 19.99 6.78 V alv e 0 4.29 2.38 -0.56 -1.15 0.82 -2.26 1 0.39 0.02 0.48 -1.95 0.04 0.27 2 5.92 8.42 0.59 9.1 12.47 0.91 A v erage 3.53 3.33 0.16 1.95 3.99 -0.4 A v erage o v er all machines -2.64 1.71 -0.9 -3.9 7.94 -0.08 Int J Artif Intell, V ol. 14, No. 5, October 2025: 3656–3666 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3663 T able 3. A UC and pA UC scores of all machines on e v aluation dataset (best v alues are highlighted) Harmonic mean o v er all machine types, sections, and domains Of cial score Mel scale A UC 0.476997654 0.485524897 pA UC 0.503527942 In v erse-Mel scale A UC 0.476953026 0.487278196 pA UC 0.509330358 Combination maximum A UC 0.490115307 0.495681532 pA UC 0.507202091 Concatenated A UC 0.475036924 0.485275108 pA UC 0.507135059 Combination a v erage A UC 0.46975013 0.481373758 pA UC 0.506436574 Combination minimum A UC 0.475282691 0.486243539 pA UC 0.509755228 T able 4. Percentage impro v ements in scores on e v aluation dataset (best mean v alues are highlighted) Harmonic mean o v er all machine types, sections, and domains Of cial score In v erse-Mel scale A UC -0.01 0.37 pA UC 1.16 Combination Maximum A UC 2.76 2.1 pA UC 0.73 Concatenated A UC -0.42 -0.06 pA UC 0.72 Combination A v erage A UC -1.52 -0.86 pA UC 0.58 Combination Minimum A UC -0.36 0.15 pA UC 1.24 Use of plain in v erse-Mel scale spectrograms has enhanced the tar get domain A UC in all machines , e xcept for the f an machine. The most signicant impro v ement, approximately 37%, is noted in the tar get domain A UC for the type 2 domai n shift condition of the bearing machine. On a v erage, there is about a 9% increase in the tar get domain A UC across all machines. Experiments conducted under v arious domain shift conditions sho w that the tar get domain A UC impro v es within a range of 5-36% for all machines, e xcluding the f an machine. Commonly occurring domain shifts—such as changes in microphone location (bearing machine section 2), v arying loads (gearbox machine section 2), uctuations in operational v oltages (gearbox machine section 1), dif ferences in operational speeds (bearing machine section 1), v ariations in operational v elocity (slide rail machine section 1), changes in operational acceleration (slide rail machine section 2), dif fering operational patterns (v alv e machine section 1), and the mixing of v ar ious f actory noises at dif ferent inde x es (slide rail machine section 3)—are ef fecti v ely identied by in v erse-Mel scales. These impro v ements highlight the ef fecti v eness of the in v erse-Mel scale in accurately detecting operational and en vironment al domain shifts commonly encountered in IMS. Use of plain in v erse-Mel scale spectrograms has also impro v ed the source domain A UC and pA UC in gearbox, slide rail, and v alv e machines. An a v erage impro v ement of approximately 6% and 3% in source domain A UC and pA UC, respecti v ely , is observ ed across the abo v e three machines. These impro v ements pro v e the supremac y of the in v erse-Mel scale in the detection of anomalous beha vior from IMS. Combinational maximum spectrograms are observ ed to enhance tar get domain A UC and pA UC scores in both the bearing and f an machines. Use of plai n in v erse-Mel scale spectrograms resulted in poor perfor - mance for these machines. This is due to the f a ct that bearing and f an machines produce a lo w le v el of sound ener gy , with the emitted ener gy primarily concentrated in the lo w-frequenc y re gions. Ne v ertheless, the use of combinational m aximum spectrograms has demonstrated impro v ed detection accurac y . On a v erage, there is an impro v ement of approximately 6% in source domain A UC and 4% in pA UC across both machines. Additionally , concatenated spectrograms are observ ed to enhance pA UC scores in bearing, slide rail, and v alv e machines, yielding an a v erage impro v ement of around 3% in pA UC across these three machines. These results suggest that plain in v erse-Mel scales may not al w ays yield optimal results; ho we v er , the use of combinational or concatenated spectrograms could impro v e the performance of the detection system. Ev aluations are also conducted on the e v aluation dataset. The of cial D A CSE 2022 Challenge e v alu- ator [27] is e x ecuted with the anomaly scores and decision results generated by the trained models. Harmonic In ver se-Mel scale spectr o gr ams for high-fr equency featur e e xtr action and ... (Kader Basha T ajuddin Shaikh) Evaluation Warning : The document was created with Spire.PDF for Python.
3664 ISSN: 2252-8938 means of A UC and pA UC scores calculated across all machine types, sections, and domains of e v aluation datasets are presented in T able 3. The of cial scores, as e v aluated by the of cial e v aluator , are listed in T able 3. The of cial scores are utilized to rank the participating systems and teams. T able 4 lists the per - centage impro v ements in all the aforementioned scores in comparison to the results obtained from Mel scale spectrograms. Among all the proposed methods, the combinational maximum spectrograms ha v e been sho wn to generate the best v alues of harmonic mean for both A UC and pA UC scores across all machine types, sections, and domains. Impro v ements of approxim ately 3% in A UC scores and 1% in pA UC scores are observ ed. The of cial score for the combinational maximum spectrograms indicates an impro v ement of about 2% compared to the of cial score of Mel scale spectrograms. This enhanced score results in a rank of 21 st in the of cial ranking released by the DCASE Challenge 2022 T ask 2 [7]. This ranking is three positions higher than that of the baseline system. 4. CONCLUSION In this w ork, in v erse-Mel scales are used to capture the ener gy present in the high frequencies of IMS. This approach captures the information ne glected by standard Mel scales. An autoencoder emplo ying in v erse-Mel scales, as well as the concatenation and combination of Mel and in v erse-Mel scale spectrograms as front-end features, is implemented for AAD in industrial machines. Experiments are conducted on all machines in the MIMII DG datasets. The use of in v erse-Mel scales, along with combinational maximum and concatenated spectrograms, has been sho wn to enhance source domain A UC, tar get domain A UC, and pA UC scores by 8%, 9%, and 2%, respecti v ely , across all machines. The impro v em ent in tar get domain A UC is particularly signicant as it demonstrates the ef fecti v eness of the proposed method in identifying challenging operational and en vironmental domain shifts. The higher ranking a w arded by the of cial challenge e v aluator in the e v aluation datasets reects the system’ s capability to ef fecti v ely capture domain shifts. The results indicate that IMS contain a considerable amount of ener gy in higher frequenc y ranges that standard Mel scales f ail to detect. In v erse-Mel scales are more ef cient in capturing these high-frequenc y components and are hence advised to be used in AAD for industrial machines. FUNDING INFORMA TION Authors state no funding in v olv ed. A UTHOR CONTRIB UTIONS ST A TEMENT This journal uses the Contrib utor Roles T axonomy (CRediT) to recognize indi vidual author contrib u- tions, reduce authorship disputes, and f acilitate collaboration. Name of A uthor C M So V a F o I R D O E V i Su P Fu Kader Basha T ajuddin Shaikh Naresh P . Ja w arkar V asif Ahmed Nadir Nizar Ali Charniya C : C onceptualization I : I n v estig ation V i : V i sualization M : M ethodology R : R esources Su : Su pervision So : So ftw are D : D ata Curation P : P roject Administration V a : V a lidation O : Writing - O riginal Draft Fu : Fu nding Acquisition F o : F o rmal Analysis E : Writing - Re vie w & E diting CONFLICT OF INTEREST ST A TEMENT Authors state no conict of interest. Int J Artif Intell, V ol. 14, No. 5, October 2025: 3656–3666 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 3665 D A T A A V AILABILITY The supporting data of this study are openly a v ailable at https://zenodo.or g/record/6529888 [4]. REFERENCES [1] K. B. T . Shaikh, N. P . Ja w arkar , and V . Ahmed, “Machine diagnosis using acoustic analysis: a re vie w , in 2021 IEEE Confer ence on Norbert W iener in the 21st Century (21CW) , Chennai, India: IEEE, Jul. 2021, pp. 1–6, doi: 10.1109/21CW48944.2021.9532537. [2] T . Salm, K. T atar , and J. Chilo, “Real-time acoustic measurement system for cut ting-tool analysis during stainless steel machining, Mac hines , v ol. 12, no. 12, Dec. 2024, doi: 10.3390/machines12120892. [3] G. Li, X. Shang, L. Sun, B. Fu, L. Y ang, and H. Zhou, Application of audible sound signals in tool wear monitoring: a re vie w , Advanced Manufacturing Science and T ec hnolo gy , v ol. 5, no. 1, 2025, doi: 10.51393/j.jamst.2025003. [4] K. Dohi et al ., “MIMII DG: Sound dataset for malfunctioning industrial machine in v estig ation and inspection for domain general- ization task, in Pr oceedings of the 7th Detecti on and Classication of Acoustic Scenes and Events 2022 W orkshop (DCASE2022) , Nanc y , France, No v . 2022, pp. 1–5. [5] K. Dohi et al ., “Description and discussion on DCASE 2022 challenge T ask 2: unsupervised anomalous sound detection for machine condition monitoring applying domain generalization techniques, Detection and Classication of Acoustic Scenes and Events 2022 , No v . 2022, pp. 1-5. [6] N. Harada, D. Niizumi, D. T ak euchi, Y . Ohishi, M. Y asuda, and S. Saito, “T o yADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions, arXiv-Electrical Engineering and Systems Science , pp. 1-5, Jun. 2021, doi: 10.48550/arXi v .2106.02369. [7] K. Dohi et al. , “Unsupervised anomalous sound detection for machine condition monitoring applying domain generalization tech- niques, DCASE Community , 2022. Accessed: Aug. 20, 2025. [Online]. A v ailable: https://dcase.community/challenge2022/task- unsupervised-anomalous-sound-detection-for -machine-condition-monitoring. [8] Y . Liu, C. Miao, X. Li, J. Ji, and D. Meng, “Research on the f ault analysis method of belt con v e yor idlers based on sound and thermal infrared image features, Measur ement , v ol. 186, Dec. 2021, doi: 10.1016/j.measurement.2021.110177. [9] Y . Liu, Y . Chen, X. Li, X. Zhou, and D. W u, “MPNet: A lightweight f ault diagnosis netw ork for rotating machinery , Measur ement , v ol. 239, Jan. 2025, doi: 10.1016/j.measurement.2024.115498. [10] J. Liu, S. Fu, F . Liu, and X. Cheng, “Intelligent f ault diagnosis of bel t con v e yor rollers using a polar KNN algorithm with audio features, Engineering F ailur e Analysis , v ol. 168, Feb . 2025, doi: 10.1016/j.engf ailanal.2024.109101. [11] H. Zhou, Z. Gao, H. Li, and Y . Zhang, “State identifying method for rolling tire in lab test using acoustic signal, Applied Acoustics , v ol. 231, Mar . 2025, doi: 10.1016/j.apacoust.2024.110487. [12] Y . Zhao, B. Qin, Y . Zhou, and X. Xu, “Bearing f ault diagnosis based on in v erted Mel-scale frequenc y cepstral coef cients and deformable con v olution netw orks, Measur ement Science and T ec hnolo gy , v ol. 34, no. 5, Feb . 2023, doi: 10.1088/1361-6501/acb0ea. [13] L. Ma, A. Jiang, and W . Jiang, “The intell igent diagnosis of a h ydraulic plunger pump based on the MIGLCC-DLSTM method using sound signals, Mac hines , v ol. 12, no. 12, No v . 2024, doi: 10.3390/machines12120869. [14] S. Y oung et al ., The HTK book , Cambridge, United Kingdom: Cambridge Uni v ersity Engineering Department, 2002. [15] M. Slane y , Auditory toolbox: a MA TLAB toolbox for auditory modeling w ork, Interval Resear c h Corpor ation , pp. 1-41, 1998. [16] Latha, “Rob ust speak er identication incorporating high frequenc y feature s, Pr ocedia Computer Science , v ol. 89, pp. 804–811, 2016, doi: 10.1016/j.procs.2016.06.064. [17] H. K. Kathania, S. Shahna w azuddin, W . Ahmad, and N. Adig a, “Role of li near , mel and in v erse-mel lterbanks in automatic recognition of speech from high-pitched speak ers, Cir cuits Systems Signal Pr ocess , v ol. 38, no. 10, pp. 4667–4682, Oct. 2019, doi: 10.1007/s00034-019-01072-7. [18] S. Lalitha, S. T ripathi, and D. Gupta, “Enhanced speech emotion detection using deep neural netw orks, International J ournal of Speec h T ec hnolo gy , v ol. 22, pp. 497–510, Sept. 2019, doi: 10.1007/s10772-018-09572-8. [19] Z. W ang, J. Y an, Y . W ang, and X. W ang, “Speech emotion feature e xtraction method based on impro v ed MFCC and IMFCC fusion features, in 2023 IEEE 2nd International Confer ence on Electrical Engineering , Big Data and Algorithms (EEBD A) , Feb . 2023, pp. 1917–1924. doi: 10.1109/EEBD A56825.2023.10090810. [20] S. Aziz and S. Shahna w azuddin, “Ef fecti v e pres erv ation of higher -frequenc y contents in the conte xt of short utterance based children’ s speak er v erication system, Applied Acoustics , v ol. 209, June 2023, doi: 10.1016/j.apacoust.2023.109420. [21] S. Aziz and S. Shahna w azuddin, “Experimental studies for i mpro ving the performance of children’ s speak er v erication system using short utterances, Applied Acoustics , v ol. 216, Jan. 2024, doi: 10.1016/j.apacoust.2023.109783. [22] S. Aziz and S. Shahna w azuddin, “Role of data augmentation and ef fecti v e conserv at ion of high-frequenc y contents in the conte xt children’ s speak er v erication system, Cir cuits Systems Signal Pr ocess , v ol. 43, pp. 3139–3159, May . 2024, doi: 10.1007/s00034-024-02598-1. [23] S. Chakrobort y , A. Ro y , S. Majumdar , and G. Saha, “Capturing complementary information via re v ersed lter bank and parallel implementation with MFCC for impro v ed te xt-independent speak er identication, in 2007 International Confer ence on Computing: Theory and Applications (ICCT A ’07) , Mar . 2007, pp. 463–467, doi: 10.1109/ICCT A.2007.35. [24] S. C hakroborty , A. Ro y , and G. Saha, “Impro v ed closed set te xt-independent speak er identicati on by combining MFCC with e vi- dence from ipped lter banks, International J ournal of Electr onics and Communication Engineering , v ol. 2, no. 11, pp. 2554–2561, 2008. [25] D. Sharma and I. Ali, A modied MFCC feature e xtraction technique for rob ust speak er recognition, in 2015 Inter - national Confer ence on Advances in Computing , Communications and Informatics (ICA CCI) , Aug. 2015, pp. 1052–1057, doi: 10.1109/ICA CCI.2015.7275749. [26] B. McFee et al ., “Librosa: 0.10.0.post2, GitHub , 2023. [Online]. A v ailable: https://github .com/librosa/librosa/releases/tag/0.10.0.post2 [27] K. Dohi, “Dcase2022 task 22 e v aluator , GitHub , 2022. Acces sed: Aug. 20, 2025. [Online]. A v ailable: https://github .com/K ota- Dohi/dcase2022 e v aluator In ver se-Mel scale spectr o gr ams for high-fr equency featur e e xtr action and ... (Kader Basha T ajuddin Shaikh) Evaluation Warning : The document was created with Spire.PDF for Python.