IAES Inter national J our nal of Articial Intelligence (IJ-AI) V ol. 15, No. 2, April 2026, pp. 1733 1745 ISSN: 2252-8938, DOI: 10.11591/ijai.v15.i2.pp1733-1745 1733 ResNet based deep lear ning appr oach f or chr onic obstructi v e pulmonary disease pr ediction using lung sound analysis Babitha Sudhakar Ullal 1 , V eena Kalludi Narasimhaiah 1 , Rithul Kamesh 2 1 School of Electronics and Communication Engineering, REV A Uni v ersity , Beng aluru, India 2 Department of Electronics and Communication Engineering, PES Uni v ersity , Beng aluru, India Article Inf o Article history: Recei v ed Aug 21, 2025 Re vised Jan 15, 2026 Accepted Feb 6, 2026 K eyw ords: Audio signal processing Chronic obstructi v e pulmonary disease Con v olutional neural netw ork Long short-term memory Residual netw orks ABSTRA CT Chronic obstructi v e pulmonary disease (COPD) af fects around 300-400 million people w orldwi de representing a critical healthcare challenge that requires early detection for ef fecti v e interv ention. This w ork introduces chronic lung analysis via audi o signal prediction (CLASP), a no v el frame w ork achie ving 97.90% accurac y in predicting COPD automatically through respiratory audio signal analysis. This method inte grates adv anced signal processing and deep learning architectures, comparing long short-term memory (LSTM), con v olutional neural netw orks (CNN), and residual netw orks (ResNet) models for optimal performance. The ResNet architecture e xhibits superior diagnostic capability with precision of 98.72%, recal l of 96.86%, and 0.9937 area under the curv e (A UC), as compared to e xisting methods by signicant mar gins. These results establish a ne w benchmark for nonin v asi v e COPD detection, thus enabling practical deplo yment in clinical settings thereby dramatically impro ving the patient outcomes by early detection and also reduce healthcare costs. This is an open access article under the CC BY -SA license . Corresponding A uthor: Babitha Sudhakar Ullal School of Electronics and Communication Engineering, REV A Uni v ersity Beng aluru, India Email: babitharoshan91@gmail.com 1. INTR ODUCTION Chronic obstructi v e pulmonary disease (COPD) i s one of the critical challenges in present-day respiratory medicine, globally af fecting around 384 million people, and i s estimated to become the w orld’ s third most common cause of death by 2030 [1], [2]. This disease is progressi v e in nature [3], and is characterized by persistent problems of respiration and airo w limitation [4], [5]. It demands early detection techniques that can identify patients before irre v ersible lung damage occurs [6], [7]. Spirometry and clinical e v aluation are the current diagnostic approaches used which often do not detect COPD in its early stages [8] where medical interv entions could be the most ef fecti v e to impro v e patient outcome, creating a demanding need for more sensiti v e and accessible screening methods. Machine learning approaches in detection of v arious respiratory diseases has e xhibited remar kable promise [9]–[13] by the application of audio signal analysis using the v ast temporal and spectral information present in breath sounds [14], [15]. Recent de v elopments in deep learning architectures, such as recurrent neural netw orks (RNNs), con v olutional neural netw orks (CNNs), and residual netw orks (ResNet s), ha v e sho wn e xceptional capability in pattern recognition tasks of bi omedical signals. These technologies pro vide the J ournal homepage: http://ijai.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
1734 ISSN: 2252-8938 necessary computational foundation to implement automated diagnostic systems that can analyse comple x respiratory audio patterns with high accurac y while maintaining the nonin v asi v e nature. The specic challenge addressed in this w ork is the implementation of an automated system to distinguish COPD-related respiratory patterns from other normal or abnormal breathing patterns with acceptable le v els of accurac y for cli nical deplo yment. T radi tional machine learning approaches ha v e achie v ed modest success in this a rea, with accuracies typically v arying between 82.5% and 93% [16], [17]. But these performance le v els f all short of the required standards for clinical deplo yment. The proposed approach introduces chronic lung analysis via audio signal prediction (CLASP), a complete frame w ork combining the adv anced signal processing techniques with cutting-edge deep learning models gi ving unparalleled accurac y in automated detection of COPD. The k e y inno v ations in the w ork are as follo ws. First, de v elopment of an optimized audio preprocessing pipeline incorporating Mel-frequenc y cepstral coef cients (MFCC) along with rst deri v ati v e, delta and second deri v ati v e, delta-delta features to enable enhanced temporal pattern capture. Second, comparati v e e v aluation of three distinct deep learning models—long short-term memory (LSTM), CNN, and ResNet. Third, implementation of threshold optimization techniques to enhance its clinical utility . 2. RELA TED W ORK Lee et al. [16] designed a model that uses thermal imaging to capture respiratory patterns, focusing on four features considered as primary: total v olume of respiration, a v erage e xpiration distance, a v erage inspiration distance, and respiration rate. Later , Z-score normalization w as applied to these features and combined them through weighted summ ation to generate a composite score used in cl assication later . The accurac y of the model is 82.5%. The model’ s high recall indicates the ability to identify indi viduals with COPD more accurately minimizing f alse ne g ati v es. Siddiqui et al. [17] e xplored the use of a non-in v asi v e method - ultra-wideband (UWB) radar to dif ferentiate COPD patients from indi viduals who are health y . Data w as collected from 70 subjects (35 each of COPD condition and health y controls). The researchers e xtracted data of respiration and incorporated further features such as age of patient, smoking history and gender to enhance the performance accurac y . Se v eral machine learning classiers including support v ector machine (SVM), na ¨ ıv e Bayes (NB), adapti v e boosting (AdaBoost), k-nearest neighbor (KNN), random forest (RF) and deep learning models lik e CNN and LSTM netw orks were emplo yed. Among these, highest accurac y of 93% w as achie v ed by LSTM model, demonstrating the potential of combination of UWB radar and machine learning as a non-i n v as i v e and ef fecti v e method for COPD detection. Abineza et al. [18] utilized time-stamped electronic health records from COPD patients to de v elop an LSTM model to predict subsequent e xacerbation by analyzing symptoms, patterns, and arterial oxygen saturation le v els o v er time. Experiment w as done with v arying time windo ws, ranging from one to six prior days, to forecast the lik elihood of an e xacerbation on the follo wing day . The model demonstrated optimal performance when using a one-day time windo w , achie ving testing accurac y of 85%, training accurac y of 87%, and area under the curv e (A UC) of 0.83. These results were obtained from a dataset comprising 54 patients, which is a small number . Reliance on saturation of peripheral oxygen (SPO2) is the only clini cal v ariable used as f actors inuencing COPD e xacerbations. Mei et al. [19] introduces DeepSpiro, a deep learning no v el architecture designed to enhance detection as well as early prediction of COPD using spirogram data. DeepSpiro comprises of four primary components: SpiroSmoother , SpiroEncoder , SpiroExplainer , and SpiroPredictor . The model resulted in 0.8328 v alue of A UC distinguishing indi viduals with COPD from those without. In early prediction tas ks, DeepSpiro ef fecti v ely dif ferentiated between the groups of lo w-risk and high-risk observing substantial dif ferences in future COPD de v elopment. This underscores the model’ s potential in forecasting the progression of COPD o v er long-term. The model accurac y depends on the quality of spirogram data. Y in et al. [20] uses fractional-order dynamics and deep learning techniques to predict COP D. Thorax breathing ef fort, respiratory rate, and oxygen saturation le v els were e xtracted to obtain fractional dynamic signatures to train a deep neural netw ork (DNN). The model accurac y w as 94.01% when trained on the W estRo Porti C OPD dataset and tested on the W estRo COPD dataset, and 90.13% accurac y in the re v erse scenario. Ho we v er , the relati v ely small number of unique patients may limit the model’ s generalizability to broader populations. Int J Artif Intell, V ol. 15, No. 2, April 2026: 1733–1745 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 1735 Bairagi and Kanw ade [21] used a non-in v asi v e technique by analyzing surf ace el ectromyograph y (sEMG) si gnals from the sternomastoid muscle, a primary respiratory muscle aiming to o v ercome lim itations of traditional spirometry . EMG signals were e xamined across time domain, frequenc y domain and time-frequenc y domains. An algorithm that detects onset based on slope w as emplo yed to identify muscle acti v ation periods, enhancing the accurac y of feature e xtraction. Features were e xtracted usi ng continuous w a v elet transform (CWT) at single-frequenc y of 7, 8, and 10 Hz f acilitating COPD classication based on grades of se v erity , By emplo ying this technique, classication accurac y of 85.89% across dif ferent COPD se v erity grades is achei v ed. K umar et al. [22] presents an inno v ati v e approach in diagnosing COPD by inte grating images scanned by computed tomograph y (CT) with audio samples of lung to enhance the diagnostic process for COPD. Features are e xtracted from scan images and audio samples, including te xture, histogram intensity , Gaussian scale space, chroma, and MFCCs. T o assess se v erity le v el of the patient by performing early classication, the e xtracted features are fed into the ensemble learning technique. The proposed frame w ork achie v ed accuracies of 97.50% for fusion technique based early diagnosis, 98% for early diagnosis using the CT diagnostic model, 95.30% for early diagnosis utilizing the cough sample model. These high accuracies ha v e contrib ution not from the audio signals alone, b ut also from CT images. Ullah et al . [23] empl o yed dataset from Kaggle (respiratory sound database) and chest w all lung sounds. T o ensure uniformity along with duration of x ed-length, ra w signals were resampled to 4 kHz and later zero-padded. Later se gmentation w as performed to prepare the data for feature e xtraction using MFCC (13 features) and short term F ourier transform (STFT) (1,000 features). SVM, articial neural netw ork (ANN), KNN, RF , and decision tree (DT) machine learning algorithms were emplo yed. The models were trained for 70% data and v alidated on 30%. The combination STFT+MFCC-ANN yielded best accurac y . Nuna v ath et al. [24] e xplores deep learning architectures t o predict e xacer bation in C OPD patients. The authors emplo y LSTM to analyze patient data (only 94) and identify patterns that indicate the lik elihood of future e xacerbation. The deep neural netw ork performed better than traditional machine learni n g approaches. in predicting COPD e xacerbation. The LSTM model sho wed 92.86% accurac y . Jenef a et al. [25] presents a no v el approach to predicting COPD by inte grating both CNN and LSTM. This approach le v erages the strengths of both architectures: spatial features e xtraction using CNN and temporal dependencies in sequential data is captured using LSTM. The model ef ciently captured comple x COPD patterns, leading to more ef fecti v e predictions. The method identied early stages of COPD with accurac y greater than 95%. Ho we v er , these approaches often struggle in capturing long-term temporal patt erns in the audio signals, which are critical for accurate COPD diagnosis. CLASP b uilds upon these w orks using LSTM model, a CNN model and a ResNet model to capture long-term and short-term patterns in respiratory audio signals and compare their performance. 3. METHOD The CLASP frame w ork uses a systematic approach to detect COPD through the analysis of respiratory audio signal. The methodology includes three primary components. First, computational signal processing techniques for audio preprocessing and feature e xtraction. Second, e xperimental deep learning architectures for pattern recognition. Third, comprehensi v e e v aluation protocols for clinical v alidation. The proposed methodology utilizes a publicly a v ailable dataset a v ailable in Kaggle at https://www .kaggle.com/code/mariammagdy22/pulmonary-diseases-detection-system/input. 3.1. Computational techniques Audio signals are processed using a sampling frequenc y of 22,050 Hz with windo wed se gmentation. MFCC e xtraction is done by including 13 static coef cients augmented with delta and delta-delta features for capture of temporal dynamics. The foundation for MFCC computati o n in v olv es STFT analysis, Mel-scale lterbank application, and processing of discrete cosine transform (DCT). The proposed implementation uses librosa library with optimized parameters v alidated through preliminary t esting on the International Conference on Biomedical Health Informatics (ICBHI) dataset [15]. This approach gi v es rob ust feature representations that capture spectral characteristics along with temporal v ariations required for respiratory pattern analysis. Spectrogram grid sho wcasing respiratory audio signals used in the CLASP frame w ork is sho wn in Figure 1. ResNet based deep learning appr oac h for c hr onic obstructive pulmonary ... (Babitha Sudhakar Ullal) Evaluation Warning : The document was created with Spire.PDF for Python.
1736 ISSN: 2252-8938 3.2. Experimental techniques Three dif ferent deep learning architectures were implemented and methodically compared: LSTM netw orks with attention mechanisms, CNN with global pooling strate gies, and ResNet with skip connections. Each architecture w as designed e xplicitly for the 39-dimensional MFCC feature v ectors, with carefully considering temporal dependencies in respiratory audio signals. T raining protocols used stratied data splitting with 80-20 trai n - test split, ensuring balanced representation of COPD and non-COPD classes. Adam optimizer is used for model optimization with 0.001 as learning rate. T o pre v ent o v ertting, early stopping and strate gies to reduce learning rate are used which also maximizes con v er gence stability . 3.3. Err or analysis and v alidation The e v aluation protocols addressed both random and systematic error sources through metric assessment including accurac y , precision, specicity , F1-score, recall, and area under the curv e (A UC). Threshold optimization techniques are used to enhance clinical utility , prioritizing sensiti vity for screening applications in medical eld. Strate gies for cross-v alidation and analysis of confusion matrix pro vides detailed insights into model performance characteristics. T raining time analysis quanti ed computational ef cienc y trade-of fs and statistical signi cance testing conrmed the rob ustness of performance dif ferences across architectures. CLASP pipeline architecture with three deep learning models (CNN, ResNet, and LSTM with attention) is sho wn in Figure 2. Figure 1. Spectrogram grid used in the CLASP frame w ork Figure 2. CLASP pipeline architecture 4. SYSTEM ARCHITECTURE AND EXPERIMENT AL SET -UP The CLASP frame w ork consists of three main components optimized for respiratory audio signal analysis: pre-processing, feature e xtraction, and neural netw ork-based prediction. Each component is Int J Artif Intell, V ol. 15, No. 2, April 2026: 1733–1745 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 1737 designed to w ork seamlessly with the other , forming a comprehensi v e analysis pipeline. The audio signal pre-processing, MFCC computation, dynamic feature computation and the three models (LSTM, CNN, and ResNet architecture) are discussed in detail in the follo wing sections. 4.1. A udio signal pr e-pr ocessing The pre-processing phase con v erts ra w audio signals into a format that is suitable for analysis t hrough the follo wing steps: i) Sampling frequenc y ( f s ): audio signals are sampled at 22,050 Hz for high resolution. ii) W indo w size ( w s ): a windo w size of 20 ms corresponds to 441 samples. iii) Step size ( s s ): a step size of 10 ms ensures a 50% o v erlap between consecuti v e windo ws. i v) Fix ed number of windo ws: each se gment has 10 consecuti v e windo ws with total of 110 ms per se gment. T o se gment the signals for analysis, we apply a windo wing function. Mathematically , the process of se gmentation is e xpressed as (1). x i [ n ] = x [ n + is s ] , 0 n < N w (1) Where x i [ n ] represents the i -th se gment, and N w is the windo w size in samples. At the se gment boundaries spectral leakage is minimized by using Hamming windo w function to taper each se gment. This function is gi v en as (2). w [ n ] = 0 . 54 0 . 46 cos 2 π n N w 1 , 0 n < N w (2) The trade-of f between temporal precision and frequenc y resolution w as balanced by suitable choice of windo w function and o v erlap which w as decided by e xtensi v e e xperimentation. The Hamming windo w , in particular , w as chosen for its superior frequenc y resolution compared to rectangular windo ws, while maintaining good temporal resolution. 4.2. F eatur e extraction This section details MFCC computation and dynamic feature computation. STFT , Mel lterbank application, DCT , and the feature composition are in v olv ed in MFCC comput ation, whereas dynamic feature computation includes delta (rate of change of the cepstral coef cients) and delta-delta (rate of change of delta features). 4.2.1. Mel-fr equency cepstral coefcients computation The MFCC computation process to capture dif ferent aspects of the audio signal is as follo ws: i) STFT : STFT con v erts each windo wed signal into its frequenc y representation as (3). X i [ k ] = N w 1 X n =0 y i [ n ] e j 2 π k n/ N (3) Where y i [ n ] represents the windo wed signal and X i [ k ] gi v es its frequenc y components. ii) Mel lterbank application: no w , a set of Mel-scale lters is applied to the STFT magnitudes. This step maps the frequenc y components to Mel-scale, which better represents human auditory perception as (4). S i [ m ] = N / 2 X k =0 | X i [ k ] | 2 H m [ k ] , 0 m < M (4) Where H m [ k ] represents the m -th Mel-lter response, M is the total number of Mel-lters in the Mel lterbank, S i [ m ] is Mel-ltered spectral ener gy for the m -th Mel-lter at time frame i . iii) DCT : no w , MFCC is computed using the DCT as (5). c i [ n ] = M 1 X m =0 log ( S i [ m ]) cos π n ( m + 1 2 ) M 0 n < 13 (5) Where c i [ n ] is the n -th MFCC for the i -th time frame. i v) Feature composition: the nal feat ure v ector comprises 13 static MFCCs, delta c oef cients, and 13 delta-delta coef cients. ResNet based deep learning appr oac h for c hr onic obstructive pulmonary ... (Babitha Sudhakar Ullal) Evaluation Warning : The document was created with Spire.PDF for Python.
1738 ISSN: 2252-8938 4.2.2. Dynamic featur e computation T emporal v ariations in the audio signal are captured by computing dynamic features (deltas) from the MFCCs, as dened in (6). These features represent the rate of change of the cepstral coef cients and indicate ho w the spectral characteristics of the audio change o v er time. c f t = P Θ θ =1 θ ( f t + θ f t θ ) 2 P Θ θ =1 θ 2 (6) Where Θ = 3 denes computation windo w width and θ represents our time lag, f t is cepstral coef cient at time frame t , and f t is rst-order deri v ati v e of f t . Similarly , rate of change of delta features, 2 f t is (7). 2 f t = P Θ θ =1 θ (∆ f t + θ f t θ ) 2 P Θ θ =1 θ 2 (7) 4.3. Model ar chitectur es 4.3.1. Long short-term memory-based ar chitectur e The LSTM model processes sequential MFCC features using bidirectional long short-term memory (BiLSTM) layers follo wed by an attention mechanism to identify the most important se gments in the respiratory audio signals. Each of the tw o BiLSTM layers are follo wed by batch normalization and dropout layers. Attention mechanism helps to focus on important parts of the sequence. The nal layers include dense layers and sigmoid acti v ation for binary classication. Adam optimizer with learning r ate of 0.001 is used to train the model. LSTM model architecture with attention mechanism is sho wn in Figure 3. Figure 3. LSTM model architecture with attention mechanism Int J Artif Intell, V ol. 15, No. 2, April 2026: 1733–1745 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 1739 4.3.2. Con v olutional neural netw ork-based ar chitectur e The CNN model reshapes sequential MFCC features int o an image-lik e format and appli es con v olutional blocks to e xtract spatial and temporal feat ures. This approach treats the time-frequenc y representation of audio as a 2D image to le v erage the pattern recognition strength of CNNs. The architecture consists of three con v olutional blocks, each follo wed by batch normalization, max pooling, and dropout layers to pre v ent o v ertting. F or binary classication, the nal layer consists of global a v erage p ool ing, dense layers, and a sigmoid acti v ation. This model is trained using Adam optimizer and a learning rate of 0.001 is used. The architecture of CNN model is sho wn in Figure 4. Figure 4. CNN model architecture 4.3.3. ResNet-based ar chitectur e The ResNet model uses residual connections to enable deeper netw orks to learn comple x feat ure representations while a v oi ding v anishing gradients, ef fecti v ely capturing subtle respiratory audio patterns that distinguish COPD from other conditions. The architecture be gins with con v olutional layer and is follo wed by three stages of residual blocks. Each block includes tw o con v olutional layers, batch normalization follo wed by rectied linear unit (ReLU) ac ti v ation, along with a shortcut connection. The model concludes with global a v erage pooling, dense layers, and a sigmoid acti v ation. The skip connections in residual blocks allo w the gradient to o w more easily during backpropag ation, reducing the v anishing gradient problem. ResNet architecture is sho wn in Figure 5 and ResNet residual block s tructure is sho wn in Figure 6, where solid red arro w sho ws the projection pathw ay when dimensions change (1×1 con v olution), the dashed red arro w sho ws the direct skip connecti on when dimensi ons match. The model trai ning is don e using Adam optimizer and a learning rate of 0.001 is used and can be trained with focal loss to address class imbalance. 4.4. T raining and e v aluation 4.4.1. Dataset The dataset is tak en from Kaggle (respiratory sound database), which contains 920 audio recordings from 126 subjects. It includes samples of health y indi viduals and patients with v arious respiratory conditions, including COPD. This dataset w as augmented with signal transformations, resulting in a total of 800 recordings for COPD and Non-COPD, with a 80-20 split between training and testing sets. 4.4.2. T raining methodology This section outlines the training methodology of the CLASP frame w ork. The discussion is or g anized into three main components. These components are described as follo ws. i) Class imbalance bandling: the CLASP frame w ork implements three complementary strate gies to address class imbalance, a comm on issue to address in medical datasets. Data augmentation is done using time domain augmentation, feature domain augmentation, and sample mixing. Also, balanced training sets and focal loss with f alse ne g ati v e weighting are discussed in the follo wing sections. ResNet based deep learning appr oac h for c hr onic obstructive pulmonary ... (Babitha Sudhakar Ullal) Evaluation Warning : The document was created with Spire.PDF for Python.
1740 ISSN: 2252-8938 Data augmentation: the audio processor component applies tar geted augmentation to increase minority class repres entation: time-domain augmentation: time stretching/compression (±10%), pitch shifting (±2 sem itones); feature-domain augmentation: spectral masking, random feature scaling, additi v e Gaussian noise; and sample mixing: linear combination of similar -class samples with random weights. Balanced training sets: for each training epoch, the frame w ork dynamically creates balanced mini-batches: minority class samples are upsampled to match majority class frequenc y; create balanced subset function, which ensures equal class representation while maintaining di v ersity within classes; and random state initialization ensures reproducibility across training runs. F ocal loss with f alse ne g ati v e weighting: a specialized loss funct ion FL( p t ) is emplo yed to prioritize correct classication of COPD cases as in (8). FL ( p t ) = α t (1 p t ) γ log ( p t ) (8) Where p t is the probability of estimation of the model for the correct class, α t is f actor for balancing, and γ is focusing parameter . F or COPD detection specically , we further modify the loss function to assign a higher penalty to f alse ne g ati v es as in (9). FNFL ( y , ˆ y ) = ( FL ( ˆ y ) if y = 0 FL ( ˆ y ) · fn weight if y = 1 and ˆ y < 0 . 5 (9) Where fn weight is set to 3.0 by def ault, ef fecti v ely tripling the loss for missed COPD cases. Figure 5. ResNet architecture ii) Threshold optimization: CLASP uses a sensiti vity-weighted optimization to nd the optimal classication threshold instead of standard v alue of 0.5 all o wi ng for ne-tuning the specicity - sensiti vity trade-of f. Comparison of classication thresholds for each model is sho wn in Figure 7. The sensiti vity–specicity threshold selection method is as follo ws: Multiple candidate thresholds are e v aluated on v alidation data. F or each threshold, a weighted score is computed as in (10). Score ( θ ) = sensiti vity ( θ ) · w + specicity ( θ ) w + 1 (10) Int J Artif Intell, V ol. 15, No. 2, April 2026: 1733–1745 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Artif Intell ISSN: 2252-8938 1741 Where w is the sensiti vity weight (def ault =2.0). The threshold maximizing this weighted score is selected as optimal. Figure 6. ResNet residual block structure Figure 7. Comparison of classication thresholds iii) T raining conguration: the models are trained with the congurations as follo ws: Optimizer: Adam with learning rates 0.001. Batch normalization: stabilizes the training when applied after each major layer . Dropout: strate gic application with increasing rates deeper in the netw ork (20% to 50%). Early stopping: based on v alidation loss with patience of 10-15 epochs. Learning rate reduction: f actor of 0.2 when v alidation loss plateaus. Epochs: up to 50 for LSTM and ResNet,40 for CNN (typically terminating earlier due to early stopping). V alidation split: 15% of training data. These congurat ion choices were opted by conducting e xtensi v e e xperimentation and specically tuned for respiratory audio classication. 5. RESUL TS AND DISCUSSION The proposed CLASP frame w ork sho ws clear impro v ements o v er e xisting approaches and is v a lidated through performance e v aluations. A systematic comparison of three deep learning architectures re v eals optimal ResNet based deep learning appr oac h for c hr onic obstructive pulmonary ... (Babitha Sudhakar Ullal) Evaluation Warning : The document was created with Spire.PDF for Python.
1742 ISSN: 2252-8938 performance characteristics, underscoring its ef fecti v eness. The follo wing section discusses precision recall curv e comparison, recei v er operating characteristic (R OC) curv e, performance metric comparison, training time comparison, and state-of-the-art comparison. 5.1. Model perf ormance comparison The precision-recall curv e plots for the three model s discussed is sho wn in Figure 8. The graphs sho w ResNet with a highest a v erage precision (AP) of 0.994, follo wed by LSTM model with AP of 0.991 and CNN model with AP of 0.968. The R OC curv e for LSTM, CNN, and ResNet models are as in Figure 9 which ag ain sho ws ResNet model with highest A UC of 0.994, outperforming LSTM with A UC of 0.988 and CNN with A UC of 0.958. The performance comparison of the three models using accurac y , recall, precision, F1-score, specicity , and A UC as metrics is in Figure 10 and T able 1. The ResNet architecture demonstrates e xceptional diagnostic capability with accurac y of 97.90%, signicantly e xceeding the performance of e xisting approaches which typically achie v e 82% to 93% accurac y . Precision of 98.72% indicates e xceptional specicity in positi v e COPD identication, while the recall of 96.86% ensures minimal missed cases—a critical consideration for clinical screening applications where f als e ne g ati v es ha v e se v ere consequences for patient outcomes. The training time comparison of all the 3 models is as in Figure 11 with ResNet taking maximum training time of 32.2 seconds while 11.3 seconds for CNN and 10.9 seconds for LSTM. But considering the other metrics (as in T able 1), this delay is f ar more ne gligible gi v en the substantial accurac y impro v ements and the non-real-time nature of diagnostic screening applications. Also, a comparison between the proposed ResNet architecture and e xisting approaches that use only audio si gnals as input to their model and achie v e accuracies abo v e 90% is sho wn in Figure 12. The accurac y of 93% is the w ork of [17], 92.86% accurac y is the w ork of [24], and 95% is the w ork e x ecuted by [25]. COPD prediction m odel e v aluation: the confusion matrix v alues for LSTM, CNN, and ResNet models pro vide v aluable insights into the classication performance of each model and is as follo ws: LSTM: true ne g ati v es: 171, f alse positi v es: 3, f alse ne g ati v es: 7, and true positi v es: 152. CNN: true ne g ati v es: 170, f alse positi v es: 4, f alse ne g ati v es: 20, and true positi v es: 139. ResNet: true ne g ati v es: 172, f alse positi v es: 2, f alse ne g ati v es: 5, and true positi v es: 154. During v alidation the optimal threshold v alue obtained is 0.15 for LSTM, 0.29 for CNN and 0.36 for ResNet. The learning curv es for all three models sho wed an appropriate con v er gence beha vior , with v alidation metrics closely tracking training metrics, suggesting good generalization without signicant o v ertting. The ResNet model in particular demonstrated e xcellent stability in both loss and accurac y metrics during the training process. The 0.9937 A UC score indicates near -perfect discriminati v e capability , while the balanced precision-recal l characteristics pro vide optimal trade-of fs for clinical deplo yment. Based on this comprehensi v e e v aluation, the ResNet architecture is recommended for deplo yment in the CLASP system, as it pro vides best diagnostic accurac y and reliabili ty for clinical applications, with 96.86% recall and 98.85% specicity which is important for clinical use to minimize missed cases, despite its computational requirements. Figure 8. Precision-recall curv e comparison Figure 9. R OC curv e comparison Int J Artif Intell, V ol. 15, No. 2, April 2026: 1733–1745 Evaluation Warning : The document was created with Spire.PDF for Python.