Investigation on low-performance tuned-regressor of inhibitory concentration targeting the SARS-CoV-2 polyprotein 1ab
International Journal of Artificial Intelligence
 
        Abstract
Hyperparameter tuning is a key optimization strategy in machine learning (ML), often used with GridSearchCV to find optimal hyperparameter combinations. This study aimed to predict the half-maximal inhibitory concentration (IC50) of small molecules targeting the SARS-CoV-2 replicase polyprotein 1ab (pp1ab) by optimizing three ML algorithms: histogram gradient boosting regressor (HGBR), light gradient boosting regressor (LGBR), and random forest regressor (RFR). Bioactivity data, including duplicates, were processed using three approaches: untreated, aggregation of quantitative bioactivity, and duplicate removal. Molecular features were encoded using twelve types of molecular fingerprints. To optimize the models, hyperparameter tuning with GridSearchCV was applied across a broad parameter space. The results showed that the performance of the models was inconsistent, despite comprehensive hyperparameter tuning. Further analysis showed that the distribution of Murcko fragments was uneven between the training and testing datasets. Key fragments were underrepresented in the testing phase, leading to a mismatch in model predictions. The study demonstrates that hyperparameter tuning alone may not be sufficient to achieve high predictive performance when the distribution of molecular fragments is unbalanced between training and testing datasets. Ensuring fragment diversity across datasets is crucial for improving model reliability in drug discovery applications.
Discover Our Library
                        Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.






