Document Type


Publication Date


Publication Title

Cement and Concrete Composites


Machine learning (ML)-based prediction of non-linear composition-strength relationship in concretes requires a large, complete, and consistent dataset. However, the availability of such datasets is limited as the datasets often suffer from incompleteness because of missing data corresponding to different input features, which makes the development of robust ML-based predictive models challenging. Besides, as the degree of complexity in these ML models increases, the interpretation of the results becomes challenging. These interpretations of results are critical towards the development of efficient materials design strategies for enhanced materials performance. To address these challenges, this paper implements different data imputation approaches for enhanced dataset completeness. The imputed dataset is leveraged to predict the compressive and tensile strength of concrete using various hyperparameter-optimized ML approaches. Among all the approaches, Extreme Gradient Boosted Decision Trees (XGBoost) showed the highest prediction efficacy when the dataset is imputed using k-nearest neighbors (kNN) with a 10-neighbor configuration. To interpret the predicted results, SHapley Additive exPlanations (SHAP) is employed. Overall, by implementing efficient combinations of data imputation approach, machine learning, and data interpretation, this paper develops an efficient approach to evaluate the composition- strength relationship in concrete. This work, in turn, can be used as a starting point toward the design and development of various performance-enhanced and sustainable concretes.


Machine learning, Concrete strength, Missing data, Data imputation, SHAP







Originally published in Cement and Concrete Composites, Volume 128, April 2022, 104414.