According to a new study, a machine learning model provided an early-stage sustainability assessment tool for one of the most emissions-intensive businesses in the world by accurately predicting carbon emissions from building projects with over 73 per cent accuracy.
With an R-squared value of 0.734 on project data that had not yet been examined, the study discovered that a Random Forest model performed the best in predicting carbon emissions throughout the building phase.
A machine learning approach called the Random Forest model creates numerous decision trees and aggregates their output to generate a more reliable and accurate result.
Additionally, the model classified projects into low-, medium-, and high-emission groups with the maximum accuracy. The results, which were published in “Scientific Reports”, were based on survey data from 150 construction projects in significant Saudi cities. The study looked at how supervised machine learning can help with early sustainability decisions in the construction industry, which is a major source of waste, energy use, and carbon emissions.
According to the report’s principal author, Ahmed Ali A. Shohan, “accurate prediction at early stages can significantly improve decision-making and sustainability outcomes in construction projects.” Three supervised machine learning models were compared in the analysis: Extreme Gradient Boosting, Random Forest, and Support Vector Machine.
The machine learning algorithms SVM and EGB are used for regression and classification with the goal of reducing prediction error on unknown data. Using 19 project and sustainability criteria, each model was trained to forecast overall carbon emissions and categorise projects into emission-level groups.
The Random Forest model regularly performed better than the other models. With a test-set accuracy of 78% for emission category prediction, it produced the best classification results and the highest regression accuracy.
With an R-squared value of 0.728 on the independent test set, EGB came in second. SVM performed worse, especially when it came to calculating emissions for large-scale projects.
To lessen overfitting and produce more accurate generalisation estimates, a hierarchical cross-validation framework was used for both training and validating all models. Final testing was conducted using a 30 per cent holdout dataset. Beyond prediction accuracy, the study looked at the elements that had the biggest impact on carbon emissions during the construction period. Across all models, waste production, energy use, and project time were found to be the most significant predictors.
According to Shohan, “These factors consistently showed the highest contribution to emission predictions.” Carbon emissions and waste creation had the biggest positive correlation. Due to the heavy reliance on electricity and diesel-powered equipment during construction, energy consumption came in second.
Higher emissions were also linked to longer project durations, underscoring the connection between construction-phase environmental performance and scheduling effectiveness. The dataset’s carbon emissions varied greatly, ranging from roughly 1,000 tonnes to over 450,000 tonnes, which reflected a large variety in project scale and operational intensity.
For high-emission projects, especially those at the higher end of the emissions range, prediction errors rose significantly. Nonetheless, it was demonstrated that ensemble models handled this variability better than kernel-based methods.
A structured questionnaire filled out by contractors, consultants, and clients working on projects in Riyadh, Jeddah, Makkah, Dammam, Abha, and Madinah was used to create the dataset. Following statistical screening for multicollinearity and redundancy, 19 verified indicators were kept.
Environmental factors like waste management, energy use, and carbon emissions were deemed extremely essential by respondents. Additionally, economic and social issues received high ratings, suggesting that the sector as a whole recognises integrated sustainability concerns.













Comments are closed