Influence of Geospatial Features on Performance of Machine Learning Models for Predicting Water Consumption by Poor Urban Households
Abstract
Several studies have applied various techniques to model and predict water consumption in urban areas, but none has been found that integrates geospatial technology and machine learning techniques to predict water consumption by poor urban households. Using an integrated tool of geospatial technology and machine learning techniques, this study examines the influence of geospatial features on performance of machine learning models for predicting volume of water consumed by poor urban households. Historical data of daily volume of water consumed was gathered through questionnaires, and integrated with socioeconomic data, weather data, property data and geospatial data using geospatial technologies. The datasets were passed through Pearson correlation algorithm to select few features that correlate with the target variable. The selected features were inputted into four predictive models built with four machine learning techniques – Multilinear Regression (MLR), Random Forest (RF), Support Vector Regression (SVR), and Artificial Neural Networks (ANN). Three error metrics, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and R squared (R2) score were used to measure the model performances. The results show that including GSF considerably improved MLR, RF, SVR and ANN model performances as their RMSE decreased from 139 litres to 116 litres, 77 litres to 57 litres, 130 litres to 110 litres, and 110 litres to 53 litres respectively during training. However, significance test at 95% confidence level carried out on the results shows that when GSF is included as input into each of the models, improvement in model performance is not significant for MLR, RF and SVR.