Page 1 of 12
Transactions on Machine Learning and Artificial Intelligence - Vol. 9, No. 4
Publication Date: August, 25, 2021
DOI:10.14738/tmlai.94.10281. Namaswa, S., Githiri, J., Mariita, N., K’Orowe, M. O., & Njiru, N. (2021). Application of Data-Driven Discovery Machine Learning
Algorithms in Predicting Geothermal Reservoir Temperature from Geophysical Resistivity Method. Transactions on Machine
Learning and Artificial Intelligence, 9(4). 01-12.
Services for Science and Education – United Kingdom
Application of Data-Driven Discovery Machine Learning
Algorithms in Predicting Geothermal Reservoir Temperature
from Geophysical Resistivity Method
Solomon Namaswa
Physics Department, Jomo Kenyatta
University of Agriculture and Technology, Kenya
Department of Physics, Multimedia University of Kenya
John Githiri
Physics Department, Jomo Kenyatta
University of Agriculture and Technology, Kenya
Nicholas Mariita
Geothermal Training and Research Institute
Dedan Kimathi University of Technology, Kenya
Maurice O. k’Orowe
Physics Department, Jomo Kenyatta
University of Agriculture and Technology, Kenya
Nicholas Njiru
Department of Computer Science, Multimedia University of Kenya
ABSTRACT
Geophysical methods including seismology, resistivity, gravity, magnetic and
electromagnetic have been put in use for geothermal resource mapping at the Great
Olkaria Geothermal field for decades. Reservoir temperature distribution and the
electrical conductivity of rocks mainly depend on the same parameters such
permeability, porosity, fluid salinity and temperature. This research focused on the
integration of Olkaria Domes geothermal well testing temperature and geophysical
Electromagnetic resistivity data with the aim of establishing an alternative
estimation method for temperature of the reservoir through machine learning
Analytics. To achieve this, Data-Driven Discovery Predictive Model Algorithm was
built using Python programming language on Anaconda framework. The open- source web based application Jupyter Notebook for coding and visualization was
used. Decision Tree Regression, Adaptive Booster Regression, Support Vector
Regression and Random Forest Regression were used. The model performance was
evaluated using R-Score and Mean Absolute Error metrics. Based on these
performance score, the best performing model was suggested to predict subsurface
temperature from resistivity. Training the model using the DTR algorithm approach
provides superior outputs with R2 of 0.81 and lowest MAE of 29.8. The DTR
algorithm could be implemented in determination of subsurface Temperature from
resistivity in high temperature hydrothermal fields.
Page 2 of 12
2
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 9, Issue 4, August - 2021
Services for Science and Education – United Kingdom
Key words: Geothermal Reservoir, Temperature, Resistivity, Data-driven modelling,
Machine Learning
INTRODUCTION
A valuable large volume of heat bearing formations that has the ability to supply a considerable
amount of energy if suitable extraction techniques are applied for exploitation is contained in
the earth’s subsurface (Hu, 2016). For proper exploitation of this resource, a comprehensive
and integrated reservoir analysis from exploration to production monitoring need to be
implemented (Aragón et al., 2019). Prediction and mapping of reservoir properties in the
subsurface requires the integration of knowledge as a result of recognition of the importance
and advantages of integrated reservoir studies (Zhu et al., 2020).
The approaches employed in determination of some of subsurface properties such as well
drilling for subsurface temperature measurements are time-consuming and expensive.
Relating and integrating these reservoir properties with the geophysical rock resistivity can be
highly effective, since the reservoir temperature distribution and the electrical conductivity of
rocks mainly depend on the same parameters such as porosity, permeability, tortuosity and
pore geometry (Adebayo et al., 2019). The electrical conductivity of reservoir rocks and
aquifers can be measured at the surface using geophysical survey methods such as
Electromagnetic methods. However subsurface temperature can only be obtained through well
logging after drilling the well. Therefore, a method for reliably estimating subsurface
temperatures from surface electrical conductivity measurements would be beneficial.
DATA-DRIVEN DISCOVERY PREDICTIVE MODELING
Data-Driven Discovery Predictive Modeling is defined as the construction of robust empirical
models of real or complex systems with the intension of helping decision makers establish the
relationship between the input and output parameters of the system without having a proper
understanding of the physical behavior of the system (Mont ́ans et al., 2019). This is illustrated
in Figure 1.
Figure 1: Data Driven Modelling Approach (Abrahart et al., 2008)
The technique is able to build models that characterize the performance of corresponding
physical processes through the study of relevant data characterizing the systems of interest by
inferring the dependencies between system inputs and outputs using certain learning
algorithms without building the complex physical models (Lee et al., 2019).
Page 3 of 12
3
Namaswa, S., Githiri, J., Mariita, N., K’Orowe, M. O., & Njiru, N. (2021). Application of Data-Driven Discovery Machine Learning Algorithms in
Predicting Geothermal Reservoir Temperature from Geophysical Resistivity Method. Transactions on Machine Learning and Artificial Intelligence,
9(4). 01-12.
URL: http://dx.doi.org/10.14738/tmlai.94.10281.
Machine Learning
Machine learning is the process of iteratively training models to learn from data through
experience without being programmed explicitly, without a human intervention unlike
traditional programming methodology. ML improves learning when given new unseen data by
use of diverse ML algorithms to help non-experts in ML in Decision making. Machine learning
algorithms are classified into three types; supervised, unsupervised and reinforced learning
algorithms (Schmidt et al., 2019).
Supervised is further divided into Classification and Regression algorithms (CART).
Classification algorithms are used for data that are categorical in nature, that is, the expected
output is categorized into classes and used in the prediction of classes. In regression is a
supervised ML algorithms, the model trained are applied in continuous data values where a
continuous real value prediction is expected. The regression problems are further divided into
Linear and Non-Linear Regression (Lundkvist, 2014). Linear regression is applied where the
data to be learned exhibits a linear relationship with its main goal is to fit a linear equation to
the given data in a n-dimensional space. Non-Linear or also known as curvilinear regressions
as used in learning data that exhibit non-linearity in data. In Curvilinear models are achieved
by setting higher-order of the predictor variables. In supervised learning algorithms, the
training model is provided with example data that is human label. Examples of the common
popular supervised learning algorithms are Linear Regression, Support Vector Machine (SVM),
Naïve-Bayes, Logistic Regression, Classification and Regression Techniques (CART), Decision
Trees and K-Nearest Neighbors (Daniya et al., 2020). Unsupervised learning algorithms are
provided with unlabeled data to enable the hidden structure or patterns in data. The
unsupervised learning is also divided into clustering, dimensionality reduction and association.
Clustering is a technique that is used to group observations into natural groupings that are
similar to each other as clusters (Njiru et al., 2018). Dimensionality reduction is a technique
that is used to reduce the number of features in a dataset to ensure the most important
information is conveyed. The dimension of the dataset is reduced for a high- dimensional to a
low-dimensional space. Having a very high dimension space makes visualization of the features
to become very poor. Association is an algorithm used to find the probability of co-occurrence
of items in a data set. Examples of unsupervised algorithms are K-means, Principal Components
Analysis (PCA), Apriori algorithms, Linear Discriminant Analysis, Latent Dirichlet Allocation
(LDA), Hierarchical clustering, Fuzzy C-means, Singular Value Decomposition (SVD) and
Independent Component Allocations (ICA). Reinforcement learning is an algorithm where the
machine learning model learns from a dynamic environment through a trial and error method
by use of rewards and penalties to the learner based on its performance (Mirza et al., 2019).
Regression is a predictive modeling technique that has been proven scientifically to be
predicting the future and causal inference. It is a long established statistical procedure that has
been adapted in AI and ML because it models re understandable. The choice of a machine
learning algorithm in ML is dependent with the type of data. Regression is adapted in numerical
data whose value of prediction is a single value hence the choice of the algorithm in this
research.
Decision Tree Regression (DTR) is a technique that is used for fitting a sine curve on a model
where the continuous value is being sought. It can be used for building classification or
regression models. The model observes the features of a dataset to learn the tree structure in
the dataset for predicting a real value output. The dataset is broken down into decision nodes