Page 1 of 12

Transactions on Machine Learning and Artificial Intelligence - Vol. 9, No. 4

Publication Date: August, 25, 2021

DOI:10.14738/tmlai.94.10281. Namaswa, S., Githiri, J., Mariita, N., K’Orowe, M. O., & Njiru, N. (2021). Application of Data-Driven Discovery Machine Learning

Algorithms in Predicting Geothermal Reservoir Temperature from Geophysical Resistivity Method. Transactions on Machine

Learning and Artificial Intelligence, 9(4). 01-12.

Services for Science and Education – United Kingdom

Application of Data-Driven Discovery Machine Learning

Algorithms in Predicting Geothermal Reservoir Temperature

from Geophysical Resistivity Method

Solomon Namaswa

Physics Department, Jomo Kenyatta

University of Agriculture and Technology, Kenya

Department of Physics, Multimedia University of Kenya

John Githiri

Physics Department, Jomo Kenyatta

University of Agriculture and Technology, Kenya

Nicholas Mariita

Geothermal Training and Research Institute

Dedan Kimathi University of Technology, Kenya

Maurice O. k’Orowe

Physics Department, Jomo Kenyatta

University of Agriculture and Technology, Kenya

Nicholas Njiru

Department of Computer Science, Multimedia University of Kenya

ABSTRACT

Geophysical methods including seismology, resistivity, gravity, magnetic and

electromagnetic have been put in use for geothermal resource mapping at the Great

Olkaria Geothermal field for decades. Reservoir temperature distribution and the

electrical conductivity of rocks mainly depend on the same parameters such

permeability, porosity, fluid salinity and temperature. This research focused on the

integration of Olkaria Domes geothermal well testing temperature and geophysical

Electromagnetic resistivity data with the aim of establishing an alternative

estimation method for temperature of the reservoir through machine learning

Analytics. To achieve this, Data-Driven Discovery Predictive Model Algorithm was

built using Python programming language on Anaconda framework. The open- source web based application Jupyter Notebook for coding and visualization was

used. Decision Tree Regression, Adaptive Booster Regression, Support Vector

Regression and Random Forest Regression were used. The model performance was

evaluated using R-Score and Mean Absolute Error metrics. Based on these

performance score, the best performing model was suggested to predict subsurface

temperature from resistivity. Training the model using the DTR algorithm approach

provides superior outputs with R2 of 0.81 and lowest MAE of 29.8. The DTR

algorithm could be implemented in determination of subsurface Temperature from

resistivity in high temperature hydrothermal fields.

Page 2 of 12

2

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 9, Issue 4, August - 2021

Services for Science and Education – United Kingdom

Key words: Geothermal Reservoir, Temperature, Resistivity, Data-driven modelling,

Machine Learning

INTRODUCTION

A valuable large volume of heat bearing formations that has the ability to supply a considerable

amount of energy if suitable extraction techniques are applied for exploitation is contained in

the earth’s subsurface (Hu, 2016). For proper exploitation of this resource, a comprehensive

and integrated reservoir analysis from exploration to production monitoring need to be

implemented (Aragón et al., 2019). Prediction and mapping of reservoir properties in the

subsurface requires the integration of knowledge as a result of recognition of the importance

and advantages of integrated reservoir studies (Zhu et al., 2020).

The approaches employed in determination of some of subsurface properties such as well

drilling for subsurface temperature measurements are time-consuming and expensive.

Relating and integrating these reservoir properties with the geophysical rock resistivity can be

highly effective, since the reservoir temperature distribution and the electrical conductivity of

rocks mainly depend on the same parameters such as porosity, permeability, tortuosity and

pore geometry (Adebayo et al., 2019). The electrical conductivity of reservoir rocks and

aquifers can be measured at the surface using geophysical survey methods such as

Electromagnetic methods. However subsurface temperature can only be obtained through well

logging after drilling the well. Therefore, a method for reliably estimating subsurface

temperatures from surface electrical conductivity measurements would be beneficial.

DATA-DRIVEN DISCOVERY PREDICTIVE MODELING

Data-Driven Discovery Predictive Modeling is defined as the construction of robust empirical

models of real or complex systems with the intension of helping decision makers establish the

relationship between the input and output parameters of the system without having a proper

understanding of the physical behavior of the system (Mont ́ans et al., 2019). This is illustrated

in Figure 1.

Figure 1: Data Driven Modelling Approach (Abrahart et al., 2008)

The technique is able to build models that characterize the performance of corresponding

physical processes through the study of relevant data characterizing the systems of interest by

inferring the dependencies between system inputs and outputs using certain learning

algorithms without building the complex physical models (Lee et al., 2019).

Page 3 of 12

3

Namaswa, S., Githiri, J., Mariita, N., K’Orowe, M. O., & Njiru, N. (2021). Application of Data-Driven Discovery Machine Learning Algorithms in

Predicting Geothermal Reservoir Temperature from Geophysical Resistivity Method. Transactions on Machine Learning and Artificial Intelligence,

9(4). 01-12.

URL: http://dx.doi.org/10.14738/tmlai.94.10281.

Machine Learning

Machine learning is the process of iteratively training models to learn from data through

experience without being programmed explicitly, without a human intervention unlike

traditional programming methodology. ML improves learning when given new unseen data by

use of diverse ML algorithms to help non-experts in ML in Decision making. Machine learning

algorithms are classified into three types; supervised, unsupervised and reinforced learning

algorithms (Schmidt et al., 2019).

Supervised is further divided into Classification and Regression algorithms (CART).

Classification algorithms are used for data that are categorical in nature, that is, the expected

output is categorized into classes and used in the prediction of classes. In regression is a

supervised ML algorithms, the model trained are applied in continuous data values where a

continuous real value prediction is expected. The regression problems are further divided into

Linear and Non-Linear Regression (Lundkvist, 2014). Linear regression is applied where the

data to be learned exhibits a linear relationship with its main goal is to fit a linear equation to

the given data in a n-dimensional space. Non-Linear or also known as curvilinear regressions

as used in learning data that exhibit non-linearity in data. In Curvilinear models are achieved

by setting higher-order of the predictor variables. In supervised learning algorithms, the

training model is provided with example data that is human label. Examples of the common

popular supervised learning algorithms are Linear Regression, Support Vector Machine (SVM),

Naïve-Bayes, Logistic Regression, Classification and Regression Techniques (CART), Decision

Trees and K-Nearest Neighbors (Daniya et al., 2020). Unsupervised learning algorithms are

provided with unlabeled data to enable the hidden structure or patterns in data. The

unsupervised learning is also divided into clustering, dimensionality reduction and association.

Clustering is a technique that is used to group observations into natural groupings that are

similar to each other as clusters (Njiru et al., 2018). Dimensionality reduction is a technique

that is used to reduce the number of features in a dataset to ensure the most important

information is conveyed. The dimension of the dataset is reduced for a high- dimensional to a

low-dimensional space. Having a very high dimension space makes visualization of the features

to become very poor. Association is an algorithm used to find the probability of co-occurrence

of items in a data set. Examples of unsupervised algorithms are K-means, Principal Components

Analysis (PCA), Apriori algorithms, Linear Discriminant Analysis, Latent Dirichlet Allocation

(LDA), Hierarchical clustering, Fuzzy C-means, Singular Value Decomposition (SVD) and

Independent Component Allocations (ICA). Reinforcement learning is an algorithm where the

machine learning model learns from a dynamic environment through a trial and error method

by use of rewards and penalties to the learner based on its performance (Mirza et al., 2019).

Regression is a predictive modeling technique that has been proven scientifically to be

predicting the future and causal inference. It is a long established statistical procedure that has

been adapted in AI and ML because it models re understandable. The choice of a machine

learning algorithm in ML is dependent with the type of data. Regression is adapted in numerical

data whose value of prediction is a single value hence the choice of the algorithm in this

research.

Decision Tree Regression (DTR) is a technique that is used for fitting a sine curve on a model

where the continuous value is being sought. It can be used for building classification or

regression models. The model observes the features of a dataset to learn the tree structure in

the dataset for predicting a real value output. The dataset is broken down into decision nodes