Companies
GEOREKA Software
Software
GEOREKA - Machine Learning (ML) ...

GEOREKA - Machine Learning (ML) Demystified Software

Machine Learning (ML) techniques, often perceived as complex, have been increasingly discussed across various industries, including mining. Despite some skepticism regarding their effectiveness, ML offers valuable solutions when correctly understood and applied. Key ML methodologies used in geological modelling include Neural Networks (NNs), Support Vector Machines (SVMs), and Gaussian Processes (GPs). Each of these has unique features; for instance, GPs are akin to Kriging, allowing for variance estimation between known data points. SVMs efficiently separate data categories using hyperplanes. Meanwhile, NNs, while computationally intensive to train, are excellent for function approximation once trained. However, a major challenge is integrating ML with geological knowledge, given ML techniques don't inherently understand geological parameters like anisotropy. Thus, success relies on adept data preparation and geological understanding. By incorporating variography with ML, geologists could optimize their modelling efforts, thereby demonstrating the continued relevance of traditional geological techniques in conjunction with contemporary ML tools.

Three main techniques

Within ML there are many variants for training as illustrated by the list mentioned on the Scikit-learn webpage (https://scikit-learn.org/stable/supervised_learning.html#supervised-learning). For our purposes, geological modelling, three main training techniques stand out and will be the ones we will focus on here: Neural Networks (NNs), Support Vector Machines (SVMs) and Gaussian Processes (GPs). Searching the net you’ll find that all three are in turn closely related. The mathematics is beyond the scope of this article, and to some extend even beyond our capabilities to understand in-depth. However, we’ll highlight some of the similarities and differences related to existing modelling techniques to put ML into perspective.

Before we continue to delve into the main three techniques, just a quick word on what we are actually trying to do in Geological modelling. With some abstraction we can distinguish two main goals: classification (modelling categorical data like lithology) and numeric interpolation also called regression (e.g. for assays).

Let’s focus on classification first

In geology, we try to model geological features, like rock type or certain zones in 3D space. Traditionally this is done by defining a boundary between units. A typical approach would connect the boundary points (found down drill logs) between the unit to model, let’s call it unit A. To model unit A the contact points with any other unit would be extracted and connected, either using digitizing or implicit modelling techniques.

Now, let’s return to our ML techniques. Modelling categorical data as described above amounts to a classification between unit A and everything else. Any of the techniques described above is capable of classification of data, but there are important differences.

We’ll start with Gaussian Processes (GPs)

Gaussian Processes (also called Gaussian Process Regression) are very powerful and have a unique capability compared to the other two techniques we mentioned above: not only can it estimate the boundary (the contact) between a unit A and other units, it also produces the variance between known points. The best is to compare it with implicit modelling where a surface is created through all contact points. What happens in between the points is the interesting bit. With implicit modelling we only obtain a single surface, assumed to be the best fit. Gaussian Processes also produce this boundary, but also give a sense of reliability in between the known points… But hold on… does that not seem a little familiar? Do you know about a technique in geological modelling or geostats that is very similar. Let me help you here. Within geostats there is something called Conditional Simulation. A best guess is created using Kriging and uses the variance around that best fit. In conditional simulation we produce estimates, e.g. inside blocks, to create some ‘noise’ around the Kriging estimate to better re-create reality. To this end the Kriging estimate is used with noise added based on the estimated variance. Here, conditional is referring to the condition that we have known points that need to be honoured. In 1D, Kriging with variance would could like the image on the right.

If you search a bit, GPs are at the least very similar to Kriging, possibly with conditional simulation (https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process-regression-gpr).

To illustrate, we also give a 1D example for Gaussian Process Regression.

If you click the link to the fundamental webpage on Gaussian Processes (http://gaussianprocess.org), notice how on the main webpage Kriging is listed as one of the recommended books.

But, moreover, check the wikipedia page on Kriging itself:

The first line is: “In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression“…

So, yes, this “Machine Learning technique” is nothing new to geology! It seems to actually be derived from techniques used in the industry for decades!!

OK, well, now you know that you have been doing a form of Machine Learning most of your modelling life. I bet you already will sleep a lot better…

Now, let us focus on Support Vector Machines (SVMs)

Above, we already mentioned implicit modelling as being one of the traditional methods to define a boundary between categorical data. To this end the categorical data is typically converted to numeric data. In this conversion process the unit to model is labelled with positive indicator points, the boundary points with a zero value and all other units are converted to negative values. In most cases the inside values will be converted to +1, the outside values to -1. This might not always be obvious, but is what actually happens inside the software (others might not use indicator values, but positive and negative distance values, but the principle is the same). It means the categorical data is turned into classification data [-1, 0, +1].

In implicit modelling, so-called Radial Basis Functions (RBFs) are fitted through all the points with their values as if the values were on a continuous scale. It has been shown that fitting RBFs in this way is akin to Kriging already in 1996

The most important part to remember about this is that a radial basis function is used at each point in the data set. Such functions are typically a Gaussian or other smooth function. A single RBF will produce a value that is dependent on the distance from a point in the original data to another point in that data set. Doing this for each point to every other point you end up with many RBFs. This becomes inefficient for points far away, so clever techniques have been developed to make that a lot faster.

The aim is then to fit, in other words learn, the weights of each of these functions (so, one weight per function) to reproduce the original data.

Now compare that to Support Vector Machines (SVMs). In SVMs also Radial Basis Functions (RBFs) are used to get a relation between each point in a data set and every other point in that data set. However, in the case of SVMs the RBF is called a kernel. Similar to fitting RBFs, for each point in the data the kernel is used to calculate a measure related to the distance to every other point. All these values are treated as a (very) long vector associated with that particular point. This means you have a large set of very long vectors. Due to efficient mathematics involving dot products (a dot product of two vectors returns the ‘angle’ between them) points that have high similarity (a small angle) are used to separate the groups of points, others are discarded. Using these vectors a so-called hyperplane can be identified that will separate the +1 points from the -1 points. This hyperplane is guaranteed to be the best fit plane separating the groups (compare it a bit to Least Square fitting). So, although the technique is slightly different to direct fitting of RBFs, the basis is roughly the same. Especially when optimized RBFs are considered that actually filter or group points together (e.g. for fast RBF implementations; https://www.researchgate.net/publication/2931421_Reconstruction_and_Representation_of_3D_Objects_With_Radial_Basis_Functions)

The important difference to remember is that RBFs (without speed improvements) will fit exactly to the known points, most notably the boundary. SVMs on the other hand will produce a boundary that is the best fit between two groups of points. This means its boundary might not always honour the points on the boundary exactly if inside and outside points are not equally far from the boundary. The advantage of SVMs is that they will ‘learn’ really quickly, but also will produce very good estimates in areas where the contact is not very clear or uncertain. Also, in SVMs, a penalty can be applied to wrongfully classifying a number of points. This will control the accuracy of the separation, just like in RBFs we do not always fit exactly through the points to improve the speed (by setting an accuracy value).

To illustrate a bit further, on the right we show examples for separating two groups of points (https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47). In the top one, shows a number of possible separations, the bottom only shows the optimal separation. It almost becomes like a Kriging estimate (optimal plane) with conditional simulation (other possible planes).

The optimal separation plane, when converted back to 3D is the estimated contact between two unit.

GEOREKA - Machine Learning (ML) Demystified Software

Details

Three main techniques

We’ll start with Gaussian Processes (GPs)

So, yes, this “Machine Learning technique” is nothing new to geology! It seems to actually be derived from techniques used in the industry for decades!!

Now, let us focus on Support Vector Machines (SVMs)