If you search a bit, GPs are at the least very similar to Kriging, possibly with conditional simulation (https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process-regression-gpr).
To illustrate, we also give a 1D example for Gaussian Process Regression.
The first line is: “In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression“…
OK, well, now you know that you have been doing a form of Machine Learning most of your modelling life. I bet you already will sleep a lot better…
Now, let us focus on Support Vector Machines (SVMs)
Above, we already mentioned implicit modelling as being one of the traditional methods to define a boundary between categorical data. To this end the categorical data is typically converted to numeric data. In this conversion process the unit to model is labelled with positive indicator points, the boundary points with a zero value and all other units are converted to negative values. In most cases the inside values will be converted to +1, the outside values to -1. This might not always be obvious, but is what actually happens inside the software (others might not use indicator values, but positive and negative distance values, but the principle is the same). It means the categorical data is turned into classification data [-1, 0, +1].
In implicit modelling, so-called Radial Basis Functions (RBFs) are fitted through all the points with their values as if the values were on a continuous scale. It has been shown that fitting RBFs in this way is akin to Kriging already in 1996
The most important part to remember about this is that a radial basis function is used at each point in the data set. Such functions are typically a Gaussian or other smooth function. A single RBF will produce a value that is dependent on the distance from a point in the original data to another point in that data set. Doing this for each point to every other point you end up with many RBFs. This becomes inefficient for points far away, so clever techniques have been developed to make that a lot faster.
The aim is then to fit, in other words learn, the weights of each of these functions (so, one weight per function) to reproduce the original data.
Now compare that to Support Vector Machines (SVMs). In SVMs also Radial Basis Functions (RBFs) are used to get a relation between each point in a data set and every other point in that data set. However, in the case of SVMs the RBF is called a kernel. Similar to fitting RBFs, for each point in the data the kernel is used to calculate a measure related to the distance to every other point. All these values are treated as a (very) long vector associated with that particular point. This means you have a large set of very long vectors. Due to efficient mathematics involving dot products (a dot product of two vectors returns the ‘angle’ between them) points that have high similarity (a small angle) are used to separate the groups of points, others are discarded. Using these vectors a so-called hyperplane can be identified that will separate the +1 points from the -1 points. This hyperplane is guaranteed to be the best fit plane separating the groups (compare it a bit to Least Square fitting). So, although the technique is slightly different to direct fitting of RBFs, the basis is roughly the same. Especially when optimized RBFs are considered that actually filter or group points together (e.g. for fast RBF implementations; https://www.researchgate.net/publication/2931421_Reconstruction_and_Representation_of_3D_Objects_With_Radial_Basis_Functions)
The important difference to remember is that RBFs (without speed improvements) will fit exactly to the known points, most notably the boundary. SVMs on the other hand will produce a boundary that is the best fit between two groups of points. This means its boundary might not always honour the points on the boundary exactly if inside and outside points are not equally far from the boundary. The advantage of SVMs is that they will ‘learn’ really quickly, but also will produce very good estimates in areas where the contact is not very clear or uncertain. Also, in SVMs, a penalty can be applied to wrongfully classifying a number of points. This will control the accuracy of the separation, just like in RBFs we do not always fit exactly through the points to improve the speed (by setting an accuracy value).
To illustrate a bit further, on the right we show examples for separating two groups of points (https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47). In the top one, shows a number of possible separations, the bottom only shows the optimal separation. It almost becomes like a Kriging estimate (optimal plane) with conditional simulation (other possible planes).
The optimal separation plane, when converted back to 3D is the estimated contact between two unit.