dGB Earth Sciences - XGBoost NPHI Prediction

: Written by: Paul de Groot; Published: 23 February 2023

In today’s post we show how Machine Learning can be used to predict a missing log. The model is called XGBoost (eXtreme Gradient Boosting). This is an extremely fast ensemble technique in which the overall performance of a base model is improved through boosting.

In this case the base model used is a Random Forest (RF) model. A Random Forest is an ensemble of decision trees that are averaged at the end. In boosting, a stronger predictor (the ensemble of RF models) is created by combining weaker predictors (the RF models) sequentially, assigning weights to the output of the RF models. Higher weights are assigned to incorrect classifications from the first RF model after which the weighted input is passed to the next RF model. After numerous cycles, the boosting method combines these weak rules into a single powerful prediction rule.

In our example, the XGBoost model has learned to predict a neutron porosity log from sonic, density, gamma-ray, and measured depth. Each Random Forest consists of 100 estimators with a maximum depth of 50 successive splits.

The training data set is a subset of a harmonized well database we created from wells in the Dutch sector of the North Sea. We selected 255 wells with coverage of DT, RhoB, GR, MD and NHPI curves. The latter log being the target log while the first four curves serve as input to the model. The XGBoost model was trained on 219 of the wells and validated on 36, both sets properly representing the characteristics of the complete dataset. The data was extracted in a sliding window of 21 samples for each of the input curves and a sampling rate of 0.1524 m with the target as the NHPI sample at the center of the sliding window The correlation coefficient, R, of the predicted neutron porosity logs with the measured logs on the validation set is 0.93, indicating an extremely good fit. The video shows the results of the trained model on well G13-01, a well from the validation set.

This trained model is added to the library of trained models that is available to all users of OpendTect Machine Learning solution. We expect the model to be applicable in the entire Southern North Sea. We encourage our users to test the applicability of this model in other parts of the North Sea and even in other basins.

Feedback on test results will be highly appreciated!

Read the post on LinkedIn

Blogs Chronologically