{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 26 Working with Well Data from the Geological Survey NRW\n", "\n", "This notebook presents the extraction of borehole data (location of wells and stratigraphy) from logs provided by the Geological Survey NRW. \n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set File Paths and download Tutorial Data\n", "\n", "If you downloaded the latest `GemGIS` version from the Github repository, append the path so that the package can be imported successfully. Otherwise, it is recommended to install `GemGIS` via `pip install gemgis` and import `GemGIS` using `import gemgis as gg`. In addition, the file path to the folder where the data is being stored is set. The tutorial data is downloaded using Pooch (https://www.fatiando.org/pooch/latest/index.html) and stored in the specified folder. Use `pip install pooch` if Pooch is not installed on your system yet." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-03-17T11:21:58.167948Z", "start_time": "2021-03-17T11:21:55.976221Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`\n", "C:\\Users\\ale93371\\Anaconda3\\envs\\test_gempy\\lib\\site-packages\\theano\\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory\n", " warnings.warn(\"DeprecationWarning: there is no c++ compiler.\"\n", "WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.\n", "WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.\n" ] } ], "source": [ "import gemgis as gg\n", "\n", "file_path ='data/26_working_with_well_data_from_GD_NRW/'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-03-17T11:21:58.230549Z", "start_time": "2021-03-17T11:21:58.218285Z" } }, "outputs": [], "source": [ "gg.download_gemgis_data.download_tutorial_data(filename=\"26_working_with_well_data_from_GD_NRW.zip\", dirpath=file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the Well Data\n", "\n", "The data used for GemGIS is obtained from the Geological Survey NRW. It will be used under Datenlizenz Deutschland – Namensnennung – Version 2.0 (https://www.govdata.de/dl-de/by-2-0).\n", "\n", "The PDF Files can be loaded as strings using PyPDF2. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-01-01T10:11:06.485271Z", "start_time": "2021-01-01T10:11:06.415431Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 37.16it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "../../../../gemgis_data/data/26_working_with_well_data_from_GD_NRW/test_data.txt successfully saved\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "'Stammdaten - 2521/ 5631/ 1 - Bnum: 196747 . . Objekt / Name :B. 19 ESCHWEILER\\n\\n Bohrungs- / Aufschluß-Nr. :19\\n\\n Archiv-Nr. :\\n Endteufe [m] :70.30\\n\\n Stratigraphie der Endteufe :Karbon\\n . TK 25 :Eschweiler [TK 5103]\\n\\n Ort / Gemarkung :Eschweiler/Weißweiler\\n\\n GK Rechtswert/Hochwert [m] :2521370.00 / 5631910.00\\n\\n UTM East/North [m] :32310019.32 / 5633520.32\\n\\n Hoehe des Ansatzpunktes [mNN] :130.00\\n\\n Koordinatenbestimmung :ungeprüfte Angabe aus dem Bohrarch'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = gg.misc.load_pdf(path=file_path + 'test_data.pdf')\n", "data[:500]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting Meta Data From the Well Data\n", "\n", "The meta data or 'Stammdaten' of the wells can be extracted using ``get_meta_data_df(...)``. Any duplicate wells will be removed automatically." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.149429Z", "start_time": "2020-12-17T10:07:42.105866Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndexDABO No.NameNumberDepthXYZX_GKY_GK...KindProcedureConfidentialityRecord TypeLithlog VersionQualityDrilling PeriodRemarksAvailability Lithloggeometry
0GD0001DABO_196747B.19ESCHWEILER1970.3032310019.325633520.32130.002521370.005631910.00...Bohrungvertraulich, offen nach Einzelfallprüfung;Übertragung eines alten Archivbestandes1Schichtdaten von guter Qualität; genaue strati...Original-Schichtenverzeichnis liegt vorPOINT (32310019.320 5633520.320)
1GD0002DABO_196748B.16ESCHWEILER1637.6132310327.145632967.35122.002521700.005631370.00...Bohrungvertraulich, offen nach Einzelfallprüfung;Übertragung eines alten Archivbestandes1Schichtdaten von guter Qualität; genaue strati...Original-Schichtenverzeichnis liegt vorPOINT (32310327.140 5632967.350)
\n", "

2 rows × 26 columns

\n", "
" ], "text/plain": [ " Index DABO No. Name Number Depth X Y \\\n", "0 GD0001 DABO_196747 B.19ESCHWEILER 19 70.30 32310019.32 5633520.32 \n", "1 GD0002 DABO_196748 B.16ESCHWEILER 16 37.61 32310327.14 5632967.35 \n", "\n", " Z X_GK Y_GK ... Kind Procedure \\\n", "0 130.00 2521370.00 5631910.00 ... Bohrung \n", "1 122.00 2521700.00 5631370.00 ... Bohrung \n", "\n", " Confidentiality \\\n", "0 vertraulich, offen nach Einzelfallprüfung; \n", "1 vertraulich, offen nach Einzelfallprüfung; \n", "\n", " Record Type Lithlog Version \\\n", "0 Übertragung eines alten Archivbestandes 1 \n", "1 Übertragung eines alten Archivbestandes 1 \n", "\n", " Quality Drilling Period Remarks \\\n", "0 Schichtdaten von guter Qualität; genaue strati... \n", "1 Schichtdaten von guter Qualität; genaue strati... \n", "\n", " Availability Lithlog geometry \n", "0 Original-Schichtenverzeichnis liegt vor POINT (32310019.320 5633520.320) \n", "1 Original-Schichtenverzeichnis liegt vor POINT (32310327.140 5632967.350) \n", "\n", "[2 rows x 26 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = gg.misc.get_meta_data_df(data=data, \n", " name='GD')\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot Data\n", "\n", "The locations of the wells can easily be plotted using Matplotlib or the built-in GeoPandas functions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.306548Z", "start_time": "2020-12-17T10:07:42.150423Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.scatter(df['X'], df['Y'])\n", "plt.grid()\n", "plt.xlabel('X [m]')\n", "plt.ylabel('Y [m]')\n", "for i in range(len(df)):\n", " plt.text(df['X'].loc[i], df['Y'].loc[i], df['Name'].loc[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extracting Stratigraphic Data from Well Data\n", "\n", "The stratigraphic data can be extracted using ``get_stratigraphic_data_df(..)``. Different files have to be loaded beforehand to make the workflow work. This includes a file containing symbols that will be filtered out and the classification of the different formations." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.352546Z", "start_time": "2020-12-17T10:07:42.308549Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 66.64it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "../../../../gemgis_data/data/26_working_with_well_data_from_GD_NRW/test_data.txt successfully saved\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "'Stammdaten - 2521/ 5631/ 1 - Bnum: 196747 . . Objekt / Name :B. 19 ESCHWEILER\\n\\n Bohrungs- / Aufschluß-Nr. :19\\n\\n Archiv-Nr. :\\n Endteufe [m] :70.30\\n\\n Stratigraphie der Endteufe :Karbon\\n . TK 25 :Eschweiler [TK 5103]\\n\\n Ort / Gemarkung :Eschweiler/Weißweiler\\n\\n GK Rechtswert/Hochwert [m] :2521370.00 / 5631910.00\\n\\n UTM East/North [m] :32310019.32 / 5633520.32\\n\\n Hoehe des Ansatzpunktes [mNN] :130.00\\n\\n Koordinatenbestimmung :ungeprüfte Angabe aus dem Bohrarch'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = gg.misc.load_pdf(path=file_path + 'test_data.pdf', \n", " save_as_txt=True)\n", "\n", "data[:500]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Well Data from txt-file\n", "\n", "The data can be loaded from a text file so that the original PDF does not have to be reloaded again to save time." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.368544Z", "start_time": "2020-12-17T10:07:42.354544Z" } }, "outputs": [], "source": [ "with open(file_path + 'test_data.txt', \"r\") as text_file:\n", " data = text_file.read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Symbols from txt-file\n", "\n", "Symbols that will be removed by default from the well data can be loaded from a text file." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.384546Z", "start_time": "2020-12-17T10:07:42.372544Z" } }, "outputs": [ { "data": { "text/plain": [ "[('.m ', ''),\n", " (', ', ''),\n", " ('; ', ''),\n", " (': ', ''),\n", " ('/ ', ''),\n", " ('? ', ''),\n", " ('! ', ''),\n", " ('-\"- ', ''),\n", " ('\" ', ''),\n", " ('% ', ''),\n", " ('< ', ''),\n", " ('> ', ''),\n", " ('= ', ''),\n", " ('~ ', ''),\n", " ('_ ', ''),\n", " ('° ', ''),\n", " (\"' \", '')]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open(file_path + 'symbols.txt', \"r\") as text_file:\n", " symbols = [(i, '') for i in text_file.read().splitlines()]\n", "\n", "symbols" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-12-17T09:52:34.116241Z", "start_time": "2020-12-17T09:52:34.102205Z" } }, "source": [ "### Load Formations from txt-file\n", "\n", "Classified formations can be loaded from a text file." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.400551Z", "start_time": "2020-12-17T10:07:42.386547Z" } }, "outputs": [ { "data": { "text/plain": [ "[('UnterdevonKalltalFormation', 'KalltalFM'),\n", " ('nullLöss', 'Quaternary'),\n", " ('QuartärFlugsand', 'Quaternary'),\n", " ('QuartärHauptterrassen', 'Quaternary'),\n", " ('QuartärSandlöss', 'Quaternary'),\n", " ('QuartärHochflutablagerungen', 'Quaternary'),\n", " ('QuartärAnthropogeneBildungen(künstlicheAufschüttung)', 'Quaternary'),\n", " ('QuartärVerschwemmungsablagerungenFrostbodenbildungenundRutschmassen',\n", " 'Quaternary'),\n", " ('QuartärLösslehm', 'Quaternary'),\n", " ('QuartärHochflutlehm', 'Quaternary')]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open(file_path + 'formations.txt', \"rb\") as text_file:\n", " formations = text_file.read().decode(\"UTF-8\").split()\n", " \n", "formations = [(formations[i], formations[i+1]) for i in range(0,len(formations)-1,2)]\n", "formations[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting the Stratigraphic Data\n", "\n", "After loading the symbols and formations, the stratigraphic data can be extracted. The (Geo-)DataFrame contains the index, the well name, X, Y and Z coordinates, the altitudes, the depths, the formations and a geometry column. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.462572Z", "start_time": "2020-12-17T10:07:42.402552Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IndexNameXYZAltitudeDepthformationgeometry
0GD0001B.19ESCHWEILER32310019.325633520.32125.30130.0070.30QuaternaryPOINT (32310019.320 5633520.320)
1GD0001B.19ESCHWEILER32310019.325633520.3266.50130.0070.30MiocenePOINT (32310019.320 5633520.320)
2GD0001B.19ESCHWEILER32310019.325633520.3260.90130.0070.30OligocenePOINT (32310019.320 5633520.320)
3GD0001B.19ESCHWEILER32310019.325633520.3259.70130.0070.30CarboniferousPOINT (32310019.320 5633520.320)
4GD0002B.16ESCHWEILER32310327.145632967.35117.80122.0037.61QuaternaryPOINT (32310327.140 5632967.350)
5GD0002B.16ESCHWEILER32310327.145632967.3584.40122.0037.61MiocenePOINT (32310327.140 5632967.350)
6GD0002B.16ESCHWEILER32310327.145632967.3584.39122.0037.61CarboniferousPOINT (32310327.140 5632967.350)
\n", "
" ], "text/plain": [ " Index Name X Y Z Altitude Depth \\\n", "0 GD0001 B.19ESCHWEILER 32310019.32 5633520.32 125.30 130.00 70.30 \n", "1 GD0001 B.19ESCHWEILER 32310019.32 5633520.32 66.50 130.00 70.30 \n", "2 GD0001 B.19ESCHWEILER 32310019.32 5633520.32 60.90 130.00 70.30 \n", "3 GD0001 B.19ESCHWEILER 32310019.32 5633520.32 59.70 130.00 70.30 \n", "4 GD0002 B.16ESCHWEILER 32310327.14 5632967.35 117.80 122.00 37.61 \n", "5 GD0002 B.16ESCHWEILER 32310327.14 5632967.35 84.40 122.00 37.61 \n", "6 GD0002 B.16ESCHWEILER 32310327.14 5632967.35 84.39 122.00 37.61 \n", "\n", " formation geometry \n", "0 Quaternary POINT (32310019.320 5633520.320) \n", "1 Miocene POINT (32310019.320 5633520.320) \n", "2 Oligocene POINT (32310019.320 5633520.320) \n", "3 Carboniferous POINT (32310019.320 5633520.320) \n", "4 Quaternary POINT (32310327.140 5632967.350) \n", "5 Miocene POINT (32310327.140 5632967.350) \n", "6 Carboniferous POINT (32310327.140 5632967.350) " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = gg.misc.get_stratigraphic_data_df(data=data, \n", " name='GD', \n", " symbols=symbols,\n", " formations=formations,\n", " return_gdf=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting data" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2020-12-17T10:07:42.589064Z", "start_time": "2020-12-17T10:07:42.464564Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.plot()\n", "plt.grid()" ] } ], "metadata": { "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }