Basics#

geotech-pandas is mainly accessed from the geotech accessor on DataFrame objects. When accessed, geotech-pandas validates the current DataFrame for several minimum requirements. These requirements are discussed in the following sections.

Customarily, we import the necessary libraries before we begin the guide,

In [1]: import pandas as pd

In [2]: import geotech_pandas

Minimum requirements#

Columns#

The minimum required columns for geotech-pandas are the point_id and bottom columns. The point_id represents the ID or the group where each layer belongs to. Whereas, the bottom column represents the bottom depths of these layers. For more information, see General Columns.

If you try to access geotech with the following DataFrame,

In [3]: df = pd.DataFrame(
   ...:     {
   ...:         "point_id": ["BH-1", "BH-1", "BH-1"],
   ...:     }
   ...: )
   ...: 

In [4]: df.geotech()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 df.geotech()

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
   6292 if (
   6293     name not in self._internal_names_set
   6294     and name not in self._metadata
   6295     and name not in self._accessors
   6296     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297 ):
   6298     return self[name]
-> 6299 return object.__getattribute__(self, name)

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
    229 object.__setattr__(obj, self._name, accessor_obj)

File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:27, in GeotechDataFrameAccessor.__init__(self, df)
     24 def __init__(self, df: pd.DataFrame):
     25     self._obj = df
---> 27     self._validate_columns()
     28     self._validate_monotony()
     29     self._validate_duplicates()

File /workspaces/geotech-pandas/src/geotech_pandas/base.py:45, in GeotechPandasBase._validate_columns(self, columns)
     42 missing_columns = [column for column in columns if column not in self._obj.columns]
     44 if len(missing_columns) > 0:
---> 45     raise AttributeError(
     46         f"The DataFrame must have: {', '.join(missing_columns)} "
     47         f"column{'s' if len(missing_columns) > 1 else ''}."
     48     )

AttributeError: The DataFrame must have: bottom column.

An AttributeError is raised stating that the DataFrame is missing the bottom column.

Arrangement#

Another requirement is that the bottom depth values for each point_id should be monotonically increasing, as most methods assume that each layer comes right after the other in each point.

If you try to access geotech with the following DataFrame,

In [5]: df = pd.DataFrame(
   ...:     {
   ...:         "point_id": ["BH-1", "BH-1", "BH-1"],
   ...:         "bottom": [0.0, 2.0, 1.0],
   ...:     }
   ...: )
   ...: 

In [6]: df.geotech()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[6], line 1
----> 1 df.geotech()

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
   6292 if (
   6293     name not in self._internal_names_set
   6294     and name not in self._metadata
   6295     and name not in self._accessors
   6296     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297 ):
   6298     return self[name]
-> 6299 return object.__getattribute__(self, name)

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
    229 object.__setattr__(obj, self._name, accessor_obj)

File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:28, in GeotechDataFrameAccessor.__init__(self, df)
     25 self._obj = df
     27 self._validate_columns()
---> 28 self._validate_monotony()
     29 self._validate_duplicates()

File /workspaces/geotech-pandas/src/geotech_pandas/base.py:63, in GeotechPandasBase._validate_monotony(self)
     61 check_list = check_df[~check_df["bottom"]]["point_id"].to_list()
     62 if ~check_df["bottom"].all():
---> 63     raise AttributeError(
     64         f"Elements in the bottom column must be monotonically increasing for:"
     65         f" {', '.join(check_list)}."
     66     )

AttributeError: Elements in the bottom column must be monotonically increasing for: BH-1.

An AttributeError is raised listing which points contain the erroneous arrangement.

Uniqueness#

It is also required that the point_id and bottom pairs to be unique, as most methods assume that each layer is unique for each point.

If you try to access geotech with the following DataFrame,

In [7]: df = pd.DataFrame(
   ...:     {
   ...:         "point_id": ["BH-1", "BH-1", "BH-1"],
   ...:         "bottom": [0.0, 1.0, 1.0],
   ...:     }
   ...: )
   ...: 

In [8]: df.geotech()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 df.geotech()

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
   6292 if (
   6293     name not in self._internal_names_set
   6294     and name not in self._metadata
   6295     and name not in self._accessors
   6296     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6297 ):
   6298     return self[name]
-> 6299 return object.__getattribute__(self, name)

File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
    229 object.__setattr__(obj, self._name, accessor_obj)

File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:29, in GeotechDataFrameAccessor.__init__(self, df)
     27 self._validate_columns()
     28 self._validate_monotony()
---> 29 self._validate_duplicates()

File /workspaces/geotech-pandas/src/geotech_pandas/base.py:82, in GeotechPandasBase._validate_duplicates(self)
     78 duplicate_list = self._obj[self._obj[["point_id", "bottom"]].duplicated()][
     79     "point_id"
     80 ].to_list()
     81 if len(duplicate_list) > 0:
---> 82     raise AttributeError(
     83         "The DataFrame contains duplicate point_id and bottom:"
     84         f" {', '.join(duplicate_list)}."
     85     )

AttributeError: The DataFrame contains duplicate point_id and bottom: BH-1.

An AttributeError is raised listing which points contain duplicate values.

Subaccessors#

There are no available methods under the geotech accessor other than the validation methods that are called automatically upon initiation of the accessor as shown in the preceding sections.

The geotech accessor serves as a parent namespace to the various scopes provided in geotech-pandas. These scopes are accessors that can be accessed from geotech like so,

In [9]: df = pd.DataFrame(
   ...:     {
   ...:         "point_id": ["BH-1", "BH-1", "BH-1"],
   ...:         "bottom": [0.0, 1.0, 2.0],
   ...:     }
   ...: )
   ...: 

In [10]: df.geotech.point
Out[10]: <geotech_pandas.point.PointDataFrameAccessor at 0x7fb472dfde50>

Here, we can access the point accessor where point-related methods can be accessed. Suceeding guides demonstrate the usage of each subaccessor in geotech-pandas.