Basics#
geotech-pandas is mainly accessed from the geotech
accessor on
DataFrame
objects. When accessed, geotech-pandas validates the current
DataFrame
for several minimum requirements. These requirements are
discussed in the following sections.
Customarily, we import the necessary libraries before we begin the guide,
In [1]: import pandas as pd
In [2]: import geotech_pandas
Minimum requirements#
Columns#
The minimum required columns for geotech-pandas are the point_id
and bottom
columns. The
point_id
represents the ID or the group where each layer belongs to. Whereas, the bottom
column represents the bottom depths of these layers. For more information, see
General Columns.
If you try to access geotech
with the following
DataFrame
,
In [3]: df = pd.DataFrame(
...: {
...: "point_id": ["BH-1", "BH-1", "BH-1"],
...: }
...: )
...:
In [4]: df.geotech()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 df.geotech()
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
6292 if (
6293 name not in self._internal_names_set
6294 and name not in self._metadata
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
221 if obj is None:
222 # we're accessing the attribute of the class, i.e., Dataset.geo
223 return self._accessor
--> 224 accessor_obj = self._accessor(obj)
225 # Replace the property with the accessor object. Inspired by:
226 # https://www.pydanny.com/cached-property.html
227 # We need to use object.__setattr__ because we overwrite __setattr__ on
228 # NDFrame
229 object.__setattr__(obj, self._name, accessor_obj)
File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:27, in GeotechDataFrameAccessor.__init__(self, df)
24 def __init__(self, df: pd.DataFrame):
25 self._obj = df
---> 27 self._validate_columns()
28 self._validate_monotony()
29 self._validate_duplicates()
File /workspaces/geotech-pandas/src/geotech_pandas/base.py:45, in GeotechPandasBase._validate_columns(self, columns)
42 missing_columns = [column for column in columns if column not in self._obj.columns]
44 if len(missing_columns) > 0:
---> 45 raise AttributeError(
46 f"The DataFrame must have: {', '.join(missing_columns)} "
47 f"column{'s' if len(missing_columns) > 1 else ''}."
48 )
AttributeError: The DataFrame must have: bottom column.
An AttributeError
is raised stating that the DataFrame
is missing the bottom
column.
Arrangement#
Another requirement is that the bottom
depth values for each point_id
should be
monotonically increasing, as most methods assume that each layer comes right after the other in each
point.
If you try to access geotech
with the following
DataFrame
,
In [5]: df = pd.DataFrame(
...: {
...: "point_id": ["BH-1", "BH-1", "BH-1"],
...: "bottom": [0.0, 2.0, 1.0],
...: }
...: )
...:
In [6]: df.geotech()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[6], line 1
----> 1 df.geotech()
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
6292 if (
6293 name not in self._internal_names_set
6294 and name not in self._metadata
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
221 if obj is None:
222 # we're accessing the attribute of the class, i.e., Dataset.geo
223 return self._accessor
--> 224 accessor_obj = self._accessor(obj)
225 # Replace the property with the accessor object. Inspired by:
226 # https://www.pydanny.com/cached-property.html
227 # We need to use object.__setattr__ because we overwrite __setattr__ on
228 # NDFrame
229 object.__setattr__(obj, self._name, accessor_obj)
File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:28, in GeotechDataFrameAccessor.__init__(self, df)
25 self._obj = df
27 self._validate_columns()
---> 28 self._validate_monotony()
29 self._validate_duplicates()
File /workspaces/geotech-pandas/src/geotech_pandas/base.py:63, in GeotechPandasBase._validate_monotony(self)
61 check_list = check_df[~check_df["bottom"]]["point_id"].to_list()
62 if ~check_df["bottom"].all():
---> 63 raise AttributeError(
64 f"Elements in the bottom column must be monotonically increasing for:"
65 f" {', '.join(check_list)}."
66 )
AttributeError: Elements in the bottom column must be monotonically increasing for: BH-1.
An AttributeError
is raised listing which points contain the erroneous
arrangement.
Uniqueness#
It is also required that the point_id
and bottom
pairs to be unique, as most methods
assume that each layer is unique for each point.
If you try to access geotech
with the following
DataFrame
,
In [7]: df = pd.DataFrame(
...: {
...: "point_id": ["BH-1", "BH-1", "BH-1"],
...: "bottom": [0.0, 1.0, 1.0],
...: }
...: )
...:
In [8]: df.geotech()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 df.geotech()
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/generic.py:6299, in NDFrame.__getattr__(self, name)
6292 if (
6293 name not in self._internal_names_set
6294 and name not in self._metadata
6295 and name not in self._accessors
6296 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)
File /opt/geotech-pandas-env/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
221 if obj is None:
222 # we're accessing the attribute of the class, i.e., Dataset.geo
223 return self._accessor
--> 224 accessor_obj = self._accessor(obj)
225 # Replace the property with the accessor object. Inspired by:
226 # https://www.pydanny.com/cached-property.html
227 # We need to use object.__setattr__ because we overwrite __setattr__ on
228 # NDFrame
229 object.__setattr__(obj, self._name, accessor_obj)
File /workspaces/geotech-pandas/src/geotech_pandas/accessor.py:29, in GeotechDataFrameAccessor.__init__(self, df)
27 self._validate_columns()
28 self._validate_monotony()
---> 29 self._validate_duplicates()
File /workspaces/geotech-pandas/src/geotech_pandas/base.py:82, in GeotechPandasBase._validate_duplicates(self)
78 duplicate_list = self._obj[self._obj[["point_id", "bottom"]].duplicated()][
79 "point_id"
80 ].to_list()
81 if len(duplicate_list) > 0:
---> 82 raise AttributeError(
83 "The DataFrame contains duplicate point_id and bottom:"
84 f" {', '.join(duplicate_list)}."
85 )
AttributeError: The DataFrame contains duplicate point_id and bottom: BH-1.
An AttributeError
is raised listing which points contain duplicate values.
Subaccessors#
There are no available methods under the geotech
accessor other than the
validation methods that are called automatically upon initiation of the accessor as shown in the
preceding sections.
The geotech
accessor serves as a parent namespace to the various scopes
provided in geotech-pandas. These scopes are accessors that can be accessed from
geotech
like so,
In [9]: df = pd.DataFrame(
...: {
...: "point_id": ["BH-1", "BH-1", "BH-1"],
...: "bottom": [0.0, 1.0, 2.0],
...: }
...: )
...:
In [10]: df.geotech.point
Out[10]: <geotech_pandas.point.PointDataFrameAccessor at 0x7fb472dfde50>
Here, we can access the point
accessor where point-related
methods can be accessed. Suceeding guides demonstrate the usage of each subaccessor in
geotech-pandas.