Contribution Guide#
(The contents of this guide are largely inspired by Pandas and GeoPandas)
Overview#
Contributions to geotech-pandas are very welcome. They are likely to be accepted more quickly if they follow these guidelines.
At this stage of geotech-pandas development, the priorities are to define a simple, usable, and stable API and to have clean, maintainable, readable code. Performance matters, but not at the expense of those goals.
In general, geotech-pandas follows the conventions of the pandas project where applicable.
In particular, when submitting a pull request (PR):
All existing tests should pass. Please make sure that the test suite passes, both locally and on GitHub Actions (GHA). Status on GHA will be visible on a PR. GHA are automatically enabled on your own fork as well. To trigger a check, make a PR to your own fork.
New functionality should include tests. Please write reasonable tests for your code and make sure that they pass on your PR.
Classes, methods, functions, etc. should have docstrings. The first line of a docstring should be a standalone summary. Parameters and return values should be documented explicitly.
Follow PEP8 when possible. We use Ruff to ensure a consistent code format throughout the project.
Follow the Conventional Commits standard when writing commits. We use Commitizen to ensure consistent commit messages, which are then used in automated semantic versioning and changelog generation.
The geotech-pandas project supports Python 3.9+ only, usually in-sync with the supported versions of Pandas.
Bug reports and feature requests#
Bug reports and enhancement requests are an important part of making geotech-pandas more stable. These are curated though GitHub issues. When reporting and issue or request, please select the appropriate category and fill out the issue form fully to ensure others and the core development team can fully understand the scope of the issue.
The issue will then show up in GitHub issues and be open to comments/ideas from others.
Finding an issue to contribute to#
If you are brand new to geotech-pandas or open-source development, we recommend searching the GitHub “issues” tab to find issues that interest you. Unassigned issues labeled documentation and good first issue are typically good for newer contributors.
Once you’ve found an interesting issue, it’s a good idea to assign the issue to yourself, so nobody else duplicates the work on it.
If for whatever reason you are not able to continue working with the issue, please unassign it, so other people know it’s available again. You can check the list of assigned issues, since people may not be working in them anymore. If you want to work on one that is assigned, feel free to kindly ask the current assignee if you can take it. Please allow at least a week of inactivity before considering work in the issue discontinued.
General contribution workflow#
Once you have chosen an issue to work on, follow the basic steps below to contribute to geotech-pandas:
Each of these steps is detailed in the following sections after reading through the Recommended prerequisites.
Recommended prerequisites#
Set up Git to use SSH
Generate an SSH key and add the SSH key to your GitHub account.
Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes EOF
Install Docker
Enable Use Docker Compose V2 in Docker Desktop’s preferences window.
For Linux installations:
Export your user’s user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
Install VS Code or PyCharm
Install VS Code and VS Code’s Dev Containers extension. Alternatively, install PyCharm.
Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Fork the git repository#
You will need your own fork to work on the code. Go to the geotech-pandas project page and hit the Fork
button.
Clone your fork#
To clone your fork to your local machine:
git clone git@GitHub.com:your-user-name/geotech-pandas.git geotech-pandas-yourname
cd geotechpandas-yourname
git remote add upstream git://GitHub.com/fraserdominicdavid/geotech-pandas.git
git fetch upstream
This creates the directory geotech-pandas-yourname
and connects your repository to the upstream
(main project) geotech-pandas repository.
Set up a development environment#
Thanks to Dev Containers, it is now easier to set up a consistent environment for the development of this project. The following development environments are supported:
GitHub Codespaces:
From your GitHub fork, click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
VS Code:
Clone your fork, open it with VS Code, and run Ctrl/⌘ + ⇧ + P → Dev Containers: Reopen in Container.
PyCharm:
Clone your fork, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
dev
service.Terminal:
Clone your fork, open it with your terminal, and run:
docker compose up --detach dev
to start a Dev Container in the background, and then run:
docker compose exec dev zsh
to open a shell prompt in the Dev Container.
Create a feature branch#
Your local main branch should always reflect the current state of the geotech-pandas repository. First ensure that it’s up-to-date with the main repository:
git checkout main
git pull upstream main --ff-only
Then, create a feature branch for making your changes. For example:
git checkout -b shiny-new-feature
This changes your working branch from the main
to the shiny-new-feature
branch. Keep any
changes in this branch specific to one bug or feature so it is clear what the branch brings to
geotech-pandas. You can have many feature branches and switch in between them using the
git checkout
command.
To update this branch, you need to retrieve the changes from the main branch:
git fetch upstream
git rebase upstream/main
This will replay your commits on top of the latest geotech-pandas git main. If this leads to merge
conflicts, you must resolve these before submitting your PR. If you have uncommitted changes, you
will need to git stash
them prior to updating. This will effectively store your changes and they
can be reapplied after updating.
Make changes to code#
The geotech-pandas project is serious about testing and strongly encourages contributors to embrace test-driven development (TDD). This development process “relies on the repetition of a very short development cycle: first the developer writes an, initially failing, automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test.” So, before actually writing any code, you should write your tests. Often the test can be taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests.
To see all the changes you’ve currently made, run:
git status
Add reasonable tests#
Adding tests is one of the most common requests after code is pushed to geotech-pandas. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue.
The pytest testing system and convenient extensions in the
pandas._testing
module are used in geotech-pandas.
All tests should go into the tests
directory. This folder contains many current examples of
tests, and we suggest looking to these for inspiration.
The pandas._testing
module has some special assert
functions that make it easier to make
statements about whether Series
or DataFrame
objects are equivalent. The easiest way to verify that your code is correct is to explicitly
construct the result you expect, then compare the actual result to the expected correct result,
using the appropriate assert
functions from pandas._testing
.
The tests can then be run directly inside your Git clone by typing:
poe test
Lint and format code#
The PEP8 standard is followed in geotech-pandas with the help of Ruff to ensure a consistent code format throughout the project.
Continuous Integration (CI) with GHA will run lint checking tools and report any stylistic errors in your code. Therefore, it is helpful, before submitting code, to run the check yourself:
poe lint
to auto-format your code. Additionally, the Dev Container supplied in this project applies these tools as you edit files.
Update the documentation#
The geotech-pandas documentation resides in the docs
folder of the repository. Changes to the
documentation are made by modifying the appropriate file in that folder. The documentation uses the
reStructuredText syntax rendered using Sphinx. For more
information, see reStructuredText Primer.
On the other hand, the docstrings follow the Numpy Docstring Standard.
We highly encourage you to follow the Google developer documentation style guide when updating or creating new documentation.
Once you have made your changes, you may try if they render correctly by building the docs using sphinx. To do so, run:
poe docs
The resulting html pages will be located in docs/_build/html
.
If you wish to render a “clean” build, run:
poe docs -O "-E -a"
This ensures that sphinx will rebuild and save the output files completely.
A “cleaner” build can also be done by removing the _build
and api-reference/api
folders
first before building:
rm -r docs/api-reference/api docs/_build
poe docs
Commit changes#
After making the changes, tests, linting and formatting, you can now stage your changes using:
git add path/to/files-to-be-added-or-changed.py
Running git status
, the output should be:
On branch shiny-new-feature
modified: /relative/path/to/file-to-be-added-or-changed.py
Note that this project follows the Conventional Commits standard with the help of Commitizen. So, to commit using Commitizen, run:
cz commit
This command will guide you through the process of writing an appropriate commit message.
There are also pre-commit hooks set up to ensure that lint checks and code tests are run before committing, but, again, it is always helpful to run these yourself first before committing with:
poe test
and:
poe lint
Push your changes#
When you want your changes to appear publicly on your GitHub page, push your forked feature branch’s commits:
git push origin shiny-new-feature
Here origin
is the default name given to your remote repository on GitHub. You can see the
remote repositories:
git remote -v
If you added the upstream repository as described above you will see something like:
origin git@GitHub.com:your-user-name/geotech-pandas.git (fetch)
origin git@GitHub.com:your-user-name/geotech-pandas.git (push)
upstream git://GitHub.com/fraserdominicdavid/geotech-pandas.git (fetch)
upstream git://GitHub.com/fraserdominicdavid/geotech-pandas.git (push)
Now your code is on GitHub, but it is not yet a part of the geotech-pandas project. For that to happen, a PR needs to be submitted on GitHub.
Submit a pull request#
Once you’ve made changes and pushed them to your forked repository, you then submit a PR to have them integrated into the geotech-pandas code base.
You can find a PR tutorial in the GitHub’s Help Docs.
Update your pull request#
Based on the review you get on your PR, you will probably need to make some changes to the code. You can follow the above steps again to address any feedback and update your PR.
It is also important that updates in the geotech-pandas main
branch are reflected in your pull
request. To update your feature branch with changes in the geotech-pandas main
branch, run:
git checkout shiny-new-feature
git fetch upstream
git merge upstream/main
If there are no conflicts, or if they could be fixed automatically, a file with a default commit message will open, and you can simply save and quit this file.
If there are merge conflicts, you need to solve those conflicts. For more information, see Resolving a merge conflict using the command line.
Once the conflicts are resolved, run:
git add -u
to stage any files you’ve updated;git commit
to finish the merge.
Note
If you have uncommitted changes at the moment you want to update the branch with main
, you
will need to stash
them prior to updating. This will effectively store your changes and they
can be reapplied after updating. For more information, see Stashing and Cleaning.
After the feature branch has been update locally, you can now update your PR by pushing to the branch on GitHub:
git push origin shiny-new-feature
Any git push
will automatically update your PR with your branch’s changes and restart the CI
checks.
Update your development environment#
It is important to periodically update your local main
branch with updates from the
geotech-pandas main
branch and update your development environment to reflect any changes to the
various packages that are used during development. To do that, run:
git checkout main
git fetch upstream
git merge upstream/main
If there are any updates to the dependencies, for instance, if the pyproject.toml
and/or
poetry.lock
files are changed, then you must also rebuild your Dev Container:
GitHub Codespaces:
Run Ctrl/⌘ + ⇧ + P → Codespaces: Rebuild Container
VS Code:
Run Ctrl/⌘ + ⇧ + P → Dev Containers: Rebuild Container.
PyCharm:
See PyCharm’s documentation about Docker Compose.
Terminal:
Open your fork with your terminal, and run:
docker compose up --build --detach dev
to start and rebuild a Dev Container in the background, and then run:
docker compose exec dev zsh
to open a shell prompt in the Dev Container.
Tips for a successful pull request#
Once you have submitted a PR, one of the core contributors may take a look. Please note however that there are only a handful of people are responsible for reviewing all of the contributions, which can often lead to bottlenecks.
To improve the chances of your PR being reviewed, you should:
Reference an open issue for non-trivial changes to clarify the PR’s purpose;
Ensure you have reasonable tests and present them in the PR;
Keep your PRs as simple as possible to make it easier to review the PR;
Ensure that GHA status is green, otherwise, reviewers may not even take a look; and
Keep updating your PR, either by the request of a reviewer or every few days to keep up-to-date with the current geotech-pandas codebase.