Contribution Guide#

(The contents of this guide are largely inspired by Pandas and GeoPandas)

Overview#

Contributions to geotech-pandas are very welcome. They are likely to be accepted more quickly if they follow these guidelines.

At this stage of geotech-pandas development, the priorities are to define a simple, usable, and stable API and to have clean, maintainable, readable code. Performance matters, but not at the expense of those goals.

In general, geotech-pandas follows the conventions of the pandas project where applicable.

In particular, when submitting a pull request (PR):

  • All existing tests should pass. Please make sure that the test suite passes, both locally and on GitHub Actions (GHA). Status on GHA will be visible on a PR. GHA are automatically enabled on your own fork as well. To trigger a check, make a PR to your own fork.

  • New functionality should include tests. Please write reasonable tests for your code and make sure that they pass on your PR.

  • Classes, methods, functions, etc. should have docstrings. The first line of a docstring should be a standalone summary. Parameters and return values should be documented explicitly.

  • Follow PEP8 when possible. We use Ruff to ensure a consistent code format throughout the project.

  • Follow the Conventional Commits standard when writing commits. We use Commitizen to ensure consistent commit messages, which are then used in automated semantic versioning and changelog generation.

  • The geotech-pandas project supports Python 3.9+ only, usually in-sync with the supported versions of Pandas.

Bug reports and feature requests#

Bug reports and enhancement requests are an important part of making geotech-pandas more stable. These are curated though GitHub issues. When reporting and issue or request, please select the appropriate category and fill out the issue form fully to ensure others and the core development team can fully understand the scope of the issue.

The issue will then show up in GitHub issues and be open to comments/ideas from others.

Finding an issue to contribute to#

If you are brand new to geotech-pandas or open-source development, we recommend searching the GitHub “issues” tab to find issues that interest you. Unassigned issues labeled documentation and good first issue are typically good for newer contributors.

Once you’ve found an interesting issue, it’s a good idea to assign the issue to yourself, so nobody else duplicates the work on it.

If for whatever reason you are not able to continue working with the issue, please unassign it, so other people know it’s available again. You can check the list of assigned issues, since people may not be working in them anymore. If you want to work on one that is assigned, feel free to kindly ask the current assignee if you can take it. Please allow at least a week of inactivity before considering work in the issue discontinued.

General contribution workflow#

Once you have chosen an issue to work on, follow the basic steps below to contribute to geotech-pandas:

  1. Fork the git repository

  2. Set up a development environment

  3. Create a feature branch

  4. Make changes to code

  5. Add reasonable tests

  6. Lint and format code

  7. Update the documentation

  8. Commit changes

  9. Push your changes

  10. Submit a pull request

  11. Update your pull request

  12. Update your development environment

Each of these steps is detailed in the following sections after reading through the Recommended prerequisites.

Fork the git repository#

You will need your own fork to work on the code. Go to the geotech-pandas project page and hit the Fork button.

Clone your fork#

To clone your fork to your local machine:

git clone git@GitHub.com:your-user-name/geotech-pandas.git geotech-pandas-yourname
cd geotechpandas-yourname
git remote add upstream git://GitHub.com/fraserdominicdavid/geotech-pandas.git
git fetch upstream

This creates the directory geotech-pandas-yourname and connects your repository to the upstream (main project) geotech-pandas repository.

Set up a development environment#

Thanks to Dev Containers, it is now easier to set up a consistent environment for the development of this project. The following development environments are supported:

  1. GitHub Codespaces:

    From your GitHub fork, click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.

  2. VS Code:

    Clone your fork, open it with VS Code, and run Ctrl/⌘ + + PDev Containers: Reopen in Container.

  3. PyCharm:

    Clone your fork, open it with PyCharm, and configure Docker Compose as a remote interpreter with the dev service.

  4. Terminal:

    Clone your fork, open it with your terminal, and run:

    docker compose up --detach dev
    

    to start a Dev Container in the background, and then run:

    docker compose exec dev zsh
    

    to open a shell prompt in the Dev Container.

Create a feature branch#

Your local main branch should always reflect the current state of the geotech-pandas repository. First ensure that it’s up-to-date with the main repository:

git checkout main
git pull upstream main --ff-only

Then, create a feature branch for making your changes. For example:

git checkout -b shiny-new-feature

This changes your working branch from the main to the shiny-new-feature branch. Keep any changes in this branch specific to one bug or feature so it is clear what the branch brings to geotech-pandas. You can have many feature branches and switch in between them using the git checkout command.

To update this branch, you need to retrieve the changes from the main branch:

git fetch upstream
git rebase upstream/main

This will replay your commits on top of the latest geotech-pandas git main. If this leads to merge conflicts, you must resolve these before submitting your PR. If you have uncommitted changes, you will need to git stash them prior to updating. This will effectively store your changes and they can be reapplied after updating.

Make changes to code#

The geotech-pandas project is serious about testing and strongly encourages contributors to embrace test-driven development (TDD). This development process “relies on the repetition of a very short development cycle: first the developer writes an, initially failing, automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test.” So, before actually writing any code, you should write your tests. Often the test can be taken from the original GitHub issue. However, it is always worth considering additional use cases and writing corresponding tests.

To see all the changes you’ve currently made, run:

git status

Add reasonable tests#

Adding tests is one of the most common requests after code is pushed to geotech-pandas. Therefore, it is worth getting in the habit of writing tests ahead of time so this is never an issue.

The pytest testing system and convenient extensions in the pandas._testing module are used in geotech-pandas.

All tests should go into the tests directory. This folder contains many current examples of tests, and we suggest looking to these for inspiration.

The pandas._testing module has some special assert functions that make it easier to make statements about whether Series or DataFrame objects are equivalent. The easiest way to verify that your code is correct is to explicitly construct the result you expect, then compare the actual result to the expected correct result, using the appropriate assert functions from pandas._testing.

The tests can then be run directly inside your Git clone by typing:

poe test

Lint and format code#

The PEP8 standard is followed in geotech-pandas with the help of Ruff to ensure a consistent code format throughout the project.

Continuous Integration (CI) with GHA will run lint checking tools and report any stylistic errors in your code. Therefore, it is helpful, before submitting code, to run the check yourself:

poe lint

to auto-format your code. Additionally, the Dev Container supplied in this project applies these tools as you edit files.

Update the documentation#

The geotech-pandas documentation resides in the docs folder of the repository. Changes to the documentation are made by modifying the appropriate file in that folder. The documentation uses the reStructuredText syntax rendered using Sphinx. For more information, see reStructuredText Primer.

On the other hand, the docstrings follow the Numpy Docstring Standard.

We highly encourage you to follow the Google developer documentation style guide when updating or creating new documentation.

Once you have made your changes, you may try if they render correctly by building the docs using sphinx. To do so, run:

poe docs

The resulting html pages will be located in docs/_build/html.

If you wish to render a “clean” build, run:

poe docs -O "-E -a"

This ensures that sphinx will rebuild and save the output files completely.

A “cleaner” build can also be done by removing the _build and api-reference/api folders first before building:

rm -r docs/api-reference/api docs/_build
poe docs

Commit changes#

After making the changes, tests, linting and formatting, you can now stage your changes using:

git add path/to/files-to-be-added-or-changed.py

Running git status, the output should be:

On branch shiny-new-feature

     modified:   /relative/path/to/file-to-be-added-or-changed.py

Note that this project follows the Conventional Commits standard with the help of Commitizen. So, to commit using Commitizen, run:

cz commit

This command will guide you through the process of writing an appropriate commit message.

There are also pre-commit hooks set up to ensure that lint checks and code tests are run before committing, but, again, it is always helpful to run these yourself first before committing with:

poe test

and:

poe lint

Push your changes#

When you want your changes to appear publicly on your GitHub page, push your forked feature branch’s commits:

git push origin shiny-new-feature

Here origin is the default name given to your remote repository on GitHub. You can see the remote repositories:

git remote -v

If you added the upstream repository as described above you will see something like:

origin  git@GitHub.com:your-user-name/geotech-pandas.git (fetch)
origin  git@GitHub.com:your-user-name/geotech-pandas.git (push)
upstream        git://GitHub.com/fraserdominicdavid/geotech-pandas.git (fetch)
upstream        git://GitHub.com/fraserdominicdavid/geotech-pandas.git (push)

Now your code is on GitHub, but it is not yet a part of the geotech-pandas project. For that to happen, a PR needs to be submitted on GitHub.

Submit a pull request#

Once you’ve made changes and pushed them to your forked repository, you then submit a PR to have them integrated into the geotech-pandas code base.

You can find a PR tutorial in the GitHub’s Help Docs.

Update your pull request#

Based on the review you get on your PR, you will probably need to make some changes to the code. You can follow the above steps again to address any feedback and update your PR.

It is also important that updates in the geotech-pandas main branch are reflected in your pull request. To update your feature branch with changes in the geotech-pandas main branch, run:

git checkout shiny-new-feature
git fetch upstream
git merge upstream/main

If there are no conflicts, or if they could be fixed automatically, a file with a default commit message will open, and you can simply save and quit this file.

If there are merge conflicts, you need to solve those conflicts. For more information, see Resolving a merge conflict using the command line.

Once the conflicts are resolved, run:

  1. git add -u to stage any files you’ve updated;

  2. git commit to finish the merge.

Note

If you have uncommitted changes at the moment you want to update the branch with main, you will need to stash them prior to updating. This will effectively store your changes and they can be reapplied after updating. For more information, see Stashing and Cleaning.

After the feature branch has been update locally, you can now update your PR by pushing to the branch on GitHub:

git push origin shiny-new-feature

Any git push will automatically update your PR with your branch’s changes and restart the CI checks.

Update your development environment#

It is important to periodically update your local main branch with updates from the geotech-pandas main branch and update your development environment to reflect any changes to the various packages that are used during development. To do that, run:

git checkout main
git fetch upstream
git merge upstream/main

If there are any updates to the dependencies, for instance, if the pyproject.toml and/or poetry.lock files are changed, then you must also rebuild your Dev Container:

  1. GitHub Codespaces:

    Run Ctrl/⌘ + + PCodespaces: Rebuild Container

  2. VS Code:

    Run Ctrl/⌘ + + PDev Containers: Rebuild Container.

  3. PyCharm:

    See PyCharm’s documentation about Docker Compose.

  4. Terminal:

    Open your fork with your terminal, and run:

    docker compose up --build --detach dev
    

    to start and rebuild a Dev Container in the background, and then run:

    docker compose exec dev zsh
    

    to open a shell prompt in the Dev Container.

Tips for a successful pull request#

Once you have submitted a PR, one of the core contributors may take a look. Please note however that there are only a handful of people are responsible for reviewing all of the contributions, which can often lead to bottlenecks.

To improve the chances of your PR being reviewed, you should:

  • Reference an open issue for non-trivial changes to clarify the PR’s purpose;

  • Ensure you have reasonable tests and present them in the PR;

  • Keep your PRs as simple as possible to make it easier to review the PR;

  • Ensure that GHA status is green, otherwise, reviewers may not even take a look; and

  • Keep updating your PR, either by the request of a reviewer or every few days to keep up-to-date with the current geotech-pandas codebase.