Day 42 of 50 Days of Python: Setting Up CI/CD for Python Data Projects
Part of Week 6: Advanced Topics
Welcome to the final day of Week 6: Advanced topics. I’ve decided to end this week on CI/CD as it follows nicely into the final week of learning which is “Python in Production”. Naturally at this point it does become more than just Python but the other pieces of the puzzle like:
Infrastructure as Code (IaC)
Git
Docker
Yaml files
What Is CI/CD?
Continuous Integration (CI) is the practice of automatically building and testing every change pushed to your shared repository. It catches integration errors early and keeps the main branch in a releasable state.
Continuous Delivery (CD) extends CI by packaging the application (e.g., Docker image, Python wheel) and pushing it to an artifact registry or staging environment so that it can be deployed with a single click. Continuous Deployment goes one step further by automatically promoting those artifacts all the way to production once they pass the pipeline.
Applied to data projects, CI/CD means that ETL code, notebooks, model training scripts, and configuration are version‑controlled, unit‑tested, and released through the same automated pipeline. This ensures reproducibility, faster feedback loops, and safer rollbacks.
What We’ll Cover
The CI/CD lifecycle: trigger → build → test → package → deploy → notify.
Best‑practice folder layout & pyproject.toml for data projects.
Writing a reusable GitHub Actions workflow (.github/workflows/ci.yml).
Adding coverage, caching pip wheels, and matrix jobs (Python 3.9‑3.12).
Deploying Docker images to GitHub Container Registry or AWS ECR.
Key Concepts
→ Trigger: When a pipeline runs
→ Python Matrix: Essentially testing across different versions of Python
→ Cache Key: For speeding up installs
→ Fail Fast: Run jobs even if one fails
→ Environment: Where you’re going to deploy (dev, test and prod)
→ Secrets: Credentials for registries, environments and cloud resources.
→ Approval Gate: Fancy way of getting human approval before prod deployment
Hands On: GitHub Actions Pipeline
name: CI
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
schedule:
- cron: "0 3 * * 1" # weekly Monday 03:00 UTC run
jobs:
build-test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
fail-fast: false
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip" # built‑in cache for wheels
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r dev-requirements.txt
- name: Lint with Ruff
run: ruff src tests
- name: Run unit tests
run: pytest -q --cov=src --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
docker-deploy:
needs: build-test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && success()
steps:
- uses: actions/checkout@v4
- name: Login to GHCR
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build & push image
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/${{ github.repository }}:latest
- name: Deploy to staging via SSH
uses: appleboy/ssh-action@v1
with:
host: ${{ secrets.STAGING_HOST }}
username: ubuntu
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
docker pull ghcr.io/${{ github.repository }}:latest
docker compose -f /srv/compose/data-api.yml up -d
- name: Notify Slack
uses: slackapi/slack-github-action@v1
with:
payload: |
{"text":"Data API deployed to *staging* (<${{ github.server_url }}/${{ github.repository }}|view build>)"}
Run the workflow by pushing to main or opening a pull request. You’ll see matrix jobs for each Python version, followed by a single deploy job.
GitLab CI Alternative
stages: [lint, test, deploy]
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
lint:
stage: lint
image: python:3.11
script:
- pip install ruff
- ruff src tests
pytest:
stage: test
image: python:3.11
script:
- pip install -r requirements.txt
- pytest -q
docker-build:
stage: deploy
image: docker:latest
services: [docker:dind]
before_script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
script:
- docker build -t $CI_REGISTRY_IMAGE:latest .
- docker push $CI_REGISTRY_IMAGE:latest
environment:
name: staging
url: https://staging.example.com
only: [main]
GitLab’s built‑in Container Registry stores the image; an Environment panel tracks deployments.
TL;DR
CI: GitHub Actions runs lint → tests → coverage on every push (Python 3.9‑3.12 matrix).
CD: On main, builds and pushes Docker image to GHCR, then deploys to a staging server via SSH.
Secrets live in repo settings; use encrypted variables & short‑lived tokens.
Alternatives: replicate the flow in GitLab CI.
Next Up: Day 43 - Logging and Debugging Best Practices.
I’ll take you through logging and efficient debugging techniques. Which are backbones of production ready pipelines and code.
See you for the next one and as always… Happy coding!