A Beginner’s Guide to CI/CD for ML Models (GitHub Actions + Docker + Kubernetes)

How to automate testing, container builds, and Kubernetes deployment for machine learning inference services

Posted by Perivitta on January 10, 2026 · 20 mins read
CI/CD for ML Models: A Step-by-Step Guide

CI/CD for ML Models using GitHub Actions, Docker, and Kubernetes

Deploying a machine learning model is very different from training it. Training usually happens in a notebook or a local script, but deployment requires an engineering workflow that ensures the model is stable, testable, scalable, and reproducible.

In real production environments, ML models are not deployed once. They are deployed repeatedly, because:

  • new datasets are collected.
  • feature engineering logic changes.
  • hyperparameters are tuned.
  • models are retrained periodically.
  • dependencies are upgraded.
  • bugs are fixed in the inference service.

Without a CI/CD pipeline, ML deployment becomes manual and error-prone. The most common result is inconsistent deployments, broken environments, and confusion about which model version is running in production.

This blog post provides a beginner-friendly but detailed step-by-step guide to implementing CI/CD for ML models using:

  • GitHub Actions for CI/CD automation.
  • Docker for packaging the inference service.
  • Kubernetes for scalable deployments.

1. What CI/CD Means in Machine Learning

CI/CD stands for Continuous Integration and Continuous Deployment. In normal software projects, CI/CD ensures code changes are tested and deployed automatically. In machine learning projects, the concept is similar, but it includes additional ML components such as model artifacts and preprocessing pipelines.

Continuous Integration (CI)

CI ensures every push to the repository is validated automatically. A good ML CI pipeline typically checks:

  • Python dependency installation.
  • unit tests for inference. code
  • model file existence and successful loading.
  • basic prediction sanity tests.
  • optional performance validation (accuracy threshold).

Continuous Deployment (CD)

CD automates deployment after CI passes. A standard ML CD pipeline typically:

  • builds a Docker image.
  • pushes the Docker image to a container registry.
  • deploys the image to Kubernetes.
  • performs rolling updates with minimal downtime.

2. Why ML CI/CD Is More Complex Than Software CI/CD

In normal software, deployment artifacts are mostly code. In ML, deployment artifacts include:

  • model weights (e.g., model.pkl, model.pt, model.onnx).
  • feature engineering / preprocessing logic.
  • training configuration.
  • dependency versions (NumPy, scikit-learn, PyTorch, etc.)
  • hardware assumptions (CPU vs GPU environments).

A CI/CD pipeline ensures these artifacts are deployed consistently. This is a major part of modern MLOps (Machine Learning Operations).


3. Target Architecture

The pipeline we want to implement follows a standard modern architecture:

Component Purpose
GitHub Repository Stores inference code, model artifact, Dockerfile, Kubernetes manifests
GitHub Actions Runs CI tests, builds Docker image, deploys to Kubernetes
Docker Packages code + dependencies + model into a portable container
Container Registry (GHCR) Stores built Docker images
Kubernetes Runs inference service at scale and supports rolling updates

The high-level deployment workflow is:

  1. Push changes to GitHub.
  2. GitHub Actions runs CI tests.
  3. Docker image is built.
  4. Docker image is pushed to registry.
  5. Kubernetes deployment is updated automatically.

4. Example Project Structure

A clean project structure makes automation easier. A recommended structure is:

ml-cicd-project/
│── app/
│   ├── main.py
│── models/
│   ├── model.pkl
│── requirements.txt
│── Dockerfile
│── k8s/
│   ├── deployment.yaml
│   ├── service.yaml
│── .github/
│   ├── workflows/
│       ├── cicd.yaml

This structure separates:

  • app/: inference API code.
  • models/: trained model artifact.
  • k8s/: Kubernetes deployment configuration.
  • .github/workflows/: GitHub Actions pipeline definition.

5. Building an Inference API (FastAPI Example)

In most real ML deployments, the model is wrapped in a web API. A common approach is to use FastAPI because it is lightweight, fast, and supports automatic API documentation.

Inference API code

from fastapi import FastAPI
import joblib
import numpy as np

app = FastAPI(title="ML Inference API")

# Load model at startup
model = joblib.load("models/model.pkl")

@app.get("/")
def health_check():
    return {"status": "ok", "message": "ML API is running"}

@app.post("/predict")
def predict(payload: dict):
    # Expected payload format:
    # {"features": [feature1, feature2, feature3]}
    features = payload["features"]
    X = np.array(features).reshape(1, -1)

    prediction = model.predict(X)

    return {"prediction": prediction.tolist()}

The API expects a JSON request body like:

{
  "features": [50000, 35, 720]
}

FastAPI also provides built-in Swagger documentation at:

http://localhost:8000/docs

6. Dockerizing the ML Model Service

Docker solves one major issue in ML deployment: environment reproducibility. Instead of manually installing dependencies on a server, Docker ensures the same environment runs everywhere.

requirements.txt

fastapi
uvicorn
numpy
joblib
scikit-learn

Dockerfile

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app/ app/
COPY models/ models/

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

This Dockerfile performs the following:

  • Uses a minimal Python base image.
  • Installs dependencies.
  • Copies inference code and model file.
  • Starts the API server using uvicorn.

Local Docker testing

docker build -t ml-api .
docker run -p 8000:8000 ml-api

If the container runs successfully, you can test the API endpoint:

curl http://localhost:8000/

7. Deploying the Container on Kubernetes

Docker solves packaging, but Kubernetes solves deployment management. Kubernetes is designed for running containers at scale, providing:

  • replication (multiple pods).
  • load balancing.
  • self-healing (restart crashed pods).
  • rolling updates (deploy new versions gradually).

Kubernetes Deployment YAML

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-api
  template:
    metadata:
      labels:
        app: ml-api
    spec:
      containers:
        - name: ml-api
          image: ghcr.io/YOUR_USERNAME/ml-api:latest
          ports:
            - containerPort: 8000

Explanation:

  • replicas: 2 ensures two instances of the API are running.
  • image defines which Docker image Kubernetes should pull.
  • containerPort specifies the port used inside the container.

Kubernetes Service YAML

apiVersion: v1
kind: Service
metadata:
  name: ml-api-service
spec:
  selector:
    app: ml-api
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8000
  type: LoadBalancer

The Service exposes the pods behind a stable endpoint. In cloud environments, LoadBalancer will provide a public IP.


8. GitHub Actions CI/CD Pipeline Setup

GitHub Actions allows us to automate the pipeline so deployment happens automatically on every push to the main branch.

Create a workflow file:

.github/workflows/cicd.yaml

GitHub Actions Workflow

name: CI/CD for ML Model Deployment

on:
  push:
    branches:
      - main

jobs:
  build-test-deploy:
    runs-on: ubuntu-latest

    steps:
      # Step 1: Checkout repository
      - name: Checkout code
        uses: actions/checkout@v4

      # Step 2: Setup Python
      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      # Step 3: Install dependencies
      - name: Install dependencies
        run: |
          pip install -r requirements.txt

      # Step 4: Validate model artifact exists and loads correctly
      - name: Validate model artifact
        run: |
          python -c "import joblib; joblib.load('models/model.pkl')"

      # Step 5: Login to GitHub Container Registry (GHCR)
      - name: Login to GHCR
        run: echo "$" | docker login ghcr.io -u $ --password-stdin

      # Step 6: Build Docker image
      - name: Build Docker image
        run: |
          docker build -t ghcr.io/$/ml-api:latest .

      # Step 7: Push Docker image
      - name: Push Docker image
        run: |
          docker push ghcr.io/$/ml-api:latest

      # Step 8: Install kubectl
      - name: Setup kubectl
        uses: azure/setup-kubectl@v4
        with:
          version: "latest"

      # Step 9: Configure kubeconfig (Kubernetes access)
      - name: Configure kubeconfig
        run: |
          mkdir -p $HOME/.kube
          echo "$" | base64 --decode > $HOME/.kube/config

      # Step 10: Deploy to Kubernetes
      - name: Deploy to Kubernetes
        run: |
          kubectl apply -f k8s/deployment.yaml
          kubectl apply -f k8s/service.yaml

This pipeline automatically performs:

  • dependency installation.
  • model loading validation.
  • Docker build + push.
  • Kubernetes deployment update.

9. Setting Up Kubernetes Authentication (KUBECONFIG)

GitHub Actions cannot access your Kubernetes cluster unless you provide authentication credentials. Kubernetes access is typically controlled using a kubeconfig file.

On your local machine, your kubeconfig is usually stored at:

~/.kube/config

Convert it into a base64 string:

cat ~/.kube/config | base64

Then store it in GitHub repository secrets:

  • KUBECONFIG_DATA → paste the base64 output

In the GitHub Actions workflow, it is decoded back into a kubeconfig file so that kubectl works.


10. Best Practice: Use Image Versioning (Avoid "latest")

Using the latest tag is not recommended for real production deployments. It becomes difficult to track which model version is running.

A better strategy is tagging images using the Git commit hash:

$

Improved Docker build step

- name: Build and push Docker image with SHA tag
  run: |
    IMAGE_TAG=$
    docker build -t ghcr.io/$/ml-api:$IMAGE_TAG .
    docker push ghcr.io/$/ml-api:$IMAGE_TAG

After building the image, update Kubernetes dynamically:

- name: Update Kubernetes deployment image
  run: |
    IMAGE_TAG=$
    kubectl set image deployment/ml-api ml-api=ghcr.io/$/ml-api:$IMAGE_TAG

This ensures:

  • every deployment is traceable.
  • rollback is easier.
  • you can identify exactly which commit is in production.

11. Adding Readiness and Liveness Probes

Kubernetes supports health checks to automatically restart broken pods. ML services may crash due to corrupted model files, memory issues, or unexpected requests.

Add probes to your deployment configuration:

readinessProbe:
  httpGet:
    path: /
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 20

Explanation:

  • Readiness probe ensures the service only receives traffic after it is ready.
  • Liveness probe ensures Kubernetes restarts the container if it becomes unresponsive.

12. ML-Specific CI Validation (Recommended)

In ML deployment, a pipeline should validate not only code correctness but also basic model validity. Otherwise, a broken or low-quality model can still pass CI.

A minimal validation step can include:

  • check that the model loads successfully.
  • run a dummy prediction.
  • ensure output shape is correct.

Example validation script

# validate_model.py
import joblib
import numpy as np

model = joblib.load("models/model.pkl")

dummy_input = np.array([[50000, 35, 720]])
prediction = model.predict(dummy_input)

print("Prediction output:", prediction)

Then add to GitHub Actions:

- name: Run model validation
  run: |
    python validate_model.py

In real pipelines, you can extend validation to enforce accuracy thresholds:

if accuracy < 0.85:
    raise Exception("Model performance too low. Deployment blocked.")

13. Rollback Strategy in Kubernetes

A strong reason for using Kubernetes is rollback capability. If a newly deployed model version causes failures, you can revert quickly.

Check rollout status

kubectl rollout status deployment/ml-api

Rollback to previous version

kubectl rollout undo deployment/ml-api

This is significantly safer than manually deploying containers on a VM.


14. Summary: What This CI/CD Pipeline Achieves

After implementing GitHub Actions + Docker + Kubernetes, you achieve:

  • automatic validation of model artifacts.
  • reproducible inference environments.
  • automated container builds and publishing.
  • automated Kubernetes deployments.
  • scalable inference services using replicas.
  • safe rolling updates and easy rollback.

This pipeline represents a strong foundation for real-world ML deployment workflows and is a practical first step into MLOps.


15. Next Improvements for Production-Level MLOps

This CI/CD workflow can be improved further using advanced tools:

  • MLflow Model Registry for managing model versions and approvals
  • ArgoCD GitOps for Kubernetes deployment automation.
  • Canary deployments to deploy new models to a small percentage of traffic first.
  • Monitoring using Prometheus and Grafana.
  • Data drift detection to identify when model performance degrades over time.

These additions help build a complete ML production lifecycle system.


Final Thoughts

CI/CD is a standard practice in software engineering, and machine learning systems should follow the same discipline. A trained model is not enough it must be packaged, tested, deployed, versioned, and monitored.

Using GitHub Actions, Docker, and Kubernetes provides a scalable and maintainable way to deploy machine learning models, enabling teams to ship updates faster while reducing deployment risks.

Once this foundation is implemented, teams can focus on improving model performance and reliability rather than manually deploying artifacts.


Related Articles