CI/CD for ML Models using GitHub Actions, Docker, and Kubernetes
Deploying a machine learning model is very different from training it. Training usually happens in a notebook or a local script, but deployment requires an engineering workflow that ensures the model is stable, testable, scalable, and reproducible.
In real production environments, ML models are not deployed once. They are deployed repeatedly, because:
- new datasets are collected.
- feature engineering logic changes.
- hyperparameters are tuned.
- models are retrained periodically.
- dependencies are upgraded.
- bugs are fixed in the inference service.
Without a CI/CD pipeline, ML deployment becomes manual and error-prone. The most common result is inconsistent deployments, broken environments, and confusion about which model version is running in production.
This blog post provides a beginner-friendly but detailed step-by-step guide to implementing CI/CD for ML models using:
- GitHub Actions for CI/CD automation.
- Docker for packaging the inference service.
- Kubernetes for scalable deployments.
1. What CI/CD Means in Machine Learning
CI/CD stands for Continuous Integration and Continuous Deployment. In normal software projects, CI/CD ensures code changes are tested and deployed automatically. In machine learning projects, the concept is similar, but it includes additional ML components such as model artifacts and preprocessing pipelines.
Continuous Integration (CI)
CI ensures every push to the repository is validated automatically. A good ML CI pipeline typically checks:
- Python dependency installation.
- unit tests for inference. code
- model file existence and successful loading.
- basic prediction sanity tests.
- optional performance validation (accuracy threshold).
Continuous Deployment (CD)
CD automates deployment after CI passes. A standard ML CD pipeline typically:
- builds a Docker image.
- pushes the Docker image to a container registry.
- deploys the image to Kubernetes.
- performs rolling updates with minimal downtime.
2. Why ML CI/CD Is More Complex Than Software CI/CD
In normal software, deployment artifacts are mostly code. In ML, deployment artifacts include:
- model weights (e.g., model.pkl, model.pt, model.onnx).
- feature engineering / preprocessing logic.
- training configuration.
- dependency versions (NumPy, scikit-learn, PyTorch, etc.)
- hardware assumptions (CPU vs GPU environments).
A CI/CD pipeline ensures these artifacts are deployed consistently. This is a major part of modern MLOps (Machine Learning Operations).
3. Target Architecture
The pipeline we want to implement follows a standard modern architecture:
| Component | Purpose |
|---|---|
| GitHub Repository | Stores inference code, model artifact, Dockerfile, Kubernetes manifests |
| GitHub Actions | Runs CI tests, builds Docker image, deploys to Kubernetes |
| Docker | Packages code + dependencies + model into a portable container |
| Container Registry (GHCR) | Stores built Docker images |
| Kubernetes | Runs inference service at scale and supports rolling updates |
The high-level deployment workflow is:
- Push changes to GitHub.
- GitHub Actions runs CI tests.
- Docker image is built.
- Docker image is pushed to registry.
- Kubernetes deployment is updated automatically.
4. Example Project Structure
A clean project structure makes automation easier. A recommended structure is:
ml-cicd-project/
│── app/
│ ├── main.py
│── models/
│ ├── model.pkl
│── requirements.txt
│── Dockerfile
│── k8s/
│ ├── deployment.yaml
│ ├── service.yaml
│── .github/
│ ├── workflows/
│ ├── cicd.yaml
This structure separates:
- app/: inference API code.
- models/: trained model artifact.
- k8s/: Kubernetes deployment configuration.
- .github/workflows/: GitHub Actions pipeline definition.
5. Building an Inference API (FastAPI Example)
In most real ML deployments, the model is wrapped in a web API. A common approach is to use FastAPI because it is lightweight, fast, and supports automatic API documentation.
Inference API code
from fastapi import FastAPI
import joblib
import numpy as np
app = FastAPI(title="ML Inference API")
# Load model at startup
model = joblib.load("models/model.pkl")
@app.get("/")
def health_check():
return {"status": "ok", "message": "ML API is running"}
@app.post("/predict")
def predict(payload: dict):
# Expected payload format:
# {"features": [feature1, feature2, feature3]}
features = payload["features"]
X = np.array(features).reshape(1, -1)
prediction = model.predict(X)
return {"prediction": prediction.tolist()}
The API expects a JSON request body like:
{
"features": [50000, 35, 720]
}
FastAPI also provides built-in Swagger documentation at:
http://localhost:8000/docs
6. Dockerizing the ML Model Service
Docker solves one major issue in ML deployment: environment reproducibility. Instead of manually installing dependencies on a server, Docker ensures the same environment runs everywhere.
requirements.txt
fastapi
uvicorn
numpy
joblib
scikit-learn
Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ app/
COPY models/ models/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
This Dockerfile performs the following:
- Uses a minimal Python base image.
- Installs dependencies.
- Copies inference code and model file.
- Starts the API server using uvicorn.
Local Docker testing
docker build -t ml-api .
docker run -p 8000:8000 ml-api
If the container runs successfully, you can test the API endpoint:
curl http://localhost:8000/
7. Deploying the Container on Kubernetes
Docker solves packaging, but Kubernetes solves deployment management. Kubernetes is designed for running containers at scale, providing:
- replication (multiple pods).
- load balancing.
- self-healing (restart crashed pods).
- rolling updates (deploy new versions gradually).
Kubernetes Deployment YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-api
spec:
replicas: 2
selector:
matchLabels:
app: ml-api
template:
metadata:
labels:
app: ml-api
spec:
containers:
- name: ml-api
image: ghcr.io/YOUR_USERNAME/ml-api:latest
ports:
- containerPort: 8000
Explanation:
replicas: 2ensures two instances of the API are running.imagedefines which Docker image Kubernetes should pull.containerPortspecifies the port used inside the container.
Kubernetes Service YAML
apiVersion: v1
kind: Service
metadata:
name: ml-api-service
spec:
selector:
app: ml-api
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
The Service exposes the pods behind a stable endpoint.
In cloud environments, LoadBalancer will provide a public IP.
8. GitHub Actions CI/CD Pipeline Setup
GitHub Actions allows us to automate the pipeline so deployment happens automatically on every push to the
main branch.
Create a workflow file:
.github/workflows/cicd.yaml
GitHub Actions Workflow
name: CI/CD for ML Model Deployment
on:
push:
branches:
- main
jobs:
build-test-deploy:
runs-on: ubuntu-latest
steps:
# Step 1: Checkout repository
- name: Checkout code
uses: actions/checkout@v4
# Step 2: Setup Python
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
# Step 3: Install dependencies
- name: Install dependencies
run: |
pip install -r requirements.txt
# Step 4: Validate model artifact exists and loads correctly
- name: Validate model artifact
run: |
python -c "import joblib; joblib.load('models/model.pkl')"
# Step 5: Login to GitHub Container Registry (GHCR)
- name: Login to GHCR
run: echo "$" | docker login ghcr.io -u $ --password-stdin
# Step 6: Build Docker image
- name: Build Docker image
run: |
docker build -t ghcr.io/$/ml-api:latest .
# Step 7: Push Docker image
- name: Push Docker image
run: |
docker push ghcr.io/$/ml-api:latest
# Step 8: Install kubectl
- name: Setup kubectl
uses: azure/setup-kubectl@v4
with:
version: "latest"
# Step 9: Configure kubeconfig (Kubernetes access)
- name: Configure kubeconfig
run: |
mkdir -p $HOME/.kube
echo "$" | base64 --decode > $HOME/.kube/config
# Step 10: Deploy to Kubernetes
- name: Deploy to Kubernetes
run: |
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
This pipeline automatically performs:
- dependency installation.
- model loading validation.
- Docker build + push.
- Kubernetes deployment update.
9. Setting Up Kubernetes Authentication (KUBECONFIG)
GitHub Actions cannot access your Kubernetes cluster unless you provide authentication credentials.
Kubernetes access is typically controlled using a kubeconfig file.
On your local machine, your kubeconfig is usually stored at:
~/.kube/config
Convert it into a base64 string:
cat ~/.kube/config | base64
Then store it in GitHub repository secrets:
- KUBECONFIG_DATA → paste the base64 output
In the GitHub Actions workflow, it is decoded back into a kubeconfig file so that kubectl works.
10. Best Practice: Use Image Versioning (Avoid "latest")
Using the latest tag is not recommended for real production deployments.
It becomes difficult to track which model version is running.
A better strategy is tagging images using the Git commit hash:
$
Improved Docker build step
- name: Build and push Docker image with SHA tag
run: |
IMAGE_TAG=$
docker build -t ghcr.io/$/ml-api:$IMAGE_TAG .
docker push ghcr.io/$/ml-api:$IMAGE_TAG
After building the image, update Kubernetes dynamically:
- name: Update Kubernetes deployment image
run: |
IMAGE_TAG=$
kubectl set image deployment/ml-api ml-api=ghcr.io/$/ml-api:$IMAGE_TAG
This ensures:
- every deployment is traceable.
- rollback is easier.
- you can identify exactly which commit is in production.
11. Adding Readiness and Liveness Probes
Kubernetes supports health checks to automatically restart broken pods. ML services may crash due to corrupted model files, memory issues, or unexpected requests.
Add probes to your deployment configuration:
readinessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 10
periodSeconds: 20
Explanation:
- Readiness probe ensures the service only receives traffic after it is ready.
- Liveness probe ensures Kubernetes restarts the container if it becomes unresponsive.
12. ML-Specific CI Validation (Recommended)
In ML deployment, a pipeline should validate not only code correctness but also basic model validity. Otherwise, a broken or low-quality model can still pass CI.
A minimal validation step can include:
- check that the model loads successfully.
- run a dummy prediction.
- ensure output shape is correct.
Example validation script
# validate_model.py
import joblib
import numpy as np
model = joblib.load("models/model.pkl")
dummy_input = np.array([[50000, 35, 720]])
prediction = model.predict(dummy_input)
print("Prediction output:", prediction)
Then add to GitHub Actions:
- name: Run model validation
run: |
python validate_model.py
In real pipelines, you can extend validation to enforce accuracy thresholds:
if accuracy < 0.85:
raise Exception("Model performance too low. Deployment blocked.")
13. Rollback Strategy in Kubernetes
A strong reason for using Kubernetes is rollback capability. If a newly deployed model version causes failures, you can revert quickly.
Check rollout status
kubectl rollout status deployment/ml-api
Rollback to previous version
kubectl rollout undo deployment/ml-api
This is significantly safer than manually deploying containers on a VM.
14. Summary: What This CI/CD Pipeline Achieves
After implementing GitHub Actions + Docker + Kubernetes, you achieve:
- automatic validation of model artifacts.
- reproducible inference environments.
- automated container builds and publishing.
- automated Kubernetes deployments.
- scalable inference services using replicas.
- safe rolling updates and easy rollback.
This pipeline represents a strong foundation for real-world ML deployment workflows and is a practical first step into MLOps.
15. Next Improvements for Production-Level MLOps
This CI/CD workflow can be improved further using advanced tools:
- MLflow Model Registry for managing model versions and approvals
- ArgoCD GitOps for Kubernetes deployment automation.
- Canary deployments to deploy new models to a small percentage of traffic first.
- Monitoring using Prometheus and Grafana.
- Data drift detection to identify when model performance degrades over time.
These additions help build a complete ML production lifecycle system.
Final Thoughts
CI/CD is a standard practice in software engineering, and machine learning systems should follow the same discipline. A trained model is not enough it must be packaged, tested, deployed, versioned, and monitored.
Using GitHub Actions, Docker, and Kubernetes provides a scalable and maintainable way to deploy machine learning models, enabling teams to ship updates faster while reducing deployment risks.
Once this foundation is implemented, teams can focus on improving model performance and reliability rather than manually deploying artifacts.