KPI Benchmark
Benchmark methodology and results demonstrating CI Sizer’s resource optimization and energy savings.
Overview
The KPI Benchmark validates CI Sizer’s effectiveness through a controlled experiment measuring resource utilization, energy consumption, scheduling density, and reliability across multiple workload types and sizing conditions.
Repository: edp.buildth.ing/DevFW/kpi-benchmark
Methodology
Experimental Design
The benchmark uses a factorial design:
- 5 conditions × 3 workloads × 30 runs = 450 total runs
Conditions
| Condition | Description |
|---|---|
STATIC | Fixed resource allocations (baseline — no sizer) |
GARM_BARE | GARM runner provisioning without sizer |
GARM_OBSERVE | Sizer in observe mode (recommendations computed, not enforced) |
GARM_ENFORCE | Sizer in enforce mode (recommendations applied as K8s requests/limits) |
GARM_WARM | Sizer enforce mode with pre-warmed historical data |
Workloads
| Workload | Description |
|---|---|
carbon-burner | CPU stress workload |
memory-stress | Variable memory allocation workload |
go-build | Real-world multi-package Go compilation |
Statistical Approach
- Bootstrap BCa confidence intervals for resource metrics
- Fisher’s exact test for OOM rate comparisons
- Paired Wilcoxon signed-rank tests for duration comparisons
Key Results
Resource Optimization
| Metric | Improvement |
|---|---|
| CPU oversizing reduction | 79.64% |
| Memory oversizing reduction | 88.59% |
| Scheduling density improvement | 12.2× (500m → 41m CPU requests) |
Baseline Waste
Without CI Sizer, typical CI workloads exhibit significant resource waste:
| Workload | CPU Utilization | Waste |
|---|---|---|
| Batch (carbon-burner) | 12.8% | 87% |
| Go build | 48.4% | 52% |
Energy
- Per-run energy: 0.095–0.323 mWh (measured via CCF methodology)
- Projected savings at scale: 65–90% energy reduction via node-hour reduction
Reliability
| Scenario | Completion Rate |
|---|---|
| Without sizer (variable-memory workloads) | 60% |
| With sizer | 100% |
OOM without the sizer causes node eviction and collateral damage to co-located pods. With the sizer, failures are contained at the cgroup boundary.
Performance Overhead
| Mode | Overhead |
|---|---|
| Observe mode | <1% duration overhead, zero resource modification |
| Enforce mode (go-build) | 1–5% faster (tighter limits reduce scheduling contention) |
| GARM lifecycle (lightweight workloads) | 7–20% duration overhead |
Formal KPIs
The benchmark validates the following IPCEI-CIS work package objectives:
| KPI | Work Package | Objective | Target | Result |
|---|---|---|---|---|
| Resource utilization | WP e.1 | OB 45/46 | ≥10% improvement | 79–89% improvement |
| Sustainability | WP e.2 | OB 47/48 | ≥10% improvement | 65–90% projected |