KPI Benchmark

Benchmark methodology and results demonstrating CI Sizer’s resource optimization and energy savings.

Overview

The KPI Benchmark validates CI Sizer’s effectiveness through a controlled experiment measuring resource utilization, energy consumption, scheduling density, and reliability across multiple workload types and sizing conditions.

Repository: edp.buildth.ing/DevFW/kpi-benchmark

Methodology

Experimental Design

The benchmark uses a factorial design:

5 conditions × 3 workloads × 30 runs = 450 total runs

Conditions

Condition	Description
`STATIC`	Fixed resource allocations (baseline — no sizer)
`GARM_BARE`	GARM runner provisioning without sizer
`GARM_OBSERVE`	Sizer in observe mode (recommendations computed, not enforced)
`GARM_ENFORCE`	Sizer in enforce mode (recommendations applied as K8s requests/limits)
`GARM_WARM`	Sizer enforce mode with pre-warmed historical data

Workloads

Workload	Description
`carbon-burner`	CPU stress workload
`memory-stress`	Variable memory allocation workload
`go-build`	Real-world multi-package Go compilation

Statistical Approach

Bootstrap BCa confidence intervals for resource metrics
Fisher’s exact test for OOM rate comparisons
Paired Wilcoxon signed-rank tests for duration comparisons

Key Results

Resource Optimization

Metric	Improvement
CPU oversizing reduction	79.64%
Memory oversizing reduction	88.59%
Scheduling density improvement	12.2× (500m → 41m CPU requests)

Baseline Waste

Without CI Sizer, typical CI workloads exhibit significant resource waste:

Workload	CPU Utilization	Waste
Batch (carbon-burner)	12.8%	87%
Go build	48.4%	52%

Energy

Per-run energy: 0.095–0.323 mWh (measured via CCF methodology)
Projected savings at scale: 65–90% energy reduction via node-hour reduction

Reliability

Scenario	Completion Rate
Without sizer (variable-memory workloads)	60%
With sizer	100%

OOM without the sizer causes node eviction and collateral damage to co-located pods. With the sizer, failures are contained at the cgroup boundary.

Performance Overhead

Mode	Overhead
Observe mode	<1% duration overhead, zero resource modification
Enforce mode (go-build)	1–5% faster (tighter limits reduce scheduling contention)
GARM lifecycle (lightweight workloads)	7–20% duration overhead

Formal KPIs

The benchmark validates the following IPCEI-CIS work package objectives:

KPI	Work Package	Objective	Target	Result
Resource utilization	WP e.1	OB 45/46	≥10% improvement	79–89% improvement
Sustainability	WP e.2	OB 47/48	≥10% improvement	65–90% projected