KPI Benchmark

Benchmark methodology and results demonstrating CI Sizer’s resource optimization and energy savings.

Overview

The KPI Benchmark validates CI Sizer’s effectiveness through a controlled experiment measuring resource utilization, energy consumption, scheduling density, and reliability across multiple workload types and sizing conditions.

Repository: edp.buildth.ing/DevFW/kpi-benchmark

Methodology

Experimental Design

The benchmark uses a factorial design:

  • 5 conditions × 3 workloads × 30 runs = 450 total runs

Conditions

ConditionDescription
STATICFixed resource allocations (baseline — no sizer)
GARM_BAREGARM runner provisioning without sizer
GARM_OBSERVESizer in observe mode (recommendations computed, not enforced)
GARM_ENFORCESizer in enforce mode (recommendations applied as K8s requests/limits)
GARM_WARMSizer enforce mode with pre-warmed historical data

Workloads

WorkloadDescription
carbon-burnerCPU stress workload
memory-stressVariable memory allocation workload
go-buildReal-world multi-package Go compilation

Statistical Approach

  • Bootstrap BCa confidence intervals for resource metrics
  • Fisher’s exact test for OOM rate comparisons
  • Paired Wilcoxon signed-rank tests for duration comparisons

Key Results

Resource Optimization

MetricImprovement
CPU oversizing reduction79.64%
Memory oversizing reduction88.59%
Scheduling density improvement12.2× (500m → 41m CPU requests)

Baseline Waste

Without CI Sizer, typical CI workloads exhibit significant resource waste:

WorkloadCPU UtilizationWaste
Batch (carbon-burner)12.8%87%
Go build48.4%52%

Energy

  • Per-run energy: 0.095–0.323 mWh (measured via CCF methodology)
  • Projected savings at scale: 65–90% energy reduction via node-hour reduction

Reliability

ScenarioCompletion Rate
Without sizer (variable-memory workloads)60%
With sizer100%

OOM without the sizer causes node eviction and collateral damage to co-located pods. With the sizer, failures are contained at the cgroup boundary.

Performance Overhead

ModeOverhead
Observe mode<1% duration overhead, zero resource modification
Enforce mode (go-build)1–5% faster (tighter limits reduce scheduling contention)
GARM lifecycle (lightweight workloads)7–20% duration overhead

Formal KPIs

The benchmark validates the following IPCEI-CIS work package objectives:

KPIWork PackageObjectiveTargetResult
Resource utilizationWP e.1OB 45/46≥10% improvement79–89% improvement
SustainabilityWP e.2OB 47/48≥10% improvement65–90% projected