Sizing Algorithm
Overview
CI Sizer analyses historical resource usage to recommend right-sized Kubernetes resource requests and limits for each container in a CI pod. The goal is to find the smallest allocation that safely completes the job — reducing waste without causing failures.
Methodology
The sizer computes recommendations by aggregating the N most recent clean (non-OOM) runs for a given workflow/job combination. The aggressiveness of the recommendation depends on the current confidence phase.
Confidence Phases
Every workflow/job progresses through three confidence phases as clean samples accumulate:
| Phase | Clean Samples | Behaviour |
|---|---|---|
| unknown | 0 | Returns bootstrap default: 4Gi memory, 500m CPU |
| learning | 1–2 | Applies 3× headroom above observed peak (conservative) |
| confident | ≥3 | Full algorithm with tight staircase buffer |
In the confident phase, the full algorithm below applies:
- Collect the N most recent runs (configurable via
?runs=query parameter, 1–100) - Per container, across runs:
- CPU request — take the selected percentile (default: p95) of each run’s CPU usage, then take the maximum across runs
- Memory request — take the peak memory of each run, then take the maximum across runs
- Apply buffers to add headroom above observed values
- Apply floor values to ensure minimum viable allocations
- Apply a memory ceiling — no single container can exceed the total pod memory observed across all runs (plus buffer)
- Round limits to clean values: CPU rounds up to the nearest 0.5 cores; memory rounds up to the next power of 2 in MiB
For full details on confidence phases and OOM recovery, see OOM Detection.
Query Parameters
| Parameter | Default | Description |
|---|---|---|
runs | 5 | Number of recent runs to analyse (1–100) |
buffer | 20 | CPU headroom percentage (memory uses the staircase below) |
cpu_percentile | p95 | CPU stat to use: peak, p99, p95, p75, p50, avg |
Thresholds and Floors
Every container receives a minimum viable allocation even if it was completely idle in all observed runs:
| Resource | Request Floor | Limit Floor |
|---|---|---|
| CPU | 10m | 500m |
| Memory | 32Mi | 128Mi |
Request and limit floors are intentionally asymmetric: a low request allows efficient scheduling bin-packing, while a higher limit prevents OOM kills or severe throttling if a previously-idle container becomes active.
Staircase Buffer
CPU uses a flat configurable buffer (default: 20%). Memory uses a staircase buffer — larger allocations are inherently more stable and over-provisioning them wastes more cluster resources:
| Observed Peak Memory | Buffer |
|---|---|
| < 1 GiB | 20% |
| 1 – 4 GiB | 10% |
| > 4 GiB | 5% |
CPU vs Memory Enforcement
Kubernetes treats CPU and memory differently, and the sizer reflects this:
- CPU is compressible — exceeding the limit causes throttling, not failure. The job continues, just slower.
- Memory is incompressible — exceeding the limit triggers an OOM kill. The job fails immediately.
Memory limits are therefore always enforced. CPU enforcement is opt-in via --cpu-sizing-mode:
| Mode | Description |
|---|---|
observe (default) | Compute CPU recommendations and report them, but mark enforced: false. The provider uses its own defaults. |
enforce | Apply CPU recommendations as Kubernetes requests/limits (enforced: true). |
Memory QoS
The --memory-qos flag controls the memory QoS class:
| Mode | Description |
|---|---|
guaranteed (default) | Memory request equals memory limit (Guaranteed QoS class). Prevents overcommit. |
burstable | Memory request is less than limit (Burstable QoS class). Allows burst above the request. |
Sizing Overrides
Operators can pin CPU and/or memory values at any scope instead of relying on the algorithm. Overrides are useful for known-heavy jobs, cost caps, or bootstrapping new workflows before enough historical data exists.
Scope Hierarchy
Overrides resolve with most-specific wins:
job > workflow > repo > org
Fields left null in an override are inherited from the next parent scope (or the algorithm). This means you can override only memory at the org level and let CPU continue to be computed from data.
Override API
| Method | Path | Description |
|---|---|---|
GET | /api/v1/sizing/overrides | List all overrides |
PUT | /api/v1/sizing/overrides/{org} | Upsert org-level override |
PUT | /api/v1/sizing/overrides/{org}/{repo} | Upsert repo-level override |
PUT | /api/v1/sizing/overrides/{org}/{repo}/{workflow} | Upsert workflow-level override |
PUT | /api/v1/sizing/overrides/{org}/{repo}/{workflow}/{job} | Upsert job-level override |
DELETE | Same paths as PUT | Remove override at that scope |
When an override is active, the sizing response includes override_scope in the meta block indicating which level matched (job, workflow, repo, org). When no override matched, the value is "global".
OOM-Aware Sizing
When OOM events are detected (via cgroup v2 memory.events or the 95%-of-limit heuristic), the sizer applies special handling:
- OOM-suspect samples are excluded from the clean sample count — they do not advance the confidence phase
- Exponential backoff on consecutive OOMs:
limit × 2^consecutiveOOMs - Node ceiling cap — backoff is bounded by the node ceiling (90% of node RAM or
--max-memory)
This ensures the sizer recovers gracefully from memory exhaustion without unbounded growth. For full details, see OOM Detection.
For the full sizing API response format, see the API Reference.