Sizing Algorithm

How CI Sizer calculates resource sizing recommendations for runners.

Overview

CI Sizer analyses historical resource usage to recommend right-sized Kubernetes resource requests and limits for each container in a CI pod. The goal is to find the smallest allocation that safely completes the job — reducing waste without causing failures.

Methodology

The sizer computes recommendations by aggregating the N most recent clean (non-OOM) runs for a given workflow/job combination. The aggressiveness of the recommendation depends on the current confidence phase.

Confidence Phases

Every workflow/job progresses through three confidence phases as clean samples accumulate:

Phase	Clean Samples	Behaviour
unknown	0	Returns bootstrap default: `4Gi` memory, `500m` CPU
learning	1–2	Applies 3× headroom above observed peak (conservative)
confident	≥3	Full algorithm with tight staircase buffer

In the confident phase, the full algorithm below applies:

Collect the N most recent runs (configurable via ?runs= query parameter, 1–100)
Per container, across runs:
- CPU request — take the selected percentile (default: p95) of each run’s CPU usage, then take the maximum across runs
- Memory request — take the peak memory of each run, then take the maximum across runs
Apply buffers to add headroom above observed values
Apply floor values to ensure minimum viable allocations
Apply a memory ceiling — no single container can exceed the total pod memory observed across all runs (plus buffer)
Round limits to clean values: CPU rounds up to the nearest 0.5 cores; memory rounds up to the next power of 2 in MiB

For full details on confidence phases and OOM recovery, see OOM Detection.

Query Parameters

Parameter	Default	Description
`runs`	`5`	Number of recent runs to analyse (1–100)
`buffer`	`20`	CPU headroom percentage (memory uses the staircase below)
`cpu_percentile`	`p95`	CPU stat to use: `peak`, `p99`, `p95`, `p75`, `p50`, `avg`

Thresholds and Floors

Every container receives a minimum viable allocation even if it was completely idle in all observed runs:

Resource	Request Floor	Limit Floor
CPU	`10m`	`500m`
Memory	`32Mi`	`128Mi`

Request and limit floors are intentionally asymmetric: a low request allows efficient scheduling bin-packing, while a higher limit prevents OOM kills or severe throttling if a previously-idle container becomes active.

Staircase Buffer

CPU uses a flat configurable buffer (default: 20%). Memory uses a staircase buffer — larger allocations are inherently more stable and over-provisioning them wastes more cluster resources:

Observed Peak Memory	Buffer
< 1 GiB	20%
1 – 4 GiB	10%
> 4 GiB	5%

CPU vs Memory Enforcement

Kubernetes treats CPU and memory differently, and the sizer reflects this:

CPU is compressible — exceeding the limit causes throttling, not failure. The job continues, just slower.
Memory is incompressible — exceeding the limit triggers an OOM kill. The job fails immediately.

Memory limits are therefore always enforced. CPU enforcement is opt-in via --cpu-sizing-mode:

Mode	Description
`observe` (default)	Compute CPU recommendations and report them, but mark `enforced: false`. The provider uses its own defaults.
`enforce`	Apply CPU recommendations as Kubernetes requests/limits (`enforced: true`).

Memory QoS

The --memory-qos flag controls the memory QoS class:

Mode	Description
`guaranteed` (default)	Memory request equals memory limit (Guaranteed QoS class). Prevents overcommit.
`burstable`	Memory request is less than limit (Burstable QoS class). Allows burst above the request.

Sizing Overrides

Operators can pin CPU and/or memory values at any scope instead of relying on the algorithm. Overrides are useful for known-heavy jobs, cost caps, or bootstrapping new workflows before enough historical data exists.

Scope Hierarchy

Overrides resolve with most-specific wins:

job > workflow > repo > org

Fields left null in an override are inherited from the next parent scope (or the algorithm). This means you can override only memory at the org level and let CPU continue to be computed from data.

Override API

Method	Path	Description
`GET`	`/api/v1/sizing/overrides`	List all overrides
`PUT`	`/api/v1/sizing/overrides/{org}`	Upsert org-level override
`PUT`	`/api/v1/sizing/overrides/{org}/{repo}`	Upsert repo-level override
`PUT`	`/api/v1/sizing/overrides/{org}/{repo}/{workflow}`	Upsert workflow-level override
`PUT`	`/api/v1/sizing/overrides/{org}/{repo}/{workflow}/{job}`	Upsert job-level override
`DELETE`	Same paths as PUT	Remove override at that scope

When an override is active, the sizing response includes override_scope in the meta block indicating which level matched (job, workflow, repo, org). When no override matched, the value is "global".

OOM-Aware Sizing

When OOM events are detected (via cgroup v2 memory.events or the 95%-of-limit heuristic), the sizer applies special handling:

OOM-suspect samples are excluded from the clean sample count — they do not advance the confidence phase
Exponential backoff on consecutive OOMs: limit × 2^consecutiveOOMs
Node ceiling cap — backoff is bounded by the node ceiling (90% of node RAM or --max-memory)

This ensures the sizer recovers gracefully from memory exhaustion without unbounded growth. For full details, see OOM Detection.

For the full sizing API response format, see the API Reference.