Sizing Algorithm

How CI Sizer calculates resource sizing recommendations for runners.

Overview

CI Sizer analyses historical resource usage to recommend right-sized Kubernetes resource requests and limits for each container in a CI pod. The goal is to find the smallest allocation that safely completes the job — reducing waste without causing failures.

Methodology

The sizer computes recommendations by aggregating the N most recent clean (non-OOM) runs for a given workflow/job combination. The aggressiveness of the recommendation depends on the current confidence phase.

Confidence Phases

Every workflow/job progresses through three confidence phases as clean samples accumulate:

PhaseClean SamplesBehaviour
unknown0Returns bootstrap default: 4Gi memory, 500m CPU
learning1–2Applies 3× headroom above observed peak (conservative)
confident≥3Full algorithm with tight staircase buffer

In the confident phase, the full algorithm below applies:

  1. Collect the N most recent runs (configurable via ?runs= query parameter, 1–100)
  2. Per container, across runs:
    • CPU request — take the selected percentile (default: p95) of each run’s CPU usage, then take the maximum across runs
    • Memory request — take the peak memory of each run, then take the maximum across runs
  3. Apply buffers to add headroom above observed values
  4. Apply floor values to ensure minimum viable allocations
  5. Apply a memory ceiling — no single container can exceed the total pod memory observed across all runs (plus buffer)
  6. Round limits to clean values: CPU rounds up to the nearest 0.5 cores; memory rounds up to the next power of 2 in MiB

For full details on confidence phases and OOM recovery, see OOM Detection.

Query Parameters

ParameterDefaultDescription
runs5Number of recent runs to analyse (1–100)
buffer20CPU headroom percentage (memory uses the staircase below)
cpu_percentilep95CPU stat to use: peak, p99, p95, p75, p50, avg

Thresholds and Floors

Every container receives a minimum viable allocation even if it was completely idle in all observed runs:

ResourceRequest FloorLimit Floor
CPU10m500m
Memory32Mi128Mi

Request and limit floors are intentionally asymmetric: a low request allows efficient scheduling bin-packing, while a higher limit prevents OOM kills or severe throttling if a previously-idle container becomes active.

Staircase Buffer

CPU uses a flat configurable buffer (default: 20%). Memory uses a staircase buffer — larger allocations are inherently more stable and over-provisioning them wastes more cluster resources:

Observed Peak MemoryBuffer
< 1 GiB20%
1 – 4 GiB10%
> 4 GiB5%

CPU vs Memory Enforcement

Kubernetes treats CPU and memory differently, and the sizer reflects this:

  • CPU is compressible — exceeding the limit causes throttling, not failure. The job continues, just slower.
  • Memory is incompressible — exceeding the limit triggers an OOM kill. The job fails immediately.

Memory limits are therefore always enforced. CPU enforcement is opt-in via --cpu-sizing-mode:

ModeDescription
observe (default)Compute CPU recommendations and report them, but mark enforced: false. The provider uses its own defaults.
enforceApply CPU recommendations as Kubernetes requests/limits (enforced: true).

Memory QoS

The --memory-qos flag controls the memory QoS class:

ModeDescription
guaranteed (default)Memory request equals memory limit (Guaranteed QoS class). Prevents overcommit.
burstableMemory request is less than limit (Burstable QoS class). Allows burst above the request.

Sizing Overrides

Operators can pin CPU and/or memory values at any scope instead of relying on the algorithm. Overrides are useful for known-heavy jobs, cost caps, or bootstrapping new workflows before enough historical data exists.

Scope Hierarchy

Overrides resolve with most-specific wins:

job > workflow > repo > org

Fields left null in an override are inherited from the next parent scope (or the algorithm). This means you can override only memory at the org level and let CPU continue to be computed from data.

Override API

MethodPathDescription
GET/api/v1/sizing/overridesList all overrides
PUT/api/v1/sizing/overrides/{org}Upsert org-level override
PUT/api/v1/sizing/overrides/{org}/{repo}Upsert repo-level override
PUT/api/v1/sizing/overrides/{org}/{repo}/{workflow}Upsert workflow-level override
PUT/api/v1/sizing/overrides/{org}/{repo}/{workflow}/{job}Upsert job-level override
DELETESame paths as PUTRemove override at that scope

When an override is active, the sizing response includes override_scope in the meta block indicating which level matched (job, workflow, repo, org). When no override matched, the value is "global".

OOM-Aware Sizing

When OOM events are detected (via cgroup v2 memory.events or the 95%-of-limit heuristic), the sizer applies special handling:

  • OOM-suspect samples are excluded from the clean sample count — they do not advance the confidence phase
  • Exponential backoff on consecutive OOMs: limit × 2^consecutiveOOMs
  • Node ceiling cap — backoff is bounded by the node ceiling (90% of node RAM or --max-memory)

This ensures the sizer recovers gracefully from memory exhaustion without unbounded growth. For full details, see OOM Detection.

For the full sizing API response format, see the API Reference.