OOM Detection & Confidence-Gated Sizing

How CI Sizer detects out-of-memory events and adapts sizing recommendations through confidence phases.

Overview

CI Sizer v0.7.0 introduces confidence-gated sizing — a system that adapts recommendation aggressiveness based on how much data is available for a given workflow/job. Combined with OOM detection, the sizer can automatically recover from memory exhaustion events by applying exponential backoff and notifying the source forge via commit status.

Confidence Phases

Every workflow/job combination progresses through three confidence phases as the sizer accumulates clean (non-OOM) samples:

Phase	Condition	Behaviour
unknown	0 clean samples	Returns a bootstrap default of `4Gi` memory. API responds with HTTP 200 and `meta.confidence_phase == "unknown"`.
learning	1–2 clean samples	Applies 3× headroom above observed peak. Conservative to avoid OOMs while data is sparse.
confident	≥3 clean samples	Uses the tight staircase buffer (20%/10%/5%). Full algorithm precision.

Client Note

The bootstrap phase (0 samples) now returns HTTP 200 instead of 404. Clients should check meta.confidence_phase to distinguish bootstrap defaults from data-driven recommendations.

OOM Detection

Cgroup v2 Detection

The collector sidecar reads the cgroup v2 memory.events file and monitors the oom_kill counter. When the counter increments during a run, the sample is marked as an OOM event.

Source: internal/cgroup/oom.go

Heuristic Detection

For environments where the oom_kill counter is not available (e.g., cgroup v1), the sizer applies a heuristic: if the observed peak memory reaches ≥95% of the configured limit, the sample is marked as OOM-suspect. OOM-suspect samples are excluded from the clean sample count used for confidence phase progression.

Exponential Backoff

When consecutive OOMs are detected for a workflow/job, the sizer applies exponential backoff to the memory limit:

new_limit = current_limit × 2^consecutiveOOMs

The backoff is capped at the node ceiling to prevent unbounded growth.

Node Ceiling

The maximum memory allocation is bounded by the node ceiling, which is determined by:

Auto-detection — reads /proc/meminfo and uses 90% of total node RAM
Manual override — configurable via --max-memory flag

Similarly, --max-cpu caps the maximum CPU allocation.

Commit Status Notifications

When an OOM event is detected, the receiver posts a commit status notification to the source forge, alerting developers that their CI run was killed due to memory exhaustion.

Forgejo / GitHub

POST /api/v1/repos/{owner}/{repo}/statuses/{sha}

GitLab

POST /api/v4/projects/{id}/statuses/{sha}

Authentication is via the PRIVATE-TOKEN header for GitLab or Bearer token for Forgejo/GitHub.

Configuration

Flag	Environment Variable	Description	Default
`--notify-enabled`	`RECEIVER_NOTIFY_ENABLED`	Enable commit status notifications	`true`
`--notify-base-url`	`RECEIVER_NOTIFY_BASE_URL`	Forge base URL (auto-detected from push metadata if unset)	—
`--notify-token`	`RECEIVER_NOTIFY_TOKEN`	API token for forge commit status API	—

In most deployments, only --notify-token is required. The base URL and node ceiling are auto-detected.

Source: internal/receiver/notify/notify.go

Web UI Indicators

The web dashboard surfaces OOM information through several visual elements:

Confidence badges — displayed per workflow/job showing the current phase (unknown, learning, confident)
OOM banners — warning banners on affected workflow/job pages
Red markers on charts — individual OOM’d runs are highlighted with red markers on the timeline chart

Sizing Response

When OOM detection is active, the sizing API response includes additional fields in the meta block:

{
  "meta": {
    "confidence_phase": "learning",
    "clean_samples": 3,
    "consecutive_ooms": 1,
    "node_ceiling_memory": "28Gi",
    "node_ceiling_cpu": "14"
  }
}

Source Files

File	Purpose
`internal/cgroup/oom.go`	Cgroup v2 OOM detection via `memory.events`
`internal/receiver/sizing/confidence.go`	Confidence phase logic and phase transitions
`internal/receiver/notify/notify.go`	Commit status notification dispatch