Calculating OEE with Overlapping Maintenance Windows

A scheduled maintenance window that intersects an unplanned breakdown, a shift transition, or a microstop is one of the most common ways an OEE pipeline quietly produces wrong numbers. This is a specific case of event-to-downtime mapping: raw telemetry from PLCs, OPC-UA servers, and edge gateways never arrives as clean, mutually exclusive time buckets. It arrives as asynchronous state transitions, heartbeat gaps, and overlapping event flags that must be temporally aligned before any aggregation runs. When two downtime intervals overlap and a naive routine adds their durations, it double-counts the shared minutes, deflates Availability below its true value, and corrupts every performance baseline downstream. The fix is set-theoretic: compute the union of overlapping intervals, attribute each merged block to exactly one priority-ranked category, then feed clean, non-overlapping windows into the OEE math.

The Availability term this protects is:

$A = \frac{\text{Planned Production Time} - \text{Downtime}}{\text{Planned Production Time}}$

where Downtime is the total of deduplicated unplanned intervals and scheduled maintenance is excluded from Planned Production Time entirely. Get the interval algebra wrong and both the numerator and denominator drift.

Variant 1: Interval Merging with the Sweep-Line Union Permalink to this section

Before computing Availability, Performance, or Quality, every event stream must be normalized to a single UTC timeline — stripping DST artifacts, applying monotonic clock corrections to eliminate backward timestamp drift, and aligning PLC heartbeat signals to a consistent sampling frequency. The same clock drift correction discipline used upstream applies here verbatim; interval algebra on non-monotonic timestamps produces phantom overlaps and negative durations. Standard-library Python datetime primitives handle the timezone normalization once the timestamps are monotonic.

Once normalized, the pipeline merges overlapping windows into a single contiguous block rather than additive fragments. A sweep-line pass over start-sorted intervals is the deterministic standard for this:

A naive sum yields 90 minutes of downtime; the merged union is 60 minutes, attributed to the higher-priority Maintenance category. The merge sorts by start time, then by priority so resolution is deterministic regardless of arrival order:

from dataclasses import dataclass
from datetime import datetime


@dataclass
class EventInterval:
    start: datetime
    end: datetime
    category: str
    priority: int  # lower integer = higher priority


def merge_overlapping_intervals(
    intervals: list[EventInterval],
) -> list[EventInterval]:
    """Collapse overlapping intervals into their union.

    Overlapping windows are extended to a single contiguous block and
    attributed to the highest-priority category present, so a planned
    maintenance window bleeding into a breakdown counts once, not twice.
    """
    if not intervals:
        return []

    # Sort by start, then priority → deterministic resolution.
    ordered = sorted(intervals, key=lambda x: (x.start, x.priority))
    merged: list[EventInterval] = [ordered[0]]

    for current in ordered[1:]:
        last = merged[-1]
        if current.start < last.end:  # overlap (half-open [start, end))
            last.end = max(last.end, current.end)
            if current.priority < last.priority:
                last.category = current.category
                last.priority = current.priority
        else:
            merged.append(current)
    return merged

Treat intervals as half-open [start, end) so a window ending at exactly 09:30:00 and another starting at 09:30:00 are adjacent, not overlapping — this prevents zero-width phantom overlaps at boundaries.

Variant 2: Priority Masking for Nested and Partial Overlaps Permalink to this section

The harder case is a nested overlap: a scheduled maintenance window spans T1–T4 while an unplanned breakdown fires at T2 and clears at T3 entirely inside it. The pipeline must keep the union (T1–T4) and attribute the overlap to the higher-priority class, not split it into three separately counted fragments. Production pipelines enforce a strict priority hierarchy during classification:

Scheduled / planned maintenance — excluded from the OEE denominator (Planned Production Time).
Unplanned breakdowns and changeovers — deducted from Availability.
Microstops and speed losses — deducted from Performance, not Availability.
Running state — lowest priority, the baseline.

Applying this mask after the union merge guarantees a lower-priority event can never overwrite a higher-priority state. The mask resolves every timeline cell to one owner before any duration is summed, which is what keeps root-cause attribution auditable and aligns with the canonical state contract defined in event-to-downtime mapping.

from datetime import timedelta


def split_by_oee_dimension(
    merged: list[EventInterval],
) -> dict[str, timedelta]:
    """Aggregate deduplicated intervals into OEE buckets.

    Maintenance reduces Planned Production Time; breakdowns reduce
    Availability; microstops reduce Performance. Because intervals are
    already a non-overlapping union, summation cannot double-count.
    """
    buckets: dict[str, timedelta] = {
        "planned_downtime": timedelta(),
        "availability_loss": timedelta(),
        "performance_loss": timedelta(),
    }
    routing = {
        "Maintenance": "planned_downtime",
        "Breakdown": "availability_loss",
        "Changeover": "availability_loss",
        "Microstop": "performance_loss",
    }
    for iv in merged:
        bucket = routing.get(iv.category)
        if bucket is None:
            continue  # Running / no-demand: no loss attributed
        buckets[bucket] += iv.end - iv.start
    return buckets

Variant 3: Microstop Threshold Guards Against Maintenance Bleed Permalink to this section

Microstop classification is acutely sensitive to threshold configuration. A 45-second PLC fault occurring during a maintenance bleed-over can be misclassified as both a microstop and a maintenance extension if the pipeline lacks temporal guards. The rule is to evaluate microstop thresholds only after scheduled maintenance windows are explicitly masked out — a two-pass strategy:

Pass 1 — exclusion mask. Truncate or discard any interval that overlaps a scheduled maintenance block, so maintenance time can never leak into the Performance denominator.
Pass 2 — residual scan. Scan the remaining timeline for stoppages exceeding the configured microstop threshold (typically 15–60 s, calibrated to the machine’s theoretical cycle time). Machine-specific calibration is covered in threshold tuning for microstops.

If a gap exceeds the heartbeat timeout but falls below the microstop threshold, classify it as Transient Noise rather than forcing it into the Performance denominator — and route genuine telemetry voids to bounded gap-filling algorithms that record every synthesized value instead of silently bridging the gap.

def apply_maintenance_mask(
    candidates: list[EventInterval],
    maintenance: list[EventInterval],
) -> list[EventInterval]:
    """Subtract maintenance windows from each candidate stoppage.

    Run this BEFORE microstop threshold evaluation so a fault during a
    bleed-over is not counted twice (as both microstop and maintenance).
    A candidate straddling a maintenance window is split into the
    surviving segments on either side; one fully inside is dropped.
    """
    blocks = merge_overlapping_intervals(list(maintenance))  # non-overlapping mask
    survivors: list[EventInterval] = []

    for c in candidates:
        cursor = c.start
        for m in blocks:
            if m.end <= cursor or m.start >= c.end:
                continue  # no overlap with the remaining candidate span
            if m.start > cursor:  # surviving segment before this mask block
                survivors.append(
                    EventInterval(cursor, m.start, c.category, c.priority)
                )
            cursor = max(cursor, m.end)  # jump past the masked region
        if cursor < c.end:  # trailing surviving segment
            survivors.append(
                EventInterval(cursor, c.end, c.category, c.priority)
            )
    return survivors

Variant 4: Shift Boundaries and Production-Calendar Alignment Permalink to this section

When a maintenance window spans two shifts, a naive pipeline splits the downtime and calculates Availability against the wrong Planned Production Time on each side of the handover. Anchor the denominator to a validated production calendar rather than arbitrary 8-hour blocks: Planned Production Time = Total Shift Duration − Scheduled Downtime (aligned to calendar). Treat shift handovers as continuous intervals unless staggered maintenance is explicitly configured — the full slicing contract lives in shift boundary logic, and standardized downtime categories should map to ISA-95 Part 11 so shift-level reporting stays consistent across rolling windows.

Slice merged intervals at shift boundaries after the union and priority mask, so each fragment is attributed to the correct shift’s denominator without re-introducing overlaps. The final Availability and OEE composite produced this way is what downstream OEE formula validation checks against its sanity bounds, and any residual float error is governed by the precision and rounding contract for the ratios.

Gotchas & Anti-Patterns Permalink to this section

Summing durations before merging. Adding breakdown_minutes + maintenance_minutes double-counts every overlapped second. Always compute the union first, then sum the non-overlapping result.
Closed intervals at boundaries. Treating intervals as [start, end] (inclusive) makes back-to-back windows register a one-tick overlap, inflating downtime. Use half-open [start, end).
Masking microstops before maintenance. Evaluating the microstop threshold before applying the maintenance exclusion mask lets a fault inside a bleed-over be counted as both a microstop and a maintenance extension.
Letting low priority win. Applying the priority mask before merging (or with the wrong sort) lets a Running cell overwrite a Maintenance cell, leaking planned time into Availability loss.
Dropping heartbeat gaps silently. A telemetry void longer than the heartbeat timeout that is silently discarded shrinks the denominator and flatters OEE. Classify it as Unknown Downtime instead of dropping it.

Quick Reference: Overlap Resolution Matrix Permalink to this section

Overlap scenario	Resolution rule	Counts against
Maintenance fully contains a breakdown	Union = maintenance span; breakdown absorbed	Planned downtime only (excluded from denominator)
Breakdown partially overlaps maintenance	Union of both; overlap → maintenance (higher priority)	Planned downtime for overlap, Availability for remainder
Microstop inside a maintenance window	Mask out before threshold scan	Nothing (suppressed)
Two unplanned breakdowns overlap	Single merged interval = their union	Availability loss, counted once
Maintenance spans a shift boundary	Slice at boundary after merge; per-shift denominator	Planned downtime, split per calendar
Gap > heartbeat, < microstop threshold	Classify as `Transient Noise`	Nothing (not a stoppage)
Gap > heartbeat, source unknown	Classify as `Unknown Downtime`	Availability loss (never dropped)

Event-to-Downtime Mapping for OEE-Accurate Machine State Pipelines — the parent topic this overlap case sits inside
Threshold Tuning for Microstops — calibrate the microstop cutoff used in the two-pass mask
Shift Boundary Logic — slice merged intervals against a validated production calendar
OEE Formula Validation — sanity-check the Availability and OEE values these intervals produce
Clock Drift Correction — guarantee the monotonic timestamps interval algebra depends on

Calculating OEE with Overlapping Maintenance Windows

Variant 1: Interval Merging with the Sweep-Line Union #Permalink to this section

Variant 2: Priority Masking for Nested and Partial Overlaps #Permalink to this section

Variant 3: Microstop Threshold Guards Against Maintenance Bleed #Permalink to this section

Variant 4: Shift Boundaries and Production-Calendar Alignment #Permalink to this section

Gotchas & Anti-Patterns #Permalink to this section

Quick Reference: Overlap Resolution Matrix #Permalink to this section

Related #Permalink to this section