Shift Boundary Logic for OEE Availability Windows and Downtime Attribution

Shift boundary logic is the temporal contract that decides which crew, which day, and which planned-production window every machine-state interval belongs to, and it is one of the core subsystems of Downtime Classification & OEE Calculation. The problem is deceptively narrow: a factory floor runs on local wall-clock shift schedules, but telemetry arrives stamped in UTC, so the boundary between Shift A and Shift B is never a fixed offset — it drifts twice a year with daylight saving time, repeats or skips an hour, and lands mid-fault on a live line. This page specifies the shift-calendar contract, a production-grade Python materializer and interval slicer, the calendar-arithmetic failure modes that silently corrupt availability, and the reconciliation queries that prove every second of loaded time is attributed exactly once. It assumes upstream signal hygiene from PLC tag standardization and that downtime windows already exist from event-to-downtime mapping; this layer only slices and attributes them.

Get the boundary wrong and the damage is invisible: a fault that straddles a 22:00 handover is counted twice, a spring-forward shift reports 7 hours of planned production where there were 8, and cross-site benchmarking compares a Detroit day against a São Paulo day that began at a different absolute instant. The output still looks like a plausible percentage, which is exactly why boundary bugs survive quarterly audits.

The shift-calendar contract this layer enforces is summarized below. Every materialized shift row is a half-open UTC interval [shift_start, shift_end) carrying the metadata that OEE attribution and crew reporting depend on.

Field	Type	Rule	Why it matters
`shift_id`	UUID / deterministic hash	Unique per facility + day + shift slot	Idempotent upserts; stable joins
`facility_id`	string	Maps to one IANA timezone	Wall-clock → UTC conversion is per facility
`shift_start` / `shift_end`	`TIMESTAMPTZ`	Half-open `[start, end)`, monotonic, no overlap	Prevents double-counting at handovers
`crew_id`	string	Resolved from rotation pattern (DuPont, 4-on-4-off)	Crew-level downtime attribution
`planned_production_sec`	int	Shift span minus scheduled breaks/no-demand	Denominator of the Availability term
`is_scheduled`	bool	False on holidays / unstaffed days	Excludes unscheduled time from loaded time

The Availability term these windows feed is the canonical identity used across this section:

$A = \frac{\text{Planned Production Time} - \text{Downtime}}{\text{Planned Production Time}}$

Both terms in that fraction are scoped to a single shift window, so if the window is mis-bounded, both numerator and denominator are wrong and the error does not cancel.

Core concept and design contract Permalink to this section

A shift is defined by humans in local wall-clock time — “nights run 22:00 to 06:00” — but every comparison, join, and aggregation downstream must happen in a single monotonic time domain. The first contract is therefore wall-clock definition, UTC materialization: shift slots are authored once as (local_start_time, duration) tuples plus an IANA timezone (e.g. America/Chicago), and a materializer expands them into concrete UTC [start, end) intervals for every calendar day. Never store shifts as fixed UTC offsets; an offset that is correct in January is one hour wrong from March to November.

The second contract is half-open, gap-free, overlap-free coverage. Adjacent shifts must share an instant such that the day partitions cleanly: Shift A ends at exactly the instant Shift B begins, and the interval is closed on the left and open on the right, [start, end). Counting both endpoints as inclusive double-counts the boundary second across two shifts; leaving a gap drops it. This mirrors the ISA-95 separation between Level 1/2 device time and the Level 3/4 schedule that consumes it — the device emits absolute instants, and the schedule defines how those instants roll up into accountable production periods.

The third contract is DST-correct duration accounting. A standard day is 86,400 seconds, but the spring-forward day is 82,800 (an hour skipped) and the fall-back day is 90,000 (an hour repeated). Planned Production Time must reflect the actual elapsed wall-clock the crew worked, not a hardcoded 8 hours, or Availability is systematically biased on exactly two days a year. Resolving the absolute boundary instants depends on the same discipline as clock drift correction: the schedule’s notion of “now” and the telemetry’s notion of “now” must be reconciled to the same reference before they are compared.

The fourth contract is deterministic crew attribution. Rotating patterns (DuPont, 4-on-4-off, continental) assign a different crew to the same clock slot on different days. The materializer resolves crew from a rotation function keyed on the calendar date, so a downtime event is always attributable to the team that was actually on the line.

Implementation Permalink to this section

The materializer below expands wall-clock shift definitions into UTC interval rows for a date range, using zoneinfo so DST transitions are handled by the IANA database rather than by hand. It emits half-open intervals and computes per-shift planned production time net of scheduled breaks.

from __future__ import annotations

from dataclasses import dataclass
from datetime import date, datetime, time, timedelta
from zoneinfo import ZoneInfo

import pandas as pd


@dataclass(frozen=True)
class ShiftSlot:
    """A wall-clock shift definition, authored once per facility."""
    name: str                 # "A", "B", "C"
    local_start: time         # e.g. time(22, 0) for nights
    duration: timedelta       # wall-clock span, e.g. 8h
    break_sec: int = 0        # scheduled non-production within the shift


def _to_utc(local_dt_naive: datetime, tz: ZoneInfo, *, fold: int) -> datetime:
    """Localize a naive wall-clock datetime to UTC, resolving DST gaps/folds.

    `fold=0` picks the first occurrence of an ambiguous local time (pre fall-back),
    `fold=1` the second. Nonexistent spring-forward times are normalized forward
    by comparing the round-trip; this keeps boundaries monotonic.
    """
    aware = local_dt_naive.replace(tzinfo=tz, fold=fold)
    utc = aware.astimezone(ZoneInfo("UTC"))
    # Detect a spring-forward gap: the local time did not actually exist.
    if utc.astimezone(tz).replace(tzinfo=None) != local_dt_naive:
        # Shift the wall clock forward by the DST gap (typically 1h).
        aware = (local_dt_naive + timedelta(hours=1)).replace(tzinfo=tz, fold=fold)
        utc = aware.astimezone(ZoneInfo("UTC"))
    return utc


def materialize_shift_calendar(
    facility_id: str,
    tz_name: str,
    slots: list[ShiftSlot],
    start: date,
    end: date,
    crew_for: callable,            # (slot_name, calendar_date) -> crew_id
) -> pd.DataFrame:
    """Expand wall-clock shift slots into UTC [start, end) rows for [start, end]."""
    tz = ZoneInfo(tz_name)
    rows: list[dict] = []
    day = start
    while day <= end:
        for slot in slots:
            local_start = datetime.combine(day, slot.local_start)
            shift_start = _to_utc(local_start, tz, fold=0)
            # End is start + wall-clock duration, re-localized so DST is applied
            # across the span (a night shift over fall-back is 9h elapsed).
            local_end = local_start + slot.duration
            shift_end = _to_utc(local_end, tz, fold=1)
            span_sec = int((shift_end - shift_start).total_seconds())
            rows.append({
                "shift_id": f"{facility_id}:{day.isoformat()}:{slot.name}",
                "facility_id": facility_id,
                "crew_id": crew_for(slot.name, day),
                "slot": slot.name,
                "shift_start": shift_start,
                "shift_end": shift_end,
                "planned_production_sec": span_sec - slot.break_sec,
                "is_scheduled": True,
            })
        day += timedelta(days=1)
    cal = pd.DataFrame(rows).sort_values("shift_start").reset_index(drop=True)
    return cal

With the calendar materialized, downtime windows produced upstream are clipped to each overlapping shift. The slicer uses half-open interval intersection so a boundary-straddling fault is split into contiguous segments whose durations sum to the original — no leakage, no double-count.

def slice_events_to_shifts(events: pd.DataFrame, calendar: pd.DataFrame) -> pd.DataFrame:
    """Clip each event interval to every shift it overlaps; attribute per segment.

    events:   columns asset_id, state, start_utc, end_utc (tz-aware, half-open)
    calendar: output of materialize_shift_calendar
    Guarantees: sum(segment durations) == original event duration (no leakage).
    """
    cal = calendar.sort_values("shift_start")
    shift_iv = pd.IntervalIndex.from_arrays(
        cal["shift_start"], cal["shift_end"], closed="left"
    )

    segments: list[dict] = []
    for evt in events.itertuples(index=False):
        probe = pd.Interval(evt.start_utc, evt.end_utc, closed="left")
        for pos in shift_iv.get_indexer_for(shift_iv[shift_iv.overlaps(probe)]):
            s_start, s_end = cal.iloc[pos]["shift_start"], cal.iloc[pos]["shift_end"]
            seg_start = max(evt.start_utc, s_start)
            seg_end = min(evt.end_utc, s_end)
            if seg_start >= seg_end:        # touching boundary, no overlap
                continue
            segments.append({
                "asset_id": evt.asset_id,
                "state": evt.state,
                "shift_id": cal.iloc[pos]["shift_id"],
                "crew_id": cal.iloc[pos]["crew_id"],
                "start_utc": seg_start,
                "end_utc": seg_end,
                "duration_sec": (seg_end - seg_start).total_seconds(),
            })
    return pd.DataFrame(segments)

The half-open closed="left" interval is the load-bearing detail: an event ending exactly at 22:00:00 belongs to the day shift, and the night shift starting at 22:00:00 owns the next instant forward. Plant policy occasionally requires whole-interval attribution (the shift in which a fault started owns it entirely) rather than proportional splitting; expose that as a config flag rather than branching the math inline.

Edge cases and failure modes Permalink to this section

Calendar arithmetic is where boundary logic breaks on real floors. Each of these classes must be handled explicitly, not discovered when a controller’s monthly OEE refuses to reconcile.

Spring-forward gap (nonexistent local time). If a shift is defined to start at 02:30 in a timezone that jumps 02:00 → 03:00, that wall-clock instant never occurs. Naively localizing it raises or silently produces the wrong UTC. The _to_utc helper detects the gap by round-tripping and rolls the boundary forward, keeping shifts monotonic and the day at 23 hours.

Fall-back ambiguity (repeated local time). When 02:00 → 03:00 repeats, a 02:30 boundary is ambiguous — it happens twice. The fold attribute disambiguates: fold=0 is the first (pre-transition) occurrence, fold=1 the second. A night shift spanning fall-back genuinely elapses 9 hours, and planned_production_sec must reflect that or Availability is understated for the crew that worked the extra hour.

Overnight shifts crossing midnight. A 22:00–06:00 night shift belongs to two calendar dates. Decide and document the attribution key (typically the date the shift started) so dashboards, crew reports, and the time-series database sync layer all roll the same hours into the same business day.

Clock drift at the boundary. A PLC whose local clock has drifted 4 seconds against the MES stamps a fault at 21:59:58 that the floor experienced as 22:00:02 — landing it in the wrong shift. Boundary logic cannot fix uncorrected drift; it must run after clock drift correction has aligned every source to a common reference.

Boundary-straddling microstops. Short stoppages cluster around handovers as crews change over, and splitting one across two shifts fragments it below the microstop threshold in both, making it vanish from each. Resolve microstop merging before slicing, in coordination with threshold tuning for microstops, so a genuine 4-minute stop is not erased by the boundary.

Mid-period schedule changes. When a plant moves from two shifts to three, the calendar must be versioned with an effective date; regenerating history under the new pattern silently rewrites past OEE. Materialize forward only, and keep prior rows immutable.

Off-by-one endpoint counting. Treating both endpoints as inclusive adds one extra second to every shift and double-counts every handover instant. Always use half-open intervals end to end — materializer, slicer, and SQL range joins alike.

Verification and testing Permalink to this section

Shift boundary logic is deterministic, so the highest-value tests assert invariants: total coverage, DST-day duration, and zero overlap. Start with the two days a year that break naive code.

from datetime import date, time, timedelta
from zoneinfo import ZoneInfo


def test_dst_days_have_correct_duration():
    """Spring-forward day = 82,800s of shifts; fall-back day = 90,000s."""
    slots = [ShiftSlot("A", time(0, 0), timedelta(hours=24))]  # one 24h slot
    cal = materialize_shift_calendar(
        "P1", "America/Chicago", slots,
        date(2026, 3, 8), date(2026, 3, 8),       # US spring forward
        crew_for=lambda s, d: "X",
    )
    assert int(cal.iloc[0]["shift_end"].timestamp()
               - cal.iloc[0]["shift_start"].timestamp()) == 82_800

    cal = materialize_shift_calendar(
        "P1", "America/Chicago", slots,
        date(2026, 11, 1), date(2026, 11, 1),     # US fall back
        crew_for=lambda s, d: "X",
    )
    assert int(cal.iloc[0]["shift_end"].timestamp()
               - cal.iloc[0]["shift_start"].timestamp()) == 90_000


def test_cross_boundary_fault_splits_without_leakage():
    """A fault straddling 14:00 splits into two segments that sum to the whole."""
    import pandas as pd
    tz = ZoneInfo("America/Chicago")
    cal = materialize_shift_calendar(
        "P1", "America/Chicago",
        [ShiftSlot("A", time(6, 0), timedelta(hours=8)),
         ShiftSlot("B", time(14, 0), timedelta(hours=8))],
        date(2026, 6, 26), date(2026, 6, 26),
        crew_for=lambda s, d: s,
    )
    start = pd.Timestamp("2026-06-26 13:58", tz=tz).tz_convert("UTC")
    end = pd.Timestamp("2026-06-26 14:05", tz=tz).tz_convert("UTC")
    evt = pd.DataFrame([{"asset_id": "M1", "state": "Fault",
                         "start_utc": start, "end_utc": end}])
    segs = slice_events_to_shifts(evt, cal)
    assert len(segs) == 2
    assert set(segs["shift_id"].str[-1]) == {"A", "B"}
    assert segs["duration_sec"].sum() == (end - start).total_seconds()  # no leakage

Beyond unit tests, reconcile the sliced ledger against the shift calendar to prove gap-free, overlap-free coverage of loaded time. Run this against the TimescaleDB hypertable that stores the materialized calendar; any uncovered or double-covered second is a boundary defect.

-- Every shift's accounted seconds must equal its planned span (± tolerance).
-- Flags both leakage (sum < span) and double-counting (sum > span).
SELECT c.shift_id,
       c.planned_production_sec,
       COALESCE(SUM(e.duration_sec), 0)                       AS accounted_sec,
       COALESCE(SUM(e.duration_sec), 0) - c.planned_production_sec AS delta_sec
FROM   shift_calendar c
LEFT   JOIN event_segments e ON e.shift_id = c.shift_id
WHERE  c.is_scheduled
GROUP  BY c.shift_id, c.planned_production_sec
HAVING ABS(COALESCE(SUM(e.duration_sec), 0) - c.planned_production_sec)
       > 0.01 * c.planned_production_sec;       -- > 1% drift = audit

Performance and scale considerations Permalink to this section

At hundreds of assets and millions of state samples per minute, boundary resolution must move from per-row Python to set-based, partition-friendly execution. Four practices keep it cheap and correct.

Materialize the calendar once, ahead of time. Generate shift rows for the coming horizon (e.g. a rolling 90 days) in a nightly job and store them in a dedicated table. Boundary slicing then becomes a range join, not a per-event timezone computation.
Enforce non-overlap at the storage layer. A Postgres exclusion constraint makes overlapping windows physically impossible, turning a whole class of boundary bugs into insert-time errors rather than silent metric corruption.

CREATE TABLE shift_calendar (
    shift_id               TEXT PRIMARY KEY,
    facility_id            TEXT NOT NULL,
    crew_id                TEXT NOT NULL,
    shift_start            TIMESTAMPTZ NOT NULL,
    shift_end              TIMESTAMPTZ NOT NULL,
    planned_production_sec INTEGER NOT NULL,
    is_scheduled           BOOLEAN NOT NULL DEFAULT TRUE,
    CHECK (shift_end > shift_start),
    EXCLUDE USING GIST (
        facility_id WITH =,
        tstzrange(shift_start, shift_end, '[)') WITH &&   -- half-open, no overlap
    )
);

Align stream windows to the shift offset. When boundary slicing runs in Flink or Kafka Streams, configure tumbling windows whose offset matches the facility’s UTC shift start and enable exactly-once checkpointing, so a pipeline restart never re-emits a boundary segment. This complements idempotent shift_id keys: replays upsert rather than duplicate.
Partition the segment ledger by facility_id and shift date. Bounded partitions keep shift-report rollups to a single range scan and let retention drop old shifts wholesale, instead of paying IEEE 754 summation cost over an unbounded fact table (see precision and rounding limits for why summing many small durations in float drifts).

Treated as a versioned, materialized, set-based system rather than ad-hoc datetime math, shift boundary logic gives every downtime second a single owner — the right crew, the right day, the right planned-production window — so the Availability term it feeds into OEE formula validation is a number an engineer can defend in an audit.

Downtime Classification & OEE Calculation — parent section and end-to-end pipeline overview
Event-to-Downtime Mapping — produces the downtime windows this layer slices
Threshold Tuning for Microstops — merge short stops before boundary slicing fragments them
OEE Formula Validation — bounding and reconciling the Availability term these windows feed
Clock Drift Correction — align every clock to a common reference before boundaries are evaluated
Time-Series Database Sync — where the materialized calendar and segment ledger live

Shift Boundary Logic for OEE Availability Windows and Downtime Attribution

Core concept and design contract #Permalink to this section

Implementation #Permalink to this section

Edge cases and failure modes #Permalink to this section

Verification and testing #Permalink to this section

Performance and scale considerations #Permalink to this section

Related #Permalink to this section

Related in this section

Core concept and design contract Permalink to this section

Implementation Permalink to this section

Edge cases and failure modes Permalink to this section

Verification and testing Permalink to this section

Performance and scale considerations Permalink to this section

Related Permalink to this section