Implementing Linear Interpolation for Missing Sensor Values in IIoT Pipelines

Transient network partitions, MQTT broker backpressure, and asynchronous polling misalignments are inherent realities of industrial telemetry. This page is the concrete, code-first recipe under gap-filling algorithms for the most common case: reconstructing short, interior holes in a continuous analog tag — spindle torque, thermal gradient, hydraulic pressure differential — with horizon-bounded piecewise linear reconstruction. When programmable logic controllers (PLCs) or edge gateways drop packets, raw time-series streams develop temporal discontinuities that directly corrupt downstream Availability, Performance, and Quality (APQ) metrics. Naive forward-fill introduces artificial plateaus; statistical imputation can mask genuine machine degradation. Linear interpolation provides a deterministic, physics-aligned bridge for short-duration voids, preserving the expected rate-of-change for electromechanical variables over sub-second intervals.

The goal is not merely to fill a gap, but to fill it honestly: every synthesized point must be reproducible, bounded by a verified anchor on each side, never bridged across a real fault, and flagged so the OEE engine never silently counts fabricated data as uptime.

Prerequisite: temporal alignment before any fill Permalink to this section

Interpolation is mathematically sound only when applied to a strictly monotonic, uniformly spaced time index. Edge devices routinely exhibit oscillator drift of 10–50 ms/hour from unsynchronized NTP polling or hardware variance, and interpolating against drifting timestamps injects phase-shift artifacts that compound during OEE cycle-time aggregation and state-transition analysis. Establishing a synchronized, monotonic index is the job of clock drift correction, which must run upstream of this stage; the snippet below is the minimal alignment guard the fill assumes.

Before any gap-filling logic executes, raw telemetry undergoes deterministic resampling. The recommended approach anchors every stream to a fixed UTC epoch grid aligned to the PLC scan cycle. Drift correction computes the cumulative offset between device-local timestamps and a synchronized reference clock, then applies a monotonic adjustment to prevent timestamp inversion.

import pandas as pd


def correct_clock_drift(df: pd.DataFrame, ts_col: str, ref_col: str) -> pd.DataFrame:
    """Align device-local timestamps to a synchronized reference clock.

    Prevents the non-monotonic sequences that silently break interpolation
    kernels (pandas reindexes on a sorted axis and will mis-pair anchors).
    """
    df = df.sort_values(ts_col).reset_index(drop=True)
    drift = (pd.to_datetime(df[ref_col]) - pd.to_datetime(df[ts_col])).dt.total_seconds() * 1000.0
    # Rolling median filters transient network-jitter spikes from the offset
    smoothed_drift = drift.rolling(window=5, center=True, min_periods=1).median()
    df["ts_aligned"] = pd.to_datetime(df[ts_col]) + pd.to_timedelta(smoothed_drift, unit="ms")
    return df.set_index("ts_aligned").sort_index()

Once aligned, the series is resampled to a deterministic grid (df.resample("100ms") or your specific scan cycle). Missing grid cells are explicitly marked NaN, creating a clean canvas for the interpolation pass.

The core vectorized implementation Permalink to this section

Linear interpolation for discrete sampling relies on piecewise linear reconstruction between two verified anchor points. Given a missing sample at timestamp $t_m$ , bounded by the last valid observation $(t_p, v_p)$ and the next valid observation $(t_n, v_n)$ , the interpolated value $v_i$ is:

$v_i = v_p + (v_n - v_p) \cdot \frac{t_m - t_p}{t_n - t_p}$

This formulation assumes linear change between samples, which accurately models the physical inertia of most manufacturing actuators over sub-second intervals. In Python automation pipelines, iterative row-wise evaluation is prohibited — it is non-deterministic under threading and collapses throughput. Use the vectorized kernels in the pandas interpolation reference or numpy.interp instead.

import numpy as np
import pandas as pd


def bounded_linear_interpolation(
    series: pd.Series, max_gap_seconds: float, scan_cycle_ms: int
) -> pd.Series:
    """Linear interpolation with a strict horizon to prevent smoothing over real faults."""
    # Maximum allowable consecutive NaNs derived from the scan cycle
    max_consecutive_nans = int(np.ceil(max_gap_seconds / (scan_cycle_ms / 1000.0)))

    interpolated = series.interpolate(
        method="linear",
        limit=max_consecutive_nans,
        limit_direction="forward",
    )

    # Never back-propagate into pre-startup voids (no left anchor)
    first_valid_idx = series.first_valid_index()
    if first_valid_idx is not None:
        interpolated.loc[:first_valid_idx] = np.nan

    return interpolated

The limit parameter is derived from the PLC scan cycle and the maximum tolerable fill horizon per sensor tag, which itself comes from the tag metadata established during PLC tag standardization. A vibration accelerometer sampled at 1 kHz with a 0.5 s tolerance yields limit=500; a slow thermocouple on a 1 s grid might tolerate limit=30. Gaps exceeding the threshold stay NaN, triggering downstream alerting rather than silent data fabrication. Note also the IEEE 754 caveat covered in floating-point drift in sensor readings: the same fill must produce bit-identical results on every replay, so never split a single gap across chunk boundaries where partial-sum order can change.

Masking faults so interpolation never bridges them Permalink to this section

Linear interpolation must never span a genuine process anomaly, sensor saturation, or hard dropout that manifests as an extreme outlier. Filling blindly across a fault boundary smooths the exact diagnostic signal a predictive-maintenance model needs and corrupts root-cause analysis. A robust pipeline runs outlier detection methods before the fill: a rolling median absolute deviation (MAD) mask flags anomalous readings as NaN so they become gaps to be reconstructed, not anchors to interpolate from.

def mask_and_interpolate(
    df: pd.DataFrame, value_col: str, window: int = 15, threshold: float = 3.5
) -> pd.Series:
    """Detect outliers with rolling MAD, mask them, then bounded-interpolate the rest."""
    rolling_median = df[value_col].rolling(window=window, center=True, min_periods=1).median()
    mad = (df[value_col] - rolling_median).abs().rolling(
        window=window, center=True, min_periods=1
    ).median()

    # Modified Z-score (0.6745 = 0.75 quantile of the standard normal) for robustness
    modified_z = 0.6745 * (df[value_col] - rolling_median) / mad.replace(0, np.nan)
    outlier_mask = modified_z.abs() > threshold

    clean_series = df[value_col].where(~outlier_mask)
    return bounded_linear_interpolation(clean_series, max_gap_seconds=2.0, scan_cycle_ms=100)

This two-stage order — mask, then fill — guarantees interpolation only reconstructs telemetry lost to network jitter or polling misalignment, while genuine fault signatures survive intact for the maintenance models downstream.

Running the fill in async batches at scale Permalink to this section

High-frequency deployments ingest millions of rows per minute across hundreds of tags. Synchronous interpolation blocks the event loop and starves memory. An asynchronous pattern — the contract behind async batch processing — decouples ingestion, cleaning, and persistence, enabling horizontal scaling and backpressure management. The CPU-bound NumPy/pandas work is offloaded to a thread pool so the loop stays responsive to MQTT and WebSocket control traffic.

import asyncio
from concurrent.futures import ThreadPoolExecutor
from collections.abc import AsyncIterator


async def process_sensor_batch(
    batch_df: pd.DataFrame, executor: ThreadPoolExecutor
) -> pd.DataFrame:
    """Offload CPU-bound interpolation to a thread pool while keeping async I/O flowing."""
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(
        executor, lambda: mask_and_interpolate(batch_df, "torque_nm")
    )


async def run_async_pipeline(
    stream_generator: AsyncIterator[pd.DataFrame], batch_size: int = 50_000
) -> AsyncIterator[pd.DataFrame]:
    executor = ThreadPoolExecutor(max_workers=4)
    try:
        async for batch in stream_generator:
            if batch.empty:
                continue
            yield await process_sensor_batch(batch, executor)
    finally:
        executor.shutdown(wait=True)

Memory is reclaimed between batches, preventing heap fragmentation during continuous 24/7 operation. Partition by asset_id and keep each gap whole within a batch, otherwise a missing left- or right-anchor breaks both correctness and replay reproducibility.

Gotchas and anti-patterns Permalink to this section

Interpolating before masking outliers. If a stuck-at-zero fault or a saturation spike is still in the series, it becomes an anchor and the synthesized segment ramps toward a lie. Always run the MAD/IQR mask first.
Filling leading or trailing gaps. A limit_direction="both" or an unguarded bfill invents data before the first real reading or after the last, fabricating uptime at shift edges. Require a verified anchor on both sides.
A horizon set in samples, not seconds. Hard-coding limit=50 couples the fill to one scan rate; re-tune a line to 500 ms and a 5 s tolerance silently becomes 25 s. Derive the limit from max_gap_seconds / scan_cycle.
Bridging a shift change or recipe changeover. Interpolation across a process boundary blends two unrelated operating regimes. Reset the fill at boundaries defined by shift boundary logic.
Forgetting the audit flag. A synthesized value that is indistinguishable from a measured one will eventually be counted as real production. Emit an is_interpolated column and let OEE formula validation decay its confidence weight by gap duration.

Quick reference: when linear interpolation applies Permalink to this section

Scenario	Signal class	Gap duration	Use linear interpolation?	Alternative
Spindle torque, hydraulic pressure	Continuous analog, low inertia	≤ 5 s	Yes — closed-form, reproducible	—
Furnace / coolant temperature	Continuous analog, high inertia	5–60 s	Borderline — curvature lost	Cubic spline / Akima
Machine state, valve position, alarm	Discrete state	Any	No — toggles a real state	Bounded forward-fill
Any tag past its tolerance horizon	Any	> horizon	No — extrapolation is fabrication	Flag as `MISSING`
Leading / trailing gap (no anchor)	Any	Any	No — no anchor to bound	Leave `NaN`, alert

Gap-filling algorithms — parent overview: method selection by signal physics and the auditability contract
Clock drift correction — the temporal alignment this fill depends on
Outlier detection methods — quarantining anomalies so they become gaps, not anchors
Z-score filtering for vibration anomalies — the masking pass applied to rotating-asset telemetry
Async batch processing — sealing the contiguous windows this stage fills

Implementing Linear Interpolation for Missing Sensor Values in IIoT Pipelines

Prerequisite: temporal alignment before any fill #Permalink to this section

The core vectorized implementation #Permalink to this section

Masking faults so interpolation never bridges them #Permalink to this section

Running the fill in async batches at scale #Permalink to this section

Gotchas and anti-patterns #Permalink to this section

Quick reference: when linear interpolation applies #Permalink to this section

Related #Permalink to this section