Using Celery for High-Throughput MQTT Ingestion

When a single ingest process can no longer keep up with thousands of concurrent device connections, the fix is to fan telemetry out across a worker pool. This is the horizontal-scaling recipe behind async batch processing: sealed batches of MQTT payloads are handed to a distributed Celery cluster, cleaned in-flight, and persisted without blocking the broker subscription. The scenario it solves is specific — bursty, high-frequency PLC and CNC telemetry that must reach a time-series store and feed Overall Equipment Effectiveness (OEE) math with zero silent message loss. Get the worker contract wrong and a single pod restart drops payloads mid-deserialization, quietly corrupting the Availability denominator. The goal here is to decouple broker subscription from downstream processing while preserving message ordering and at-least-once delivery guarantees end to end.

Choosing and configuring the broker Permalink to this section

Celery’s distributed task model maps cleanly onto MQTT topic hierarchies, but the broker choice dictates your delivery guarantees. Redis is lightweight and low-latency, yet for industrial ingestion RabbitMQ is the safer default: its native AMQP 0-9-1 semantics give you persistent queues, dead-letter exchanges, and per-message acknowledgment tracking that Redis emulates only approximately. On a shop floor where a dropped batch silently skews an OEE report, those guarantees are not optional.

The two settings that matter most for telemetry are the prefetch multiplier and late acknowledgment. A worker_prefetch_multiplier of 1 stops a worker from greedily reserving a backlog it cannot process during a burst, which is what causes memory exhaustion on high-frequency sensor arrays. Setting task_acks_late=True means a task is acknowledged only after it has successfully run — so a crash during deserialization or schema validation returns the task to the queue instead of losing it. Pair it with task_reject_on_worker_lost=True so an evicted or OOM-killed worker re-queues its in-flight task rather than dropping it.

# celery_config.py — broker, durability, and routing
broker_url = "amqp://celery:secure_password@rabbitmq.internal:5672/iot_ingest"
result_backend = "redis://redis.internal:6379/0"

# Resilience & throughput tuning
worker_prefetch_multiplier = 1            # critical for bursty MQTT streams
task_acks_late = True                     # ack only after successful execution
task_reject_on_worker_lost = True         # re-queue on worker OOM / eviction
task_serializer = "json"
task_default_queue = "mqtt_raw"
task_default_exchange_type = "direct"

# Bound runaway cleaning tasks so a stuck worker cannot starve the queue
task_soft_time_limit = 25                 # raises SoftTimeLimitExceeded for cleanup
task_time_limit = 30                      # hard kill backstop

# Route by topic class so sensor and alarm traffic scale independently
task_routes = {
    "ingest.tasks.assemble_and_dispatch": {"queue": "sensor_processing"},
    "ingest.tasks.process_alarm_batch": {"queue": "alarm_processing"},
}

Routing by topic class lets you scale alarm processing and bulk sensor processing on separate worker pools with separate concurrency, so a flood of nuisance alarms never delays the OEE-critical sensor stream. Topic design upstream follows the same discipline described in MQTT topic hierarchies.

Bridging paho-mqtt to Celery without losing messages Permalink to this section

The MQTT client should run as a thin edge daemon whose only job is to receive payloads and enqueue them — never to do real work inside the network callback. Blocking the on_message callback stalls the broker’s keepalive and triggers spurious reconnects, which on a QoS 1 session means a storm of redeliveries. Subscribe with QoS 1 for at-least-once delivery, accept that duplicates and out-of-order arrival are normal, and hand each payload straight to Celery.

# bridge.py — paho-mqtt edge daemon that only enqueues
import json
import logging
import paho.mqtt.client as mqtt
from ingest.tasks import assemble_and_dispatch

logger = logging.getLogger("mqtt_bridge")

# A bounded staging buffer batches messages before dispatch to amortize task overhead.
_staging: list[dict] = []
BATCH_SIZE = 750


def on_message(client: mqtt.Client, userdata: object, msg: mqtt.MQTTMessage) -> None:
    """Fast path only: decode, stage, and dispatch a batch. No heavy work here."""
    try:
        payload = json.loads(msg.payload)
    except (json.JSONDecodeError, UnicodeDecodeError):
        logger.warning("undecodable payload on %s; dropping", msg.topic)
        return

    payload["broker_ts"] = payload.get("broker_ts") or _now_ms()
    _staging.append(payload)

    if len(_staging) >= BATCH_SIZE:
        # .delay() returns immediately; the callback never blocks on processing.
        assemble_and_dispatch.delay(_staging.copy())
        _staging.clear()


def on_connect(client: mqtt.Client, userdata, flags, rc, properties=None) -> None:
    client.subscribe("plant/+/line/+/sensor/#", qos=1)   # QoS 1 = at-least-once


def _now_ms() -> int:
    import time
    return int(time.time() * 1000)

Staging into a small buffer before calling .delay() keeps one Celery task from representing a single MQTT message, which would crush the broker with task overhead at high message rates. The buffer flushes on size here; in production also flush on a short wall-clock timer so a quiet line still drains promptly.

Batched task assembly and sequence validation Permalink to this section

Inside the worker, the first task amortizes serialization and database round-trips by validating and ordering a whole batch at once. Because the transport is at-least-once and brokers reorder under failover, the task must sort on an embedded sequence ID, drop duplicates, and detect gaps before anything downstream runs. A detected gap should not block the batch: log it, route the missing window to a recovery queue, and let the clean records proceed. This mirrors the order-invariance contract that all async batch processing windows must honor.

# tasks.py — assembly, dedup, and sequence validation
from celery import Celery
from celery.utils.log import get_task_logger

app = Celery("ingest")
app.config_from_object("celery_config")
logger = get_task_logger(__name__)


@app.task(bind=True, max_retries=3, default_retry_delay=2)
def assemble_and_dispatch(self, raw_messages: list[dict]) -> None:
    """Order by sequence, drop QoS 1 duplicates, flag gaps, then hand off to cleaning."""
    if not raw_messages:
        return

    # Deduplicate on (asset_id, seq_id): at-least-once delivery guarantees repeats.
    seen: set[tuple[str, int]] = set()
    unique: list[dict] = []
    for m in raw_messages:
        key = (m["asset_id"], m["seq_id"])
        if key not in seen:
            seen.add(key)
            unique.append(m)

    ordered = sorted(unique, key=lambda m: m["seq_id"])

    # Detect sequence gaps within the contiguous range.
    seq_ids = [m["seq_id"] for m in ordered]
    expected = range(seq_ids[0], seq_ids[-1] + 1)
    missing = sorted(set(expected) - set(seq_ids))
    if missing:
        logger.warning("sequence gap on %s: %s", ordered[0]["asset_id"], missing)
        app.send_task("ingest.tasks.recover_gap",
                      args=[ordered[0]["asset_id"], missing])

    # Hand the clean, ordered batch to the cleaning stage.
    run_cleaning_pipeline.delay(ordered, window_start=ordered[0]["ts"])

Deduplicating on (asset_id, seq_id) is what prevents double-counted parts and inflated performance numbers when a reconnecting client replays its QoS 1 backlog. The unique asset key depends on consistent PLC tag standardization upstream so that one accumulator can serve heterogeneous assets without collisions.

In-task cleaning before persistence Permalink to this section

Cleaning belongs inside the worker, not in a later job, so dirty data never reaches the OEE aggregator. Three transforms run in a fixed order on the sealed batch. First, clock drift correction realigns device timestamps against broker arrival time, because edge devices rarely hold NTP sync and skew produces phantom state transitions. Second, outlier detection methods using a rolling Median Absolute Deviation (MAD) mask physically impossible readings from EMI or wiring faults — MAD is preferred over a plain Z-score because it makes no Gaussian assumption. Detection only sets values to NaN; replacement is owned by the next stage. Third, gap-filling algorithms reconstruct the masked and missing samples — linear interpolation for continuous process variables, forward-fill for discrete state codes.

# cleaning.py — deterministic in-task cleaning
import numpy as np
import pandas as pd


@app.task
def run_cleaning_pipeline(messages: list[dict], window_start: float) -> None:
    df = pd.DataFrame(messages)
    df["ts"] = pd.to_datetime(df["ts"], unit="ms", utc=True)
    df.set_index("ts", inplace=True)

    # 1. Clock drift: align edge time to broker arrival via median offset.
    df["broker_ts"] = pd.to_datetime(df["broker_ts"], unit="ms", utc=True)
    drift_offset = (df["broker_ts"] - df.index).median()
    df.index = df.index + drift_offset
    df.drop(columns=["broker_ts"], inplace=True)
    df.sort_index(inplace=True)

    # 2. Outlier masking with rolling MAD (detection only, set NaN).
    for col in ("value", "current"):
        if col not in df.columns:
            continue
        med = df[col].rolling(window="2s", min_periods=1).median()
        mad = (df[col] - med).abs().rolling(window="2s", min_periods=1).median()
        df.loc[(df[col] - med).abs() > 3.0 * mad, col] = np.nan

    # 3. Gap filling: linear for continuous, forward-fill for discrete state.
    df["value"] = df["value"].interpolate(method="linear", limit=5)
    if "status" in df.columns:
        df["status"] = df["status"].ffill(limit=3)

    # Gaps too wide to fill stay NaN and are dropped, never fabricated.
    df.dropna(subset=["value"], inplace=True)
    persist_to_timeseries(df)

Bounding the interpolation limit is what keeps a genuine machine fault from being smoothed into a fabricated value; the detailed reasoning lives in implementing linear interpolation for missing sensor values. The cleaned frame is then written to a time-series database with a deterministic (asset_id, window_start) key so replays stay idempotent and OEE recomputes identically.

Resilience, dead-letter queues, and diagnostics Permalink to this section

Dead-letter queues are non-negotiable in production. With task_reject_on_worker_lost=True set above, failed tasks land in a dedicated mqtt_ingest_dlq; a separate reconciliation worker inspects them, attempts schema repair, and re-injects valid records so a transient partition or a single malformed payload never permanently corrupts a KPI. Instrument three layers: broker (queue depth, consumer count, unacked message age via the RabbitMQ Management API or a Prometheus exporter), worker (task latency, retry rate, per-process memory), and task (a correlation ID threaded from the on_message callback through to the database write). Correlation IDs turn “a payload vanished” into a traceable path; the soft and hard time limits configured earlier prevent a stuck cleaning task from becoming a zombie that starves queue capacity.

# reconcile.py — drain the DLQ, repair, and re-inject
@app.task
def reconcile_dlq(payloads: list[dict]) -> None:
    for raw in payloads:
        try:
            repaired = repair_schema(raw)          # coerce types, backfill required keys
        except SchemaUnrecoverable:
            logger.error("unrecoverable payload archived: %s", raw.get("asset_id"))
            archive_for_audit(raw)                  # keep for forensics, do not re-queue
            continue
        assemble_and_dispatch.delay([repaired])

The acknowledgment flow that makes this safe is the same one that protects metrics across pod restarts: a batch is acked only after OEE-relevant processing succeeds, and the final factor definitions it feeds are pinned by OEE formula validation so that two engineers reading the same shift get the same number.

Gotchas and anti-patterns Permalink to this section

Doing real work in on_message. Any blocking call inside the MQTT callback stalls keepalive and triggers reconnects; on QoS 1 that means a redelivery storm. Decode, stage, dispatch — nothing else.
Leaving worker_prefetch_multiplier at the default. The default lets each worker reserve a large backlog, so a telemetry burst is buffered in worker memory and OOM-kills the process. Pin it to 1 for bursty streams.
Forgetting task_acks_late. With early acks (the default), a crash during deserialization silently drops the task and the data with it — the most common cause of unexplained OEE drift.
Deduplicating on seq_id alone. Sequence counters reset per asset; without the asset_id in the key, two assets that share a counter cross-contaminate and parts get under- or over-counted.
One Celery task per MQTT message. At thousands of messages per second the task overhead dwarfs the payload work. Always batch at the bridge and validate the whole batch in one task.
Interpolating without a gap limit. Unbounded fill bridges across genuine faults and erases the diagnostic signature predictive-maintenance models depend on.

Quick reference: configuration decisions Permalink to this section

Concern	Setting / choice	Recommended value	Why
Broker	`broker_url`	RabbitMQ (AMQP)	Persistent queues + dead-letter exchanges for guaranteed delivery
Burst safety	`worker_prefetch_multiplier`	`1`	Prevents memory exhaustion on bursty sensor arrays
Delivery safety	`task_acks_late`	`True`	Ack only after successful processing — no silent loss
Eviction safety	`task_reject_on_worker_lost`	`True`	Re-queue in-flight tasks from OOM-killed workers
MQTT QoS	subscription QoS	`1`	At-least-once; pair with `(asset_id, seq_id)` dedup
Batch size	bridge `BATCH_SIZE`	500–1000	Amortizes serialization and DB round-trips
Runaway tasks	`task_soft_time_limit` / `task_time_limit`	`25` / `30` s	Stops stuck cleaning tasks from starving the queue
Failure handling	DLQ routing	`mqtt_ingest_dlq`	Repair-and-replay instead of permanent corruption

Async batch processing — parent overview of windowing, sealing, and in-batch OEE computation
Ingestion & cleaning workflows — the full ingestion and cleaning pipeline this recipe sits in
Implementing linear interpolation for missing sensor values — bounded gap filling for the cleaning stage
Z-score filtering for vibration anomalies — alternative in-task outlier strategy
Best practices for MQTT QoS levels in factory networks — delivery semantics the dedup logic depends on

Using Celery for High-Throughput MQTT Ingestion

Choosing and configuring the broker #Permalink to this section

Bridging paho-mqtt to Celery without losing messages #Permalink to this section

Batched task assembly and sequence validation #Permalink to this section

In-task cleaning before persistence #Permalink to this section

Resilience, dead-letter queues, and diagnostics #Permalink to this section

Gotchas and anti-patterns #Permalink to this section

Quick reference: configuration decisions #Permalink to this section

Related #Permalink to this section