How to Track Manufacturing Downtime and Reduce Unplanned Stops

Published March 28, 2026 · 9 min read

The Real Cost of Downtime

Most manufacturers know that downtime is expensive. Few know how expensive it actually is. When a production line stops, the obvious costs are visible: idle labor, missed shipments, and overtime to catch up. The hidden costs are larger: expedited freight for late orders, customer dissatisfaction, lost future orders, and the ripple effect on downstream operations that were waiting for your output.

A useful rule of thumb: calculate your plant's revenue per operating hour, then multiply by 2 to 3x to account for the indirect costs. If your plant generates $500 per machine-hour in revenue, each hour of unplanned downtime actually costs $1,000 to $1,500 when you include the full impact. For a plant with 10 machines averaging 85% Availability (that is 1.2 hours of downtime per 8-hour shift per machine), the annual cost of downtime is staggering — often exceeding $500,000 per year even for small operations.

The problem is not that manufacturers do not care about downtime. The problem is that they cannot manage what they do not measure. And most plants measure downtime poorly, inconsistently, or not at all.

Manual vs. Automated Tracking

Manual Tracking: The Paper Log

The most common approach in small and mid-size plants is a paper log at each machine or a shared spreadsheet. Operators record when the machine stopped, when it restarted, and (sometimes) why it stopped. This approach has well-documented problems:

Under-reporting. Studies consistently show that manual logs capture only 40-60% of actual downtime events. Short stops (under 5 minutes) are almost never recorded. Operators who are troubleshooting a problem are not simultaneously writing down that they are troubleshooting a problem.
Inaccurate timestamps. Operators record times from memory at the end of an event or at the end of a shift. A 23-minute stop becomes "about 20 minutes." Start and end times are rounded to the nearest 5 or 10 minutes.
Inconsistent categorization. Without a standardized reason code list, one operator writes "jammed," another writes "material jam," and a third writes "feeder stuck." These are the same failure mode but show up as three separate categories in any analysis.
Delayed visibility. Paper logs are not visible to anyone until someone collects them, transcribes them, and enters them into a system. This typically happens the next day at the earliest.

Automated Tracking: PLC-Based Detection

Every modern PLC already knows whether the machine is running. The machine state signal — whether it comes from a dedicated status register, a combination of drive run signals and fault bits, or a PackML state machine — is the most reliable source of downtime data. It does not forget, does not round, does not under-report, and does not require operator effort.

Automated detection captures the start time, end time, and duration of every single stop event, including the 90-second jams and 3-minute material waits that operators never record. Those micro-stops, individually insignificant, often account for 10-20% of total downtime when aggregated across a shift.

The limitation of automated detection is that the PLC knows that the machine stopped, but it may not know why. Some PLCs provide detailed fault codes that map directly to root causes. Others simply report "not running." This is where operator input remains valuable — not to report that the machine stopped (the system already knows), but to assign a reason code after the fact.

Downtime Categories

A well-designed downtime categorization system is hierarchical. The top level separates fundamentally different types of stops. Lower levels provide the detail needed for root cause analysis.

Planned Downtime

Stops that were scheduled and expected. These are excluded from OEE Availability calculations because the equipment was not planned to produce during these periods.

Scheduled maintenance — preventive maintenance, calibration, inspection
No orders — no production demand during this period
Breaks and meals — if the machine cannot run unattended during breaks
Planned changeover — if your organization classifies changeover as planned (see note below)

Unplanned Downtime

Stops that were not scheduled. These directly impact OEE Availability and represent the primary target for improvement.

Equipment breakdown — mechanical failure, electrical fault, pneumatic/hydraulic failure
Changeover and setup — time spent switching between products or jobs (if classified as unplanned)
Material shortage — waiting for raw materials, upstream machine starving the line
Tooling — tool change, tool break, tool calibration
Quality hold — machine stopped for quality investigation or inspection
Operator unavailable — bathroom break, late return from lunch, no operator assigned
Utilities — compressed air loss, power fluctuation, coolant system failure
IT/Network — control system issue, network outage, software error

A Note on Changeover

The classification of changeover time is one of the most debated topics in OEE methodology. Some organizations treat it as planned downtime (excluded from OEE) because changeovers are a necessary part of the production schedule. Others treat it as unplanned downtime (included in OEE) because it represents time the machine is not producing and is therefore an opportunity for improvement via SMED (Single Minute Exchange of Die) techniques.

There is no universally "correct" answer. What matters is consistency: pick one approach and apply it uniformly across all machines. If you exclude changeover from OEE, track it separately so you can still measure and improve it. If you include it, make sure it has its own reason code so it does not get lumped in with breakdowns.

Building a Pareto of Downtime Reasons

The Pareto principle (80/20 rule) is remarkably consistent in downtime analysis. In most plants, 3 to 5 reason codes account for 70-80% of total downtime minutes. A Pareto chart — a bar chart sorted by descending duration with a cumulative percentage line — instantly reveals which problems deserve attention and which are noise.

To build an effective Pareto:

Use duration, not frequency. A fault that occurs 50 times but averages 30 seconds each (25 total minutes) is less impactful than a breakdown that occurs twice but lasts 90 minutes each (180 total minutes). Duration-based Pareto prioritizes the right problems.
Filter by time period and machine. A plant-wide Pareto for the entire quarter is useful for strategic decisions. A single-machine Pareto for last week is useful for the maintenance team planning their Monday morning.
Separate planned from unplanned. Mixing planned and unplanned downtime in the same Pareto obscures the signal. You cannot "fix" a scheduled break. Keep the analysis focused on losses you can actually reduce.
Update regularly. A Pareto built once and pinned to the wall for six months is a poster, not a management tool. The top reasons should be reviewed weekly and action items assigned for the top 2-3.

Why Operators Under-Report Downtime

Understanding why manual tracking fails is important for designing a system that works. Operators under-report downtime for several reasons:

Blame avoidance. If downtime data is used to assign blame rather than improve processes, operators learn to minimize reported stops. This is a management problem, not an operator problem. Downtime data should drive root cause analysis and process improvement, not punitive action.
Effort. Writing down every stop event is tedious, especially when the operator is simultaneously trying to troubleshoot and restart the machine. Automated detection eliminates this effort entirely.
Ambiguity. Operators are not sure what counts as "downtime." Is a 2-minute material reload a stop event? What about waiting 90 seconds for the forklift? Without clear definitions, reporting is inconsistent.
No perceived value. If operators never see what happens with the downtime data they record, they reasonably conclude it does not matter. Closing the feedback loop — showing operators the Pareto chart, sharing improvement results, and crediting them when their input leads to a fix — changes the equation.

Using PLC State Signals for Automatic Detection

The most reliable automated downtime detection uses the machine's PLC state signal, published via MQTT and Sparkplug B to the monitoring platform. The implementation pattern is straightforward:

The PLC exposes a machine state tag (e.g., an integer where 1 = Running, 2 = Idle, 3 = Faulted, 4 = Changeover, etc.)
The MQTT edge gateway publishes state changes to the cloud platform
The platform records every state transition with a millisecond-accurate timestamp
Any period where the machine state is not "Running" during planned production time is automatically classified as a downtime event
If the PLC provides fault codes, these are captured as the initial reason code
For events without PLC fault codes, the platform prompts the operator (or supervisor) to assign a reason code

This hybrid approach gives you the accuracy and completeness of automated detection with the contextual knowledge that only a human operator can provide. The machine knows when it stopped and for how long. The operator knows why.

Calculating the Cost of Downtime

To translate downtime minutes into dollars, you need a cost-per-minute figure for each machine or production line. The basic formula:

Downtime Cost = Downtime Minutes × (Revenue per Minute + Labor Cost per Minute + Overhead per Minute)

For a rough estimate, take the machine's annual revenue contribution, divide by annual operating minutes, and multiply by 1.5 to 2.5 to account for indirect costs (expediting, overtime, customer penalties, etc.). Even a rough cost figure transforms downtime discussions. "We had 47 minutes of downtime on Line 3" is abstract. "We lost $2,800 on Line 3 today due to downtime" gets attention.

From Tracking to Reduction

Tracking downtime is only valuable if it leads to action. The standard improvement cycle:

Measure — automated detection provides accurate, complete data
Categorize — reason codes identify the type of loss
Prioritize — Pareto analysis focuses effort on the top losses
Investigate — root cause analysis (5 Why, fishbone diagram) identifies the underlying problem
Act — implement countermeasures (preventive maintenance, design change, procedure update, training)
Verify — monitor the same Pareto to confirm the problem is reduced

This is not a one-time exercise. It is a weekly rhythm. The top 2-3 Pareto items should have assigned owners and active improvement projects at all times. As the top items are resolved, the next tier moves up and becomes the focus. Over months, this disciplined approach compounds into significant OEE gains.

How PulseMQ Tracks Downtime

PulseMQ auto-detects every downtime event from PLC machine state signals published via MQTT. Every state transition is logged with precise timestamps. The platform distinguishes between planned and unplanned downtime based on your configured shift schedules and maintenance windows.

For unplanned stops, operators can assign reason codes from a configurable list directly on the production dashboard — on a shop floor tablet, their phone, or any browser. PLC fault codes are automatically captured as the initial classification. The AI agent can suggest reason codes based on patterns it has learned from historical data.

Built-in Pareto analysis shows downtime by reason, by machine, by shift, and by time period. Drill down from a plant-wide view to a single machine's downtime history in two clicks. Export data for deeper analysis or integration with your CMMS (Computerized Maintenance Management System).

Because downtime tracking is integrated with OEE calculation, job tracking, and environmental monitoring, you can correlate downtime events with production context. Did the breakdown happen during a specific product run? Was ambient temperature elevated? Was the machine running a new recipe? These correlations are impossible with standalone downtime tracking tools.

Stop Guessing About Downtime

Automatic detection from PLC signals. Operator reason codes. Pareto analysis. Every stop, every machine, every shift — with zero manual data entry.

Schedule a Demo