Back to ArticlesReliability

5 Strategies to Reduce Repeat Equipment Failures

Marcus Chen

Principal Reliability Engineer

January 15, 2024
8 min read

Learn proven methodologies to identify bad actors and implement targeted solutions that cut repeat failures by up to 60%. We explore root cause analysis, preventive maintenance optimization, and data-driven decision making.

The Hidden Cost of Repeat Failures

Repeat equipment failures are among the most frustrating and costly problems in industrial operations. When the same pump seal fails every three months, when the same heat exchanger tubes corrode prematurely, or when the same compressor valve breaks down during peak production, the cumulative impact goes far beyond the direct repair costs.

Industry data shows that repeat failures account for 30-40% of total maintenance spend at many industrial facilities. Beyond the direct costs, each failure event triggers a cascade of consequences: emergency mobilization of maintenance crews, unplanned production losses, expedited parts procurement at premium prices, and the hidden cost of diverted attention from planned work.

At Integral Solutions, we have worked with gas plants, compressor stations, and refineries across North America to systematically eliminate repeat failures. The five strategies outlined below represent the core of our field-proven approach.

Strategy 1: Implement a Bad Actor Management Program

A bad actor is any equipment item that consumes a disproportionate share of maintenance resources relative to its criticality and replacement value. In our experience, 10-15% of assets typically account for 50-70% of corrective maintenance costs.

How to Identify Bad Actors:

The first step is rigorous data collection. Gather 12 to 24 months of work order history from your CMMS and calculate total maintenance cost per asset, including labour, materials, and contracted services. Then identify assets with the highest failure frequency and plot them on a cost-versus-criticality matrix.

Assets that fall into the high-cost, high-criticality quadrant are your priority bad actors. But do not ignore the high-cost, low-criticality assets either. These are candidates for redesign, material upgrade, or run-to-failure strategies that can free up significant maintenance resources.

Running the Program:

Establish a monthly bad actor review meeting with representation from operations, maintenance, reliability engineering, and planning. For each bad actor, assign an owner responsible for driving a targeted improvement plan. Track progress using failure frequency trends and cost reduction metrics. We typically see 40-60% reductions in repeat failures on targeted bad actors within the first 12 months.

Strategy 2: Conduct Rigorous Root Cause Failure Analysis

Standard troubleshooting typically addresses the immediate physical cause of failure: the seal failed, the bearing overheated, the pipe corroded. But without understanding why these failures occur, you are destined to repeat them.

The RCFA Framework:

Effective Root Cause Failure Analysis follows a structured methodology. Begin with clear problem definition: what failed, when did it fail, what were the operating conditions at the time, and what was the impact? Then gather physical evidence including failed components, operating data, maintenance history, and process conditions.

Use systematic analysis techniques to work from the physical root cause (what broke) through the human root cause (what action or inaction contributed) to the latent root cause (what systemic issue allowed it to happen). The 5-Why method is effective for simpler failures, while more complex events may require fault tree analysis or Ishikawa diagrams.

The 5-Why Method in Practice:

Consider a recurring centrifugal pump seal failure. Why did the seal fail? Because it ran dry. Why did it run dry? Because the minimum flow recirculation line was blocked. Why was it blocked? Because the check valve was installed backwards after the last turnaround. Why was it installed backwards? Because there was no quality verification step in the installation procedure. Why was there no verification step? Because maintenance procedures had not been updated to include QA checkpoints.

The root cause is not the seal failure. It is a gap in the management of change process for maintenance procedures. Fixing only the seal guarantees recurrence.

Strategy 3: Optimize Your Preventive Maintenance Program

Many facilities operate with preventive maintenance programs that have grown organically over decades, accumulating tasks without systematic review. The result is often a combination of over-maintenance on some assets and under-maintenance on others.

The PM Optimization Process:

Start with a PM audit. Review every preventive maintenance task against the actual failure modes it is intended to prevent. For each task, ask three questions: Is this task technically effective at detecting or preventing the failure mode? Is the frequency appropriate based on actual failure data and Mean Time Between Failures? Is the task the most cost-effective way to manage this failure mode?

Tasks that cannot be linked to a specific failure mode should be challenged. Tasks whose frequency does not align with actual deterioration rates should be adjusted. Tasks that address failure modes better managed through condition monitoring should be converted to predictive maintenance routes.

Frequency Analysis Using MTBF Data:

If a bearing has a Mean Time Between Failures of 18 months, running a vibration route every 6 months provides reasonable lead time for detection. But performing a time-based bearing replacement every 12 months is wasteful and potentially introduces infant mortality failures from unnecessary maintenance interventions.

Our clients typically achieve 15-25% reduction in PM task volume while simultaneously improving equipment reliability through this optimization process.

Strategy 4: Implement Operator-Driven Reliability

Equipment operators interact with assets more frequently than any other group in the facility. They hear changes in sound, feel changes in vibration, notice changes in temperature, and observe changes in process performance. Yet in many organizations, this knowledge is not systematically captured or acted upon.

Building an Effective ODR Program:

Start by training operators on basic equipment care and condition recognition. This includes lubrication fundamentals, visual inspection techniques, and understanding of normal versus abnormal operating parameters. Then establish structured daily and weekly equipment check routines with standardized reporting formats.

The key is making it easy for operators to report observations and ensuring that their reports are acted upon promptly. Nothing kills an ODR program faster than operators who submit deficiency reports that disappear into a maintenance backlog.

Create feedback loops so operators can see the results of their observations. When an operator's early detection of a bearing defect prevents a catastrophic pump failure, make sure the entire crew knows about it. Recognition drives engagement.

Strategy 5: Develop Meaningful KPIs and Track Them Relentlessly

You cannot manage what you do not measure. But the inverse is equally true: measuring the wrong things drives the wrong behaviour. Many facilities track lagging indicators like total maintenance cost or number of breakdowns without connecting them to the leading indicators that drive improvement.

Essential Reliability KPIs:

Track Mean Time Between Failures at the asset and failure mode level, not just as a plant-wide average. Monitor the ratio of planned to unplanned work, targeting 80% or higher planned work. Measure PM and PdM compliance rates to ensure the foundation of your reliability program is solid. Track bad actor reduction trends monthly.

At the organizational level, measure maintenance cost as a percentage of replacement asset value, targeting 2-3% for well-maintained facilities. Monitor overall equipment availability and production losses attributable to equipment failure.

The Review Cadence:

Establish weekly tactical reviews focusing on current work execution, monthly reliability reviews focusing on bad actors and RCFA progress, and quarterly strategic reviews focusing on program effectiveness and resource allocation.

Bringing It All Together

Reducing repeat failures is not about implementing one silver-bullet solution. It requires a systematic, integrated approach combining rigorous analysis, optimized maintenance strategies, engaged operations teams, and disciplined performance management.

The payoff is substantial. Our clients who fully implement these five strategies typically achieve 40-60% reductions in repeat failures, 20-30% reductions in overall maintenance costs, and significant improvements in equipment availability and production output.

The key is starting. Pick your top five bad actors, launch RCFA investigations, and build momentum from there. Small wins compound into transformational results.

RCFABad ActorsPreventive MaintenanceKPIs
MC

Marcus Chen

Principal Reliability Engineer

Expert in industrial reliability and asset management with extensive experience helping facilities optimize their operations and improve equipment performance.

Want to implement these strategies in your facility?

Contact Our Experts
5 Strategies to Reduce Repeat Equipment Failures | Integral Solutions Inc. | Integral Solutions Inc.