Inspectioneering Journal

The Fourth Maintenance Revolution

By Barry Snider at Small Hammer Incorporated. This article appears in the September/October 2011 issue of Inspectioneering Journal

Maintenance has been around since prehistoric man fixed a broken, trusted spear instead of fashioning a new one. One concept of maintenance is any activity that extends the useful life or enhances the performance of an item of interest. A broad concept for sure but for most of recorded history, maintenance has been synonymous with fixing or repairing. Maintaining an item to prevent a loss or failure was reserved for only a few things such as artworks, whiskey, and books. Until James Watt's steam engine, machines were rather simple items with wheeled carriages, pendulum clocks, and muzzle-loading weapons representing the peak of sophistication. Titles for the true maintenance technicians of their day were blacksmith, tailor, carpenter, and cobbler who represented extensions of their manufacturing trade. Along came the industrial revolution and mankind become reliant on machines and therefore a much more robust strategy for maintaining the machines' performance. To keep up with the growing demand for maintaining the machines, a new craft was developed whose sole purpose was to perform maintenance.

First Maintenance Revolution - In the early decades of the industrial revolution, machines were primarily maintained by the operator. As machines broke, they were fixed by any means available but often were discarded and replaced. Maintenance became a separately defined activity during the Great Depression as there was no money to buy new equipment so clever mechanics, under the title of "repair man", learned how to fix and repair what was broken. During WWII, maintenance became an independent function and for the first time, a "Maintenance Mechanic" was called upon to fix, repair, and replace war machines and manufacturing equipment. As experiences grew, such techniques as preventive maintenance and scheduled overhauls became the norm. Despite the tremendous increase in maintenance activity there were two primary causes of equipment failures that were not addressed. These two causes were operation outside the design limitations of the equipment and human factors which included design, manufacture, installation, operation, and maintenance activities.

Second Maintenance Revolution - In the late 1950's and early 1960's, two processes, Total Productive Maintenance (TPM) and Reliability-Centered Maintenance (RCM), were created in the Japanese auto industry and the commercial airline industry. These processes emphasized the idea that maintenance was part of production and should be included in the scheduling and planning. No longer should maintenance be seen as only a cost but should offer value to the overall manufacturing strategy. TPM and RCM soon gravitated to other industries over the following three decades. In the oil and gas industry, RCM became the dominant maintenance strategy.

RCM is a structured process utilizing failure modes and effects analysis (FMEA) plus an estimation of failure rates to create a maintenance strategy to improve reliability. Later enhancements to the process included concepts such as criticality, maintenance effectiveness, and maintenance optimization that produced strategies that incorporated the cost of maintenance in the decisions of "what to do" and "when to do it". The use of computerized maintenance management systems (CMMS) became the norm during this period. The reliability and availability of equipment drastically increased but there is still a debate whether this was due to RCM or from improvements in equipment designs and automation using distributed control systems (DCS) and programmable logic controllers (PLC). Despite all of these advancements, the primary causes of failures continued to be operation outside the design limitations and human factors, which were still not addressed.

Third Maintenance Revolution - In the 1990's, maintenance strategies began to use the concept of risk to decide when to perform maintenance and inspections. Risk is the combination of probability of failure with the consequence of failure often expressed as a product:

Risk = Probability X Consequence

Risk had commonly been used in developing safety and environmental programs but had not been used for making maintenance or operating decisions.

Other new concepts that became hallmarks of this period were the refined technologies for determining failure mechanisms and the use of root cause failure analysis (RCFA). The use of risk, failure mechanisms and RCFA led to processes such as Risk Based Inspection14 (RBI), Predictive Maintenance, Condition Based Maintenance, and Life Cycle Costs. Once again, there were vast improvements in how equipment, now called assets, were managed but as before, the two major causes of failures, operation outside design limitations and human factors, were still not addressed.

Fourth Maintenance Revolution - This revolution has only just begun in the oil, gas, and process industries as well as many other manufacturing sectors. The maintenance strategies during this next revolution will include all of the advancements previously developed and also fully address the two major failure causes. The question begs for an answer: Why has it taken so long to tackle the two main causes of equipment failures?

The answer is in three parts:

  1. Back in the 1930's, when maintenance became a separate function from operation, the incentive of the operators changed. Operators were directed by production managers to push the equipment to full production and maximum throughput. "Do not shut down the equipment unless absolutely necessary" ... was a common directive from managers and military officers all the way up the organization. This put the equipment under tremendous stress which led to failures and high maintenance costs. But the operators were not penalized for the high maintenance costs, maintenance was. The constant financial pressure from the same managers was placed upon maintenance to cut costs. Maintenance was seen as a huge liability. But maintenance could only work on the equipment after it had failed and had little i nfluence on the processes or operation which were over stressing the machines by operating outside the design limitations.
  2. The second part in answer to the question addresses human factors. To overcome the persistent failures caused by the human factors there needed to be knowledge of psychological and communication techniques to influence behavior. Most managers in industry did not have the knowledge and skills to do this effectively. The oil, gas, and process industries demanded higher reliability and productivity. Instead of addressing the real problem of human behavior, they embraced automation and process controls that effectively removed many of the human interactions i n the operation. Reliability and productivity did improve but there were still present many opportunities for the human factors to design, manufacture, install, operate and maintain the equipment.... incorrectly.
  3. The third part of the answer involves management of change. Changes of all types, shapes, and sizes were occurring too rapidly in all industries to be able to address these two fundamental failure causes. New equipment designs, new processes, new complex operating procedures, new technologies requiring computer programmers and electronic technicians, and the ever changing workforce were just not manageable using the techniques of RCM, RBI, TPM, etc. Some industries became good at managing change such as nuclear power generation. In this industry, change was limited. Following the incident at Three Mile Island9, nuclear power plants were highly automated, operated in a steady state mode, and the training regiment for operators and maintenance was the most intense of any industry. The management-of-change procedures for nuclear sites required multi-level reviews and approvals, the application of risk models, full testing protocols, modification of procedures, updating of manuals and drawings, and operator training using simulators. Other industries attempted similar rigorous management programs, such as Process Safety Management11 (PSM) and Safety Case12, but only in areas of safety and environment. These programs are not truly applicable for designing equipment maintenance strategies.

The Fourth Maintenance Revolution will include the techniques and strategies to monitor process changes that often place the operation of equipment outside the design limitations. The monitoring will be performed using technology but also by a combined effort of operators, maintenance, and engineering personnel. No longer will the tasks loaded into the CMMS be only for maintenance and inspections. There will be tasks for process engineers, reliability engineers, rotating equipment engineers, instrument and control engineers, as well as other disciplines necessary to actually prevent failures, not just detect the onset of failures.

Consider this, if the process is maintained within the design operating limits of the equipment and all human interaction with the process is held to a level where no incorrect actions are taken, how many failures will be prevented? The answer is, most of them. Studies show that failures as a result of these two causes account for 80% to 90%13,14 of all failures. What were the primary causes of the Deepwater Horizon4 explosion and environmental catastrophe? What were the primary causes of the Texas City Refinery5 explosion? What were the main causes of Piper Alpha6, Grangemouth7, Phillips Chemical8, Three Mile Island9, Chernobyl10, and most other high profile failures? Most can be traced to a combination of the two main causes of failures stated above.

Equipment Maintenance Strategies must continue to utilize the beneficial activities of predictive, preventive, condition-based, failure finding, and risk based inspection that have added real value to production facilities. In addition, there must be a drastic increase in the prevention of process deviations outside the design limits of the equipment. There must also be a huge effort to increase the consistent and correct human actions in all aspects of the design, engineering, manufacturing, construction, installation, operation and maintenance of production processes and equipment.

A new terminology, Asset Integrity Management (AIM), will replace Equipment Maintenance and will make use of the PSM and Safety Case Models. In the new AIM model, risk will be featured for making decisions and the entire operating environment, production profile, facility condition, global market, AND organization will be considered when designing a strategy for managing assets. The concepts of situational awareness13 and normalization of deviance15 will be heavily analyzed and addressed when attempting to mitigate risks and prevent failures. Extensive use of Risk Modeling, Root Cause Failure Analysis, and Defect Elimination will dominate the workplace somewhat relegating the notions of reliability, availability, and maintainability (RAM) to minor roles in the decision making process. (Let's face it, the RAM model didn't play a significant role in the decision making process anyway.)

The major changes will occur not in the improvement of maintenance techniques to monitor equipment, stock spare parts, or track mean time between failures (MTBF), but in the management of human behavior. The training, coaching, practice, and simulation techniques utilized by the nuclear power industry, commercial airline pilots, race car pit crews, and even professional athletes will be incorporated into the overall asset integrity management that will make a step change improvement in the prevention and reduction of high risk failures.


  1. “James Watt’s steam engine”, H. W. Dickinson and Hugh Pembroke Vowles James Watt and the Industrial Revolution (published in 1943, new edition 1948 and reprinted in 1949. Also published in Spanish and Portuguese (1944) by the British Council)
  2. “TPM”, Total Productive Maintenance by Jack Roberts, Ph.D. Department of Industrial and Engineering Technology, Texas A&M University- Commerce, 1997
  3. “RCM”, Reliability Centered Maintenance: Gateway to World Class Maintenance by Anthony M. Smith, 1993.
  4. “Deepwater Horizon”, blowout-happened-2
  5. “Texas City Refinery”,
  6. “Piper Alpha”,
  7. “Grangemouth”,
  8. “Phillips Chemical”,
  9. “Three Mile Island”,
  10. “Chernobyl”,
  11. “Process Safety Management”,, 2000
  12. “Safety Case”, Safety cases and safety reports: meaning, motivation and management, Richard Maguire, 2006
  13. “Situational Awareness and Human Error: Designing to Support Human Performance”, Mica R. Endsley, Ph D, SA Technologies, Inc., Proceedings of the High Consequence Systems Surety Conference, 1999.
  14. “Human error is involved in over 90% of all accidents and injuries in a workplace”,
  15. “Normalization of Deviance”, The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, Author: Diane Vaughan, Publisher: University of Chicago Press, Chicago and London, 1996

Comments and Discussion

There are no comments yet.

Add a Comment

Please log in or register to participate in comments and discussions.

Inspectioneering Journal

Explore over 20 years of articles written by our team of subject matter experts.

Company Directory

Find relevant products, services, and technologies.

Job Postings

Discover job opportunities that match your skillset.

Event Calendar

Find upcoming conferences, training sessions, online events, and more.

Industry News

Stay up-to-date with the latest inspection and asset integrity management news.


Read short articles and insights authored by industry experts.

Asset Intelligence Reports

Download brief primers on various asset integrity management topics.


Watch educational and informative videos directly related to your profession.