How Machine Learning is Changing the Data Organization Game

By Pinnacle. June 13, 2022

The oil and gas industry faces several data collection, processing, and handling challenges. A large amount of data is being collected through various techniques and processes, but figuring out how to organize and analyze that data can be a struggle. The proper technical analysis of this data can help facilities improve safety, optimize maintenance and inspection tasks, and decrease spending. Machine learning and artificial intelligence (AI) techniques promise improvements for big data storage capabilities and high efficiency of numerical calculations.

Machine learning can strengthen the natural limitations of human subject matter experts (SMEs), resulting in a significantly reduced number of person-hours used for data analysis and organization. Machine learning models can quickly sort through, organize, and clean massive amounts of input data such as temperature, pressure, metallurgy, and stream information enabling facility leaders to make informed organization decisions.

The addition of data science into traditional reliability methods allows models to evolve continually and learn, preventing results from becoming stagnant. Data science and machine learning eliminate the need for humans to code rules that tend to be hard to work with and fragile. One specific application of data science, Natural Language Processing (NLP), can be used to solve many data organization problems.

What is Natural Language Processing?

NLP converts words through logic to enable computers to understand natural language as people do. Whether written or spoken, NLP uses AI to take, analyze, and organize textual inputs in a way the computer can understand. There are two types of machine learning that can be applied to NLP: supervised and unsupervised.

Supervised learning is a machine learning approach defined by its use of labeled datasets. These datasets are set up to train or “supervise” algorithms into classifying data or predicting outcomes accurately. The model can learn over time and measure its accuracy using labeled inputs and outputs. Unsupervised learning uses machine learning algorithms to cluster and analyze unlabeled data sets. These algorithms discover hidden patterns in data without the need for human intervention.

Figure 1. Example of Unsupervised and Supervised Data Distributions

Choosing the right path forward for your situation depends on how your team assesses the structure and volume of your data and the use case. When selecting the approach for your situation, consider the following:

  • Evaluate your input data: Is it labeled or unlabeled data? Do you have personnel to support additional labeling?
  • Define your goals: Do you have a well-defined and recurring problem to solve?
  • Review your options for algorithms: Do the algorithms cover the features, attributes, or characteristics needed? Can they support your data volume and structure?

Though challenging, classifying big data in supervised learning results in a highly accurate and trustworthy product. In contrast, unsupervised learning is more suitable for large volumes of data in real-time, but there can be a lack of transparency into how data is clustered and a higher risk of inaccurate results. NLP does not come without its challenges as well, such as:

  • Misspellings within the data, for example, “stainless” vs. “stanless."
  • Tense differentiators such as “inspect” vs. “inspected”
  • Time – when did the data originate? Were different speech/words used at that time?
  • Handwritten documents vs. typed documents

Overcoming these challenges requires various data pre-processing techniques, such as data-driven spelling correction and stemming/lemmatization. For reading handwritten documents, optical character recognition (OCR) methods are employed to convert the handwritten text into an electronic format suitable for machine processing.

Industry Applications for Natural Language Processing

NLP applies machine learning to all kinds of document-related tasks, like mining U1 forms and quality checks. In addition to being tiresome for humans, these tasks can also be prone to error. Leveraging data science techniques to get a computer to do these things automatically is a significant win on costs and accuracy. NLP can be leveraged for a broad range of applications; examples include:

  • Inspection Grading- NLP can evolve inspection grading by eliminating the need for people to read reports and make subjective judgments about inspection quality. With NLP, you can extract all kinds of information from a report, such as if the inspection was internal or external, the percentage of coverage, what method was used, and grade inspections based on set criteria. This is conducted by supervised machine learning, where a list of terms is provided to flag grade indicators.
  • Quality Checks – NLP and supervised machine learning can audit quality in a large amount of documentation. For example, NLP can be used to conduct quality checks on material intake forms to determine if Positive Material Identification (PMI) is required. In one case, a facility that applied NLP to improve its PMI program reduced the number of documents its employees had to look at from half a million to only a few hundred records.
  • U-1 Form Mining- an NLP tool called OCR can scan both written and typed documents to extract data automatically, eliminating the need to have a person mine data from these documents.
  • Log Analysis- NLP can organize large incident logs in unsupervised machine learning projects. For example, some refineries use NLP to organize logs that cover incidents ranging from security incidents to loss of containment issues by organizing the incoming data and ensuring it’s being sent to the proper stakeholders.


As the industry evolves and facilities start to collect more and more data, the methodology behind data organization must also adapt. Data science and machine learning open many doors for reducing person-hours and making daunting tasks more manageable.

To dive more thoroughly into other applications of machine learning in reliability, check out the webinar Combining Subject Matter Expertise and Data Science to Optimize CML Inspection.

Comments and Discussion

Posted by Dustin A on June 13, 2022
This is a fascinating look at the future of AI in... Log in or register to read the rest of this comment.

Add a Comment

Please log in or register to participate in comments and discussions.

Inspectioneering Journal

Explore over 20 years of articles written by our team of subject matter experts.

Company Directory

Find relevant products, services, and technologies.

Training Solutions

Improve your skills in key mechanical integrity subjects.

Case Studies

Learn from the experience of others in the industry.


Inspectioneering's index of mechanical integrity topics – built by you.

Industry News

Stay up-to-date with the latest inspection and asset integrity management news.


Read short articles and insights authored by industry experts.

Expert Interviews

Inspectioneering's archive of interviews with industry subject matter experts.

Event Calendar

Find upcoming conferences, training sessions, online events, and more.


Downloadable eBooks, Asset Intelligence Reports, checklists, white papers, and more.

Videos & Webinars

Watch educational and informative videos directly related to your profession.


Commonly used asset integrity management and inspection acronyms.