Data Lake as a Powerful Tool for Artificial Intelligence Projects

By Peter Rosiepen, Managing Director at DIMATE. February 6, 2023


Many industries rely heavily on non-destructive testing (NDT) and inspection data to ensure the safety of their assets and operations. A data lake, which is a centralized repository that allows you to store all your structured and unstructured data at any scale, could be a solution for storing and managing this type of data [1]. Having a date lake allows for the centralization of all NDT data and inspection metadata cost-effectively — the storage of large amounts of data comes at a fraction of the cost of traditional storage methods as it eliminates the need for X-ray film, chemicals, paper, or archive rooms. It also reduces pathways because digital data, unlike physical data, can be accessed from almost everywhere.

Artificial Intelligence Projects

One of the main benefits of storing and managing NDT data and inspection metadata in a data lake is that it is a valuable source for artificial intelligence (AI) projects. The large amount of data stored in a data lake can be used to train machine learning models, which can then be used to improve the efficiency of NDT and inspection processes.

One example of how data from a data lake can be used for AI in the petrochemical industry is in the development of predictive maintenance models. Using historical NDT data and inspection metadata, machine learning models can be trained to predict when equipment is likely to fail, allowing organizations to schedule maintenance and repairs proactively, reducing downtime, and increasing the overall efficiency of their operations.

Another example is the use of AI-based image analysis to improve the accuracy and efficiency of radiographic test evaluation (e.g., to determine the residual wall thickness of pipelines or to check for erosion or corrosion). Machine learning models can be trained with historical inspection images to identify defects and anomalies, allowing for the automation of the inspection process, thereby reducing the workload on human inspectors and increasing the overall accuracy of the inspection process.

Data Lake Challenges

Data lakes also come with some challenges: they can be complex to set up and manage, requiring a certain level of technical expertise and specialized tools and resources. Additionally, data lakes can be difficult to secure and require proper data governance and management to ensure data accuracy, consistency, and completeness.

Preventing “Data Swamps”

Although a data lake allows for storing large amounts of raw data in its original format, data standardization is an aspect to consider when implementing a data lake for NDT data and inspection metadata in the petrochemical industry. Since the data formats usually vary based on the different NDT device manufacturers, the data cannot be sent directly to the AI and can end up creating a so-called “data swamp.”

Standardizing the data within the data lake can have a lot of benefits, including better data quality, governance and reusability, increased agility, greater efficiency, and integrity across multiple systems.

ASTM International, formerly known as the American Society for Testing and Materials recommends considering a standardized digital file format Digital Imaging and Communication for Non-Destructive Evaluation (DICONDE), which enables organizations to store data from various NDT methods, such as ultrasonic and radiographic, as well as inspection data from visual inspections and other sources, in a centralized location. DICONDE also ensures data is complete, locatable, unaltered, and has a traceable history.

DICONDE is an open standard for displaying, transmitting, and storing images and digital data from industrial materials testing. It allows signals and images to be exchanged and displayed between different DICONDE-compliant systems. Through this, DICONDE provides a vendor-independent data storage and transmission protocol for non-destructive materials testing. DICONDE is developed by Subcommittee E07.11 of ASTM International, a global standards organization.[2]


In conclusion, data lakes are a powerful solution for storing and managing NDT data and inspection metadata in the petrochemical industry. They allow for the centralization of large amounts of data that can be used as a valuable source of data for AI projects, allowing organizations to improve the efficiency of their operations through automated defect recognition and predictive maintenance. Knowing that data lakes also come with some challenges, it might be reasonable to delegate its set-up to a trustworthy software company.


  • AWS, “What is a data lake?” Amazon Web Services,

Comments and Discussion

There are no comments yet.

Add a Comment

Please log in or register to participate in comments and discussions.

Inspectioneering Journal

Explore over 20 years of articles written by our team of subject matter experts.

Company Directory

Find relevant products, services, and technologies.

Training Solutions

Improve your skills in key mechanical integrity subjects.

Case Studies

Learn from the experience of others in the industry.


Inspectioneering's index of mechanical integrity topics – built by you.

Industry News

Stay up-to-date with the latest inspection and asset integrity management news.


Read short articles and insights authored by industry experts.

Expert Interviews

Inspectioneering's archive of interviews with industry subject matter experts.

Event Calendar

Find upcoming conferences, training sessions, online events, and more.


Downloadable eBooks, Asset Intelligence Reports, checklists, white papers, and more.

Videos & Webinars

Watch educational and informative videos directly related to your profession.


Commonly used asset integrity management and inspection acronyms.