Unlocking your traditional ML projects with GenAI

Table of Contents

Organizations currently face barriers to advancing their traditional Machine Learning (ML) projects due to the human limitation of not being able to label or organize the large amount of data they have, and also due to the lack of a strategy based on information management. However, generative AI (GenAI) offers important advantages to overcome these limitations, especially in the Oil & Gas sector, where predictive maintenance of failures and anomalies in industrial equipment is increasingly necessary.

Did you know that most artificial intelligence and traditional machine learning projects in the enterprise today fail because of specific barriers to progress? Having a solid AI strategy is a non-negotiable requirement in your industry to achieve competitive advantage and results that truly add value to the business.

According to a study by MITSloan, despite all the use cases that AI and ML can offer to industries, the data shows that 70% of companies say that AI has a minimal impact on their business and that 87% of projects never reach production. These are very worrying figures and show the huge problem of not having a proper AI strategy tailored to each case, as well as the immense missed opportunities. Likewise, 94% of business leaders agree that AI is essential for success in the next five years, but the reality shows that most companies are far behind in this area.

Key barriers to machine learning projects

The main barriers to progress in these projects are:

The ROI is not clear: If we have not defined what the profitability or return on investment objective of each project will be, it is impossible to measure it and therefore evaluate its results.
The data is not usable, due to several factors:

The data that is really needed does not exist.
The data is not available due to confidentiality issues, so the area that needs it cannot have it.
The information is not properly organized or is dirty. This lack of cleanliness and structure undermines the ability to use the data.
Finally, and one of the most important, the data is not tagged.

3. The cost of human labeling is very high: Currently, the process is very slow and expensive. Analysts get annoyed when they have to manually label hundreds of datasets or analyze time series backwards in time. Let’s say we want to correlate the cost of a basic food basket with street blockades or social protests over time, or we want to analyze how the dollar exchange rate has fluctuated in our country and correlate it with historical social conflicts. It would be extremely difficult, extensive and tedious to build a model that would allow us to predict when the next street blockade or social protest will occur, using this information and searching for different sources over time.

What usually happens is that, in human terms, when we build the model, we take it to the data science team, we tell them that we managed to get 200 event records, and they tell us that they need at least 2,000 for it to be meaningful. This leaves us deeply disappointed in the time and effort we put into the project.

In addition, if we look at deep learning models that use synthetic data and simulations, the problem changes radically. For example, the AlphaGo or chess game models, where all the data was synthetic, artificially generated millions of games, many more games than humans have played in their entire lives, meaning that this data can be simulated and analyzed to make predictions.

Why are models using GenAI successful?

The first thing to consider is that they are trained with a huge amount of data, for example a large amount of text, because they use the Internet to train themselves, which facilitates their adoption. At the same time, deep learning models are «hungry» because they need volume above all else, so this is where GenAI can work very well.

At the same time, the ability to use data augmentation techniques does not usually work well yet. Obviously, there is an ethical and trust issue in the data of the models. You cannot tamper with the data or copy and paste datasets because that is like inventing real life situations. It is simply not reliable.

Supervised machine learning needs labeled data: Current techniques that try to use unsupervised ML to get around the lack of labeled data problem usually generate a lot of false positives. For example, if we are monitoring the price of the dollar, and we introduce rules so that the model only recognizes that something is going to happen, every time a strange record or anomaly appears in the series, it will look at what happened, it is used as a proxy in the series. This usually gives a lot of false positives, and the maintenance cost is high.

Therefore, the ability to tackle supervised ML projects requires that the data be tagged, because the investment cost and time depend on it. Otherwise, the project will not be successful, and it is clear that current techniques are not sufficient to solve this labeling problem.

GenAI for Oil & Gas Projects

Today, the energy, oil and hydrocarbon industries need to solve clear problems of predictive maintenance, fault or anomaly detection and asset monitoring. The opportunities and benefits are enormous:

GenAI is a way to speed up and simplify this tagging.

These techniques allow the extraction of structured information from technical documents or reports, which are generally not structured, and can be assembled by the model to understand simple faults and more complex faults or those that are rare and difficult to detect, generating a classification from the data. For example, there is the problem of repairs: every time there is a repair order, a report is generated, but these repairs are not necessarily related to real-time faults. This is often a common problem in energy companies that need to predict these outages.

And the reality is that no one is going to tag years of data on failures or anomalies in industrial plants because it is almost humanly impossible. But with GenAI it can be done because you can scan all that text, ask questions and speed up the labeling process, it is a one-shot. There is no need to check them one by one, it has several advantages, you can run multiple models in parallel at the same time and once everything is labeled, you have the dataset. At the same time, the model can be used as Open Source or OpenAI, since it allows any of these modalities to share information within the company.

In the oil and gas industry, there is a lot of technical documentation related to real-time events that needs to be leveraged.

In these reports there are usually maintenance order records from SAP or spreadsheets with documentation from maintenance management systems, which are always linked to sensor data.

The benefit is that with GenAI, this can now be accelerated and improved. An example of maintenance would be a machine that breaks three things: filter, pump, and transformer. So every time the filter needs to be changed, there is a change order, a specific document has to be provided and the model has to classify the type of failure. Then you give the prompt examples of the type of failure and the use cases for it to classify them. You have to teach the LLM because there is technical jargon that they need to understand.

Obviously, there are few failures and limited cases. What usually happens to us is that you have a failure that is very difficult to detect, very difficult to predict. If you have 150 breakdowns a day, you have a technician to monitor and fix them. The problem is the rare failures or those that are difficult to correlate with other events. So: few events, few failures, they are difficult to predict. The labeling is very complicated. It is a supervised learning problem where the project does not progress because of the lack of labeling.

And this is where GenAI gives us a competitive advantage of success to accelerate our machine learning models for failure prediction because it can extract, classify and analyze a lot of technical information with company records and do it in real time.

The reality is that these are not just failures, there are many use cases that can come from things that are not working well or can be improved or are related to events in real time. Ultimately, GenAI will label what we as humans do not want to label.

Conclusion: 7Puentes as the winning card for your ML project

Do these problems resonate in your company? Do you see these limitations in your industry or in your Oil & Gas project? If you are an executive or a manager who wants to use AI intensively and has not yet been able to achieve success in your projects, contact our specialists who can guide you to implement GenAI and maximize all its possibilities and benefits.

Unleash your ML projects with GenAI now! Contact us