Natural Language Processing: The New Frontier

You probably saw the news on the latest digital assistant that can book your next haircut appointment over the phone. And heard about the AI algorithm that can answer eighth grade elementary science questions better than humans. You may have even interacted with a chatbot that can answer your simple banking questions. You are possibly carrying a mobile phone that can translate your sentences to 100 different languages in real time. All these technological achievements are partially fueled by the recent developments in natural language processing (NLP).

While such engineering feats caused some anxiety in people who are concerned that real-life terminators—straight out of the movies—are imminent, i.e. super-intelligent AI systems out to get all humankind, in reality we are far from it. What we are experiencing is so called ‘artificial narrow intelligence’ where we can engineer AI systems that can achieve or surpass human level performance in a single well-defined task. At this level, such AI systems can still provide immeasurable benefits in improving the quality of our lives and be game-changing for companies, creating great financial impact to the bottom line of many industries including oil and gas.

Success of any NLP project starts with finding relevant high-quality data, which is not usually easy to come by in many industries. The hurdles of finding the right and abundant data can be due to regulatory reasons, privacy reasons, copyright reasons, or due to large data debt accumulated over many decades caused by the unstructured and nonuniform nature of it.  As part of digital transformation in oil and gas, the DELFI cognitive E&P environment and industry-wide collaboration at the Open Subsurface Data Universe (OSDU™) Forum enables us to take a large step in the right direction.

Digitization of decades old reports and documents will unleash the potential for many NLP applications that can automate some manual tasks by freeing up much precious time of experts. It will also lead to new ways of discovering data and relevant information to make the right decision for exploration and production. Language models are at the heart of any NLP task, and it is essential that we train our language models on specific data either in geology, geophysics, petrophysics, or drilling and completions, whichever domain we design our NLP solutions in. 

On the algorithmic front, NLP has enjoyed decades of rich history, combining linguistics research with computational methods. Progress made in the past decade is bringing the products to the masses that we are seeing today. Ingeniously neural networks have been introduced into language model training. Later, they created contextualized language models, with deep learning that can train on large amounts of unsupervised open-source data to learn representations of contextualized text. These developments have provided a step change, achieving human-level performance at certain tasks, such as sentiment analysis, question and answering, and machine translation. A recent ICLR conference demonstrated that deep learning is at the forefront and transformer-based language models are now the preferred architecture in research, as well as the preferred method to build many of the recent sophisticated models.

There is a push on the research front to build even larger language models with tens of billions of parameters to achieve state-of-the-art results on some benchmarks, as well as pushing the boundaries with hyperparameter tuning, regularization or simply finding better cost functions. Many internet technology companies are in a race to train even larger models, but these larger models no longer fit in a memory of a single GPU and require 1,000s to train. While such large models can push the performance to human level and beyond, they are usually not practical if anyone wants to deploy them in production or in real time applications. Fortunately, there is also a push enable these models to run on CPUs or stand-alone IoT devices either by distilling, compressing or quantize them to tiny models with comparable performance.

I see great opportunities even at the ‘artificial narrow intelligence’ level where NLP will be successful. Democratizing data and making NLP research mainstream will lower the boundaries, bringing more scientist and engineers into the field.  Unique to our domain, our subsurface reports and documents are almost always combined with measurements and other data modalities.  NLP solutions that operate with multi-modal data will enable us to capture and produce knowledge in a continual-learning framework.  Since we came to the end of this brief blogpost, we can leave these topics to another time and say; NLP is the new frontier in oil and gas.

Author information: From academic research on computational intelligence to semiconductor engineering at IBM, to artificial intelligence research at Schlumberger-Doll Research Center, Zikri has been on the forefront of cutting-edge research and engineering for the last 15 years. Currently, he is bringing natural language processing (NLP) applications to oil and gas industry.

Zikri Bayraktar

Zikri Bayraktar, Ph. D.

AI Research Scientist


Useful Links:


OSDU Data Ecosystem

DELFI Data Ecosystem