For almost a decade now I have been involved in managing innovation centers, innovation programs and a wide range of industry collaborations to envision, develop and bring new solutions based on machine learning (ML) to our customers in petroleum and renewable energy. I work for a company that pioneered artificial intelligence technology in our industry decades ago. With digital solutions such as scalable cloud compute, Internet of Things and realization of the next generation of data ecosystems bringing data at our fingertips, ML finally has the muscles to succeed in the big leagues.
Every day we are seeing promising and encouraging results from proof of concepts (POC) and experimental deployments based on ML across exploration, field development, drilling, production, carbon reduction and renewable energy solutions. At the same time, companies are struggling to take promising experiments to a scale where they can have positive impact across an organization or across multiple companies. This is what we often call “enterprise deployment” or “enterprise grade technology”.
We often see the same mistakes being repeated in many organizations, and we have had our fair share ourselves. I would therefore like to summarize some of the lessons learned from the execution of dozens of projects within our own organization and dozens more through industry collaboration projects. The recommendations may look obvious, but reality is that they are often violated, leading to failure or suboptimal results.
Sounds obvious? Well, turns out it is not. In the quest for getting started on digital, we have seen many companies kick off ML and artificial intelligence (AI) projects without a clear definition of what problem they are going solve or what constitutes success.
We always insist on proper framing of the project by starting with the business goals and the problems statement. We then drill down to high level requirements and success factors. These typically fall in to three buckets: efficiency gains, improved outcomes, and workflow and platform integration. Knowing the goals ensures that we have a business first (or user first or customer first) approach, and not a technology first approach. This also ensures that we look for the best algorithm, the best architecture, and the best tool for the problem at hand, and that we can quickly learn and adjust our approach when we find a better solution.
Wait a second. I just said that proper framing of the project is critical. How do we balance that with maintaining the necessary flexibility and freedom? Turns out this is not so difficult. The project framing should define problem we are solving and how we know when we have succeeded in solving that problem. We should not prescribe a solution. We should not prescribe a specific algorithm, ML methodology or design. This allows the team to have the mandate to quickly try different approaches to select the best one, or to change course when a better idea comes along. For a POC, the best outcome is often the outcome that was not envisioned at the start of the project.
New solutions in ML require a wide range of competencies. It starts with domain expertise, data science expertise and software development expertise. Developing the solution and integrating the solution in wider workflows and digital platforms, requires additional competencies such as user experience design and digital software architecture. Data science expertise is near useless without domain expertise, and domain expertise alone is not enough to make good ML solutions. It is all about putting the right people together.
Can great domain experts become great data scientists? The answer is yes in our experience. ML is being democratized, and there are tools and methods today where domain experts can implement basic ML without programming skills. At any depth, domain scientists usually have the right background and right competency to study and successfully master data science.
For experimentation and POC development, the team should be “pizza sized” to maintain full agility and a sense of urgency. This usually means maximum eight people. For larger efforts, it is still a good rule of thumb that the subprojects or teams should not exceed eight people.
In the energy industry we work with digital representations of a complex reality. We deal with dynamic power grids, complex subsurface geology, challenging logistics and operation in difficult locations. It is therefore important that we test drive our algorithms and ML solutions with realistic data of gradually higher complexity and reach. Access to appropriate data sets should usually be a prerequisite for starting a project in the first place.
For all applications of ML it is important not to underestimate the effort required to locate, quality assure, clean up and prepare datasets for ML. Luckily this is a major focus area for tools and automation and will improve over time.
Data science is all about data driven analysis, letting the data do the talking. However, in the energy industry, the complexity is often high, and decades of investment in research and development has resulted in a sophisticated technology and science baseline. We see enormous potential in ML; however we have seen very little success in purely data driven projects. The real potential comes in combining domain expertise and proven physics models with the power of ML. Data science alongside domain science is bringing step change improvements in performance and outcomes. Areas of research that have previously reached a performance plateau, can now see a performance boost by embedding ML into their approach. Universities have great programs to educate a new generation of data scientists, but it is equally important to train domain experts in data science. This allows for better collaboration and better solutions.
ML and AI is trendy, promising, and exciting. But this should never be viewed as a free pass to develop mediocre science or push unproven solutions. As much as we must be open minded to new approaches and solutions, we must also demand the same rigor for solutions based on ML as we would have done for traditional technologies, before we deploy them to a large user base, buy a commercial solution, or base our critical decisions on the outcome. We recommend rigorous testing and validation, rigorous peer reviews and rigorous validation of scientific papers.
A fundamental challenge with data driven ML approaches is that even the most promising POC prototype may only apply successfully within a very restricted context. This could, for example, be a very specific geographic area with unique and significant characteristics. Sometimes it is enough to train the ML “brain” on relevant new datasets to expand reach, but very often the whole ML structure and learning algorithms must be changed to support multiple contexts and enable wider adoption. This can be time consuming and require significant development. It is important to be aware of this challenge when evaluating early results and planning next steps. In some cases, we may need a combination of multiple algorithms and models to cover a broad range of applications.
Companies, departments and individuals are sometimes very committed to, and protective of, a specific algorithm, approach, application or philosophy. However, silo thinking and a “do it alone” philosophy is usually not a good match for developing new ML solutions in our industry. We recommend working together between individuals, departments, companies and organizations. By joining forces and sharing domain expertise, data and technology development expertise, we can usually do better together. And ML solutions may not be one-size-fits-all. There is a lot of good initiatives out there and using available solutions and collaborating with the industry can help get you to your goals much faster and with better results.
When should you buy and when should you build? In my world, buying takes precedence over building whenever possible. If a readymade solution addressing your business needs is already available on the market, this will save you time, money and effort; a lot of time, a lot of money, and a lot of effort.
So, when should you consider building your own solutions alone or in collaboration with others? Sometimes you need to fill gaps in the technology offering, capture unique competency and experience, and supplement and expand commercial solutions to cover unique challenges. Building also serves to educate your organization in the possibilities and limitations of ML technology that again can feed into improving your digital strategy. Building may be necessary to integrate the ML solution in your workflows and in your digital ecosystem. Finally, data science can be a valuable tool for ad hoc analysis of complex situations or during crisis management.
In the beginning of the digital movement, each oil company, technology company and even new middleware companies started to develop their own proprietary data infrastructure. The industry quickly realized that this was going to slow down progress in digital transformation tremendously, and we now have fast moving standardization initiatives such as OSDU™ (Open Subsurface Data Universe). We recommend building on industry standard platforms to integrate new solutions with industry workflows and technologies, and to maintain full flexibility for the future.
The hard truth for new ML solution is that very often a lot of investment and effort is required to take a solution from POC to widespread adoption. This could involve a series of POC prototypes gradually increasing the reach and coverage of the algorithms, performance enhancements, integration with workflows and platforms, security, usability, validation, deployment and future maintenance and improvements. In each step we give end users hands-on experience, improve the algorithm and build trust in the results. Our own ML solutions have typically been through a series of innovation projects with many energy companies; each bringing something new to the table; and each bringing the solution to a new level. Building on open industry platforms is essential to maintain high development velocity and to ensure integration and future flexibility in a fast moving world
Author information: Morten has worked with software development for Schlumberger in Norway and internationally for more than two decades. In the early years, he rolled out agile software development practices for the company and managed multi-discipline projects with global coordination. He managed the software development centers in Oslo, Stavanger and Bergen dedicated to the development of industry leading technologies such as the Petrel* E&P software platform and the Ocean* open software development framework. He established and led the Software Technology Innovation Center (STIC) in Silicon Valley, California, dedicated to the digital transformation of Schlumberger products and services, before returning to Norway in 2017. In his current role Morten has global responsibility for digital innovation projects in collaboration with key strategic customers to accelerate innovation and adoption of Schlumberger digital technologies.