Onward - A Platform for Change

adrien-converse-FWhTeWRCeis-unsplash (1).jpg

Simulation

Geoscience

Agent-based Model

Completed

Go with the Flow

$15,000

Completed 194 weeks ago

0 team

Modeling the flow of fluids in the subsurface is a challenging but essential task for geoscientists. Oil and gas are generated from deeply buried source rocks and then migrate up through the subsurface. Many factors can affect the migration paths of hydrocarbon molecules, such as the characteristics of the rocks (lithology), rock quality, and more.

Prospects are evaluated on the likelihood of hydrocarbons successfully migrating into an area of interest. Migration studies are currently either done by hand on 2D seismic sections or on slow, bulky basin modeling software. Both of these solutions are incomplete as they are ad hoc and leave a poor record of the inputs. A dynamic, fast, and 3D model is needed.

The goal of this challenge is to build an agent-based model that mimics hydrocarbon migration through the subsurface.

Agent-based models haven’t been extensively used to understand hydrocarbon migration. These models have only a few simple rules, but when applied to multiple agents, they can generate complex behavior. For example, complex fish schooling behavior can be modeled using only three rules. Hydrocarbon migration can be simplified to just two rules: buoyancy and permeability. Note that the journey of the hydrocarbon molecules is just as important as where they are trapped.

Rules for Agents

To best match the behavior of hydrocarbon molecules, agents must follow these rules:

Agents move up and sideways in a 3D environment (Figure 1). Agents can move up, sideways, and diagonally. Agents can not move down or diagonally down. The environment contains different values in different arrangements that mimic real world geology.
Each turn, each agent moves one space. The agent decides which space to move to by adding its move preference array (Figure 2) to the value array of the surrounding environment. The agent then selects the highest value from these summed arrays and moves.
10% of the time the agent moves randomly to an adjacent space. See Figure 3 for the move probability matrix.
Only one agent can exist in a space at any point in time.
Each turn in the simulation 100 agents are born. 75 agents are born randomly in the bottom layer of the environment (maximum value in the z direction). 25 agents are born in a uniform random distribution throughout the environment.
Agents die when they reach the top of the environment (minimum value in the z direction). Agents also die if they can’t move during a turn. Dead agents occupy their final space.
The output for this competition is a CSV file with the x, y, z location of all the agents (alive and dead) at the end of 2,000 turns.

These rules are not comprehensive. Contestants are encouraged to add their own rules or change the parameters above to enhance the score and performance of their simulation. However, for scoring purposes the number of agents born each turn must remain at 100.

Figure 1: Dimensions of environment.

Figure 2: Move preferences for Agents. Agent is in the center of B slice.

Figure 3: Move probabilities for Agents. Agent is in the center of B slice.

Example of Simulation

To illustrate how agents should move through the environment we’ve created a diagram for a 2D example (Figure 4). Agents A and B start in different spaces in an environment where orange spaces have a higher value, blue spaces a lower value. The environment has a geologic fold that focuses the agents towards its crest. Agent A moves up until it reaches the blue spaces and then moves diagonally up the fold to stay in the higher value orange spaces. Agent B starts in blue spaces and moves up at first as dictated by the move preference matrix (Figure 2, slice B only). At Turn 6, Agent B reaches the crest of the fold first and at Turn 7 moves laterally at the crest. For Turn 8, both agents are forced to move up into lower blue values because they can’t move into a space already occupied.

Figure 4: A 2D example of agents moving in an environment.

Evaluation & Judging Criteria

During the challenge, a quantitative score will be used to populate the leader board. We will score submissions by deriving a similarity score against our answer key. Our answer key consists of a set of points in a three-dimensional space, and a perfect submission should report all of those same points. Our scoring algorithm will derive the similarity score by computing a root mean squared error between each point in the submission against its nearest neighbor in the answer key, accruing the total error across all points in the submission.

A lower score is considered more successful.

At the end of the challenge, the top 10 submissions will be judged by quantitative and qualitative criteria. Qualitative criteria are being introduced to ensure that the behavior of the agents matches geological assumptions and that submissions can be easily deployed afterwards. The top 10 submissions will be run on three, blind test volumes. The final score will be calculated based on the following criteria:

Screen Shot 2020-06-08 at 4.03.40 PM.png

Similarity

This will be the same method as the leader board during the competition. The score will be the average of the similarity scores from the blind test volumes. Top 20% of scores will receive full points, top 20-40% get 25 points, top 40-60% 20 points, and bottom 40% get 15 points.

Tracking

The path of an agent is just as important as its final location. Along with each simulation, a separate NumPy file should be generated that records the location of each agent every turn. The format for this file should be an array with the shape (200000, 2000, 3), which map to these parameters (Agent, Turn, Position). Contestants are free to change the format of this tracking feature as long as they provide documentation of their new method.

For the assessment of this criteria, a panel of three geoscientist judges will analyze how geological agents move through the environment. Factors considered include how agents react to tilt or breaks of units in the environment. Each judge will award submissions either 30 points for accurate interaction, 20 points for fair interaction, and 10 points for poor match to the environment. The final score will be the average of the judges scores.

Speed

In deployment, this simulation will be run on large volumes so run time efficiency is critical. This score will be the average run time on three blind test volumes. The standard for judging run time will be a SageMaker ml.p2.xlarge instance. You can learn more about the specifications for this instance here: https://aws.amazon.com/sagemaker/pricing/instance-types/. Top 20% will receive full points, top 20-40% get 18 points, top 40-60% get 16 points, and bottom 40% get 14 points.

Configurable script

To maximize the utility in deployment, scripts need to have configurable parameters instead of being hard coded. Configuration can be a Jupyter cell of parameters set at the beginning of the notebook or clearly labeled within a function. The following parameters and their baselines should be included:

Size of environment - (300,300,1250)
Number of turns - 2000
Number of agents born each turn - 100
Ratio of agent birth locations (i.e. base of environment vs. random) - 0.75
Number of agents tracked - 100%
Move preference matrix - (3,3,3) array with weights, see Figure 1
Stochastic move matrix - (3,3,3) array with weights, see Figure 2

Full points will be given if all seven parameters are present.

Visualization

A visualization of these simulations will be necessary for deployment and we would like to see what others can build in this space. A visualization should include:

A 3D plot
Shows geologic environment
Shows location of agents at end of simulation
Shows paths of tracked agents.

PyVista and Ipyvolume are two modules that can handle such a visualization. Eight points will be given if these features are present. Ten points will be given if visualization is animated.

At the end of judging, final scores will be placed on the Xeek challenge page. If a tie occurs then the judge panel will break the tie by judging evaluating the level of code documentation.

Submission Process

Participants will be allowed to submit up to 5 predictions per day. Participants will submit a CSV file for real-time evaluation. The file describes the x, y, z function of each agent at the end of simulation. A helper function is provided to generate the appropriate format.

Upon submission, the filed will be scored based on the sum RSME of each row. The scoring equation is as follows:

Screen Shot 2020-07-13 at 5.39.30 PM.png

This score will be used to populate the leaderboard, and only a participant's best score will appear on the leaderboard.

For final evaluation, contestants will need to submit:

Requirements file. Contestants code will not be run without a separate requirements file.
A single Jupyter Notebook containing code. Please use markdown to denote code belonging to the different criteria used for final judging (see criteria above).

Only one .ipynb file will be considered per participant. In the case that a participant uploads multiple .ipynb files, only the most recent will be considered for the final leaderboard.