Domains‎ > ‎

Invasive species

Invasive Species Problem

Majid Alkaee Taleghan, Mark Crowley, Thomas Dietterich
Oregon State University
2013 Reinforcement Learning Competition

The ecological and economic damage caused by invasive species has increased rapidly around the world. This has created a need for good policies for managing invasive species.

5-reach structure
This proposed RL domain captures some of the inherent difficulties of performing optimal decision making in spatial domains where there is a spatially spreading process that needs to be controlled. The domain is modelled as a river network with native and invading plant species. The Tamarisk tree ( is a major invasive plant species in many parts of the world including the western USA and Australia. It takes over river networks where it out-competes native species, reduces biodiversity, and consumes large amounts of water. The management goal is to reduce the impact of the invasion and restore the river network to mostly or entirely native plants while minimizing the management cost over time.

The management problem can be formalized as an MDP describing the state and dynamics of a tree-structured river network. The model is a generalization of one developed by Muneepeerakul et. al, 2007 [Muneepeerakul, R., Weitz, J. S., Levin, S. a, Rinaldo, A., & Rodriguez-Iturbe, I. (2007). A neutral metapopulation model of biodiversity in river networks. Journal of theoretical biology, 245(2), 351–63. doi:10.1016/j.jtbi.2006.10.005]

The river network is represented as a tree with E edges. Water starts at the leaves and flows toward the root. Each edge contains H slots ("habitats"), and each slot is in one of three states: empty, occupied by a Tamarisk tree, or occupied by a Native tree. Hence, the MDP has 3EH states.  In ecology, each edge is called a "reach".

The optimization objective is to minimize the infinite horizon, discounted sum of costs.


The state is represented as a list of integers in {1,2,3} of length EH representing the local state of each habitat slot. The numbers are specified as follows:

  1. The slot is occupied with Tamarisk plants.
  2. The slot is occupied with Native plants.
  3. The slot is empty
There are 3EH total possible states. The graph structure of the river network can be specified as a python DiGraph object, if one is not specified then a random one can be generated by calling InvasiveUtility.createRandomGraph(E+1). By default this will create a random, unbalanced tree containing the given number of reaches. The integers in the state vector are listed in order of the reach and the habitat in that reach. The ordering of the reaches in the state vector (and the action vector) corresponds to the order of edges returned by the DiGraph object. The edges of the graph can be obtained by using TaskSpec.


In each reach at each time step, a management action can be taken and applied to the habitats in that reach. The action across all reaches is represented as a list of integers in {1,2,3,4}, one for each reach. The ordering of the reaches corresponds to the order of edges returned by the DiGraph object. The available actions in each reach are:

  1. null : do nothing
  2. eradicate : eradicate Tamarisk plants in all slots in the reach
  3. restore : place Native plants in all empty slots in the reach
  4. eradicate+restore : Eradicate Tamarisk and place Native plants in all empty slots

Both the eradicate and restore parts of an action can fail stochastically, and both have a cost that scales linearly with the number of affected slots. There are 4E total global actions. Both the eradicate and restore parts of an action can fail stochastically, and both have a cost that scales linearly with the number of affected slots.

The following actions are not allowed:

  • Doing any action other than null on a reach full of Native plants
  • Eradicating an empty slot in an empty reach
  • Restoring a slot in a fully invaded reach (Tamarisk plants present in all habitats)


The dynamics are defined by stochastic transitions of spreading "propagules". In each time step, each tree dies with some probability. The surviving trees then produce a stochastic number of "propagules" (i.e., seeds). These then disperse according to a spatial process such that downstream spread is much more likely than upstream spread. The propagules that arrive at each empty slot compete stochastically (according to a competition parameter) to determine which one will occupy the site and grow. The dynamics can be represented as a complex dynamic Bayesian network. However, inference in this network is infeasible, so instead we will provide a simulator that makes random draws from the transition distribution.

Cost Function

The reward function is composed of a cost for performing each action, costs that penalize the level of invasion by Tamarisk and a budget constraint. Specifically, there is a cost for each slot that is occupied by a Tamarisk plant and an additional cost for each reach that has a non-zero number of Tamarisk plants in it. There is also a budget constraint at each time step. The budget limits the number of actions which are possible on a given state. The budget could be any value greater than zero. The default budget value is 100. If the total cost of actions exceeds the budget a large negative penalty will be returned. The penalty could be passed as an input parameter to the environment (default =-10000).


In the example code in the state is represented by a list of integers S using the format as specified in the state section. At each step, based on the current state and for any action, the domain takes those actions and transitions to another state. The returned reward is multiplied by -1 and is the summation of state cost and action cost. If the action cost is greater than the budget or the action is not allowable on the current state then a high negative value will be returned.

TaskSpec Parameters

The following parameters could be obtained by using TaskSpec from the related parts in the message:

  • Number of reaches, related TaskSpec message: "ACTIONS INTS (number of reaches value 1 4)"
  • Habitat size, related TaskSpec message: "OBSERVATIONS INTS (number of reaches value * habitat size value 1 3)"
  • Penalty value for exceeding budget or invalid action, related TaskSpec message: "REWARDS (bad action penalty value and maximum cost value)"
  • Discount factor, related TaskSpec message: "DISCOUNTFACTOR discount factor value"
  • The edges of the graph, related TaskSpec message: "EXTRA the edges"


We consider the following parameters for this model.

State Definition Parameters

  • Habitat size (default=4)
  • Number of reaches (default=7)

Dynamics Parameters

These dynamics parameters define aspects common to both species

  • Eradication rate (default=0.85)
  • Restoration rate (default=0.65)
  • Downstream spread rate (default=0.5)
  • Upstream spread rates (default=0.1)

These parameters define separate rates and probabilities for Tamarisk and Native plant species:

  • Death rate: array of length 2 [Native plant death rate, Tamarisk plant death rate] (default=[0.2,0.2])
  • Production rate: array of length 2 [Native plant death rate, Tamarisk plant death rate] (default=[200,200])
  • Exogenous arrival indicator, it could be On or Off (default=On)
  • Exogenous arrival probability, matrix of size (E x 2), an item in row e and the first (second) column gives the probability of Native (Tamarisk) plant arriving at reach e from outside the network at the current timestep
  • Exogenous arrival number, matrix of size (E x 2). If there are arrivals of Native (Tamarisk) plants to reach e from outside the network at the current timestep then an item in row e and the first (second) column specifies the number of arrivals at reach e

Cost Function Parameters

Each component of the cost function has an associated parameter:

  • Cost per invaded reach  (default=10)
  • Cost per tree  (default=0.1)
  • Cost per empty slot (default=0.05)
  • Eradication cost  (default=0.5)
  • Restoration cost (default=0.9)

The following components of the cost function depend on the action being taken and are multiplied by the number of habitat slots being treated by that action.

  • Variable eradication cost  (default=0.4)
  • Variable restoration cost for empty slot (default=0.4)
  • Variable restoration cost for invaded slot (default=0.8)

Other Parameters

  • Discount factor (default=0.9)
  • Seed for random generator (Optional, default=1)
Guangliang Li,
Apr 10, 2013, 5:24 AM
Nikolaos Tziortziotis,
May 6, 2013, 1:10 AM