Invasive Species ProblemMajid Alkaee Taleghan, Mark Crowley, Thomas Dietterich Oregon State University 2013 Reinforcement Learning Competition The ecological and economic damage caused by invasive species has increased rapidly around the world. This has created a need for good policies for managing invasive species. This proposed RL domain captures some of the inherent difficulties of performing optimal decision making in spatial domains where there is a spatially spreading process that needs to be controlled. The domain is modelled as a river network with native and invading plant species. The Tamarisk tree (wikipedia.org/wiki/Tamarix) is a major invasive plant species in many parts of the world including the western USA and Australia. It takes over river networks where it outcompetes native species, reduces biodiversity, and consumes large amounts of water. The management goal is to reduce the impact of the invasion and restore the river network to mostly or entirely native plants while minimizing the management cost over time.The management problem can be formalized as an MDP describing the state and dynamics of a treestructured river network. The model is a generalization of one developed by Muneepeerakul et. al, 2007 [Muneepeerakul, R., Weitz, J. S., Levin, S. a, Rinaldo, A., & RodriguezIturbe, I. (2007). A neutral metapopulation model of biodiversity in river networks. Journal of theoretical biology, 245(2), 351–63. doi:10.1016/j.jtbi.2006.10.005] The river network is represented as a tree with E edges. Water starts at the leaves and flows toward the root. Each edge contains H slots ("habitats"), and each slot is in one of three states: empty, occupied by a Tamarisk tree, or occupied by a Native tree. Hence, the MDP has 3^{EH} states. In ecology, each edge is called a "reach".The optimization objective is to minimize the infinite horizon, discounted sum of costs. StateThe state is represented as a list of integers in {1,2,3} of length EH representing the local state of each habitat slot. The numbers are specified as follows:
ActionsIn each reach at each time step, a management action can be taken and applied to the habitats in that reach. The action across all reaches is represented as a list of integers in {1,2,3,4}, one for each reach. The ordering of the reaches corresponds to the order of edges returned by the DiGraph object. The available actions in each reach are:
Both the eradicate and restore parts of an action can fail stochastically, and both have a cost that scales linearly with the number of affected slots. There are 4^{E} total global actions. Both the eradicate and restore parts of an action can fail stochastically, and both have a cost that scales linearly with the number of affected slots. The following actions are not allowed:
DynamicsThe dynamics are defined by stochastic transitions of spreading "propagules". In each time step, each tree dies with some probability. The surviving trees then produce a stochastic number of "propagules" (i.e., seeds). These then disperse according to a spatial process such that downstream spread is much more likely than upstream spread. The propagules that arrive at each empty slot compete stochastically (according to a competition parameter) to determine which one will occupy the site and grow. The dynamics can be represented as a complex dynamic Bayesian network. However, inference in this network is infeasible, so instead we will provide a simulator that makes random draws from the transition distribution. Cost FunctionThe reward function is composed of a cost for performing each action, costs that penalize the level of invasion by Tamarisk and a budget constraint. Specifically, there is a cost for each slot that is occupied by a Tamarisk plant and an additional cost for each reach that has a nonzero number of Tamarisk plants in it. There is also a budget constraint at each time step. The budget limits the number of actions which are possible on a given state. The budget could be any value greater than zero. The default budget value is 100. If the total cost of actions exceeds the budget a large negative penalty will be returned. The penalty could be passed as an input parameter to the environment (default =10000). ObservationsIn the example code in InvasiveExperiment.py the state is represented by a list of integers S using the format as specified in the state section. At each step, based on the current state and for any action, the domain takes those actions and transitions to another state. The returned reward is multiplied by 1 and is the summation of state cost and action cost. If the action cost is greater than the budget or the action is not allowable on the current state then a high negative value will be returned. TaskSpec ParametersThe following parameters could be obtained by using TaskSpec from the related parts in the message:
ParametersWe consider the following parameters for this model.State Definition Parameters
Dynamics ParametersThese dynamics parameters define aspects common to both species
These parameters define separate rates and probabilities for Tamarisk and Native plant species:
Cost Function ParametersEach component of the cost function has an associated parameter:
The following components of the cost function depend on the action being taken and are multiplied by the number of habitat slots being treated by that action.
Other Parameters

Domains >