next up previous index
Next: Allocation Models Up: Model Types in OPUS Previous: Simple Models   Index

Sampling Models

The second type of model template is a Sampling Model. This generic model takes a probability (a rate), compares it to a random number, and if the random number is larger than the given probability (rate), it assigns the outcome as being chosen. Some examples will make the use of this model clearer. Say we want to build a household evolution model. We need to deal with aging, which we can do with a Simple Model. We also models that predict:

$ \bullet$
Births
$ \bullet$
Deaths
$ \bullet$
Children leaving home as they age
$ \bullet$
Divorces
$ \bullet$
Entering the labor market
$ \bullet$
Retiring

For all of these examples - assuming that we want to base our predictions on expected rates that vary by person or household attributes - we need a more sophisticated model that we shall call a Sampling Model. Since we need to assign a tangible outcome rather than a probability, we use a sampling method to assign the outcome in proportion to the probability. This method is also occasionally referred to as a Monte Carlo Sampling algorithm.

The algorithm is simple. Let's say we have a probability of a coin toss, heads or tails each having a probability of 0.5. A sampling model to predict an outcome attribute of Heads, would take the expected probability of a fair coint toss resulting in an outcome of Heads as being 0.5. We then draw a random number from a univariate distribution between 0 and 1, and compare it to the expected probability. If the random draw is greater than the expected probability, then we set the choice outcome to Heads. If it is less than 0.5, then we set the choice outcome to Tails. Since we are drawing from a univariate random distribution between 0 and 1, we would expect that around half of the draws would be less than 0.5 and half would be greater than this value. Larger numbers of draws will tend to converge towards the expected probability by the law of large numbers. A very large number of draws should match the expected probability to a very high degree of precision.

To make the model useful for practical applications, we can add a means to apply different probabilities to different subsets of the data. For example, death rates or birth rates vary by gender, age, and race/ethnicity, and to some extent by income. We might want to stratify our probabilities by one or more of these attributes, and then use the sampling model to sample outcomes using the expected probabilities for each subgroup.

The Sampling Model takes the following arguments:

$ \bullet$
Outcome Dataset: the name of the dataset to receive the predicted values
$ \bullet$
Outcome Attribute: the name of the attribute to contain the predicted outcomes
$ \bullet$
Probability Dataset: the name of the dataset containing the probabilities
$ \bullet$
Probability Attribute: the name of the attribute containing the probability values (or rates)
$ \bullet$
List of Classification Attributes: attributes of Outcome Dataset that will be used to index different Probabilities (e.g. age and income in household relocation)

next up previous index
Next: Allocation Models Up: Model Types in OPUS Previous: Simple Models   Index
info (at) urbansim.org