next up previous index
Next: Troubleshooting Python Up: Tutorial for the urbansim Previous: Number of Agents   Index

Creating a New Model

In most cases, a model would perform some operations on datasets. Opus' only requirements for model classes are:
  1. Being a child class of the opus_core class Model, and
  2. have a method run().
Optionally, it can have a class attribute model_name.

Thus, a following code is a model:

>>> from opus_core.model import Model
>>> from opus_core.logger import logger
>>> class MyModel(Model):
        model_name = "my model"
        def run(self):
            logger.log_status("I'm running!")
            return
Then
>>> MyModel().run()
Running my model (from __main__): started on Tue Mar 28 17:41:04 2006
    I'm running!
Running my model (from __main__): completed..............................0.0 sec

Packages opus_core and urbansim implement several models that can be used as parent classes when developing a new model. The whole model hierarchy is shown in Figure 8.1 in Section 8.4. We give here an example of implementing a new chunk model, making use of the opus_core class ChunkModel (see Section 7.5.2) which automatically processes the model in several chunks.

Suppose we wish to generate a certain number $ n$ of normally distributed random numbers to each agent of a dataset. The mean and variance of the distributions are agent's specific and are given by two existing attributes of the dataset. The model returns an array of the means of the generated numbers for each dataset member. The number of the generated values $ n$ is user defined and thus it will be passed as an argument. Since we expect that both $ n$ and the dataset size can be large, we choose the ChunkModel as the parent class which provides flexibility in saving memory. For this model, we only need to define the method run_chunk() containing the actual computation, since the run() method is defined in the parent class (see Section 7.5.2 for its arguments).

The model can be coded as follows:

>>> from opus_core.chunk_model import ChunkModel
>>> from numpy import apply_along_axis
>>> from numpy.random import normal

>>> class MyChunkModel(ChunkModel):
        model_name = "my chunk model"
        def run_chunk(self, index, dataset, mean_attribute, variance_attribute, n=1):
            mean_values = dataset.get_attribute_by_index(mean_attribute, index)
            variance_values = dataset.get_attribute_by_index(variance_attribute, index)
            def draw_rn (mean_var, n):
                return normal(mean_var[0], mean_var[1], size=n)
            normal_values = apply_along_axis(draw_rn, 0, (mean_values, variance_values), n)
            return normal_values.mean(axis=0)
The first two arguments of run_chunk() are obligatory (determined by the parent class), the remaining ones are application specific. index is an index of members of dataset that are processed in that chunk. The parent class takes care of ``chopping'' the dataset into appropriate chunks. mean_attribute and variance_attribute are names of the dataset attributes that determine the means and variances, respectively. The model extracts the means and variances for dataset members of this chunk, generates a matrix of normally distributed random numbers of size $ n \times$ number of agents in the chunk, and returns the means for each agent.

In order to use this model, we need to create a dataset with the two required attributes for means and variances. In our case, we have a dataset of $ 100,000$ entries. The mean and variance for the first half of the entries is 0 and $ 1$ , respectively. The mean and variance for the second half of the entries is $ 10$ and $ 5$ , respectively.

>>> from numpy import arange, array
>>> from opus_core.storage_factory import StorageFactory
>>> storage = StorageFactory().get_storage('dict_storage')
>>> storage._write_dataset(out_table_name='dataset',
                           values={'id':arange(100000)+1,
                                   'means':array(50000*[0]+50000*[10]),
                                   'variances':array(50000*[1]+50000*[5])
                                  }
                         )
>>> from opus_core.datasets.dataset import Dataset
>>> mydataset = Dataset(in_storage=storage, in_table_name='dataset',
                        id_name='id', dataset_name='mydataset')

Invoking a run of this model in five chunks is done by

>>> from numpy.random import seed
>>> seed(1)
>>> results = MyChunkModel().run(chunk_specification={'nchunks':5},
                                 dataset=mydataset,
                                 mean_attribute="means",
                                 variance_attribute="variances", n=10)
Running my chunk model (from __main__): started on Wed Mar 21 12:00:53 2007
    Total number of individuals: 100000
    ChunkM chunk 1 out of 5.: started on Wed Mar 21 12:00:53 2007
        Number of agents in this chunk: 20000
    ChunkM chunk 1 out of 5.: completed..................................0.7 sec
    ChunkM chunk 2 out of 5.: started on Wed Mar 21 12:00:54 2007
        Number of agents in this chunk: 20000
    ChunkM chunk 2 out of 5.: completed..................................0.7 sec
    ChunkM chunk 3 out of 5.: started on Wed Mar 21 12:00:54 2007
        Number of agents in this chunk: 20000
    ChunkM chunk 3 out of 5.: completed..................................0.7 sec
    ChunkM chunk 4 out of 5.: started on Wed Mar 21 12:00:55 2007
        Number of agents in this chunk: 20000
    ChunkM chunk 4 out of 5.: completed..................................0.7 sec
    ChunkM chunk 5 out of 5.: started on Wed Mar 21 12:00:56 2007
        Number of agents in this chunk: 20000
    ChunkM chunk 5 out of 5.: completed..................................0.7 sec
Running my chunk model (from __main__): completed........................3.7 sec
The run() method expects the first two arguments, the remaining ones are optional from the parent point of view. The first argument specifies the number of chunks (see Section 4.2.3). By playing with different values for nchunks and n one can see how quickly one can run out of memory.

Check the results, e.g. by checking the means of the two halves:

>>> results[0:50000].mean()
0.0010989375305175781
>>> results[50000:].mean()
10.000937499999999


next up previous index
Next: Troubleshooting Python Up: Tutorial for the urbansim Previous: Number of Agents   Index
info (at) urbansim.org