next up previous index
Next: Disaggregation Up: Using Aggregation and Disaggregation Previous: Using Aggregation and Disaggregation   Index

Aggregation

Suppose we have three different geographical units: gridcells, zones and neighborhoods. We have information available on the gridcell level and would like to aggregate this information for zones and neighborhoods. We know the assignments of gridcells to zones and of zones to neighborhoods.

First, we place the data for three neighborhoods and five zones into a dict storage:

>>> dstorage = StorageFactory().get_storage('dict_storage')
>>> dstorage.write_table(table_name='neighborhoods',
                         table_data={"nbh_id":array([1,2,3])}
                         )
>>> dstorage.write_table(table_name='zones',
                         table_data={"zone_id":array([1,2,3,4,5]),
                                     "nbh_id": array([3,3,1,2,1])}
                         )

Then, we create the corresponding datasets:

>>> neighborhoods = Dataset(in_storage=dstorage, in_table_name='neighborhoods',
                            dataset_name="neighborhood", id_name="nbh_id")
>>> zones = Dataset(in_storage=dstorage, in_table_name='zones',
                    dataset_name="zone", id_name="zone_id")
Note that zones contain assignments to neighborhoods in the attribute `nbh_id'. For the gridcell set, consider the dataset locations defined on page [*]. We add assignments of those nine locations to the zones:
>>> locations.add_primary_attribute(name="zone_id", data=[3,5,2,2,1,1,3,5,3])
Note that any assignment must be done by using an attribute of the same name as the unique identifier of the dataset that the assignment is made to.

As the next step, we prepare a dataset pool for the variable computation, since we are dealing with variables that involve more than one dataset. To make things easy, we explicitly insert our three datasets into the pool:

>>> dataset_pool = DatasetPool(package_order=['urbansim', 'opus_core'],
                               datasets_dict={'gridcell': locations,
                                              'zone': zones, 
                                              'neighborhood':neighborhoods})
An aggregation over one geographical level for the locations attribute `capacity' can be done by:
>>> aggr_var = "aggregated_capacity = zone.aggregate(gridcell.capacity)"
>>> zones.compute_variables(aggr_var, dataset_pool=dataset_pool)
aggregated_capacity = zone.aggregate(gridcell.capacity)..................0.0 sec
array([ 4.,  5.,  4.,  0.,  2.])
By default, the aggregation function applied to the aggregated data is the `sum' function. This can be changed by passing the desired function as second argument in the variable name:
>>> aggr_var = \
"zone.aggregate(urbansim.gridcell.is_near_cbd_if_threshold_is_2, function=maximum)"
>>> zones.compute_variables(aggr_var, dataset_pool=dataset_pool)
zone.aggregate(urbansim.gridcell.is_near_cbd_if_threshold_is_2, function=maximum):
                                                            completed...0.4 sec
array([ 1.,  1.,  0.,  0.,  0.])

The aggregate method accepts the following aggregation functions: sum, mean, variance, standard_deviation, minimum, maximum, center_of_mass. These are functions of the scipy package ndimage.

An aggregation over two or more levels of geography is done by passing a third argument into the aggregate method. It is a list of dataset names over which it is aggregated, excluding datasets for the lowest and highest level. For example, aggregating the gridcell attribute `capacity' for the neighborhood set can be done by:

>>> aggr_var2 = \
   "neighborhood.aggregate(gridcell.capacity, function=sum, intermediates=[zone])"
>>> neighborhoods.compute_variables(aggr_var2, dataset_pool=dataset_pool)
neighborhood.aggregate(gridcell.capacity, function=sum, intermediates=[zone])
    zone.aggregate(gridcell.capacity,function=sum).......................0.0 sec
neighborhood.aggregate(gridcell.capacity, function=sum, intermediates=[zone]):
                                                            completed...0.3 sec
array([ 6.,  0.,  9.])


next up previous index
Next: Disaggregation Up: Using Aggregation and Disaggregation Previous: Using Aggregation and Disaggregation   Index
info (at) urbansim.org