As mentioned in Section 6.1, a dataset has a set of attributes, such as income or persons, that are stored in a file or database. We call such characteristics primary attributes. In addition, one is usually interested in attributes that are computed, for example using some transformation of primary attributes. We call those attributes variables, or computed attributes. They are simply handled as additional ``columns'' of a dataset to which they belong to, here denoted as ``parent dataset''.
In Opus, a variable is a class derived from the opus_core class Variable . (Section 7.4 gives additional details about this class.) Its name is identical to the name of the module in which it is implemented. The module is stored in a directory whose name corresponds to the name of the parent dataset. Note that the variable name must be all lower case.
The variable class must have a method compute()
that returns a numpy array of
variable values. The size of that array must correspond to the number of entries in the parent
dataset. The compute() method takes an argument called
dataset_pool containing references to the appropriate set of datasets to
use for computing this variable. The parent dataset can be
accessed from the compute() method by self.get_dataset().
If the variable depends on other attributes,they must be listed in the method dependencies(), which returns a list of all dependent variables and attributes in their fully-qualified names (see Section 7.2.4 for details on attribute specification).
As an example, consider a variable ``is_in_wetland'' for the gridcell dataset
locations from Section 6.2.2 and 6.2.3. The variable
returns True for entries whose percentage of wetland is more than a certain
threshold, and False otherwise. The module is_in_wetland.py,
containing a class is_in_wetland, is stored in the directory
gridcell because
>>> locations.get_dataset_name() 'gridcell'
The class is defined as follows:
from opus_core.variables.variable import Variable
class is_in_wetland(Variable):
def dependencies(self):
return ["gridcell.percent_wetland"]
def compute(self, dataset_pool):
return self.get_dataset().get_attribute("percent_wetland") > \
dataset_pool.get_dataset('urbansim_constant')["percent_coverage_threshold"]
The dependent attribute is a primary attribute and therefore specified as
a dataset-qualified name. For our example, we populate the primary attribute:
>>> locations.add_primary_attribute(name="percent_wetland",
data=[85,20,0,90,35,51,0,10,5])