Next: Upcoming Changes
Up: Information for Software Developers
Previous: Creating a Baseyear Database
Index
This chapter lists some of the design questions, decisions, and discussions
made as we designed and built Opus. We hope that providing this information
will help people understand, use, and extend the Opus architecture in a way
that preserves what is good and fixes what is not so good. They also may help
explain why we did things that may not be obvious otherwise.
Please feel free to send us additional architectural questions, which we'll try
to add to this list.
- What data stores should we support? Make it easy to get and
store data from many different data stores, such as relational databases or
files.
We already knew that different MPOs would have different relational
databases, so we needed to allow them to use the data store best for them. In
addition, using file stores is useful for testing and for caching datastore
attributes.
- As much as possible, allow an Opus user to easily be able to modify or
extend the behavior of Opus without making any changes to the installed set
of Opus packages.
- Should we create our own Opus language, in the way that R has its
own language? No.
Whereever possible, use Python's language to do what is necessary, rather
than creating our own language. The goal is that Opus should be a set of
Python packages and not change any of Python's default behavior. We did not
want to define our own language, as R did, since we believed we do not have a
sufficient understanding of what the language should be.
- Minimize the number of ``usage models'' or ``object models'' that a user
has to know in order to use Opus. For instance, we struggled to minimize the
number of different types of paths that may be used for naming an Opus
variable, e.g. ``gridcell.grid_id'', ``urbansim.gridcell.grid_id'', and ???.
- Favor transparency over cool code that might make the user's life
simpler. For instance, we used to specify variables with just the last part
of their name, e.g. ``average_household_income'', but changed to using a
fully-qualified path, e.g.
``urbansim.gridcell.average_household_income''. This removed the
ambiguity about which dataset this variable lived in, made it clearer when
variables were new definitions provided by a user versus ones that came with
urbansim, eliminated our need to arbitrarily decide on a search order in
which to look in different packages for this variable definition, avoided
problems associated with search paths, etc. The tradeoff was that the user
would have to type some more characters. On the other hand, we are planning
to build more tools to automatically create this text from GUIs, so the
amount of typing in the future will diminish.
- Where should unit test live? In same file as code.
Having the unit tests in the same file as the code helps reinforce that the
tests are part of the definition of the class. It also makes it much easier
to find the tests for a class.
- How can a modeler easily build a new model from a set of common
model ``parts''? Models may be composed of ``model steps'', where each
model step is a function on data.
- How can a modeler add a new model without changing any of the
installed Opus code?
- How can a modeler add a new variable without changing any of the
installed Opus code?
- How can a user trust the Opus code? Make the code as
transparent as possible. Have automated, meaningful tests for all the
important parts.
This is perhaps one of the most important design questions for Opus.
- How can the user know what to do when there is a failure?
Never hide exceptions that indicate actual errors. Provide useful
error messages. Provide good debugging support.
We still need to do more work on this item.
- How can Opus minimize the amount of memory used?
- How can the user know the ``pedigree'' of the results?
This ``pedigree'' includes which tables were found from in which database,
which version of code was used, what configuration was used for this run, etc.
- How can the user know the state of a simulation run?
- How can the user define a ``scenario`` that is a slight change to
a baseyear database?
- How make it easy to re-start a failed simulation? Use the
run_manager/restart_run.py script to restart a run from the
information in the run_activity table.
- How to know when to re-compute a variable's value? Opus'
dataset keeps track of the versions of a variable and of all the inputs used
to compute that variable. When getting a variable's values, the dataset
checks to see if the inputs' versions have changed. If so, it recomputes the
variable. This is done recursively, so ensure that a variable is computed
exactly when needed, and never before when needed.
- How to make it easy to specify a new variable? Opus uses
several techniques for this.
- A clean and simple Variable class makes it easy to specify
variables.
- A simplified equation syntax to specify some variable computations in
the ``name'' of the variable, e.g.
ln(urbansim.gridcell.total_land_price) computes
the natural log of the urbansim.gridcell.total_land_price
without requiring any extra Python code.
- A simple pattern substitution syntax to allow a ``family'' of related
variables to specified in a single variable, e.g.
urbansim.gridcell.is_plan_type_DDD will match any variable with
a set of digits at the DDD, such as
urbansim.gridcell.is_plan_type_10.
Next: Upcoming Changes
Up: Information for Software Developers
Previous: Creating a Baseyear Database
Index
info (at) urbansim.org