You are here: Research Web>DatabaseManagementTools (19 Jun 2010)

Database Management Tools

A new set of tools to support the creation of the base year database for an UrbanSim application, and also to create and manage scenario databases, is now underway.

Requirements

The project is using the following Python libraries, which need to be installed:

  • SQLAlchemy (in the 4.3 installer)
  • Elixir (in the 4.3 installer)
  • Camelot (needs to be added to the installer or can be installed using 'easy_install camelot'
  • For new report generating tools, you also need:
    • rst2pdf, which depends on reportlab. Just use 'easy_install rst2pdf' to get all the dependencies

Database_Tools Package

There is a new 'database_tools' package under opus/src, containing the following project schemas:

  • parcel_database_manager (based on the parcel model system as used by PSRC, SEMCOG, MAG)
  • zone_dababase_manager (based on the zone model system as used by AACOG)
  • simple_zone_database_manager (based on the zone model system used by DCHCMPO - no development or price models)

Localization of Database Tools for User Applications

There is also a new 'database_tools' directory in the dev.urbansim.org repository, that can be checked out into opus/database_tools (at the same level as opus/data or opus/src), to contain subdirectories for localizations of the general data schemas referenced above.

To access this repository, we have created an 'urbansim' account with a password of 'anonymous', to enable read-access using a subversion client like Tortoisesvn to:

If someone needs to access this repository to store their own localized data schemas and data loaders, which would be very helpful to others in the user community, you will need to be added to a list of users that are registered with write access. If you already have such access to the svn.

Raw Data Loader

The localized database_tools directory also can contain a localized version of load_raw_data_to_sql.py, which will contain components to load all or most raw files available for use in developing an UrbanSim database. An initial version is under development for the Honolulu project and can be used as an example to work from to adapt to other data.

There is a helper module, 'dbf.py' stored in /src/database_tools, which is used to load and parse data from dbf files, such as in an ESRI Shapefile. ASCII CSV files are supported using the csv module, and a special-purpose set of examples have been coded for legacy fixed-format files generated by older mainframe programs using COBOL, Fortran, or SAS (for example).

At this point, data can be loaded from:

  • CSV (ASCII, comma separated)
  • ASCII, tab-delimited or space delimited
  • ASCII fixed format, even using complex layouts that require reading part of a record to determine which file layout to use to read the record
In principle, it should also be possible to load data from:
  • Data stored in other SQL databases, including MS Access and MS SQL Server by using the pyodbc connector, but this has not yet been tested.

Report Generators for Continuous and Categorical Variables

It is useful to see the distributions of the values in the 'raw' data being used for an application. To do this, we have developed a reporting system that writes results to a ReStructuredText? (a wiki dialect) format, and then can use the rst2pdf command in a shell, to convert the resulting file to a pdf. Examples of the reports generated using this approach are provided below.

Two initial scripts have been generated using the Honolulu project as a testbed, and are in the dev.urbansim.org/datatabase_tools/honolulu subversion folder:

  • report_percentile.py, which iterates over a user-specified list of tables and all of the integer or float fields, and generates a table that contains min, max, and percentile values at 5%, 10%, 25%, 50% (median), 75%, 90%, 95%. sample output
  • report_unique_values.py generates a report that contains one table per field that is a string or integer field with less than a user-specified number of unique values (e.g. to avoid generating a row per address in a parcel file!). sample output

To Do

Remaining issues, tasks:
  • Diagnose why some tables are not displaying data properly in the Camelot-generated GUI, though the data is correct in the database
  • Deal more thoroughly with field validators when loading data from a file (right now the GUI can enforce validation in interactive data entry but this does not apply to loading from a file as is done with the raw_data_to_sql.py module.
  • Figure out how to turn off generating all the tables that are defined in Camelot core classes... they just add clutter. The GUI ignores them but if you open a third-party database browser they are there...

Honolulu

  • ProjectIDs? not displaying data correctly: 1st 99 project_id's blank, and many TMKs display a negative. But data is ok in the database.

San Francisco

  • Changed activity to activity_name in business_for_estimation (but this field is not on business...why?) Must be an old version. Should have activity_id.li4, activity.iS10.
  • Do we need sector11 on business_for_estimation? Two answers. In the 2001 baseyear, No. Delete. For the 2009 baseyear, no--the sector11, with the new 11-sector control total breakdown, should be named, simply, sector (or sector_name perhaps), overriding the existing six sector wide field.
  • Table development_events_exogenous contains building_use_id, but there is no building_use table in the 2001 cache. Shall we add it? (The 2009 cache contains this table, and it has a class_id, but there is no table to which that refers) building_use table is deprecated/not actively maintained as transition model now works with building_type and building_type_classification.
  • Do we still need urbansim_constants? Mostly this is legacy from the old gridcell version. What, if any, contents from this are actually needed in the current parcel version? I am not sure they are used at all.
  • Temporarily commented out Persons table -- need to resolve some error in it.
-- PaulWaddell - 11 Jun 2010
Topic revision: r4 - 19 Jun 2010 - 15:45:53 - PaulWaddell
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback