agate-lookup 0.3.1

Build status PyPI downloads Version License Support Python versions

agate-lookup adds one-line access to lookup tables to agate.

Important links:

Install

To install:

pip install agate-lookup

For details on development or supported platforms see the agate documentation.

Import

agate-lookup is an agate extension. To use it, first import it and patch its functionality into agate:

import agate
import agatelookup

Importing agate-lookup adds methods to agate.Table.

Basic lookup

agate-lookup allows you to join your tables to data from the lookup project. The basic mechanism for doing this is the TableLookup.lookup() method. For example, if you have this table:

company usps
Walmart AR
Exxon TX
Chevron CA

You could add the state name to the table by running:

joined = table.lookup('usps', 'state')

The resulting table would be:

company usps state
Walmart AR Arkansas
Exxon TX Texas
Chevron CA California

If your table has different keys from the lookup table, you can specify them using the table_key argument. For example, if your table had the column name postal then you could achieve the same result by running:

joined = table.lookup('postal', 'state', lookup_key='usps')

Multi-column lookup

Some lookup tables have multiple key columns, for example year and month. To join to a table like this, pass a sequence of column names as the first argument. For example, consider this table:

usps year
AZ 1985
WY 2014
SC 1994

We can join the population of the state for each year with this code:

joined = table.lookup(['usps', 'year'], 'population')
usps year population
AZ 1985 3,183,538
WY 2014 584,153
SC 1994 3,705,397

Versioned lookup

Some lookup tables have several versions. For example, the NAICS business code classification is revised every 5 years. You can also select a particular version of the lookup table using the version argument. To join the 2012 edition of the NAICS codes, you would run:

joined = table.lookup('naics', 'description', version='2012')

Fetch a table without joining

You can also fetch a lookup table without joining it. For example, to get the Consumer Price Index by year and month:

cpi = agate.Table.from_lookup(['year', 'month'], 'cpi')
year month cpi
1947 1 21.48
1947 2 21.62
1947 3 22.00
... ... ...

Lookup tables automatically have row_names assigned. In this case the row names are a tuple of (year, month). We can use this to quickly calculate inflation-adjusted prices in another table.

BASE_CPI = cpi.rows[('2015', '12')]['cpi']

deflator = lambda r: r['price'] * cpi.rows[(r['year'], r['month'])]['cpi'] / BASE_CPI)

adjusted = table.compute([
    ('real_price', agate.Formula(agate.Number(), deflator)
])

The adjusted table will now have a real_price column with prices in December, 2015 dollars.

Using a custom repository

By default, agate-lookup will use the wireservice/lookup repository of lookup tables. Look there to see what key/value combinations and versions you can use with lookup.

You can specify your own repository of lookup tables by constructing an instance of Source and passing it into the TableLookup.lookup() method.

Caching

Lookup tables are cached each time they are downloaded. The default cache location is ~/lookup. In order to ensure you have the latest version, tables are redownloaded each time that they are used, unless a network connection can not be made. If there is a connection issue, the cached copy will be read from disk.

You can override the default caching location by creating a custom Source and passing the cache argument.

API

agatelookup.table.lookup(self, key, value, lookup_key=None, version=None, source=None, require_match=False)

Fetch a lookup table from the remote source, matches it this table by its key columns, appends the value column and returns a new table instance.

Parameters:
  • key – A column name or a sequence of such names to match in this table.
  • value – The value that is being looked up. For example 'description' or 'population'. This is the column that will be appended.
  • lookup_key – A column name or a sequence of such names to match in the lookup table. For example 'naics' or ['city', 'year']. This defaults the same values specified for key, so it only needs to be specified if the column names in this table aren’t the same.
  • version – An optional version of the lookup to use, if more than one exists. For instance '2007' for the 2007 edition of the NAICS codes or '2012' for the 2012 version.
  • source

    An instance of Source that defines where lookup tables are located. If not specified a default source will be used that points to the wireservice/lookup repository.

  • require_match – If True, an exception will be raised if there is a value in this table with no matching entry in the lookup table.
agatelookup.table.from_lookup(cls, lookup_key, value, version=None, source=None)

Fetch a lookup table, but don’t join it to anything. See TableLookup.lookup() for arguments.

class agatelookup.source.Source(root='http://wireservice.github.io/lookup', cache='~/.lookup')

A reference to an archive of lookup tables. This is a remote location with lookup table and metadata files at a known path structure.

Parameters:
  • root – The root URL to prefix all data and metadata paths.
  • cache – A path in which to store cached copies of any tables that are used, so they can continue to be used offline.
get_metadata(keys, value, version=None)

Fetches metadata related to a specific lookup table.

See Source.get_table() for parameter details.

get_table(keys, value, version=None)

Fetches and creates and agate table from a specified lookup table.

The resulting table will automatically have row names created for the key columns, thus allowing it to be used as a lookup.

Parameters:
  • keys – Either a single string or a sequence of keys that identify the “left side” of the table. For example 'fips' or ['city', 'year'].
  • value – The value that is being looked up from the given keys. For example 'state' or 'population'.
  • version – An optional version of the given lookup, if more than one exists. For instance '2007' for the 2007 edition of the NAICS codes or '2012' for the 2012 version.

Authors

The following individuals have contributed code to agate-lookup:

Changelog

0.3.1 - December 19, 2016

  • Add missing module.

0.3.0 - December 19, 2016

  • Remove monkeypatching pattern.
  • Upgrade required agate to 1.5.0.

0.2.1 - March 9, 2016

  • Reorganize docs.
  • Add caching to docs.
  • Reversed order of source and table keys in lookup method.

0.2.0 - March 8, 2016

  • Update to latest lookup metadata format.
  • Implement local caching for offline usage. (#14)
  • Remove options to Source for generating custom file paths.

0.1.1 - March 8, 2016

  • Added CPI calculation example to docs.
  • Improvements to error handling.
  • Fetched lookup tables now automatically have row names assigned.
  • TableLookup.from_table() class method implemented.

0.1.0 - February 28, 2016

  • Initial version.

License

The MIT License

Copyright (c) 2016 Christopher Groskopf and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Indices and tables