agate-lookup 0.3.3#

Build status Coverage status PyPI downloads Version License Support Python versions

agate-lookup adds one-line access to lookup tables to agate.

Important links:

Install#

To install:

pip install agate-lookup

For details on development or supported platforms see the agate documentation.

Import#

agate-lookup is an agate extension. To use it, first import it and patch its functionality into agate:

import agate
import agatelookup

Importing agate-lookup adds methods to agate.Table.

Basic lookup#

agate-lookup allows you to join your tables to data from the lookup project. The basic mechanism for doing this is the TableLookup.lookup() method. For example, if you have this table:

company

usps

Walmart

AR

Exxon

TX

Chevron

CA

You could add the state name to the table by running:

joined = table.lookup('usps', 'state')

The resulting table would be:

company

usps

state

Walmart

AR

Arkansas

Exxon

TX

Texas

Chevron

CA

California

If your table has different keys from the lookup table, you can specify them using the table_key argument. For example, if your table had the column name postal then you could achieve the same result by running:

joined = table.lookup('postal', 'state', lookup_key='usps')

Multi-column lookup#

Some lookup tables have multiple key columns, for example year and month. To join to a table like this, pass a sequence of column names as the first argument. For example, consider this table:

usps

year

AZ

1985

WY

2014

SC

1994

We can join the population of the state for each year with this code:

joined = table.lookup(['usps', 'year'], 'population')

usps

year

population

AZ

1985

3,183,538

WY

2014

584,153

SC

1994

3,705,397

Versioned lookup#

Some lookup tables have several versions. For example, the NAICS business code classification is revised every 5 years. You can also select a particular version of the lookup table using the version argument. To join the 2012 edition of the NAICS codes, you would run:

joined = table.lookup('naics', 'description', version='2012')

Fetch a table without joining#

You can also fetch a lookup table without joining it. For example, to get the Consumer Price Index by year and month:

cpi = agate.Table.from_lookup(['year', 'month'], 'cpi')

year

month

cpi

1947

1

21.48

1947

2

21.62

1947

3

22.00

Lookup tables automatically have row_names assigned. In this case the row names are a tuple of (year, month). We can use this to quickly calculate inflation-adjusted prices in another table.

BASE_CPI = cpi.rows[('2015', '12')]['cpi']

deflator = lambda r: r['price'] * cpi.rows[(r['year'], r['month'])]['cpi'] / BASE_CPI)

adjusted = table.compute([
    ('real_price', agate.Formula(agate.Number(), deflator)
])

The adjusted table will now have a real_price column with prices in December, 2015 dollars.

Using a custom repository#

By default, agate-lookup will use the wireservice/lookup repository of lookup tables. Look there to see what key/value combinations and versions you can use with lookup.

You can specify your own repository of lookup tables by constructing an instance of Source and passing it into the TableLookup.lookup() method.

Caching#

Lookup tables are cached each time they are downloaded. The default cache location is ~/lookup. In order to ensure you have the latest version, tables are redownloaded each time that they are used, unless a network connection can not be made. If there is a connection issue, the cached copy will be read from disk.

You can override the default caching location by creating a custom Source and passing the cache argument.

API#

agatelookup.table.lookup(self, key, value, lookup_key=None, version=None, source=None, require_match=False)#

Fetch a lookup table from the remote source, matches it this table by its key columns, appends the value column and returns a new table instance.

Parameters:
  • key – A column name or a sequence of such names to match in this table.

  • value – The value that is being looked up. For example 'description' or 'population'. This is the column that will be appended.

  • lookup_key – A column name or a sequence of such names to match in the lookup table. For example 'naics' or ['city', 'year']. This defaults the same values specified for key, so it only needs to be specified if the column names in this table aren’t the same.

  • version – An optional version of the lookup to use, if more than one exists. For instance '2007' for the 2007 edition of the NAICS codes or '2012' for the 2012 version.

  • source

    An instance of Source that defines where lookup tables are located. If not specified a default source will be used that points to the wireservice/lookup repository.

  • require_match – If True, an exception will be raised if there is a value in this table with no matching entry in the lookup table.

agatelookup.table.from_lookup(cls, lookup_key, value, version=None, source=None)#

Fetch a lookup table, but don’t join it to anything. See TableLookup.lookup() for arguments.

class agatelookup.source.Source(root='https://wireservice.github.io/lookup', cache='~/.lookup')#

Bases: object

A reference to an archive of lookup tables. This is a remote location with lookup table and metadata files at a known path structure.

Parameters:
  • root – The root URL to prefix all data and metadata paths.

  • cache – A path in which to store cached copies of any tables that are used, so they can continue to be used offline.

get_metadata(keys, value, version=None)#

Fetches metadata related to a specific lookup table.

See Source.get_table() for parameter details.

get_table(keys, value, version=None)#

Fetches and creates and agate table from a specified lookup table.

The resulting table will automatically have row names created for the key columns, thus allowing it to be used as a lookup.

Parameters:
  • keys – Either a single string or a sequence of keys that identify the “left side” of the table. For example 'fips' or ['city', 'year'].

  • value – The value that is being looked up from the given keys. For example 'state' or 'population'.

  • version – An optional version of the given lookup, if more than one exists. For instance '2007' for the 2007 edition of the NAICS codes or '2012' for the 2012 version.

Authors#

The following individuals have contributed code to agate-lookup:

Changelog#

0.3.3 - February 23, 2024#

  • Add Python 3.12 support.

  • Drop Python 3.7 support (end-of-life was June 27, 2023).

0.3.2 - June 13, 2023#

  • Use yaml.safe_load.

0.3.1 - December 19, 2016#

  • Add missing module.

0.3.0 - December 19, 2016#

  • Remove monkeypatching pattern.

  • Upgrade required agate to 1.5.0.

0.2.1 - March 9, 2016#

  • Reorganize docs.

  • Add caching to docs.

  • Reversed order of source and table keys in lookup method.

0.2.0 - March 8, 2016#

  • Update to latest lookup metadata format.

  • Implement local caching for offline usage. (#14)

  • Remove options to Source for generating custom file paths.

0.1.1 - March 8, 2016#

  • Added CPI calculation example to docs.

  • Improvements to error handling.

  • Fetched lookup tables now automatically have row names assigned.

  • TableLookup.from_table() class method implemented.

0.1.0 - February 28, 2016#

  • Initial version.

License#

The MIT License

Copyright (c) 2016 Christopher Groskopf and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Indices and tables#