Skip to content

epymorph.adrio.acs5

ACS5Year module-attribute

ACS5Year = Literal[
    2009,
    2010,
    2011,
    2012,
    2013,
    2014,
    2015,
    2016,
    2017,
    2018,
    2019,
    2020,
    2021,
    2022,
    2023,
]

A supported ACS5 data year.

ACS5_YEARS module-attribute

ACS5_YEARS: Sequence[ACS5Year] = (
    2009,
    2010,
    2011,
    2012,
    2013,
    2014,
    2015,
    2016,
    2017,
    2018,
    2019,
    2020,
    2021,
    2022,
    2023,
)

All supported ACS5 data years.

RaceCategory module-attribute

RaceCategory = Literal[
    "White",
    "Black",
    "Native",
    "Asian",
    "Pacific Islander",
    "Other",
    "Multiple",
]

A racial category defined by ACS5.

ACS5Client

Methods for interacting with the Census API for ACS5 data. Typical usage will not need to use this class, but it's provided for advanced cases.

url staticmethod

url(year: int) -> str

The base request URL for a given ACS5 year.

Parameters:

  • year (int) –

    The ACS5 data vintage year.

Returns:

  • str

    The formatted base url.

get_vars cached staticmethod

get_vars(year: int) -> dict[str, dict]

Loads (and caches) ACS5 variable metadata. This metadata is published by the Census alongside the data for each year.

Parameters:

  • year (int) –

    The ACS5 data vintage year.

Returns:

  • dict[str, dict]

    A dictionary of metadata about available variables, where the key is a variable name and values are also dictionaries containing the metadata pertaining to the variable.

get_group_vars cached staticmethod

get_group_vars(
    year: int, group: str
) -> list[tuple[str, dict]]

Retrieves the variables metadata for a specific group of variables. This is equivalent to calling get_vars and then filtering to the variables in the group.

Parameters:

  • year (int) –

    The ACS5 data vintage year.

  • group (str) –

    The name of the group to fetch.

Returns:

get_group_var_names cached staticmethod

get_group_var_names(year: int, group: str) -> list[str]

Like get_group_vars but just returns the variable names in the group.

Parameters:

  • year (int) –

    The ACS5 data vintage year.

  • group (str) –

    The name of the group to fetch.

Returns:

  • list[str]

    The names of all variables in the group.

make_queries staticmethod

make_queries(scope: CensusScope) -> list[dict[str, str]]

Creates one or more Census API query predicates for the given scope. These may involve the "for" and "in" request parameters. Depending on your scope and the limitations of the API, multiple queries may be required, especially when your scope represents a disjoint spatial selection or one that otherwise can't be neatly expressed in a form like "all counties within state X".

Parameters:

  • scope (CensusScope) –

    The geo scope for which to make a query.

Returns:

  • list[dict[str, str]]

    The list of queries necessary to cover the scope. As defined by the Census API, individual queries are in the form of key/value pairs of strings.

fetch staticmethod

fetch(
    scope: CensusScope,
    variables: list[str],
    value_dtype: type[generic],
    report_progress: Callable[[float], None] | None = None,
) -> DataFrame

Requests variables from the Census API for the given scope.

Parameters:

  • scope (CensusScope) –

    The geo scope to query.

  • variables (list[str]) –

    The list of variables to query.

  • value_dtype (type[generic]) –

    The dtype of the result array.

  • report_progress (Callable[[float], None] | None, default: None ) –

    A callback for reporting query progress; especially useful when the scope necessitates multiple queries.

Returns:

  • DataFrame

    A dataframe in "long" format, with columns: geoid, variable, and value. Geoid and variable are strings and value will be converted to the given dtype.

Population

Population(
    *,
    fix_insufficient_data: FixLikeInt = False,
    fix_missing: FillLikeInt = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[int64, int64]

Loads population data from the US Census ACS 5-Year Data (variable B01001_001E). ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of integers.

Parameters:

  • fix_insufficient_data (FixLikeInt, default: False ) –

    The method to use to replace values that could not be computed due to an insufficient number of sample observation (-666666666 in the data).

  • fix_missing (FillLikeInt, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

PopulationByAgeTable

PopulationByAgeTable(
    *,
    fix_insufficient_data: FixLikeInt = False,
    fix_missing: FillLikeInt = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[int64, int64]

Loads a table of population categorized by Census-defined age brackets from the US Census ACS 5-Year Data (group B01001). This table is most useful as the source data for one or more PopulationByAge ADRIOs, which knows how to select, group, and aggregate the data for simulations. ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an NxA-shaped array of integers where A is the number of variables included in the table. For example, in 2023 there are 49 variables: 23 age brackets for male, 23 age brackets for female, the male all-ages total, the female all-ages total, and a grand total.

Parameters:

  • fix_insufficient_data (FixLikeInt, default: False ) –

    The method to use to replace values that could not be computed due to an insufficient number of sample observation (-666666666 in the data).

  • fix_missing (FillLikeInt, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census, and an example of this table for 2023.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

AgeRange

Bases: NamedTuple

Models an age range for use with ACS age-categorized data. Unlike Python integer ranges, the end of the this range is inclusive. end can also be None which models the "and over" part of ranges like "85 years and over".

start instance-attribute

start: int

The youngest age included in the range.

end instance-attribute

end: int | None

The oldest age included in the range, or None to indicate an unbounded range.

contains

contains(other: AgeRange) -> bool

Is the other range fully contained in (or coincident with) this range?

Parameters:

  • other (AgeRange) –

    The other age range to consider.

Returns:

  • bool

    True if the range is contained in this range.

parse staticmethod

parse(label: str) -> AgeRange | None

Parse the age range of an ACS field label; e.g.: Estimate!!Total:!!Male:!!22 to 24 years

Parameters:

  • label (str) –

    A census variable label.

Returns:

  • AgeRange | None

    The AgeRange object if parsing is successful, None if not.

PopulationByAge

PopulationByAge(
    age_range_start: int, age_range_end: int | None
)

Bases: _ACS5Mixin, ADRIO[int64, int64]

Processes a population-by-age table to extract the population of a specified age bracket, as limited by the age brackets defined by the US Census ACS 5-Year Data (group B01001). This ADRIO does not fetch data on its own, but requires you to provide another attribute named "population_by_age_table" for it to parse. Most often, this will be provided by a PopulationByAgeTable instance. This allows the table to be reused in case you need to calculate more than one population bracket (as is common in a multi-strata model).

The result is an N-shaped array of integers.

Parameters:

  • age_range_start (int) –

    The youngest age to include in the age bracket.

  • age_range_end (int | None) –

    The oldest age to include in the age bracket, or None to indicate an unbounded range (include all ages greater than or equal to age_range_start).

Raises:

  • ValueError

    If the given age range does not line up with those ranges which are available in the source data. For instance, the Census defines an age bracket of 20-to-24 years. This makes it impossible for 21, 22, or 23 to be either the start or end of an age range. You can view the available age ranges on data.census.gov.

See Also

The ACS 5-Year documentation from the US Census, and an example of this table for 2023.

POP_BY_AGE_TABLE class-attribute instance-attribute

POP_BY_AGE_TABLE = AttributeDef(
    "population_by_age_table", int, NxA
)

Defines the population-by-age-table requirement of this ADRIO.

requirements class-attribute instance-attribute

requirements = (POP_BY_AGE_TABLE,)

The attribute definitions describing the data requirements for this function.

For advanced use-cases, you may specify requirements as a property if you need it to be dynamically computed.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

inspect

inspect() -> InspectResult[int64, int64]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

age_ranges staticmethod

age_ranges(year: int) -> Sequence[AgeRange]

Lists the age ranges used by the ACS5 population by age table in definition order for the given year. Note that this does not correspond one-to-one with the values in the B01001 table -- this list omits "total" columns and duplicates.

Parameters:

  • year (int) –

    A supported ACS5 year.

Returns:

PopulationByRace

PopulationByRace(
    race: RaceCategory,
    *,
    fix_insufficient_data: FixLikeInt = False,
    fix_missing: FillLikeInt = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[int64, int64]

Loads population by race from the US Census ACS 5-Year Data (group B02001). ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of integers.

Parameters:

  • race (RaceCategory) –

    The Census-defined race category to load.

  • fix_insufficient_data (FixLikeInt, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666).

  • fix_missing (FillLikeInt, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

AverageHouseholdSize

AverageHouseholdSize(
    *,
    fix_insufficient_data: FixLikeFloat = False,
    fix_missing: FillLikeFloat = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[float64, float64]

Loads average household size data, based on the number of people living in a household, from the US Census ACS 5-Year Data (variable B25010_001E). ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of floats.

Parameters:

  • fix_insufficient_data (FixLikeFloat, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666).

  • fix_missing (FillLikeFloat, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

MedianAge

MedianAge(
    *,
    fix_insufficient_data: FixLikeFloat = False,
    fix_missing: FillLikeFloat = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[float64, float64]

Loads median age data from the US Census ACS 5-Year Data (variable B01002_001E). ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of floats.

Parameters:

  • fix_insufficient_data (FixLikeFloat, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666).

  • fix_missing (FillLikeFloat, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

MedianIncome

MedianIncome(
    *,
    fix_insufficient_data: FixLikeInt = False,
    fix_missing: FillLikeInt = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[int64, int64]

Loads median income data in whole dollars from the US Census ACS 5-Year Data (variable B19013_001E), which is adjusted for inflation to the year of the data. ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of integers.

Parameters:

  • fix_insufficient_data (FixLikeInt, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666).

  • fix_missing (FillLikeInt, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

GiniIndex

GiniIndex(
    *,
    fix_insufficient_data: FixLikeFloat = False,
    fix_missing: FillLikeFloat = False,
)

Bases: _ACS5FetchMixin, FetchADRIO[float64, float64]

Loads Gini Index data from the US Census ACS 5-Year Data (variable B19083_001E). This is a measure of income inequality on a scale from 0 (perfect equality) to 1 (perfect inequality). ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to BlockGroupScope (aggregates are computed by the Census Bureau). Data is loaded according to the scope's year, from 2009 to 2023.

The result is an N-shaped array of floats.

Parameters:

  • fix_insufficient_data (FixLikeFloat, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666).

  • fix_missing (FillLikeFloat, default: False ) –

    The method to use to fix missing values.

See Also

The ACS 5-Year documentation from the US Census, and general info on the Gini index.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

validate_context

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

DissimilarityIndex

DissimilarityIndex(
    majority_pop: RaceCategory,
    minority_pop: RaceCategory,
    *,
    fix_insufficient_population: FixLikeInt = False,
    fix_missing_population: FillLikeInt = False,
    fix_not_computable: FixLikeFloat = False,
)

Bases: _ACS5Mixin, ADRIO[float64, float64]

Calculates the Dissimilarity Index using US Census ACS 5-Year Data (group B02001). The dissimilarity index is a measure of segregation comparing two races. Typically one compares a majority to a minority race and so the names of parameters reflect this, but this relationship between races involved isn't strictly necessary. The numerical result can be interpreted as the percentage of "minority" individuals that would have to move in order for the geographic distribution of individuals within subdivisions of a location to match the distribution of individuals in the location as a whole. ACS5 data is compiled from surveys taken during a rolling five year period, and as such are estimates.

Data is available using CensusScope geographies, from StateScope down to TractScope. Data is loaded according to the scope's year, from 2009 to 2023. This ADRIO does not support BlockGroupScope because we the calculation of the index requires loading data at a finer granularity than the target granularity, and there is no ACS5 data below block groups.

The result is an N-shaped array of floats.

Parameters:

  • majority_pop (RaceCategory) –

    The race category representing the majority population for the amount of segregation.

  • minority_pop (RaceCategory) –

    The race category representing the minority population within the segregation analysis.

  • fix_insufficient_population (FixLikeInt, default: False ) –

    The method to use to fix values for which there were insufficient data to report (sentinel value: -666666666). The replacement is performed on the underlying population by race data.

  • fix_missing_population (FillLikeInt, default: False ) –

    The method to use to fix missing values. The replacement is performed on the underlying population by race data.

  • fix_not_computable (FixLikeFloat, default: False ) –

    The method to use to fix values for which we cannot compute a value because population numbers cannot be loaded for one or more of the populations involved.

See Also

The ACS 5-Year documentation from the US Census, and general information about the dissimilarity index.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_context

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

validate_result

validate_result(context: Context, result: NDArray) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

inspect

inspect() -> InspectResult[float64, float64]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

census_api_key

census_api_key() -> str | None

Loads the API key to use for census.gov, as environment variable 'API_KEY__census.gov'. If that's not found we fall back to 'CENSUS_API_KEY', as a legacy form.

Returns:

  • str | None

    The key, or None if it's not set.