Skip to content

epymorph.adrio.adrio

Implements the base class for all ADRIOs, as well as some general-purpose ADRIO implementations.

ResultDType module-attribute

ResultDType = TypeVar('ResultDType', bound=generic)

The result type of an ADRIO.

ProgressCallback module-attribute

ProgressCallback = Callable[
    [float, DownloadActivity | None], None
]

The type of a callback function used by ADRIO implementations to report data fetching progress.

ResultT module-attribute

ResultT = TypeVar('ResultT', bound=generic)

The dtype of an ADRIO result.

ValueT module-attribute

ValueT = TypeVar('ValueT', bound=generic)

The dtype of an ADRIO result's values, which may differ from the result type.

ADRIOError

ADRIOError(adrio: ADRIO, context: Context, message: str)

Bases: Exception

Error while loading or processing data with an ADRIO.

Parameters:

  • adrio (ADRIO) –

    The ADRIO being evaluated.

  • context (Context) –

    The evaluation context.

  • message (str) –

    An error description.

adrio instance-attribute

adrio: ADRIO = adrio

The ADRIO being evaluated.

context instance-attribute

context: Context = context

The evaluation context.

ADRIOContextError

ADRIOContextError(
    adrio: ADRIO,
    context: Context,
    message: str | None = None,
)

Bases: ADRIOError

Error if the simulation context is invalid for evaluating the ADRIO.

Parameters:

  • adrio (ADRIO) –

    The ADRIO being evaluated.

  • context (Context) –

    The evaluation context.

  • message (str | None, default: None ) –

    An error description, or else a default message will be used.

ADRIOCommunicationError

ADRIOCommunicationError(
    adrio: ADRIO,
    context: Context,
    message: str | None = None,
)

Bases: ADRIOError

Error if the ADRIO could not communicate with the external resource.

Parameters:

  • adrio (ADRIO) –

    The ADRIO being evaluated.

  • context (Context) –

    The evaluation context.

  • message (str | None, default: None ) –

    An error description, or else a default message will be used.

ADRIOProcessingError

ADRIOProcessingError(
    adrio: ADRIO,
    context: Context,
    message: str | None = None,
)

Bases: ADRIOError

An unexpected error occurred while processing ADRIO data.

Parameters:

  • adrio (ADRIO) –

    The ADRIO being evaluated.

  • context (Context) –

    The evaluation context.

  • message (str | None, default: None ) –

    An error description, or else a default message will be used.

InspectResult dataclass

InspectResult(
    adrio: ADRIO[ResultT, ValueT],
    source: DataFrame | NDArray | None,
    result: NDArray[ResultT],
    dtype: type[ValueT],
    shape: DataShape,
    issues: Mapping[str, NDArray[bool_]],
)

Bases: Generic[ResultT, ValueT]

Inspection is the process by which an ADRIO fetches data and analyzes its quality.

The result encapsulates the source data, the processed result data, and any outstanding data issues. ADRIOs will provide methods for correcting these issues as is appropriate for the task, but often these will be optional. A result which contains unresolved data issues will be represented as a masked numpy array. Values which are not impacted by any of the data issues will be unmasked. Individual issues are tracked along with masks specific to the issue.

For example: if data is not available for every geo node requested, some values will be represented as missing. Missing values will be masked in the result, and an issue will be included (likely called "missing") with a boolean mask indicating the missing values. The ADRIO will likely provide a fill method option which allows users the option to fill missing values, for instance with zeros. Providing a fill method and inspecting the ADRIO a second time should resolve the "missing" issue and, assuming no other issues remain, produce a non-masked numpy array as a result.

InspectResult is generic on the result and value type (ResultT and ValueT) of the ADRIO.

Parameters:

  • adrio (ADRIO[ResultT, ValueT]) –

    A reference to the ADRIO which produced this result.

  • source (DataFrame | NDArray | None) –

    The data as fetched from the source. This can be useful for debugging data issues.

  • result (NDArray[ResultT]) –

    The final result produced by the ADRIO.

  • dtype (type[ValueT]) –

    The dtype of the data values.

  • shape (DataShape) –

    The shape of the result.

  • issues (Mapping[str, NDArray[bool_]]) –

    The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.

Examples:

The simplest way to use an InspectResult is to print it!

>>> from epymorph.adrio import cdc
>>> from epymorph.kit import *
>>> result = (
...     cdc.COVIDFacilityHospitalization()
...     .with_context(
...         scope=CountyScope.in_states(["AZ"], year=2019),
...         time_frame=TimeFrame.rangex("2021-01-01", "2021-02-01"),
...     )
...     .inspect()
... )
>>> print(result)
ADRIO inspection for epymorph.adrio.cdc.COVIDFacilityHospitalization:
  Result shape: AxN (5, 15); dtype: date/value (int64); size: 75
  Date range: 2021-01-03 to 2021-01-31, period: 7 days
  Values:
    histogram: 11 █▅▂▂▂▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 3065
    quartiles: 54.8, 203.5, 593.0 (IQR: 538.2)
    std dev: 487.8
    percent zero: 0.0%
    percent adult_redacted: 5.3%
    percent adult_missing: 6.7%
    percent pediatric_redacted: 26.7%
    percent pediatric_missing: 6.7%
    percent unmasked: 64.0%

adrio instance-attribute

adrio: ADRIO[ResultT, ValueT]

A reference to the ADRIO which produced this result.

source instance-attribute

source: DataFrame | NDArray | None

The data as fetched from the source. This can be useful for debugging data issues. May be None if the source data isn't suitable for being included with the result (maybe it's too large or in an awkward format, etc.)

result instance-attribute

result: NDArray[ResultT]

The final result produced by the ADRIO.

dtype instance-attribute

dtype: type[ValueT]

The dtype of the data values.

shape instance-attribute

shape: DataShape

The shape of the result.

issues instance-attribute

issues: Mapping[str, NDArray[bool_]]

The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.

values cached property

values: NDArray[ValueT]

The values in the result. If the result is date/value tuples, the values are first extracted.

unmasked_count property

unmasked_count: int

The number of unmasked values in the result.

quantify property

quantify: Sequence[tuple[str, float]]

Quantifies properties of the data: what percentage of the values are impacted by each data issue (if any), how many are zero, and how many are "unmasked" (that is, not affected by any issues). The value is a sequence of tuples which contain the name of the quality and the percentage of values.

ADRIO

Bases: SimulationFunction[NDArray[ResultT]], Generic[ResultT, ValueT]

ADRIOs (or Abstract Data Resource Interface Objects) are functions which are intended to load data from external sources for epymorph simulations. This may be from web APIs, local files or databases, or anything imaginable.

ADRIO is an abstract base class. It is generic in both the form of the result (ResultT) and the type of the values in the result (ValueT). Both represent numpy dtypes.

When the ADRIO's result is simple, like a numpy array of 64-bit integers, both ResultT and ValueT will be the same -- np.int64. If the result is a structured type, however, like with numpy arrays containing date/value tuples, ResultT will reflect the "outer" structured type and ValueT will reflect type of the "inner" data values. As a common example, a date/value array with 64-bit integer values will have ResultT equal to [("date", np.datetime64), ("value", np.int64)] and ValueT equal to np.int64. (This complexity is necessary to work around weaknesses in Python's type system.)

Implementation Notes

Implement this class by overriding result_format to describe the expected results, validate_context to check the provided context (happens prior to loading data), and inspect to implement the data loading logic. Do not override evaluate unless you need to change the base behavior. Override estimate_data if it's possible to estimate data usage ahead of time.

When evaluating an ADRIO, call evaluate or inspect.

See Also

You may prefer to extend epymorph.adrio.adrio.FetchADRIO, which provides more scaffolding for ADRIOs that fetch data from external sources like web APIs.

result_format abstractmethod property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_context abstractmethod

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

validate_result

validate_result(
    context: Context, result: NDArray[ResultT]
) -> None

Validates that the result of evaluating the ADRIO adheres to the expected result format.

Parameters:

  • context (Context) –

    The context in which the result has been evaluated.

  • result (NDArray[ResultT]) –

    The result produced by the ADRIO.

Raises:

evaluate

evaluate() -> NDArray[ResultT]

Evaluates the ADRIO in the current context.

Returns:

inspect abstractmethod

inspect() -> InspectResult[ResultT, ValueT]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

estimate_data

estimate_data() -> DataEstimate

Estimate the data usage for this ADRIO in the current context.

Returns:

  • DataEstimate

    The estimated data usage for this ADRIO's current context. If a reasonable estimate cannot be made, returns EmptyDataEstimate.

FetchADRIO

Bases: ADRIO[ResultT, ValueT]

A specialization of ADRIO that adds structure for ADRIOs that load data from an external source, such as a web API.

Implementation Notes

FetchADRIO provides an implementation of inspect, and requires that you implement methods _fetch and _process instead.

inspect

inspect() -> InspectResult[ResultT, ValueT]

Produce an inspection of the ADRIO's data for the current context.

Returns:

NodeID

Bases: ADRIO[str_, str_]

An ADRIO that provides the node IDs as they exist in the geo scope.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_context

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

inspect

inspect() -> InspectResult[str_, str_]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

Scale

Scale(parent: ADRIO[float64, float64], factor: float)

Bases: ADRIO[float64, float64]

Scales the result of another ADRIO by multiplying values by the given factor.

Parameters:

  • parent (ADRIO[float64, float64]) –

    The ADRIO whose results will be scaled.

  • factor (float) –

    The factor to multiply all resulting ADRIO values by.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_context

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

inspect

inspect() -> InspectResult[float64, float64]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

PopulationPerKM2

Bases: ADRIO[float64, float64]

Calculates population density by combining the values from data attributes for population and land area.

This ADRIO requires two data attributes:

  • "population": the population of the node
  • "land_area_km2": the land area of the node in square kilometers

POPULATION class-attribute instance-attribute

POPULATION = AttributeDef('population', int, N)

LAND_AREA_KM2 class-attribute instance-attribute

LAND_AREA_KM2 = AttributeDef('land_area_km2', float, N)

requirements class-attribute instance-attribute

requirements = (POPULATION, LAND_AREA_KM2)

The attribute definitions describing the data requirements for this function.

For advanced use-cases, you may specify requirements as a property if you need it to be dynamically computed.

result_format property

result_format: ResultFormat

Information about the expected format of the ADRIO's resulting data.

validate_context

validate_context(context: Context) -> None

Validates the context before ADRIO evaluation.

Parameters:

  • context (Context) –

    The context to validate.

Raises:

inspect

inspect() -> InspectResult[float64, float64]

Produce an inspection of the ADRIO's data for the current context.

When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.

NOTE: if you are implementing this method, make sure to call validate_context first and _validate_result last.

Returns:

adrio_cache

adrio_cache(cls: type[_ADRIOClassT]) -> type[_ADRIOClassT]

ADRIO class decorator to add result-caching behavior.

Examples:

1
2
3
4
>>> @adrio_cache
>>> class MyADRIO(ADRIO[np.int64]):
>>>     # Now this ADRIO will cache its results.
>>>     # ...

adrio_validate_pipe

adrio_validate_pipe(
    adrio: ADRIO,
    context: Context,
    result: NDArray[ResultT],
    *validators: Validator,
) -> None

Applies a sequence of validator function to the result of an ADRIO, using that ADRIO's context and raising an appropriate error if the result is invalid.

Parameters:

  • adrio (ADRIO) –

    The ADRIO instance.

  • context (Context) –

    The current simulation context.

  • result (NDArray[ResultT]) –

    The ADRIO result array.

  • *validators (Validator, default: () ) –

    The sequence of validation checks to apply.

Raises:

validate_time_frame

validate_time_frame(
    adrio: ADRIO, context: Context, time_range: DateRange
) -> None

Validates that the context time frame is within the specified DateRange.

Parameters:

  • adrio (ADRIO) –

    The ADRIO instance doing the validation.

  • context (Context) –

    The evaluation context.

  • time_range (DateRange) –

    The valid range of dates.

Raises: