epymorph.adrio.adrio
Implements the base class for all ADRIOs, as well as some general-purpose ADRIO implementations.
ResultDType
module-attribute
The result type of an ADRIO.
ProgressCallback
module-attribute
ProgressCallback = Callable[
[float, DownloadActivity | None], None
]
The type of a callback function used by ADRIO implementations to report data fetching progress.
ValueT
module-attribute
The dtype of an ADRIO result's values, which may differ from the result type.
ADRIOError
ADRIOContextError
Bases: ADRIOError
Error if the simulation context is invalid for evaluating the ADRIO.
Parameters:
-
adrio
(ADRIO
) –The ADRIO being evaluated.
-
context
(Context
) –The evaluation context.
-
message
(str | None
, default:None
) –An error description, or else a default message will be used.
ADRIOCommunicationError
Bases: ADRIOError
Error if the ADRIO could not communicate with the external resource.
Parameters:
-
adrio
(ADRIO
) –The ADRIO being evaluated.
-
context
(Context
) –The evaluation context.
-
message
(str | None
, default:None
) –An error description, or else a default message will be used.
ADRIOProcessingError
Bases: ADRIOError
An unexpected error occurred while processing ADRIO data.
Parameters:
-
adrio
(ADRIO
) –The ADRIO being evaluated.
-
context
(Context
) –The evaluation context.
-
message
(str | None
, default:None
) –An error description, or else a default message will be used.
InspectResult
dataclass
InspectResult(
adrio: ADRIO[ResultT, ValueT],
source: DataFrame | NDArray | None,
result: NDArray[ResultT],
dtype: type[ValueT],
shape: DataShape,
issues: Mapping[str, NDArray[bool_]],
)
Bases: Generic[ResultT, ValueT]
Inspection is the process by which an ADRIO fetches data and analyzes its quality.
The result encapsulates the source data, the processed result data, and any outstanding data issues. ADRIOs will provide methods for correcting these issues as is appropriate for the task, but often these will be optional. A result which contains unresolved data issues will be represented as a masked numpy array. Values which are not impacted by any of the data issues will be unmasked. Individual issues are tracked along with masks specific to the issue.
For example: if data is not available for every geo node requested, some values will be represented as missing. Missing values will be masked in the result, and an issue will be included (likely called "missing") with a boolean mask indicating the missing values. The ADRIO will likely provide a fill method option which allows users the option to fill missing values, for instance with zeros. Providing a fill method and inspecting the ADRIO a second time should resolve the "missing" issue and, assuming no other issues remain, produce a non-masked numpy array as a result.
InspectResult
is generic on the result and value type (ResultT
and ValueT
) of
the ADRIO.
Parameters:
-
adrio
(ADRIO[ResultT, ValueT]
) –A reference to the ADRIO which produced this result.
-
source
(DataFrame | NDArray | None
) –The data as fetched from the source. This can be useful for debugging data issues.
-
result
(NDArray[ResultT]
) –The final result produced by the ADRIO.
-
dtype
(type[ValueT]
) –The dtype of the data values.
-
shape
(DataShape
) –The shape of the result.
-
issues
(Mapping[str, NDArray[bool_]]
) –The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.
Examples:
The simplest way to use an InspectResult
is to print it!
adrio
instance-attribute
A reference to the ADRIO which produced this result.
source
instance-attribute
The data as fetched from the source. This can be useful for debugging data issues.
May be None
if the source data isn't suitable for being included with the result
(maybe it's too large or in an awkward format, etc.)
issues
instance-attribute
The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.
values
cached
property
The values in the result. If the result is date/value tuples, the values are first extracted.
quantify
property
Quantifies properties of the data: what percentage of the values are impacted by each data issue (if any), how many are zero, and how many are "unmasked" (that is, not affected by any issues). The value is a sequence of tuples which contain the name of the quality and the percentage of values.
ADRIO
Bases: SimulationFunction[NDArray[ResultT]]
, Generic[ResultT, ValueT]
ADRIOs (or Abstract Data Resource Interface Objects) are functions which are intended to load data from external sources for epymorph simulations. This may be from web APIs, local files or databases, or anything imaginable.
ADRIO is an abstract base class. It is generic in both the form of the result
(ResultT
) and the type of the values in the result (ValueT
). Both represent
numpy dtypes.
When the ADRIO's result is simple, like a numpy array of 64-bit
integers, both ResultT
and ValueT
will be the same -- np.int64
. If the result
is a structured type, however, like with numpy arrays containing date/value tuples,
ResultT
will reflect the "outer" structured type and ValueT
will reflect type
of the "inner" data values. As a common example, a date/value array with 64-bit
integer values will have ResultT
equal to
[("date", np.datetime64), ("value", np.int64)]
and ValueT
equal to np.int64
.
(This complexity is necessary to work around weaknesses in Python's type system.)
Implementation Notes
Implement this class by overriding result_format
to describe the expected results,
validate_context
to check the provided context (happens prior to loading data),
and inspect
to implement the data loading logic. Do not override evaluate
unless
you need to change the base behavior. Override estimate_data
if it's possible to
estimate data usage ahead of time.
When evaluating an ADRIO, call evaluate
or inspect
.
See Also
You may prefer to extend epymorph.adrio.adrio.FetchADRIO, which provides more scaffolding for ADRIOs that fetch data from external sources like web APIs.
result_format
abstractmethod
property
result_format: ResultFormat
Information about the expected format of the ADRIO's resulting data.
validate_context
abstractmethod
validate_context(context: Context) -> None
Validates the context before ADRIO evaluation.
Parameters:
-
context
(Context
) –The context to validate.
Raises:
-
ADRIOContextError
–If this ADRIO cannot be evaluated in the given context.
validate_result
Validates that the result of evaluating the ADRIO adheres to the expected result format.
Parameters:
-
context
(Context
) –The context in which the result has been evaluated.
-
result
(NDArray[ResultT]
) –The result produced by the ADRIO.
Raises:
-
ADRIOProcessingError
–If the result is invalid, indicating the processing logic has a bug.
evaluate
inspect
abstractmethod
inspect() -> InspectResult[ResultT, ValueT]
Produce an inspection of the ADRIO's data for the current context.
When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.
NOTE: if you are implementing this method, make sure to call validate_context
first and _validate_result
last.
Returns:
-
InspectResult[ResultT, ValueT]
–The data inspection results for the ADRIO's current context.
estimate_data
estimate_data() -> DataEstimate
Estimate the data usage for this ADRIO in the current context.
Returns:
-
DataEstimate
–The estimated data usage for this ADRIO's current context. If a reasonable estimate cannot be made, returns
EmptyDataEstimate
.
FetchADRIO
A specialization of ADRIO
that adds structure for ADRIOs that load data from
an external source, such as a web API.
Implementation Notes
FetchADRIO
provides an implementation of inspect
, and requires that you
implement methods _fetch
and _process
instead.
inspect
inspect() -> InspectResult[ResultT, ValueT]
Produce an inspection of the ADRIO's data for the current context.
Returns:
-
InspectResult[ResultT, ValueT]
–The data inspection results for the ADRIO's current context.
NodeID
An ADRIO that provides the node IDs as they exist in the geo scope.
result_format
property
result_format: ResultFormat
Information about the expected format of the ADRIO's resulting data.
validate_context
validate_context(context: Context) -> None
Validates the context before ADRIO evaluation.
Parameters:
-
context
(Context
) –The context to validate.
Raises:
-
ADRIOContextError
–If this ADRIO cannot be evaluated in the given context.
inspect
inspect() -> InspectResult[str_, str_]
Produce an inspection of the ADRIO's data for the current context.
When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.
NOTE: if you are implementing this method, make sure to call validate_context
first and _validate_result
last.
Returns:
-
InspectResult[ResultT, ValueT]
–The data inspection results for the ADRIO's current context.
Scale
Bases: ADRIO[float64, float64]
Scales the result of another ADRIO by multiplying values by the given factor.
Parameters:
-
parent
(ADRIO[float64, float64]
) –The ADRIO whose results will be scaled.
-
factor
(float
) –The factor to multiply all resulting ADRIO values by.
result_format
property
result_format: ResultFormat
Information about the expected format of the ADRIO's resulting data.
validate_context
validate_context(context: Context) -> None
Validates the context before ADRIO evaluation.
Parameters:
-
context
(Context
) –The context to validate.
Raises:
-
ADRIOContextError
–If this ADRIO cannot be evaluated in the given context.
inspect
inspect() -> InspectResult[float64, float64]
Produce an inspection of the ADRIO's data for the current context.
When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.
NOTE: if you are implementing this method, make sure to call validate_context
first and _validate_result
last.
Returns:
-
InspectResult[ResultT, ValueT]
–The data inspection results for the ADRIO's current context.
PopulationPerKM2
Bases: ADRIO[float64, float64]
Calculates population density by combining the values from data attributes for population and land area.
This ADRIO requires two data attributes:
- "population": the population of the node
- "land_area_km2": the land area of the node in square kilometers
LAND_AREA_KM2
class-attribute
instance-attribute
LAND_AREA_KM2 = AttributeDef('land_area_km2', float, N)
requirements
class-attribute
instance-attribute
requirements = (POPULATION, LAND_AREA_KM2)
The attribute definitions describing the data requirements for this function.
For advanced use-cases, you may specify requirements as a property if you need it to be dynamically computed.
result_format
property
result_format: ResultFormat
Information about the expected format of the ADRIO's resulting data.
validate_context
validate_context(context: Context) -> None
Validates the context before ADRIO evaluation.
Parameters:
-
context
(Context
) –The context to validate.
Raises:
-
ADRIOContextError
–If this ADRIO cannot be evaluated in the given context.
inspect
inspect() -> InspectResult[float64, float64]
Produce an inspection of the ADRIO's data for the current context.
When implementing an ADRIO, override this method to provide data fetching and processing logic. Use self methods and properties to access the simulation context or defer processing to another function.
NOTE: if you are implementing this method, make sure to call validate_context
first and _validate_result
last.
Returns:
-
InspectResult[ResultT, ValueT]
–The data inspection results for the ADRIO's current context.
adrio_cache
adrio_validate_pipe
adrio_validate_pipe(
adrio: ADRIO,
context: Context,
result: NDArray[ResultT],
*validators: Validator,
) -> None
Applies a sequence of validator function to the result of an ADRIO, using that ADRIO's context and raising an appropriate error if the result is invalid.
Parameters:
-
adrio
(ADRIO
) –The ADRIO instance.
-
context
(Context
) –The current simulation context.
-
result
(NDArray[ResultT]
) –The ADRIO result array.
-
*validators
(Validator
, default:()
) –The sequence of validation checks to apply.
Raises:
-
ADRIOProcessingError
–If the result is invalid.
validate_time_frame
Validates that the context time frame is within the specified DateRange
.
Parameters:
-
adrio
(ADRIO
) –The ADRIO instance doing the validation.
-
context
(Context
) –The evaluation context.
-
time_range
(DateRange
) –The valid range of dates.
Raises:
-
ADRIOContextError
–If the context time frame is not valid.