InspectResult

adrio.adrio.InspectResult(self, adrio, source, result, dtype, shape, issues)

Inspection is the process by which an ADRIO fetches data and analyzes its quality. The simplest way to use an InspectionResult is to print it!

The result encapsulates the source data, the processed result data, and any outstanding data issues. ADRIOs will provide methods for correcting these issues as is appropriate for the task, but often these will be optional. A result which contains unresolved data issues will be represented as a masked numpy array. Values which are not impacted by any of the data issues will be unmasked. Individual issues are tracked along with masks specific to the issue.

For example: if data is not available for every geo node requested, some values will be represented as missing. Missing values will be masked in the result, and an issue will be included (likely called “missing”) with a boolean mask indicating the missing values. The ADRIO will likely provide a fill method option which allows users the option of filling missing values, for instance by filling them with zeros. Providing a fill method and inspecting the ADRIO a second time should resolve the “missing” issue and, assuming no other issues remain, produce a non-masked numpy array as a result.

InspectResult is a frozen dataclass, and is generic on the result and value type (ResultT and ValueT) of the ADRIO.

Parameters

adrio: ADRIO[ResultT, ValueT]

A reference to the ADRIO which produced this result.

source: pd.DataFrame | NDArray

The data as fetched from the source. This can be useful for debugging data issues.

result: NDArray[ResultT]

The final result produced by the ADRIO.

dtype: type[ValueT]

The dtype of the data values.

shape: DataShape

The shape of the result.

issues: Mapping[str, NDArray[np.bool_]]

The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.

Examples

from epymorph.adrio import cdc
from epymorph.kit import *

result = (
    cdc.COVIDFacilityHospitalization()
    .with_context(
        scope=CountyScope.in_states(["AZ"], year=2019),
        time_frame=TimeFrame.rangex("2021-01-01", "2021-02-01"),
    )
    .inspect()
)

print(result)
ADRIO inspection for epymorph.adrio.cdc.COVIDFacilityHospitalization:
  Result shape: AxN (5, 15); dtype: date/value (int64); size: 75
  Date range: 2021-01-03 to 2021-01-31, period: 7 days
  Values:
    histogram: 11 █▅▂▂▂▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁ 3065
    quartiles: 54.8, 203.5, 593.0 (IQR: 538.2)
    std dev: 487.8
    percent zero: 0.0%
    percent adult_redacted: 5.3%
    percent adult_missing: 6.7%
    percent pediatric_redacted: 26.7%
    percent pediatric_missing: 6.7%
    percent unmasked: 64.0%

Attributes

adrio: ADRIO[ResultT, ValueT]

A reference to the ADRIO which produced this result.

dtype: type[ValueT]

The dtype of the data values.

issues: Mapping[str, NDArray[np.bool_]]

The set of issues in the data along with a mask which indicates which values are impacted by the issue. The keys of this mapping are specific to the ADRIO, as ADRIOs tend to deal with unique data challenges.

quantify: Sequence[tuple[str, float]]

Quantifies properties of the data: what percentage of the values are impacted by each data issue (if any), how many are zero, and how many are “unmasked” (that is, not affected by any issues). Returns a sequence of tuples which are the name of the quality and the percentage of values.

result: NDArray[ResultT]

The final result produced by the ADRIO.

shape: DataShape

The shape of the result.

source: pd.DataFrame | NDArray

The data as fetched from the source. This can be useful for debugging data issues.

values: NDArray[ValueT]

The values in the result. If the result is date/value tuples, the values are first extracted.