Skip to content

epymorph.data_usage

Data usage estimation and reporting.

DataEstimate module-attribute

DataEstimates can be either empty or non-empty.

EmptyDataEstimate dataclass

EmptyDataEstimate(name: str)

When an entity is not capable of making a data usage estimate, it returns EmptyDataEstimate as a placeholder.

Parameters:

  • name (str) –

    The name of the entity that provided the estimate.

name instance-attribute

name: str

The name of the entity that provided the estimate.

AvailableDataEstimate dataclass

AvailableDataEstimate(
    name: str,
    cache_key: str,
    new_network_bytes: int,
    max_bandwidth: int | None,
    new_cache_bytes: int,
    total_cache_bytes: int,
)

An estimate for the data usage of a data fetch operation.

Operations may download data and may utilize disk caching, so we would like to be able to estimate ahead of time how much data to expect. Because these are estimates, accuracy is not guaranteed.

For example, an ADRIO which fetches data from a third-party source may be able to estimate ahead of time how much data needs to be downloaded and stored.

Parameters:

  • name (str) –

    The name of the entity that provided the estimate.

  • cache_key (str) –

    A unique identifier for the data this estimate is about.

  • new_network_bytes (int) –

    How much new data (in bytes) will need to be downloaded.

  • max_bandwidth (int | None) –

    A source-specific limit on download bandwidth in bytes per second.

  • new_cache_bytes (int) –

    How much new data (in bytes) will be written to disk cache.

  • total_cache_bytes (int) –

    The total data (in bytes) that will be in the cache after fetch. This includes newly-cached and previously-cached files.

name instance-attribute

name: str

The name of the entity that provided the estimate.

cache_key instance-attribute

cache_key: str

A unique identifier for the data this estimate is about.

Multiple entities may load the same set of data; although both would report the same estimate, the actual data usage only happens for the first one to load. The rest would find and return the cached data. This key is used to distinguish this case -- if two estimates share the same key, we can assume the estimate should only be counted once. Cache keys are only comparable within a single simulation context, so we don't need to perfectly distinguish between different scopes or time frames.

new_network_bytes instance-attribute

new_network_bytes: int

How much new data (in bytes) will need to be downloaded.

max_bandwidth instance-attribute

max_bandwidth: int | None

A source-specific limit on download bandwidth in bytes per second. (In case data sources impose known limits on download speed.)

new_cache_bytes instance-attribute

new_cache_bytes: int

How much new data (in bytes) will be written to disk cache.

total_cache_bytes instance-attribute

total_cache_bytes: int

The total data (in bytes) that will be in the cache after fetch. This includes newly-cached and previously-cached files.

CanEstimateData

Bases: Protocol

A checkable protocol which indicates entities that can produce data estimates.

estimate_data abstractmethod

estimate_data() -> DataEstimate

Estimate the data usage for this entity.

If a reasonable estimate cannot be made, return an EmptyDataEstimate.

Returns:

DataEstimateTotal dataclass

DataEstimateTotal(
    new_network_bytes: int,
    new_cache_bytes: int,
    total_cache_bytes: int,
    download_time: float,
)

The computed total of one or more estimates.

Parameters:

  • new_network_bytes (int) –

    How much new data (in bytes) will need to be downloaded.

  • new_cache_bytes (int) –

    How much new data (in bytes) will be written to disk cache.

  • total_cache_bytes (int) –

    The total data (in bytes) that will be in the cache after fetch.

  • download_time (float) –

    The estimated time (in seconds) to download all new data.

new_network_bytes instance-attribute

new_network_bytes: int

How much new data (in bytes) will need to be downloaded.

new_cache_bytes instance-attribute

new_cache_bytes: int

How much new data (in bytes) will be written to disk cache.

total_cache_bytes instance-attribute

total_cache_bytes: int

The total data (in bytes) that will be in the cache after fetch.

download_time instance-attribute

download_time: float

The estimated time (in seconds) to download all new data.

estimate_total

estimate_total(
    estimates: Sequence[DataEstimate], max_bandwidth: int
) -> DataEstimateTotal

Compute the total of a set of data estimates.

A download time estimate is also provided, taking into account the assumed bandwidth limit (max_bandwidth) as well as any source-specific bandwidth limits.

Parameters:

  • estimates (Sequence[DataEstimate]) –

    The estimates to combine.

  • max_bandwidth (int) –

    The assumed maximum download bandwidth, in bytes per second.

Returns:

estimate_report

estimate_report(
    cache_path: Path,
    estimates: Sequence[DataEstimate],
    max_bandwidth: int,
) -> list[str]

Generate a report from the given set of data estimates.

The report describes an itemized list of how much data will be downloaded and how much new data will be written to cache, then totals that up and reports how long that will take and whether or not there is enough available disk space.

Parameters:

  • cache_path (Path) –

    The path of epymorph's cache folder.

  • estimates (Sequence[DataEstimate]) –

    The data estimates.

  • max_bandwidth (int) –

    The assumed maximum download bandwidth, in bytes per second.

Returns:

  • list[str]

    The report, as a list of lines.