Skip to content

epymorph.data_usage

DataEstimate module-attribute

EmptyDataEstimate dataclass

EmptyDataEstimate(name: str)

An empty data estimate given that the provided data does not support the calculation of the data usage of a data fetch operation.

name instance-attribute

name: str

The name of the given ADRIO.

AvailableDataEstimate dataclass

AvailableDataEstimate(
    name: str,
    cache_key: str,
    new_network_bytes: int,
    max_bandwidth: int | None,
    new_cache_bytes: int,
    total_cache_bytes: int,
)

An estimate for the data usage of a data fetch operation.

Operations may download data and may utilize disk caching, so we would like to be able to estimate ahead of time how much data to expect. A concrete example of such an operation are ADRIOs fetch data from a third-party source during the preparation of a RUME. NOTE: all values are estimated and their accuracy may vary.

name instance-attribute

name: str

What is responsible for loading this data?

cache_key instance-attribute

cache_key: str

Multiple things may in fact load the same set of data; even though both would report the same estimate for missing data, only the first one to load would really incur that cost. The others would then find the cached data waiting. This key should make it possible to discover this case -- if two estimates are produced with the same key, it can be assumed that the estimate should only be counted once. Cache keys are only comparable within a single simulation context, so we don't need to perfectly distinguish between different scopes or time frames.

new_network_bytes instance-attribute

new_network_bytes: int

How much new data (in bytes) will need to be downloaded.

max_bandwidth instance-attribute

max_bandwidth: int | None

A source-specific limit on download bandwidth (in bytes per second). (Some sources may impose known limits on downloads.)

new_cache_bytes instance-attribute

new_cache_bytes: int

How much new data (in bytes) will be written to disk cache.

total_cache_bytes instance-attribute

total_cache_bytes: int

The total data (in bytes) that will be in the cache after fetch. This includes new cached files and previously cached files.

CanEstimateData

Bases: Protocol

estimate_data abstractmethod

estimate_data() -> DataEstimate

Estimate the data usage for this entity. If a reasonable estimate cannot be made, return EmptyDataEstimate.

DataEstimateTotal dataclass

DataEstimateTotal(
    new_network_bytes: int,
    new_cache_bytes: int,
    total_cache_bytes: int,
    download_time: float,
)

new_network_bytes instance-attribute

new_network_bytes: int

How much new data (in bytes) will need to be downloaded.

new_cache_bytes instance-attribute

new_cache_bytes: int

How much new data (in bytes) will be written to disk cache.

total_cache_bytes instance-attribute

total_cache_bytes: int

The total data (in bytes) that will be in the cache after fetch.

download_time instance-attribute

download_time: float

The estimated time (in seconds) to download all new data.

estimate_total

estimate_total(
    estimates: Sequence[DataEstimate], max_bandwidth: int
) -> DataEstimateTotal

Combines a number of individual data estimates into a total.

Includes a total download time with the assumed bandwidth limit as well as source-specific bandwidth limits.

estimate_report

estimate_report(
    cache_path: Path,
    estimates: Sequence[DataEstimate],
    max_bandwidth: int,
) -> list[str]

Generate a report from the given set of data estimates.

Describes an itemized list of how much data will be downloaded and how much new data will be written to cache, then totals that up and reports how long that will take and whether or not there is enough available disk space.