epymorph.data_usage
EmptyDataEstimate
dataclass
EmptyDataEstimate(name: str)
An empty data estimate given that the provided data does not support the calculation of the data usage of a data fetch operation.
AvailableDataEstimate
dataclass
AvailableDataEstimate(
name: str,
cache_key: str,
new_network_bytes: int,
max_bandwidth: int | None,
new_cache_bytes: int,
total_cache_bytes: int,
)
An estimate for the data usage of a data fetch operation.
Operations may download data and may utilize disk caching, so we would like to be able to estimate ahead of time how much data to expect. A concrete example of such an operation are ADRIOs fetch data from a third-party source during the preparation of a RUME. NOTE: all values are estimated and their accuracy may vary.
cache_key
instance-attribute
cache_key: str
Multiple things may in fact load the same set of data; even though both would report the same estimate for missing data, only the first one to load would really incur that cost. The others would then find the cached data waiting. This key should make it possible to discover this case -- if two estimates are produced with the same key, it can be assumed that the estimate should only be counted once. Cache keys are only comparable within a single simulation context, so we don't need to perfectly distinguish between different scopes or time frames.
new_network_bytes
instance-attribute
new_network_bytes: int
How much new data (in bytes) will need to be downloaded.
max_bandwidth
instance-attribute
max_bandwidth: int | None
A source-specific limit on download bandwidth (in bytes per second). (Some sources may impose known limits on downloads.)
new_cache_bytes
instance-attribute
new_cache_bytes: int
How much new data (in bytes) will be written to disk cache.
total_cache_bytes
instance-attribute
total_cache_bytes: int
The total data (in bytes) that will be in the cache after fetch. This includes new cached files and previously cached files.
CanEstimateData
Bases: Protocol
estimate_data
abstractmethod
estimate_data() -> DataEstimate
Estimate the data usage for this entity. If a reasonable estimate cannot be made, return EmptyDataEstimate.
DataEstimateTotal
dataclass
DataEstimateTotal(
new_network_bytes: int,
new_cache_bytes: int,
total_cache_bytes: int,
download_time: float,
)
new_network_bytes
instance-attribute
new_network_bytes: int
How much new data (in bytes) will need to be downloaded.
new_cache_bytes
instance-attribute
new_cache_bytes: int
How much new data (in bytes) will be written to disk cache.
total_cache_bytes
instance-attribute
total_cache_bytes: int
The total data (in bytes) that will be in the cache after fetch.
download_time
instance-attribute
download_time: float
The estimated time (in seconds) to download all new data.
estimate_total
estimate_total(
estimates: Sequence[DataEstimate], max_bandwidth: int
) -> DataEstimateTotal
Combines a number of individual data estimates into a total.
Includes a total download time with the assumed bandwidth limit as well as source-specific bandwidth limits.
estimate_report
estimate_report(
cache_path: Path,
estimates: Sequence[DataEstimate],
max_bandwidth: int,
) -> list[str]
Generate a report from the given set of data estimates.
Describes an itemized list of how much data will be downloaded and how much new data will be written to cache, then totals that up and reports how long that will take and whether or not there is enough available disk space.