epymorph.data_usage
Data usage estimation and reporting.
DataEstimate
module-attribute
DataEstimate = EmptyDataEstimate | AvailableDataEstimate
DataEstimate
s can be either empty or non-empty.
EmptyDataEstimate
dataclass
EmptyDataEstimate(name: str)
AvailableDataEstimate
dataclass
AvailableDataEstimate(
name: str,
cache_key: str,
new_network_bytes: int,
max_bandwidth: int | None,
new_cache_bytes: int,
total_cache_bytes: int,
)
An estimate for the data usage of a data fetch operation.
Operations may download data and may utilize disk caching, so we would like to be able to estimate ahead of time how much data to expect. Because these are estimates, accuracy is not guaranteed.
For example, an ADRIO which fetches data from a third-party source may be able to estimate ahead of time how much data needs to be downloaded and stored.
Parameters:
-
name
(str
) –The name of the entity that provided the estimate.
-
cache_key
(str
) –A unique identifier for the data this estimate is about.
-
new_network_bytes
(int
) –How much new data (in bytes) will need to be downloaded.
-
max_bandwidth
(int | None
) –A source-specific limit on download bandwidth in bytes per second.
-
new_cache_bytes
(int
) –How much new data (in bytes) will be written to disk cache.
-
total_cache_bytes
(int
) –The total data (in bytes) that will be in the cache after fetch. This includes newly-cached and previously-cached files.
cache_key
instance-attribute
cache_key: str
A unique identifier for the data this estimate is about.
Multiple entities may load the same set of data; although both would report the same estimate, the actual data usage only happens for the first one to load. The rest would find and return the cached data. This key is used to distinguish this case -- if two estimates share the same key, we can assume the estimate should only be counted once. Cache keys are only comparable within a single simulation context, so we don't need to perfectly distinguish between different scopes or time frames.
new_network_bytes
instance-attribute
new_network_bytes: int
How much new data (in bytes) will need to be downloaded.
max_bandwidth
instance-attribute
max_bandwidth: int | None
A source-specific limit on download bandwidth in bytes per second. (In case data sources impose known limits on download speed.)
new_cache_bytes
instance-attribute
new_cache_bytes: int
How much new data (in bytes) will be written to disk cache.
total_cache_bytes
instance-attribute
total_cache_bytes: int
The total data (in bytes) that will be in the cache after fetch. This includes newly-cached and previously-cached files.
CanEstimateData
Bases: Protocol
A checkable protocol which indicates entities that can produce data estimates.
estimate_data
abstractmethod
estimate_data() -> DataEstimate
Estimate the data usage for this entity.
If a reasonable estimate cannot be made, return an EmptyDataEstimate
.
Returns:
-
DataEstimate
–The data estimate.
DataEstimateTotal
dataclass
DataEstimateTotal(
new_network_bytes: int,
new_cache_bytes: int,
total_cache_bytes: int,
download_time: float,
)
The computed total of one or more estimates.
Parameters:
-
new_network_bytes
(int
) –How much new data (in bytes) will need to be downloaded.
-
new_cache_bytes
(int
) –How much new data (in bytes) will be written to disk cache.
-
total_cache_bytes
(int
) –The total data (in bytes) that will be in the cache after fetch.
-
download_time
(float
) –The estimated time (in seconds) to download all new data.
new_network_bytes
instance-attribute
new_network_bytes: int
How much new data (in bytes) will need to be downloaded.
new_cache_bytes
instance-attribute
new_cache_bytes: int
How much new data (in bytes) will be written to disk cache.
total_cache_bytes
instance-attribute
total_cache_bytes: int
The total data (in bytes) that will be in the cache after fetch.
download_time
instance-attribute
download_time: float
The estimated time (in seconds) to download all new data.
estimate_total
estimate_total(
estimates: Sequence[DataEstimate], max_bandwidth: int
) -> DataEstimateTotal
Compute the total of a set of data estimates.
A download time estimate is also provided, taking into account the assumed bandwidth
limit (max_bandwidth
) as well as any source-specific bandwidth limits.
Parameters:
-
estimates
(Sequence[DataEstimate]
) –The estimates to combine.
-
max_bandwidth
(int
) –The assumed maximum download bandwidth, in bytes per second.
Returns:
-
DataEstimateTotal
–The estimate total.
estimate_report
estimate_report(
cache_path: Path,
estimates: Sequence[DataEstimate],
max_bandwidth: int,
) -> list[str]
Generate a report from the given set of data estimates.
The report describes an itemized list of how much data will be downloaded and how much new data will be written to cache, then totals that up and reports how long that will take and whether or not there is enough available disk space.
Parameters:
-
cache_path
(Path
) –The path of epymorph's cache folder.
-
estimates
(Sequence[DataEstimate]
) –The data estimates.
-
max_bandwidth
(int
) –The assumed maximum download bandwidth, in bytes per second.
Returns: