epymorph.tools.data

General tools for processing epymorph data.

RumeT `module-attribute`

RumeT = TypeVar('RumeT', bound=RUME)

A type of RUME.

Output

Bases: Protocol

A generic simulation result.

rume `instance-attribute`

rume: RUME

The RUME used in the simulation that generated this output.

dataframe `abstractmethod` `property`

dataframe: DataFrame

The simulation results as a DataFrame.

munge

munge(
    output: Output,
    geo: GeoSelection | GeoAggregation,
    time: TimeSelection | TimeAggregation,
    quantity: QuantitySelection | QuantityAggregation,
) -> DataFrame

Apply select/group/aggregate operations to an output dataframe.

This function powers many of our more-specialized output processing tools, but we expose this general utility to enable re-use of this logic in more use-cases.

Parameters:

output (Output) –

The result data to process.
geo (GeoSelection | GeoAggregation) –

The geo-axis strategy.
time (TimeSelection | TimeAggregation) –

The time-axis strategy.
quantity (QuantitySelection | QuantityAggregation) –

The quantity-axis strategy.

Returns:

DataFrame –

The munged result.

It is a dataframe with columns "time", "geo", and a column per selected quantity. The values in "time" and "geo" come from the chosen aggregation for those axes. Without any group or aggregation specified, "time" is the simulation ticks and "geo" is node IDs.

Raises:

ValueError –

If the axis strategies don't refer to the same RUME used to produce the Output. Generally it's safest to create the axis strategies using the methods from an output's RUME object.

memoize_rume

memoize_rume(
    path: str | Path,
    rume: RumeT,
    *,
    refresh: bool = False,
    rng: Generator | None = None,
) -> RumeT

Cache a RUME's parameter data using a local file.

If the file doesn't exist, the RUME's parameters are evaluated and saved. If the file does exist (and we're not forcing a refresh), values are loaded from the file and the RUME's parameters are overridden to use those values.

This is intended as a utility for working with RUMEs that use data from ADRIOs that may be expensive to fetch repeatedly.

For example, working interactively in a Notebook you may have to reprocess code cells many times, which could mean spending a long time re-fetching data and/or incurring API usage costs. This function helps you avoid those costs.

Parameters:

path (str | Path) –

The path to the cache file.
rume (RumeT) –

The RUME to use.
refresh (bool, default: False ) –

True to force this logic to ignore any existing cache file. Parameters will be evaluated and the cache file will be overwritten.
rng (Generator | None, default: None ) –

A random number generator instance to use. Otherwise we'll use numpy's default RNG.

Returns:

RumeT –

A clone of the RUME with all parameters replaced by the fully-evaluated numpy data arrays.

Warning

This is not a full serialization of the RUME, so if you change the RUME config you should not re-use the cache file; passing refresh=True is an easy way to make this function ignore any previously saved file.