The Output Tools

epymorph provides three output tools so you can quickly inspect a simulation output: line plots, choropleth maps, and data tables. To the extent possible, each of these tools uses similar interface concepts, primarily the axis strategies (select/group/aggregate) introduced in the previous chapter.

In this chapter we’ll dive deeper into the features of our three output tools.

First, let’s describe the design principles underlying these tools so you can best decide how they fit into your workflow. In the balancing act between simplicity and flexibility/power, we have prioritized “easy to learn” and “quick to use”, even if that somewhat limits “what you can do”. epymorph provides customization options we believe to be most commonly useful, but avoids esoteric or overly-complex options. For users that require maximal power and customization, the ultimate option is to utilize the output data directly with tools like matplotlib which offer much greater features at the expense of requiring more know-how and more lines of code to get results. epymorph’s tools do include some middle-ground features, however, that still leverage epymorph’s logic to do some heavy lifting while giving you more freedom in how to render the results – we’ll introduce those too.

Let’s start with a basic SIRS simulation, which we’ll use for all examples:

from epymorph.kit import *
from epymorph.adrio import acs5, us_tiger

# Our example RUME:
# - an SIRS simulation,
# - with movement based on distance,
# - in Arizona and New Mexico counties,
# - for 6 months in 2020.
rume = SingleStrataRUME.build(
    ipm=ipm.SIRS(),
    mm=mm.Centroids(),
    init=init.SingleLocation(0, 100),
    scope=CountyScope.in_states(["AZ", "NM"], year=2020),
    time_frame=TimeFrame.rangex("2020-01-01", "2020-07-01"),
    params={
        "ipm::beta": 0.4,
        "ipm::gamma": 1 / 5,
        "ipm::xi": 1 / 90,
        "mm::phi": 40.0,
        "centroid": us_tiger.InternalPoint(),
        "population": acs5.Population(),
    },
)

# We'll be inspecting this simulation output:
with sim_messaging(live=False):
    sim = BasicSimulator(rume)
    out = sim.run()

Loading gpm:all::mm::population (epymorph.adrio.acs5.Population):
  |####################| 100%  (1.074s)
Loading gpm:all::mm::centroid (epymorph.adrio.us_tiger.InternalPoint):
  |####################| 100%  (0.780s)
Running simulation (BasicSimulator):
• 2020-01-01 to 2020-06-30 (182 days)
• 48 geo nodes
  |####################| 100% 
Runtime: 3.559s

Line plots

We’ve seen basic examples of line plot usage, but you can do some customization of the plot as well. (See also: the API documentation.)

Basic customization

We can add some cosmetic touches (see code comments):

out.plot.line(
    # Start with the three axis strategies, as usual:
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.compartments("S", "I"),
    
    # Render the legend to the right of the plot:
    legend="outside",

    # Add a title:
    title="Susceptible vs Infected Population in Arizona and New Mexico",

    # Customize line styling:
    line_kwargs=[
        {"color": "red", "linestyle": "solid"},
        {"color": "red", "linestyle": "dashed"},
        {"color": "blue", "linestyle": "solid"},
        {"color": "blue", "linestyle": "dashed"},
    ],
)

legend and title are well-documented, but line_kwargs could use some additional explanation. Since we are using matplotlib behind the scenes to render plots, we wanted to surface some of its interface but without over-complicating things. line_kwargs is such a feature. If you were writing the matplotlib code yourself, you might call ax.plot(...) to render each line. Well basically so do we! Just about any option you can pass to ax.plot(...) can be provided using our line_kwargs mechanism.

If you know the order in which the lines will be drawn (which is the order they show up in the legend), then for each line you can specify a dictionary containing additional keyword arguments. (Above, I’ve drawn Arizona in red and New Mexico in blue, while drawing the Susceptible populations as solid lines and the Infected populations as dashed lines.) I knew I had four lines total here, so I provided a list of four dicts. But you can also give fewer dicts than you have lines – epymorph will simply cycle back to the start of the list if it has more lines to draw.

Of course you can get very clever with this if you’re familiar with Python list comprehensions. The following syntax would be equivalent to the above with less repetition:

line_kwargs=[
    {"color": c, "linestyle": s}
    for c in ["red", "blue"]
    for s in ["solid", "dashed"]
]

Aside from line styling, you can also change the default sort order; sorting by “location” (the default) or by “quantity”.

out.plot.line(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.compartments(),
    ordering="quantity",
)

And change the formatting of the label. Label formatting strings allow you to use {n} for the geo node label and {q} for the quantity label. (The default formatting string is "{n}: {q}")

out.plot.line(
    geo=rume.scope.select.by_state("AZ").group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.compartments(),
    label_format="compartment {q}",
)

And finally you can change the formatting of the time axis: “auto”, “date”, or “day”. Date refers to the calendar date while day refers to the indexed simulation day – 0 is the first day of the simulation, 1 is the second day, and so on.

out.plot.line(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.compartments(),
    time_format="date",
)

Note: the default time format (“auto”) is a bit tricky and can change depending on whether or not you have applied a grouping strategy to the time axis. Further, the value of time_format may be ignored if it isn’t valid for your grouping. For example if you group your results by epi week¹, “date” or “day” formats are no longer applicable – epi week is its own format!

out.plot.line(
    geo=rume.scope.select.all().group("state").sum(),
    time=(
        rume.time_frame.select
        .rangex("2020-02-01", "2020-04-01")
        .group("epiweek").agg()
    ),
    quantity=rume.ipm.select.events("S->I"),
    title="New infections per epi week",
    label_format="{n}",
    time_format="day", # <-- IGNORED!
)

Advanced customization

transform

You may want to apply some sort of transformation to the data before you plot it. The transform parameter allows you to provide an arbitrary function that will be run on each line’s data after the axis strategies are applied but before it’s plotted.

For example, to plot the y-axis on the log scale:

from math import log, nan


def log_transform(data_df):
    # we have to be careful to avoid log(0)!
    log_value = data_df["value"].apply(lambda x: log(x) if x > 0 else nan)
    return data_df.assign(value=log_value)


out.plot.line(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.compartments("S", "I"),
    transform=log_transform,
)

The transform function will be passed a pandas.DataFrame containing columns “time” (the x-axis value), “geo” (the node ID), “quantity” (the label of the quantity), and “value” (the data column). The function is expected to return a DataFrame with the same columns and same number of rows, but where the “value” column’s values have been modified in some way. (Transformations outside of these limitations may not be supported.)

Note that the “geo” and “quantity” columns will contain the same value in every row – these are constant for a single line! They are provided only in case their values are useful to your transform function.

line_plt()

If you’re familiar with matplotlib or if you want more control over how the plot is drawn, you may prefer to work with the matplotlib interface directly. Naturally there’s nothing stopping you from doing that; the simulation results data is fully available to you. However it would be a lot of work re-implementing the logic that our axis strategies already provide. Thankfully you can have it both ways by using the alternative line_plt() method. (The “plt” in the name references the conventional import alias often used for matplotlib.)

line_plt() works a lot like line() – it accepts many of the same parameters, most importantly the three axis strategies – but you also pass in a matplotlib Axes object. epymorph will draw the lines and you get to do the rest.

import matplotlib.pyplot as plt

# Create a figure:
fig, ax = plt.subplots(layout="constrained")

# Call line_plt() and pass in the axes to draw on:
lines = out.plot.line_plt(
    ax,
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all().group("day").agg(),
    quantity=rume.ipm.select.events("S->I"),
    time_format="day",
)

# I can use the returned Line2D objects:
lines[1].set_marker("o")
lines[1].set_markevery(7)

# And do the other customary matplotlib things:
plt.title("My highly customized chart!")
plt.ylabel("new infections")
plt.xlabel("simulation day")
plt.xticks(rotation=90)
plt.legend(prop={"family": "monospace"})
plt.show()

The full power of matplotlib is at your fingertips once again.

Choropleth maps

Since epymorph is all about spatially-explicit modeling, it’s natural to want to display data on a geographic map. The choropleth() method makes this very simple.

out.map.choropleth(
    geo=rume.scope.select.all(),
    time=rume.time_frame.select.all().agg(events="sum"),
    quantity=rume.ipm.select.events("S->I"),
)

I’ve used axis strategies to show a county-granularity map, selecting the “S->I” event, and computing the sum across the entire simulated time frame. The only special constraint we have with choropleth maps is that we have to condense our data down to a single value per geographic node. Each polygon can only have one color! If we specify axis strategies that don’t produce suitable data, we’ll get an exception trying to render the map.

Basic customization

We can still do a bit of customization with this method:

out.map.choropleth(
    geo=rume.scope.select.all(),
    time=rume.time_frame.select.all().agg(events="sum"),
    quantity=rume.ipm.select.events("S->I"),

    # Draw a title:
    title="Total infections per county",

    # Change the color map:
    cmap="Reds",

    # Set the min and max value for the color scale:
    vmin=0,
    vmax=8_000_000,

    # Draw borders around each state:
    borders=rume.scope.select.all().group("state"),

    # Label each polygon:
    text_label="black",

    # Change the map projection:
    proj="5070",
)

title, cmap, vmin, and vmax act just like they do in matplotlib.

borders accepts a geo axis strategy, either a selection or a grouping; generally you’d also want to create this from RUME scope. Each node selected by the strategy will get a border (as long as we can load geography for the resulting scope).

text_label tries to render a label at the center of each geo node. If you pass a string naming a color, the label text is the data value drawn in that color. If you pass True, we’ll draw it in white. And if you want to customize the label further you can create a NodeLabelRenderer instance and pass that in.

proj lets us draw our map with a different projection. (See GeoPandas reference on projections to learn about the different kinds of values you can provide for this.) If not specified, epymorph uses a default projection as decided by the geo scope.

Advanced customization

NodeLabelRenderer

Here’s a quick example using NodeLabelRenderer to label each county with its FIPS code (but only if the FIPS code is an odd number).

from epymorph.tools.out_map import NodeLabelRenderer

class MyLabels(NodeLabelRenderer):
    def labels(self, data_gdf):
        for geoid in data_gdf["geo"]:
            draw_label = int(geoid) % 2 != 0
            yield geoid if draw_label else None

out.map.choropleth(
    geo=rume.scope.select.all(),
    time=rume.time_frame.select.all().agg(events="sum"),
    quantity=rume.ipm.select.events("S->I"),
    borders=rume.scope.select.all(),
    cmap="Reds",
    vmin=0,
    vmax=8_000_000,
    text_label=MyLabels(color="black"),
)

Transform

Just like with line() you can provide a transform function to modify the data. In this case, the function gets a DataFrame with just “geo” and “data” columns, but the principle is the same.

choropleth_plt()

As with line_plt(), there is also a choropleth_plt() method which accepts most of the same arguments in addition to an Axes object on which to draw the map polygons. This brings a lot of matplotlib’s flexibility back to you.

Geographic data

Additionally, we provide some functions that expose the underlying data munging that powers our choropleth maps. geography() will produce the geopandas.GeoDataFrame which corresponds to a geo selection:

geo_gdf = out.map.geography(geo=rume.scope.select.all())
geo_gdf.head()

	GEOID	NAME	ALAND	INTPTLAT	INTPTLON	geometry	centroid
2	35011	De Baca	6016818946	+34.3592729	-104.3686961	POLYGON ((-104.56739 33.99757, -104.56772 33.9...	POINT (-104.36870 34.35927)
30	35035	Otero	17126455954	+32.6155988	-105.7513079	POLYGON ((-106.37642 32.91041, -106.37644 32.9...	POINT (-105.75131 32.61560)
31	35003	Catron	17933561654	+33.9016208	-108.3919284	POLYGON ((-109.04688 33.95674, -109.04688 33.9...	POINT (-108.39193 33.90162)
209	35059	Union	9906834380	+36.4880853	-103.4757229	POLYGON ((-104.00869 36.30524, -104.00866 36.3...	POINT (-103.47572 36.48809)
398	04027	Yuma	14280774789	+32.7739424	-113.9109050	POLYGON ((-114.79193 32.56682, -114.79186 32.5...	POINT (-113.91090 32.77394)

And geography_data() will produce that plus the simulation data for your time and quantity selections:

data_gdf = out.map.geography_data(
    geo=rume.scope.select.all(),
    time=rume.time_frame.select.all().agg(events="sum"),
    quantity=rume.ipm.select.events("S->I"),
)
data_gdf.head()

	GEOID	NAME	ALAND	INTPTLAT	INTPTLON	geometry	centroid	geo	data
0	35011	De Baca	6016818946	+34.3592729	-104.3686961	POLYGON ((-104.56739 33.99757, -104.56772 33.9...	POINT (-104.36870 34.35927)	35011	3112
1	35035	Otero	17126455954	+32.6155988	-105.7513079	POLYGON ((-106.37642 32.91041, -106.37644 32.9...	POINT (-105.75131 32.61560)	35035	69824
2	35003	Catron	17933561654	+33.9016208	-108.3919284	POLYGON ((-109.04688 33.95674, -109.04688 33.9...	POINT (-108.39193 33.90162)	35003	5574
3	35059	Union	9906834380	+36.4880853	-103.4757229	POLYGON ((-104.00869 36.30524, -104.00866 36.3...	POINT (-103.47572 36.48809)	35059	4396
4	04027	Yuma	14280774789	+32.7739424	-113.9109050	POLYGON ((-114.79193 32.56682, -114.79186 32.5...	POINT (-113.91090 32.77394)	04027	227738

This is intended to support highly customized use-cases.

Data tables

Of course sometimes it’s best to view data in a table. Methods on out.table were designed for this purpose.

`quantiles()`

quantiles() calculates time-series quantiles for your simulation results.

out.table.quantiles(
    quantiles=[0, 0.25, 0.5, 0.75, 1.0],
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.events("S->I"),
)

	geo	quantity	0	0.25	0.5	0.75	1.0
0	AZ	S → I	9.0	3348.0	7231.5	19245.0	166665.0
1	NM	S → I	0.0	1062.5	2429.0	6969.5	39769.0

By default, these methods return a pandas.DataFrame, however you can use the result_format parameter if you prefer to receive a string, or just to print the result directly.

out.table.quantiles(
    quantiles=[0, 0.25, 0.5, 0.75, 1.0],
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.by(
        compartments=["S", "I"],
        events=["S->I"],
    ),
    result_format="print",
)

geo quantity         0       0.25       0.5       0.75       1.0
 AZ        S 2142959.0 3037472.25 4147140.5 6972232.50 7173933.0
 AZ        I      97.0   46353.50   70954.0  194320.50 1124652.0
 AZ    S → I       9.0    3348.00    7231.5   19245.00  166665.0
 NM        S  657839.0  902946.25 1207962.5 1997945.75 2106243.0
 NM        I       1.0   14512.75   24320.5   71403.50  277154.0
 NM    S → I       0.0    1062.50    2429.0    6969.50   39769.0

`range()`

range() quickly shows the min and max values.

out.table.range(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.by(
        compartments=["S", "I"],
        events=["S->I"],
    ),
)

	geo	quantity	min	max
0	AZ	S	2142959.0	7173933.0
1	AZ	I	97.0	1124652.0
2	AZ	S → I	9.0	166665.0
3	NM	S	657839.0	2106243.0
4	NM	I	1.0	277154.0
5	NM	S → I	0.0	39769.0

`sum()`

sum() computes the sum of values over time.

# Show the sum of infection events in each state in February 2020.
out.table.sum(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.rangex("2020-02-01", "2020-03-01"),
    quantity=rume.ipm.select.events("S->I"),
)

	geo	quantity	sum
0	AZ	S → I	1237209
1	NM	S → I	625382

Because it’s not valid (or at least misleading) to sum compartment values over time, if your quantity selection includes compartments, those values will be omitted and you will receive a warning.

`chart()`

Although our line plot functions are the primary way to view time-series dynamics, for quick examination or in a limited environment (like a terminal) it may be handy to see a chart drawn in text. chart() does this.

out.table.chart(
    geo=rume.scope.select.all().group("state").sum(),
    time=rume.time_frame.select.all(),
    quantity=rume.ipm.select.all(),
    result_format="print",
)

geo quantity                 chart
 AZ        S █████▇▆▄▃▃▃▄▄▄▄▅▅▅▅▅▅
 AZ        I ▁▁▁▁▁▂▅█▆▃▂▂▁▁▁▁▁▁▂▂▂
 AZ        R ▁▁▁▁▁▁▃▅▇███▇▇▆▆▆▅▅▅▅
 AZ    S → I ▁▁▁▁▁▂▅█▆▃▂▂▁▁▁▁▁▁▂▂▁
 AZ    I → R ▁▁▁▁▁▂▄██▅▃▂▂▁▁▁▁▁▂▂▁
 AZ    R → S ▁▁▁▁▁▁▂▄▆███▇▇▇▆▆▅▅▅▂
 NM        S █████▇▅▄▃▃▃▄▄▄▅▅▅▅▅▅▅
 NM        I ▁▁▁▁▂▅██▆▄▂▂▂▁▁▁▂▂▂▂▂
 NM        R ▁▁▁▁▁▂▄▆▇███▇▇▆▆▆▅▅▅▅
 NM    S → I ▁▁▁▁▂▄██▆▃▂▂▁▁▁▁▁▂▂▂▁
 NM    I → R ▁▁▁▁▂▃▆█▇▅▃▂▂▁▁▁▁▂▂▂▁
 NM    R → S ▁▁▁▁▁▁▃▅▇███▇▇▇▆▆▆▅▅▂

Note: the box characters for chart() might render strangely in some fonts and some contexts. Using the “print” format can help to avoid issues.

Footnotes

an epidemiological week is a standardized way to identify weeks of the calendar year, defined by the CDC.↩︎