CSV ADRIOs are capable of loading data from CSV files in a variety of “shapes” related to simulation dimensions. These are:
N: one value per node (constant over time)
TxN: a time-series of values per node
NxN: one value per pair of nodes
For a CSV file to be loaded successfully, it must be formatted as expected.
For each “axis” in the shape (N or T) there must be one corresponding column (extra columns are allowed). Together these columns form the key. The key must be “perfectly complete” with respect to the simulation, that is, all keys implied by the simulation context must be represented exactly once. So for N, we expect a column containing every geo node. For TxN we expect a column of dates and a column of geo nodes, and every unique date/node pair is represented exactly once. For NxN, we expect two geo node columns and every unique pair of nodes is represented exactly once.
And finally there must be one column containing the corresponding data value.
Dates must be in “YYYY-MM-DD” format.
Geographic identifiers must be in one of the supported formats:
state_abbrev: postal code; e.g., “AZ” for Arizona
county_state: county name and state postal code separated by a comma; e.g., “Maricopa, AZ” for Maricopa County in Arizona
geoid: GEOID or FIPS code; e.g., “04013” for Maricopa County in Arizona
Contents of the CSV file do not need to be sorted — epymorph will take care of sorting values into a canonical ordering.
Examples
These examples will create temporary CSV files using the create_temp_file function, but in typical usage of course you would have a suitable CSV file on disk to load.
CSV
The ADRIO named CSV is for loading N-shaped data: one value per node in our geo scope.
import numpy as npfrom epymorph.kit import*from epymorph.adrio import csvcsv_path = create_temp_file(""" state,pop AZ,3000 CO,4000 NM,5000 UT,6000""")csv.CSV( file_path=csv_path, key_col=0, # first column is our geography (key) data_col=1, # second column is our data (population) data_type=np.int64, # the result will be interpreted as this dtype key_type="state_abbrev", # because our data uses state postal codes skiprows=1, # csv contains one header row).with_context( scope=StateScope.in_states(["AZ", "NM", "CO", "UT"], year=2020),).evaluate()
array([3000, 4000, 5000, 6000])
CSV Time Series
The CSVTimeSeries ADRIO is for loading TxN-shaped data: values that vary over time and space.
import numpy as npfrom epymorph.kit import*from epymorph.adrio import csvcsv_path = create_temp_file(""" date,fips,number vaccinated 2021-01-01,04013,10000 2021-01-02,04013,10500 2021-01-03,04013,11000 2021-01-04,04013,11500 2021-01-05,04013,12000 2021-01-06,04013,12500 2021-01-07,04013,13000 2021-01-01,04005,5000 2021-01-02,04005,5050 2021-01-03,04005,5100 2021-01-04,04005,5150 2021-01-05,04005,5200 2021-01-06,04005,5250 2021-01-07,04005,5300""")csv.CSVTimeSeries( file_path=csv_path, time_col=0, # first column is our dates time_frame=TimeFrame.range("2021-01-01", "2021-01-07"), key_col=1, # second column is our GEOIDs data_col=2, # third column is our data data_type=np.int64, key_type="geoid", skiprows=1,).with_context( scope=CountyScope.in_counties(["Maricopa, AZ", "Coconino, AZ"], year=2021),).evaluate()