CSV ADRIOs are capable of loading data from CSV files in a variety of shapes related to simulation dimensions. These are:
N: one value per node (constant over time)
TxN: a time-series of values per node
NxN: one value per pair of nodes
For a CSV file to be loaded successfully, it must be formatted as expected.
For each axis in the shape (N or T) there must be one corresponding column (extra columns are allowed). Together these columns form the key. The key must be “perfectly complete” with respect to the simulation, that is, all keys implied by the simulation context must be represented exactly once. So for N, we expect a column containing every geo node. For TxN we expect a column of dates and a column of geo nodes, and every unique date/node pair is represented exactly once. For NxN, we expect two geo node columns and every unique pair of nodes is represented exactly once.
And finally there must be one column containing the corresponding data value.
Dates must be in “YYYY-MM-DD” format.
Geographic identifiers must be in one of the supported formats:
state_abbrev: postal code; e.g., “AZ” for Arizona
county_state: county name and state postal code separated by a comma; e.g., “Maricopa, AZ” for Maricopa County in Arizona
geoid: GEOID or FIPS code; e.g., “04013” for Maricopa County in Arizona
Contents of the CSV file do not need to be sorted — epymorph will take care of sorting values into a canonical ordering.
Examples
These examples will create temporary CSV files using the create_temp_file function, but in typical usage of course you would have a suitable CSV file on disk to load.
CSV
The ADRIO named CSVFileN is for loading N-shaped data: one value per node in our geo scope.
import numpy as npfrom epymorph.kit import*from epymorph.adrio import csvcsv_path = create_temp_file(""" state,pop AZ,3000 CO,4000 NM,5000 UT,6000""")csv.CSVFileN( file_path=csv_path, dtype=np.int64, # the result will be interpreted as this dtype key_col=0, # first column is our geography (key) key_type="state_abbrev", # because our data uses state postal codes data_col=1, # second column is our data (population) skiprows=1, # csv contains one header row).with_context( scope=StateScope.in_states(["AZ", "NM", "CO", "UT"], year=2020),).evaluate()
array([3000, 4000, 5000, 6000])
CSV Time Series
The CSVFileTxN ADRIO is for loading TxN-shaped data: values that vary over time and space. If you provide a value for the date_range argument, you can select a subset of the dates in the file. Otherwise all dates in the file are loaded.
import numpy as npfrom epymorph.kit import*from epymorph.adrio import csvcsv_path = create_temp_file(""" date,fips,number vaccinated 2021-01-01,04013,10000 2021-01-02,04013,10500 2021-01-03,04013,11000 2021-01-04,04013,11500 2021-01-05,04013,12000 2021-01-06,04013,12500 2021-01-07,04013,13000 2021-01-01,04005,5000 2021-01-02,04005,5050 2021-01-03,04005,5100 2021-01-04,04005,5150 2021-01-05,04005,5200 2021-01-06,04005,5250 2021-01-07,04005,5300""")csv.CSVFileTxN( file_path=csv_path, dtype=np.int64, key_col=1, # second column is our GEOIDs key_type="geoid", time_col=0, # first column is our dates data_col=2, # third column is our data skiprows=1, date_range=TimeFrame.range("2021-01-01", "2021-01-05"),).with_context( scope=CountyScope.in_counties(["Maricopa, AZ", "Coconino, AZ"], year=2021),).evaluate()