CSV

Description

CSV ADRIOs are capable of loading data from CSV files in a variety of “shapes” related to simulation dimensions. These are:

  • N: one value per node (constant over time)
  • TxN: a time-series of values per node
  • NxN: one value per pair of nodes

For a CSV file to be loaded successfully, it must be formatted as expected.

For each “axis” in the shape (N or T) there must be one corresponding column (extra columns are allowed). Together these columns form the key. The key must be “perfectly complete” with respect to the simulation, that is, all keys implied by the simulation context must be represented exactly once. So for N, we expect a column containing every geo node. For TxN we expect a column of dates and a column of geo nodes, and every unique date/node pair is represented exactly once. For NxN, we expect two geo node columns and every unique pair of nodes is represented exactly once.

And finally there must be one column containing the corresponding data value.

Dates must be in “YYYY-MM-DD” format.

Geographic identifiers must be in one of the supported formats:

  • state_abbrev: postal code; e.g., “AZ” for Arizona
  • county_state: county name and state postal code separated by a comma; e.g., “Maricopa, AZ” for Maricopa County in Arizona
  • geoid: GEOID or FIPS code; e.g., “04013” for Maricopa County in Arizona

Contents of the CSV file do not need to be sorted — epymorph will take care of sorting values into a canonical ordering.

Examples

These examples will create temporary CSV files using the create_temp_file function, but in typical usage of course you would have a suitable CSV file on disk to load.

CSV

The ADRIO named CSV is for loading N-shaped data: one value per node in our geo scope.

import numpy as np

from epymorph.kit import *
from epymorph.adrio import csv

csv_path = create_temp_file("""
    state,pop
    AZ,3000
    CO,4000
    NM,5000
    UT,6000
""")

csv.CSV(
    file_path=csv_path,
    key_col=0,  # first column is our geography (key)
    data_col=1,  # second column is our data (population)
    data_type=np.int64,  # the result will be interpreted as this dtype
    key_type="state_abbrev",  # because our data uses state postal codes
    skiprows=1,  # csv contains one header row
).with_context(
    scope=StateScope.in_states(["AZ", "NM", "CO", "UT"], year=2020),
).evaluate()
array([3000, 4000, 5000, 6000])

CSV Time Series

The CSVTimeSeries ADRIO is for loading TxN-shaped data: values that vary over time and space.

import numpy as np

from epymorph.kit import *
from epymorph.adrio import csv

csv_path = create_temp_file("""
    date,fips,number vaccinated
    2021-01-01,04013,10000
    2021-01-02,04013,10500
    2021-01-03,04013,11000
    2021-01-04,04013,11500
    2021-01-05,04013,12000
    2021-01-06,04013,12500
    2021-01-07,04013,13000
    2021-01-01,04005,5000
    2021-01-02,04005,5050
    2021-01-03,04005,5100
    2021-01-04,04005,5150
    2021-01-05,04005,5200
    2021-01-06,04005,5250
    2021-01-07,04005,5300
""")

csv.CSVTimeSeries(
    file_path=csv_path,
    time_col=0,  # first column is our dates
    time_frame=TimeFrame.range("2021-01-01", "2021-01-07"),
    key_col=1,  # second column is our GEOIDs
    data_col=2,  # third column is our data
    data_type=np.int64,
    key_type="geoid",
    skiprows=1,
).with_context(
    scope=CountyScope.in_counties(["Maricopa, AZ", "Coconino, AZ"], year=2021),
).evaluate()
array([[ 5000, 10000],
       [ 5050, 10500],
       [ 5100, 11000],
       [ 5150, 11500],
       [ 5200, 12000],
       [ 5250, 12500],
       [ 5300, 13000]])

CSV Matrix

The CSVMatrix ADRIO is for loading NxN-shaped data.

import numpy as np

from epymorph.kit import *
from epymorph.adrio import csv

csv_path = create_temp_file("""
    from county,to county,commuters
    "Maricopa, AZ","Maricopa, AZ",15000
    "Maricopa, AZ","Coconino, AZ",12000
    "Maricopa, AZ","Yavapai, AZ",11000
    "Coconino, AZ","Maricopa, AZ",5500
    "Coconino, AZ","Coconino, AZ",7500
    "Coconino, AZ","Yavapai, AZ",3500
    "Yavapai, AZ","Maricopa, AZ",1333
    "Yavapai, AZ","Coconino, AZ",333
    "Yavapai, AZ","Yavapai, AZ",2333
""")

csv.CSVMatrix(
    file_path=csv_path,
    from_key_col=0,  # the column for the "origin" location (matrix row)
    to_key_col=1,  # the column for the "destination" location (matrix column)
    data_col=2,  # the column for the data value
    data_type=np.int64,
    key_type="county_state",
    skiprows=1,
).with_context(
    scope=CountyScope.in_counties(["04013", "04005", "04025"], year=2021),
).evaluate()
array([[ 7500,  5500,  3500],
       [12000, 15000, 11000],
       [  333,  1333,  2333]])