Commuting Flows

Description

ACS Commuting Flows is a US Census Bureau data product which provides location-to-location commuter counts. It is not published every year, but in non-overlapping, five-year spans beginning with 2010 (incorporating 2006-2010 survey data).

The Commuters ADRIO can produce square matrices of commuters, residence-location by work-location, for US county or state scopes. The ADRIO determines which data vintage to load based on the geo scope year in order to properly match the data to the geography. Valid scope-year options are 2010, 2015, and 2022 — note that the 2020 Comm Flows data was compiled using 2022 geography.

Geographic Coverage

Commuting Flows covers all states in the US and Puerto Rico. It aggregates data to both US county and minor civil division (MCD) granularities, and includes entries for the number of commuters who leave the country. (The ADRIO only uses county-level data.)

Data Collection

The ACS includes questions about respondents’ home and workplace locations. The aggregation of this place-to-place flow data is the basis of the Office of Management and Budget’s (OMB’s) definition of metropolitan and micropolitan areas. Additional data products may be published in the interim years for special projects. All of the workers included in the tables are 16 years old and above.

Additional Resources

More information about ACS Commuting Flows is available on the Census Bureau website.

Examples

Commuters

(API) Retrieves a (N,N)-shaped array of integers representing the number of commuters by (residence-location, work-location) pair.

from epymorph.kit import *
from epymorph.adrio import commuting_flows

result = (
  commuting_flows.Commuters()
  .with_context(scope=CountyScope.in_states(["AZ"], year=2022))
  .evaluate()
)

result[0:7, 0:7]
array([[14190,     0,   149,     9,     2,     7,     0],
       [    0, 43820,    32,     6,   511,    62,     0],
       [   99,    17, 59440,    17,     0,    36,    49],
       [   18,     0,     0, 15966,   642,    13,    15],
       [    0,    61,     0,   783, 10824,  1357,     0],
       [    0,     9,     0,     0,    95,  3606,     0],
       [    0,     0,     0,    10,     0,     0,  5484]])

Advanced Usage

What if I want to simulate during an off-year?

Oof, so there are only three years in which we can use Commuting Flows, huh? Short answer: yes. Because Census boundaries are always being updated, if we used an incompatible geographic year, some portion of the data would be “inaccessible” due to the set of locations in the data not matching up with the scope we’re using in epymorph. Some may have been added, others removed, and some may have the same identifier but have had their borders shifted significantly. All of that plays havoc with data integrity.

However, the long answer is that with a bit of leg work we might be able to make it work, at least in an approximate way that might be good enough for our purposes. Let’s see how we can adapt Commuting Flows data to a different geographic year.

First: are the geographic years in question compatible? It’s true that geography shifts all the time if you’re considering the whole US and all granularities, but particular areas may in fact be quite stable. Let’s say we want to simulate in 2019 using the 2015 Commuting Flows data. Have Arizona counties changed much between those two years?

import numpy as np

geo_2015 = CountyScope.in_states(["AZ"], year=2015)
geo_2019 = CountyScope.in_states(["AZ"], year=2019)

np.array_equal(geo_2015.node_ids, geo_2019.node_ids)
True

The set of nodes is the same! Or if you want to see for yourself:

print(geo_2015.node_ids.tolist())
print(geo_2019.node_ids.tolist())
['04001', '04003', '04005', '04007', '04009', '04011', '04012', '04013', '04015', '04017', '04019', '04021', '04023', '04025', '04027']
['04001', '04003', '04005', '04007', '04009', '04011', '04012', '04013', '04015', '04017', '04019', '04021', '04023', '04025', '04027']

Same values, same number of them, and in the exact same order. (Historically we know that Arizona counties haven’t changed since 1983.) So we aren’t at risk of losing data between years. We just have to evaluate the Commuters ADRIO in a separate context; the values will translate just fine.

Tip: geography is weird!

An example of when this doesn’t work:

np.array_equal(
  CountyScope.in_states(["CT"], year=2020).node_ids,
  CountyScope.in_states(["CT"], year=2022).node_ids,
)
False

Connecticut replaced their counties entirely and switched to “planning regions” as a county-equivalent; this change was reflected in the Census delineations starting in the 2022 vintage.

commuters_data = (
  commuting_flows.Commuters()
  .with_context(scope=geo_2015)  # we're loading Comm Flows 2015!
  .evaluate()
)

# evaluation produces a numpy array
print(f"{type(commuters_data)=}")

# in this case, it's NxN-shaped (15 by 15 counties)
print(f"{commuters_data.shape=}")

# look at the first 5 rows and columns
print(commuters_data[0:5, 0:5])
type(commuters_data)=<class 'numpy.ndarray'>
commuters_data.shape=(15, 15)
[[14349     4    45    34    93]
 [    2 45049     0     0    69]
 [  114    15 58867    79     0]
 [   16     0   106 14829   152]
 [    0   144     0   645  9401]]

Now we’re going to take advantage of the fact that — while RUMEs can certainly take parameters in the form of an ADRIO — they are just as happy to take in a numpy array, regardless of how it was produced!

from epymorph.adrio import acs5

rume = SingleStrataRUME.build(
  ipm=ipm.SIRS(),
  mm=mm.Pei(),
  init=init.SingleLocation(location=0, seed_size=100),
  scope=geo_2019,  # and we're simulating in 2019!
  time_frame=TimeFrame.rangex("2019-01-01", "2019-06-01"),
  params={
    "beta": 0.4,
    "gamma": 1/5,
    "xi": 1/90,
    "population": acs5.Population(),  # this will pull 2019 population
    "commuters": commuters_data,  # our pre-evaluated commuters
  },
)

# run it just to prove it runs
sim = BasicSimulator(rume)
with sim_messaging(live=False):
  sim.run()
Loading gpm:all::init::population (epymorph.adrio.acs5.Population):
  |####################| 100%  (0.589s)
Running simulation (BasicSimulator):
• 2019-01-01 to 2019-05-31 (151 days)
• 15 geo nodes
  |####################| 100% 
Runtime: 0.803s

Okay so that works, but now we have another issue. Did the population of Arizona change significantly between 2015 and 2019? If so our commuting data might under- or over-represent the true number of commuters in 2019. Of course the ideal would be to have a source for 2019-vintage data. But lacking that, maybe we can adjust the data we have to get something resembling reality?

If we make an assumption that the ratio of commuters to non-commuters is stable, and that ratios of which locations people commute to hasn’t changed much, we could scale our commuting numbers based on the difference in population between the two years. Of course these aren’t perfect assumptions! But this is to demonstrate one possible approach. You would have to evaluate for yourself if this or other approaches are viable for your use-case.

Caveats established. First thing we need is the population ratio: 2019 to 2015. We’ll use ACS5 again.

pop_2015 = acs5.Population().with_context(scope=geo_2015).evaluate()
pop_2019 = acs5.Population().with_context(scope=geo_2019).evaluate()

ratio = pop_2019 / pop_2015
print(ratio)
[0.99150075 0.97084391 1.03345257 1.00716637 1.01574572 1.05530311
 1.02252274 1.07731606 1.02130683 1.0149922  1.02871201 1.11037478
 0.98740254 1.05588529 1.03192815]

Some county populations decreased, others increased; one county by 11%! This is the change of the resident population in each county, and our commuters matrix is organized so residence location is along the row axis and work location is along the column axis, so we need to apply the same scaling factor to every entry across the rows. If we make ratio a column vector, numpy will handle the broadcasting for us (but you can check it if you’re not sure). And finally we’ll convert back to integers, truncating values (rounding down).

scaled_commuters_data = (ratio[:, np.newaxis] * commuters_data).astype(np.int64)

scaled_commuters_data[0:5, 0:5]
array([[14227,     3,    44,    33,    92],
       [    1, 43735,     0,     0,    66],
       [  117,    15, 60836,    81,     0],
       [   16,     0,   106, 14935,   153],
       [    0,   146,     0,   655,  9549]])

And we can compare to the original values:

commuters_data[0:5, 0:5]
array([[14349,     4,    45,    34,    93],
       [    2, 45049,     0,     0,    69],
       [  114,    15, 58867,    79,     0],
       [   16,     0,   106, 14829,   152],
       [    0,   144,     0,   645,  9401]])

Now all that’s left is to reconstruct the RUME with the scaled commuters data and run it again.

from epymorph.adrio import acs5

rume = SingleStrataRUME.build(
  ipm=ipm.SIRS(),
  mm=mm.Pei(),
  init=init.SingleLocation(location=0, seed_size=100),
  scope=geo_2019,
  time_frame=TimeFrame.rangex("2019-01-01", "2019-06-01"),
  params={
    "beta": 0.4,
    "gamma": 1/5,
    "xi": 1/90,
    "population": acs5.Population(),
    "commuters": scaled_commuters_data,  # adjusted for population change!
  },
)

sim = BasicSimulator(rume)
with sim_messaging(live=False):
  sim.run()
Loading gpm:all::init::population (epymorph.adrio.acs5.Population):
  |####################| 100%  (0.570s)
Running simulation (BasicSimulator):
• 2019-01-01 to 2019-05-31 (151 days)
• 15 geo nodes
  |####################| 100% 
Runtime: 0.798s