Skip to content

epymorph.adrio.soda

Methods for querying SODA APIs. See: https://dev.socrata.com/

Examples:

If we want to load data from a hypothetical SODA data source at data.example.com with ID abcd-1234, we could construct and execute a query as follows using the classes and functions of this module:

import epymorph.adrio.soda as q

resource = q.SocrataResource(domain="data.example.com", id="abcd-1234")

query = q.Query(
    select=(
        q.Select("collection_week", dtype="date", as_name="date"),
        q.Select("fips_code", dtype="str", as_name="fips"),
        q.Select("patients_hospitalized", dtype="int", as_name="value"),
    ),
    where=q.And(
        q.DateBetween(
            "collection_week",
            date(2020, 1, 1),
            date(2020, 12, 31),
        ),
        q.In("fips_code", ["04013", "04005"]),
    ),
    order_by=(
        q.Ascending("collection_week"),
        q.Ascending("fips_code"),
        q.Ascending(":id"), # (1)!
    ),
)

result = q.query_csv(resource=resource, query=query)
  1. It's important that your query returns results in a stable order — that is, if you repeated the query the results would be in the same order. This is needed so that pagination works, if required. Including :id in the order clause is a good way to guarantee this.

ColumnType module-attribute

ColumnType = Literal[
    "str", "int", "nullable_int", "float", "date", "bool"
]

Simplified set of types for SOQL result columns.

SocrataResource dataclass

SocrataResource(domain: str, id: str)

Defines a Socrata API resource.

Parameters:

  • domain (str) –

    The domain where the API is hosted.

  • id (str) –

    The ID of the resource.

domain instance-attribute

domain: str

The domain where the API is hosted.

id instance-attribute

id: str

The ID of the resource.

url property

url: str

The URL for this resource.

metadata_url property

metadata_url: str

The URL for the metadata description of the resource (JSON format).

SelectClause

Bases: Protocol

The common interface for SOQL select clauses.

result_name abstractmethod property

result_name: str

The name to use to refer to the result.

result_dtype abstractmethod property

result_dtype: ColumnType

The data type of the result.

Select dataclass

Select(
    name: str, dtype: ColumnType, as_name: str | None = None
)

Bases: SelectClause

A SOQL select clause for selecting a column as-is.

Parameters:

  • name (str) –

    Column name.

  • dtype (ColumnType) –

    The data type of the column.

  • as_name (str | None, default: None ) –

    Define a new name for the column; the 'AS' statement.

name instance-attribute

name: str

Column name.

dtype instance-attribute

dtype: ColumnType

The data type of the column.

as_name class-attribute instance-attribute

as_name: str | None = field(default=None)

Define a new name for the column; the 'AS' statement.

result_name property

result_name: str

The name to use to refer to the result.

result_dtype property

result_dtype: ColumnType

The data type of the result.

SelectExpression dataclass

SelectExpression(
    expression: str, dtype: ColumnType, as_name: str
)

Bases: SelectClause

A SOQL select clause for selecting the result of an expression.

Parameters:

  • expression (str) –

    Expression; as best practice you should escape column names in the expression by surrounding them in back-ticks.

  • dtype (ColumnType) –

    The data type of the column.

  • as_name (str) –

    Define a name for the result; the 'AS' statement.

expression instance-attribute

expression: str

Expression; as best practice you should escape column names in the expression by surrounding them in back-ticks.

dtype instance-attribute

dtype: ColumnType

The data type of the column.

as_name instance-attribute

as_name: str

Define a name for the result; the 'AS' statement.

result_name property

result_name: str

The name to use to refer to the result.

result_dtype property

result_dtype: ColumnType

The data type of the result.

WhereClause

Bases: Protocol

The common interface for SOQL where clauses.

NotNull dataclass

NotNull(column: str)

Bases: WhereClause

A where-clause for rows that are not null.

Parameters:

  • column (str) –

    Column name.

column instance-attribute

column: str

Column name.

Equals dataclass

Equals(column: str, value: str)

Bases: WhereClause

A where-clause for rows whose values are equal to the given value.

Parameters:

  • column (str) –

    Column name.

  • value (str) –

    The value to test.

column instance-attribute

column: str

Column name.

value instance-attribute

value: str

The value to test.

In dataclass

In(column: str, values: Iterable[str])

Bases: WhereClause

A where-clause for rows whose values are in the given set of values.

Parameters:

  • column (str) –

    Column name.

  • values (Iterable[str]) –

    The sequence of values to test.

column instance-attribute

column: str

Column name.

values instance-attribute

values: Iterable[str]

The sequence of values to test.

DateBetween dataclass

DateBetween(column: str, start: date, end: date)

Bases: WhereClause

A where-clause for rows whose date values are between two dates. Endpoints are inclusive.

Parameters:

  • column (str) –

    Column name.

  • start (date) –

    Start date.

  • end (date) –

    End date (inclusive).

column instance-attribute

column: str

Column name.

start instance-attribute

start: date

Start date.

end instance-attribute

end: date

End date (inclusive).

And dataclass

And(*clauses: WhereClause)

Bases: WhereClause

A where-clause that joins other where clauses with "and".

Parameters:

  • clauses (WhereClause, default: () ) –

    The clauses to join with an 'AND'.

clauses instance-attribute

The clauses to join with an 'AND'.

Or dataclass

Or(*clauses: WhereClause)

Bases: WhereClause

A where-clause that joins other where clauses with "or".

Parameters:

  • clauses (WhereClause, default: () ) –

    The clauses to join with an 'OR'.

clauses instance-attribute

The clauses to join with an 'OR'.

Not dataclass

Not(clause: WhereClause)

Bases: WhereClause

A where-clause that negates another where clause.

Parameters:

clause instance-attribute

clause: WhereClause

The clause to negate.

OrderClause

Bases: Protocol

The common interface for SOQL order-by clauses.

Ascending dataclass

Ascending(column: str)

Bases: OrderClause

An order-by-clause that sorts the named column in ascending order.

Parameters:

  • column (str) –

    The column name.

column instance-attribute

column: str

The column name.

Descending dataclass

Descending(column: str)

Bases: OrderClause

An order-by-clause that sorts the named column in descending order.

Parameters:

  • column (str) –

    The column name.

column instance-attribute

column: str

The column name.

Query dataclass

Query(
    select: Sequence[SelectClause],
    where: WhereClause,
    order_by: Sequence[OrderClause] | None = None,
)

A complete SOQL query, which includes oen or more select clauses, exactly one where clause, and any number of order-by clauses.

Parameters:

select instance-attribute

One or more select clauses.

where instance-attribute

where: WhereClause

A where clause.

order_by class-attribute instance-attribute

order_by: Sequence[OrderClause] | None = field(default=None)

Zero or more order-by clauses.

get_metadata

get_metadata(resource: SocrataResource) -> Any

Retrieve the JSON-formatted metadata for the given resource.

Parameters:

Returns:

  • Any

    The JSON metadata.

query_csv

query_csv(
    resource: SocrataResource,
    query: Query,
    *,
    limit: int = 10000,
    result_filter: Callable[[DataFrame], DataFrame]
    | None = None,
    api_token: str | None = None,
) -> DataFrame

Issue query against the data in resource and return the result as a dataframe. For particularly large result sets, we may need to break up the total query into more than one request. This is handled automatically, as is concatenating the results.

Parameters:

  • resource (SocrataResource) –

    The data resource.

  • query (Query) –

    The query object.

  • limit (int, default: 10000 ) –

    The maximum number of rows fetched per request. Changing this should not change the results, but may change the number of requests (and the clock time) it takes to produce the results.

  • result_filter (Callable[[DataFrame], DataFrame] | None, default: None ) –

    An optional transform to apply to the results, intended to be used to filter out rows from the result in situations where doing this filtering as a where-clause would be impossible or impractical. This function will be run on the results of each request, not the end result, since it is likely to be more efficient to filter as we go when the data requires many requests to complete.

  • api_token (str | None, default: None ) –

    An optional API token to include in requests.

Returns:

query_csv_soql

query_csv_soql(
    resource: SocrataResource,
    soql: str,
    column_types: Sequence[tuple[str, ColumnType]],
    *,
    limit: int = 10000,
    result_filter: Callable[[DataFrame], DataFrame]
    | None = None,
    api_token: str | None = None,
) -> DataFrame

Issue the query (given in string form, soql) against the data in resource and return the result as a dataframe. For particularly large result sets, we may need to break up the total query into more than one request. This is handled automatically, as is concatenating the results.

Parameters:

  • resource (SocrataResource) –

    The data resource.

  • soql (str) –

    The query string.

  • limit (int, default: 10000 ) –

    The maximum number of rows fetched per request. Changing this should not change the results, but may change the number of requests (and the clock time) it takes to produce the results.

  • result_filter (Callable[[DataFrame], DataFrame] | None, default: None ) –

    An optional transform to apply to the results, intended to be used to filter out rows from the result in situations where doing this filtering as a where-clause would be impossible or impractical. This function will be run on the results of each request, not the end result, since it is likely to be more efficient to filter as we go when the data requires many requests to complete.

  • api_token (str | None, default: None ) –

    An optional API token to include in requests.

Returns: