epymorph.adrio.soda
Methods for querying SODA APIs. See: https://dev.socrata.com/
Examples:
If we want to load data from a hypothetical SODA data source at data.example.com
with ID abcd-1234
, we could construct and execute a query as follows using
the classes and functions of this module:
- It's important that your query returns results in a stable order — that is,
if you repeated the query the results would be in the same order. This is needed so
that pagination works, if required. Including
:id
in the order clause is a good way to guarantee this.
ColumnType
module-attribute
ColumnType = Literal[
"str", "int", "nullable_int", "float", "date", "bool"
]
Simplified set of types for SOQL result columns.
SocrataResource
dataclass
Defines a Socrata API resource.
Parameters:
SelectClause
Select
dataclass
Select(
name: str, dtype: ColumnType, as_name: str | None = None
)
Bases: SelectClause
A SOQL select clause for selecting a column as-is.
Parameters:
-
name
(str
) –Column name.
-
dtype
(ColumnType
) –The data type of the column.
-
as_name
(str | None
, default:None
) –Define a new name for the column; the 'AS' statement.
as_name
class-attribute
instance-attribute
Define a new name for the column; the 'AS' statement.
SelectExpression
dataclass
SelectExpression(
expression: str, dtype: ColumnType, as_name: str
)
Bases: SelectClause
A SOQL select clause for selecting the result of an expression.
Parameters:
-
expression
(str
) –Expression; as best practice you should escape column names in the expression by surrounding them in back-ticks.
-
dtype
(ColumnType
) –The data type of the column.
-
as_name
(str
) –Define a name for the result; the 'AS' statement.
expression
instance-attribute
expression: str
Expression; as best practice you should escape column names in the expression by surrounding them in back-ticks.
WhereClause
Bases: Protocol
The common interface for SOQL where clauses.
NotNull
dataclass
NotNull(column: str)
Bases: WhereClause
A where-clause for rows that are not null.
Parameters:
-
column
(str
) –Column name.
Equals
dataclass
In
dataclass
DateBetween
dataclass
Bases: WhereClause
A where-clause for rows whose date values are between two dates. Endpoints are inclusive.
Parameters:
And
dataclass
And(*clauses: WhereClause)
Bases: WhereClause
A where-clause that joins other where clauses with "and".
Parameters:
-
clauses
(WhereClause
, default:()
) –The clauses to join with an 'AND'.
Or
dataclass
Or(*clauses: WhereClause)
Bases: WhereClause
A where-clause that joins other where clauses with "or".
Parameters:
-
clauses
(WhereClause
, default:()
) –The clauses to join with an 'OR'.
Not
dataclass
Not(clause: WhereClause)
Bases: WhereClause
A where-clause that negates another where clause.
Parameters:
-
clause
(WhereClause
) –The clause to negate.
OrderClause
Bases: Protocol
The common interface for SOQL order-by clauses.
Ascending
dataclass
Ascending(column: str)
Bases: OrderClause
An order-by-clause that sorts the named column in ascending order.
Parameters:
-
column
(str
) –The column name.
Descending
dataclass
Descending(column: str)
Bases: OrderClause
An order-by-clause that sorts the named column in descending order.
Parameters:
-
column
(str
) –The column name.
Query
dataclass
Query(
select: Sequence[SelectClause],
where: WhereClause,
order_by: Sequence[OrderClause] | None = None,
)
A complete SOQL query, which includes oen or more select clauses, exactly one where clause, and any number of order-by clauses.
Parameters:
-
select
(Sequence[SelectClause]
) –One or more select clauses.
-
where
(WhereClause
) –A where clause.
-
order_by
(Sequence[OrderClause] | None
, default:None
) –Zero or more order-by clause.
order_by
class-attribute
instance-attribute
order_by: Sequence[OrderClause] | None = field(default=None)
Zero or more order-by clauses.
get_metadata
get_metadata(resource: SocrataResource) -> Any
Retrieve the JSON-formatted metadata for the given resource.
Parameters:
-
resource
(SocrataResource
) –The data resource.
Returns:
-
Any
–The JSON metadata.
query_csv
query_csv(
resource: SocrataResource,
query: Query,
*,
limit: int = 10000,
result_filter: Callable[[DataFrame], DataFrame]
| None = None,
api_token: str | None = None,
) -> DataFrame
Issue query
against the data in resource
and return the result as a dataframe.
For particularly large result sets, we may need to break up the total query into
more than one request. This is handled automatically, as is concatenating the
results.
Parameters:
-
resource
(SocrataResource
) –The data resource.
-
query
(Query
) –The query object.
-
limit
(int
, default:10000
) –The maximum number of rows fetched per request. Changing this should not change the results, but may change the number of requests (and the clock time) it takes to produce the results.
-
result_filter
(Callable[[DataFrame], DataFrame] | None
, default:None
) –An optional transform to apply to the results, intended to be used to filter out rows from the result in situations where doing this filtering as a where-clause would be impossible or impractical. This function will be run on the results of each request, not the end result, since it is likely to be more efficient to filter as we go when the data requires many requests to complete.
-
api_token
(str | None
, default:None
) –An optional API token to include in requests.
Returns:
-
DataFrame
–The query results.
query_csv_soql
query_csv_soql(
resource: SocrataResource,
soql: str,
column_types: Sequence[tuple[str, ColumnType]],
*,
limit: int = 10000,
result_filter: Callable[[DataFrame], DataFrame]
| None = None,
api_token: str | None = None,
) -> DataFrame
Issue the query (given in string form, soql
) against the data in resource
and
return the result as a dataframe. For particularly large result sets, we may need to
break up the total query into more than one request. This is handled automatically,
as is concatenating the results.
Parameters:
-
resource
(SocrataResource
) –The data resource.
-
soql
(str
) –The query string.
-
limit
(int
, default:10000
) –The maximum number of rows fetched per request. Changing this should not change the results, but may change the number of requests (and the clock time) it takes to produce the results.
-
result_filter
(Callable[[DataFrame], DataFrame] | None
, default:None
) –An optional transform to apply to the results, intended to be used to filter out rows from the result in situations where doing this filtering as a where-clause would be impossible or impractical. This function will be run on the results of each request, not the end result, since it is likely to be more efficient to filter as we go when the data requires many requests to complete.
-
api_token
(str | None
, default:None
) –An optional API token to include in requests.
Returns:
-
DataFrame
–The query results.