shillelagh.adapters.file package¶
Submodules¶
shillelagh.adapters.file.csvfile module¶
An adapter for CSV files.
This adapter treats a CSV file as a table, allowing rows to be inserted,
deleted, and updated. It’s not very practical since it requires the data
to be written with the QUOTE_NONNUMERIC
format option, with strings
explicitly quoted. It’s also not very efficient, since it implements the
filtering and sorting in Python, instead of relying on the backend.
Remote files (HTTP/HTTPS) are also supported in read-only mode.
- class shillelagh.adapters.file.csvfile.CSVFile(path_or_uri: str)[source]¶
Bases:
Adapter
An adapter for CSV files.
The files must be written with the
QUOTE_NONNUMERIC
format option, with strings explicitly quoted:"index","temperature","site" 10.0,15.2,"Diamond_St" 11.0,13.1,"Blacktail_Loop" 12.0,13.3,"Platinum_St" 13.0,12.1,"Kodiak_Trail"
The adapter will first scan the whole file to determine number of rows, as well as the type and order of each column.
The adapter has no index. When data is
SELECT
ed the adapter will stream over all the rows in the file, filtering them on the fly. If a specific order is requests the resulting rows will be loaded into memory so they can be sorted.Inserted rows are appended to the end of the file. Deleted rows simply have their row ID marked as deleted (-1), and are ignored when the data is scanned for results. When the adapter is closed deleted rows will be garbage collected.
Updates are handled with a delete followed by an insert.
- close() None [source]¶
Garbage collect the file.
This method will get rid of deleted rows in the files.
- get_columns() Dict[str, Field] [source]¶
Return the columns available in the table.
This method is called for every query, so make sure it’s cheap. For most (all?) tables this won’t change, so you can store it in an instance attribute.
- get_cost(filtered_columns: List[Tuple[str, Operator]], order: List[Tuple[str, Literal[Order.ASCENDING] | Literal[Order.DESCENDING]]]) float [source]¶
Estimate the query cost.
The base adapter returns a fixed cost, and custom adapter can implement their own cost estimation.
- get_data(bounds: Dict[str, Filter], order: List[Tuple[str, Literal[Order.ASCENDING] | Literal[Order.DESCENDING]]], limit: int | None = None, offset: int | None = None, **kwargs: Any) Iterator[Dict[str, Any]] [source]¶
Yield rows as adapter-specific types.
This method expects rows to be in the storage format. Eg, for the CSV adapter datetime columns would be stored (and yielded) as strings. The
get_rows
method will use the adapter fields to convert these values into native Python types (in this case, a properdatetime.datetime
).Missing values (NULLs) may be omitted from the dictionary; they will be replaced by
None
by the backend.
- insert_data(row: Dict[str, Any]) int [source]¶
Insert a single row with adapter-specific types.
The rows will be formatted according to the adapter fields. Eg, if an adapter represents timestamps as ISO strings, and timestamp values will be ISO strings.
- static parse_uri(uri: str) Tuple[str] [source]¶
Parse table name, and return arguments to instantiate adapter.
- safe = False¶
- static supports(uri: str, fast: bool = True, **kwargs: Any) bool | None [source]¶
Return if a given table is supported by the adapter.
The discovery is done in 2 passes. First all adapters have their methods called with
fast=True
. On the first pass adapters should implement a cheap method, without any network calls.If no adapter returns
True
a second pass is made withfast=False
using only adapters that returnedNone
on the first pass. In this second pass adapters can perform network requests to get more information about the URI.The method receives the table URI, as well as the adapter connection arguments, eg:
>>> from shillelagh.backends.apsw.db import connect >>> connection = connect( ... ':memory:', ... adapter_kwargs={"gsheetsapi": {"catalog": ... {"table": "https://docs.google.com/spreadsheets/d/1/"}}}, ... )
This would call all adapters in order to find which one should handle the table
table
. The Gsheets adapter would be called with:>>> from shillelagh.adapters.api.gsheets.adapter import GSheetsAPI >>> GSheetsAPI.supports("table", fast=True, # first pass ... catalog={"table": "https://docs.google.com/spreadsheets/d/1"}) True
- supports_limit = True¶
- supports_offset = True¶