shillelagh.adapters.file package¶

Submodules¶

shillelagh.adapters.file.csvfile module¶

An adapter for CSV files.

This adapter treats a CSV file as a table, allowing rows to be inserted, deleted, and updated. It’s not very practical since it requires the data to be written with the QUOTE_NONNUMERIC format option, with strings explicitly quoted. It’s also not very efficient, since it implements the filtering and sorting in Python, instead of relying on the backend.

Remote files (HTTP/HTTPS) are also supported in read-only mode.

class shillelagh.adapters.file.csvfile.CSVFile(path_or_uri: str)[source]¶

Bases: Adapter

An adapter for CSV files.

The files must be written with the QUOTE_NONNUMERIC format option, with strings explicitly quoted:

"index","temperature","site"
0,15.2,"Diamond_St"
0,13.1,"Blacktail_Loop"
0,13.3,"Platinum_St"
0,12.1,"Kodiak_Trail"

The adapter will first scan the whole file to determine number of rows, as well as the type and order of each column.

The adapter has no index. When data is SELECTed the adapter will stream over all the rows in the file, filtering them on the fly. If a specific order is requests the resulting rows will be loaded into memory so they can be sorted.

Inserted rows are appended to the end of the file. Deleted rows simply have their row ID marked as deleted (-1), and are ignored when the data is scanned for results. When the adapter is closed deleted rows will be garbage collected.

Updates are handled with a delete followed by an insert.

close() → None[source]¶

Garbage collect the file.

This method will get rid of deleted rows in the files.

delete_data(row_id: int) → None[source]¶: Delete a row from the table.

drop_table() → None[source]¶: Drop a table.

get_columns() → Dict[str, Field][source]¶

Return the columns available in the table.

This method is called for every query, so make sure it’s cheap. For most (all?) tables this won’t change, so you can store it in an instance attribute.

get_cost(filtered_columns: List[Tuple[str, Operator]], order: List[Tuple[str, Literal[Order.ASCENDING] | Literal[Order.DESCENDING]]]) → float[source]¶

Estimate the query cost.

The base adapter returns a fixed cost, and custom adapter can implement their own cost estimation.

get_data(bounds: Dict[str, Filter], order: List[Tuple[str, Literal[Order.ASCENDING] | Literal[Order.DESCENDING]]], limit: int | None = None, offset: int | None = None, **kwargs: Any) → Iterator[Dict[str, Any]][source]¶

Yield rows as adapter-specific types.

This method expects rows to be in the storage format. Eg, for the CSV adapter datetime columns would be stored (and yielded) as strings. The get_rows method will use the adapter fields to convert these values into native Python types (in this case, a proper datetime.datetime).

Missing values (NULLs) may be omitted from the dictionary; they will be replaced by None by the backend.

insert_data(row: Dict[str, Any]) → int[source]¶

Insert a single row with adapter-specific types.

The rows will be formatted according to the adapter fields. Eg, if an adapter represents timestamps as ISO strings, and timestamp values will be ISO strings.

static parse_uri(uri: str) → Tuple[str][source]¶: Parse table name, and return arguments to instantiate adapter.

safe = False¶

static supports(uri: str, fast: bool = True, **kwargs: Any) → bool | None[source]¶

Return if a given table is supported by the adapter.

The discovery is done in 2 passes. First all adapters have their methods called with fast=True. On the first pass adapters should implement a cheap method, without any network calls.

If no adapter returns True a second pass is made with fast=False using only adapters that returned None on the first pass. In this second pass adapters can perform network requests to get more information about the URI.

The method receives the table URI, as well as the adapter connection arguments, eg:

>>> from shillelagh.backends.apsw.db import connect
>>> connection = connect(
...     ':memory:',
...     adapter_kwargs={"gsheetsapi": {"catalog":
...         {"table": "https://docs.google.com/spreadsheets/d/1/"}}},
... )

This would call all adapters in order to find which one should handle the table table. The Gsheets adapter would be called with:

>>> from shillelagh.adapters.api.gsheets.adapter import GSheetsAPI
>>> GSheetsAPI.supports("table", fast=True,  # first pass
...     catalog={"table": "https://docs.google.com/spreadsheets/d/1"})
True

supports_limit = True¶

supports_offset = True¶

class shillelagh.adapters.file.csvfile.RowTracker(iterable: Iterator[Dict[str, Any]])[source]¶

Bases: object

An iterator that keeps track of the last yielded row.

shillelagh.adapters.file package¶

Submodules¶

shillelagh.adapters.file.csvfile module¶

Module contents¶

Shillelagh

Navigation

Related Topics