polars-avro project¶

polars_avro module¶

Polars io plugin for reading and writing Apache Avro files.

Provides scan_avro, read_avro, write_avro, and AvroWriter. Some polars types (Int8, Int16, UInt8, UInt16, UInt32, UInt64, Time, Categorical, Enum) must be cast before writing. When reading, the utf8_view option controls how UUIDs and nullable strings are decoded — see scan_avro for details.

exception polars_avro.AvroError¶: Bases: Exception

exception polars_avro.AvroSpecError¶: Bases: ValueError

class polars_avro.AvroWriter(dest: str | Path | BinaryIO, *, schema: Schema | None = None, codec: Codec = Codec.Null, storage_options: Mapping[str, str] | None = None, credential_provider: CredentialProviderInput = 'auto')¶

Bases: object

Incrementally write DataFrames to an Avro file.

This creates a context manager that needs to be used when writing cloud files.

Some polars types (Int8, Int16, UInt8, UInt16, UInt32, UInt64, Time, Categorical, Enum) must be cast before writing — see the README for workarounds.

close() → None¶

write(batch: DataFrame) → None¶

class polars_avro.Codec¶

Bases: object

Bzip2 = Codec.Bzip2¶

Deflate = Codec.Deflate¶

Null = Codec.Null¶

Snappy = Codec.Snappy¶

Xz = Codec.Xz¶

Zstandard = Codec.Zstandard¶

exception polars_avro.EmptySources¶: Bases: ValueError

Read an Avro file into a DataFrame.

Parameters:

sources (The source(s) to scan.)
columns (The columns to select.)
n_rows (The number of rows to read.)
row_index_name (The name of the row index column, or None to not add one.)
row_index_offset (The offset to start the row index at.)
rechunk (Whether to rechunk the DataFrame after reading.)
batch_size (How many rows to attempt to read at a time.)
glob (Whether to use globbing to find files.)
strict (Whether to use strict mode when parsing avro. Incurs a) – performance hit.
utf8_view (Whether to read strings as views. When False (default),) – UUIDs are read as binary and nullable strings preserve nulls. When True, UUIDs are read as formatted strings and nulls in nullable strings are replaced with "" (lossy). Since polars tends to work with string views internally, True is likely faster.
storage_options (Extra configuration passed to the cloud storage) – backend (same keys accepted by Polars, e.g. aws_region).
credential_provider (Credential provider for cloud storage. Set to) – "auto" (default) to use automatic credential detection, or None to disable.

Scan Avro files.

Parameters:

sources (The source(s) to scan.)
batch_size (How many rows to attempt to read at a time.)
glob (Whether to use globbing to find files.)
strict (Whether to use strict mode when parsing avro. Incurs a) – performance hit.
utf8_view (Whether to read strings as views. When False (default),) – UUIDs are read as binary and nullable strings preserve nulls. When True, UUIDs are read as formatted strings and nulls in nullable strings are replaced with "" (lossy). Since polars tends to work with string views internally, True is likely faster.
storage_options (Extra configuration passed to the cloud storage) – backend (same keys accepted by Polars, e.g. aws_region).
credential_provider (Credential provider for cloud storage. Set to) – "auto" (default) to use automatic credential detection, or None to disable.

polars_avro.write_avro(batches: DataFrame | Iterable[DataFrame], dest: str | Path | BinaryIO, *, schema: Schema | None = None, codec: Codec = Codec.Null, storage_options: Mapping[str, str] | None = None, credential_provider: CredentialProviderInput = 'auto') → None¶

Write a DataFrame or iterable of DataFrames to an Avro file.

Some polars types (Int8, Int16, UInt8, UInt16, UInt32, UInt64, Time, Categorical, Enum) must be cast before writing — see the README for workarounds.

Parameters:

batches (A DataFrame or iterable of DataFrames to write.)
dest (The file path, cloud URL, or writable binary buffer to write to.)
schema (The schema to use. If None, inferred from the first batch.)
codec (The compression codec to use.)
storage_options (Extra configuration passed to the cloud storage) – backend (same keys accepted by Polars, e.g. aws_region).
credential_provider (Credential provider for cloud storage. Set to) – "auto" (default) to use automatic credential detection, or None to disable.

polars-avro project¶

polars_avro module¶

Indices and tables¶

polars-avro

Navigation

Related Topics