polars-avro project¶

polars_avro module¶

Scan and sink avro files using Polars.

This module provides three functions: scan_avro, write_avro, and read_avro written as an io-plugin. Currently sink is not supported for io-plugins.

Not all types can be converted between Polars and Avro, or are supported but with precision loss. In general this library will attempt to read and write everything as long as there’s no data loss, but that can mean promoting types to make them more general for serialization / deserialization.

exception polars_avro.AvroError¶: Bases: Exception

exception polars_avro.AvroSpecError¶: Bases: ValueError

class polars_avro.Codec¶

Bases: object

Bzip2 = Codec.Bzip2¶

Deflate = Codec.Deflate¶

Null = Codec.Null¶

Snappy = Codec.Snappy¶

Xz = Codec.Xz¶

Zstandard = Codec.Zstandard¶

exception polars_avro.EmptySources¶: Bases: ValueError

Read an Avro file into a DataFrame.

Parameters:

sources (The source(s) to scan.)
columns (The columns to select.)
n_rows (The number of rows to read.)
row_index_name (The name of the row index column, or None to not add one.)
row_index_offset (The offset to start the row index at.)
rechunk (Whether to rechunk the DataFrame after reading.)
batch_size (How many rows to attempt to read at a time.)
glob (Whether to use globbing to find files.)
storage_options (Additional options for cloud operations.)
credential_provider (The credential provider to use for cloud operations.) – Defaults to “auto” which uses the default credential provider.
retries (The number of times to retry cloud operations.)
file_cache_ttl (The time to live for cached cloud files.)

Scan Avro files.

Parameters:

sources (The source(s) to scan.)
batch_size (How many rows to attempt to read at a time.)
glob (Whether to use globbing to find files.)
storage_options (Additional options for cloud operations.)

polars_avro.write_avro(frame: DataFrame, dest: str | Path | BinaryIO, *, batch_size: int | None = None, codec: Codec = Codec.Null, promote_ints: bool = True, promote_array: bool = True, truncate_time: bool = False, compression_level: int | None = None, storage_options: Mapping[str, str] | None = None) → None¶

Write a DataFrame to an Avro file.

Parameters:

frame (The DataFrame to write.)
dest (Where to write the dataframe)
codec (The codec to use for compression, or Null for uncompressed.)
promote_ints (Whether to promote integer columns to a larger type in order) – to support writing them.
promote_array (Whether to promote array columns to lists in order to) – support writing them to avro.
truncate_time (Whether to truncate time columns so they can be written to) – avro as avro doesn’t support time with nanosecond precision.
retries (The number of times to retry cloud operations.)
credential_provider (The credential provider to use for cloud operations.) – Defaults to “auto” which uses the default credential provider.
storage_options (Additional options for cloud operations.)

polars-avro project¶

polars_avro module¶

Indices and tables¶

polars-avro

Navigation

Related Topics