Internal site. Jolli authentication required to view.
Skip to Content
FormatsParquet Format

Last Updated: 5/8/2026


Feldera can ingest and output data in the Parquet format .

  • via ingress and egress REST endpoints by specifying ?format=parquet in the URL
  • as a payload received from or sent to a connector

We document the Parquet format and how it interacts with different SQL types in this page.

Types

The parquet file is expected to be a valid parquet file with a schema. The schema (row name and type) must match the table definition in the Feldera pipeline program. We use Arrow  to specify the data-types in parquet. The following table shows the mapping between Feldera SQL types and Arrow types .

Feldera SQL TypeApache Arrow Type
BOOLEANBoolean
TINYINT, SMALLINT, INTEGER, BIGINTInt8, Int16, Int32, Int64
FLOAT, DOUBLE, DECIMALFloat32, Float64, Decimal
VARCHAR, CHAR, STRINGLargeUtf8
BINARY, VARBINARYDataType::Binary
TIMEDataType::UInt64 (time in nanoseconds)
TIMESTAMPDataType::Timestamp(TimeUnit::Millisecond, None) (milliseconds since unix epoch)
DATEDataType::Int32 (days since unix epoch)
ARRAYDataType::LargeList
STRUCTDataType::Struct
MAPDataType::Dictionary
VARIANTLargeUtf8 (JSON-encoded string, see VARIANT documentation)

Example

In this example, we configure a table to load data from a Parquet file.

create table PARTS ( part bigint not null, vendor bigint not null, price bigint not null ) with ('connectors' = '[{ "transport": { "name": "url_input", "config": { "path": "https://feldera-basics-tutorial.s3.amazonaws.com/parts.parquet" } }, "format": { "name": "parquet", "config": {} } }]');

For reference, the following python script was used to generate the parts.parquet file:

import pyarrow as pa import pyarrow.parquet as pq data = { 'PART': [1, 2, 3], 'VENDOR': [2, 1, 3], 'PRICE': [10000, 15000, 9000] } table = pa.Table.from_pydict(data) pq.write_table(table, 'parts.parquet')