deliberate.codes

Weblog of Marco N. - a software guy in data. I build with AI.

Benchmarking exarrow-rs: Rust vs Python for Exasol

I built exarrow-rs using spec-driven development.

Now, the question is: is it any good?

Time to find out. I benchmarked exarrow-rs against PyExasol, the official Python driver.

The results#

They say, the proof is in the pudding:

Operationexarrow-rsPyExasolDifference
Parquet Import (1GB)7.5s13.0s+42% faster
Parquet Import (100MB)1.08s1.39s+30% faster
Polars Streaming (100MB)1.2s1.7s+37% faster
CSV Import (100MB)0.78s0.87s+10% faster

The Parquet import stands out: at 1GB scale, exarrow-rs finishes 42% faster compared to PyExasol.

How I ran the benchmarks#

Setup#

  • Machine: MacBook Pro (M5)
  • Database: Exasol Docker (2025.2)
  • Data sizes: 100MB and 1GB
  • Iterations: 5 per benchmark (plus 1 warmup)

Test data#

Both drivers imported identical files into the same table schema:

CREATE TABLE benchmark.benchmark_data (
    id BIGINT,
    name VARCHAR(100),
    email VARCHAR(200),
    age INTEGER,
    salary DECIMAL(12,2),
    created_at TIMESTAMP,
    is_active BOOLEAN,
    description VARCHAR(1000)
)

The 100MB dataset contains 419K rows. The 1GB dataset contains 4.3M rows.

Operations tested#

  1. CSV Import: Read CSV file, insert into Exasol
  2. Parquet Import: Read Parquet file, insert into Exasol
  3. SELECT to Polars: Query data from Exasol, stream into a Polars DataFrame

Why the difference?#

Three factors explain exarrow-rs’s performance:

Native Arrow format. exarrow-rs implements ADBC (Arrow Database Connectivity). Data stays in Arrow’s columnar format throughout. No row-to-column conversions, no Python object overhead.

Rust’s memory model. No garbage collection pauses. Predictable allocations. When processing millions of rows, these details matter.

Direct Polars integration. Since both exarrow-rs and Polars use Arrow internally, data transfers are zero-copy. PyExasol can’t achieve this because it must bridge Python’s object model.

What these benchmarks don’t show#

Network latency. My tests used a local Docker container. Over a network, the driver’s efficiency matters less.

Concurrency. I tested single-threaded imports. Production workloads often parallelize across multiple connections.

Running the benchmarks yourself#

The benchmark code is available in the repository:

# Clone the repo
git clone https://github.com/marconae/exarrow-rs
cd exarrow-rs

# Start Exasol Docker
./scripts/setup_docker.sh

# Generate test data
cargo run --release --features benchmark --bin generate_data -- --size 1gb

# Run benchmarks
./scripts/run_benchmarks.sh

You can adjust iterations, data sizes, and which operations to run:

# Run only Parquet benchmarks with 10 iterations
FORMATS="parquet" ITERATIONS=10 ./scripts/run_benchmarks.sh

Summary#

exarrow-rs delivers measurable performance gains over PyExasol:

  • Parquet imports: 30-42% faster
  • Polars streaming: 37% faster
  • CSV imports: 6-10% faster

The benchmark shows that exarrow-rs is a viable alternative to PyExasol for data engineering workloads in Rust.