I built exarrow-rs using spec-driven development.
Now, the question is: is it any good?
Time to find out. I benchmarked exarrow-rs against PyExasol, the official Python driver.
The results#
They say, the proof is in the pudding:
| Operation | exarrow-rs | PyExasol | Difference |
|---|---|---|---|
| Parquet Import (1GB) | 7.5s | 13.0s | +42% faster |
| Parquet Import (100MB) | 1.08s | 1.39s | +30% faster |
| Polars Streaming (100MB) | 1.2s | 1.7s | +37% faster |
| CSV Import (100MB) | 0.78s | 0.87s | +10% faster |
The Parquet import stands out: at 1GB scale, exarrow-rs finishes 42% faster compared to PyExasol.
How I ran the benchmarks#
Setup#
- Machine: MacBook Pro (M5)
- Database: Exasol Docker (2025.2)
- Data sizes: 100MB and 1GB
- Iterations: 5 per benchmark (plus 1 warmup)
Test data#
Both drivers imported identical files into the same table schema:
CREATE TABLE benchmark.benchmark_data (
id BIGINT,
name VARCHAR(100),
email VARCHAR(200),
age INTEGER,
salary DECIMAL(12,2),
created_at TIMESTAMP,
is_active BOOLEAN,
description VARCHAR(1000)
)
The 100MB dataset contains 419K rows. The 1GB dataset contains 4.3M rows.
Operations tested#
- CSV Import: Read CSV file, insert into Exasol
- Parquet Import: Read Parquet file, insert into Exasol
- SELECT to Polars: Query data from Exasol, stream into a Polars DataFrame
Why the difference?#
Three factors explain exarrow-rs’s performance:
Native Arrow format. exarrow-rs implements ADBC (Arrow Database Connectivity). Data stays in Arrow’s columnar format throughout. No row-to-column conversions, no Python object overhead.
Rust’s memory model. No garbage collection pauses. Predictable allocations. When processing millions of rows, these details matter.
Direct Polars integration. Since both exarrow-rs and Polars use Arrow internally, data transfers are zero-copy. PyExasol can’t achieve this because it must bridge Python’s object model.
What these benchmarks don’t show#
Network latency. My tests used a local Docker container. Over a network, the driver’s efficiency matters less.
Concurrency. I tested single-threaded imports. Production workloads often parallelize across multiple connections.
Running the benchmarks yourself#
The benchmark code is available in the repository:
# Clone the repo
git clone https://github.com/marconae/exarrow-rs
cd exarrow-rs
# Start Exasol Docker
./scripts/setup_docker.sh
# Generate test data
cargo run --release --features benchmark --bin generate_data -- --size 1gb
# Run benchmarks
./scripts/run_benchmarks.sh
You can adjust iterations, data sizes, and which operations to run:
# Run only Parquet benchmarks with 10 iterations
FORMATS="parquet" ITERATIONS=10 ./scripts/run_benchmarks.shSummary#
exarrow-rs delivers measurable performance gains over PyExasol:
- Parquet imports: 30-42% faster
- Polars streaming: 37% faster
- CSV imports: 6-10% faster
The benchmark shows that exarrow-rs is a viable alternative to PyExasol for data engineering workloads in Rust.