Engine lock-in: the lakehouse anti-pattern

Consider this scenario: Multi-terabyte Databricks lakehouse. Dozens of dashboards. High consumption costs. The team wants to evaluate alternative query engines for their BI workloads. Maybe a specialized OLAP engine. Anything to bring costs down.

Digging into the reports shows: Native queries everywhere. Custom calculations and business logic written directly in the SQL dialect of the current engine, hard-coded into the BI layer.

Every one of those queries is written in the dialect of a single engine. And that’s the trap. The moment you embed engine-specific SQL in your front-end layer, you’ve welded your BI tool to that engine. Want to route queries to a faster, cheaper alternative? You can’t. Not without rewriting every report.

These teams adopted a lakehouse for flexibility. Instead, they locked themselves to a single engine through their front-end layer.

I call it the “Native Query Anti-Pattern.”

The lakehouse promise#

Data lakehouses give you the choice of query engines

Data lakehouses deliver many advantages, but one stands above the rest: an open data architecture that lets multiple query engines operate directly on the same shared data layer.

This openness brings flexibility and competition. You choose different engines for different jobs. One handles transformations, moving data from bronze to silver to gold. Another powers your consumption layer. You’re not locked in.

On Databricks, you might use Spark for transformation pipelines. Spark excels at heavy-duty data processing at scale. But Spark is not ideal for read-only analytical workloads: dozens or hundreds of concurrent users hitting dashboards simultaneously.

Why? Spark is a multi-purpose engine. It uses a hybrid execution model (data-centric code generation, optional vectorization, and Volcano-mode fallback) to handle everything from OLAP queries to semi-structured data processing. This flexibility comes at a cost. For analytical queries, engines built specifically for OLAP, like Databricks Photon, Exasol’s Lakehouse Turbo, Snowflake, or Trino, consistently outperform vanilla Spark.

Analytical queries in the consumption layer drive most of the cost in a lakehouse deployment. This is where engine choice matters most. Under consumption-based pricing, the math is simple:

Slower queries = longer runtime = higher costs.

The Native Query Anti-Pattern#

Here’s where teams give away their freedom.

When you implement custom calculations or business logic in your BI tool using native queries, you hardcode against a specific engine. These queries use engine-specific SQL dialects, functions, and syntax. They won’t run anywhere else.

The result: you can’t switch query engines without rewriting every native query in your reports. You’ve traded the lakehouse’s core advantage, engine independence, for short-term convenience.

You are locked in to both the front-end tool and the query engine.

The fix: headless BI#

Push business logic and data integration into your lakehouse. Build a clean Silver layer. And build a clean Gold layer or data mart where the data is already modeled for consumption.

When your data mart is clean, native queries become unnecessary. Your BI tool is able to push down analytical queries automatically, adapting to whatever SQL dialect the underlying engine speaks. No custom SQL, no engine-specific syntax.

This unlocks true flexibility:

Switch query engines: Route consumption queries to the fastest, cheapest engine for the job. The BI tool adapts automatically.
Switch BI tools: Your reports aren’t tied to a specific front-end. Move from one front-end tool to another without rewriting queries.
Cost optimization: Freedom to benchmark engines and optimize without migration headaches.

This is headless BI: decouple your data logic from your visualization layer. Keep the BI layer thin. Let the data platform do the heavy lifting.