All articles

Delta Lake vs Apache Iceberg in 2026: Which Open Table Format

Both bring ACID transactions and time travel to Parquet on object storage. Picking between Delta Lake and Iceberg in 2026 is less a war and more a fit-for-purpose choice.

TL;DR. Delta Lake and Apache Iceberg are both open table formats that bring ACID transactions, time travel, and schema evolution to Parquet on object storage. They started in different orbits (Databricks and Netflix respectively), and in 2026 they’re close enough on capability that the choice is mostly about fit. Iceberg has a richer metadata story for very large warehouses; Delta has a simpler design that fits smaller and single-machine setups. Engine support is broad on both sides. Pick the one your stack already leans toward and stop reading vs-vs articles.

Why these formats exist

Parquet alone is a great file format and a terrible database. You can write a beautiful columnar file to S3 and a second writer can corrupt it before the first one finishes. There’s no atomic multi-file commit, no consistent schema-evolution story, no “show me what this looked like last Tuesday.”

Both Delta Lake and Apache Iceberg solve the same problem: put a transaction layer on top of Parquet so a folder of files behaves like a versioned table. ACID writes, point-in-time reads, schema and partition evolution, and a way for any engine to ask “what’s the current state?” without scanning the directory.

The two projects converged on similar capabilities from different starting points. Delta came out of Databricks’ work on the Lakehouse architecture; Iceberg came out of Netflix’s pain managing very large Hive tables. The histories are different. The end results are surprisingly similar.

What each format actually is

Delta Lake

A Delta table is a folder. Inside it: Parquet data files, and a _delta_log/ directory containing JSON commit files and periodic Parquet checkpoints. Each write appends a new JSON file to the log. Readers list the log, replay the commits, and know the current state of the table.

sales_orders/
  part-00000-abc.parquet
  part-00001-def.parquet
  _delta_log/
    00000000000000000000.json
    00000000000000000001.json
    00000000000000000010.checkpoint.parquet

That’s it. The format is an append-only log over a folder of Parquet. The simplicity is the design. Delta’s storage layer is about three concepts you can hold in your head: data files, log files, and checkpoints.

Apache Iceberg

An Iceberg table is also a folder, but the metadata layer has more depth. Each table has a metadata pointer file (metadata.json), which references a snapshot, which references a manifest list, which references manifest files, which reference the actual Parquet files. The hierarchy looks like overhead until you have hundreds of millions of files — at that point the manifest layout lets readers prune which files they need without listing the whole directory.

sales_orders/
  data/
    00000-1-abc.parquet
    00001-2-def.parquet
  metadata/
    v1.metadata.json
    v2.metadata.json
    snap-12345.avro       (manifest list)
    abc.avro              (manifest)
    def.avro              (manifest)

The extra layers buy you finer-grained metadata pruning, hidden partitioning (the partition scheme can change without rewriting data), and a tighter story for very large object counts. The cost is more concepts, more files, and more places where the “what is this table doing” question takes longer to answer.

Where they’re functionally identical

Worth being explicit, because the comparison gets framed as a war and isn’t:

  • ACID transactions. Both formats give you atomic multi-file writes, isolated reads, and consistent commits. Concurrent writers go through optimistic concurrency control on both sides.
  • Time travel. Both let you read a specific version of the table, either by version number or by timestamp.
  • Schema evolution. Add columns, drop columns, rename columns — both formats support it without rewriting data.
  • Partition pruning. Both prune partitions when reading, and both let you query without thinking about partition columns.
  • Engine support in 2026. Spark, Flink, Trino, Presto, Snowflake, BigQuery, Athena, Polars, DuckDB, ClickHouse — all support both formats now. The gap that existed in 2022 has mostly closed.
  • Open source. Delta is Linux Foundation; Iceberg is Apache. Both have multiple corporate backers; neither is single-vendor in 2026.

If you’re choosing between them in 2026 and the dimension that decides it is “which one supports schema evolution” — both do, evenly. The dimensions that actually decide it are operational and ecosystem-shaped.

Where the design choices matter

Metadata at scale

For tables with billions of rows across hundreds of millions of small files, Iceberg’s manifest layout is genuinely better. Reading the metadata to decide what to scan can be done in a few small file reads instead of listing the whole table directory. Delta’s design is simpler and was originally optimised around a smaller object-count regime; the gap has narrowed with checkpoint optimisations, but at very large scale, Iceberg still pulls ahead.

For tables with millions of rows across thousands of files — which is most analytics workloads — neither metadata strategy is a bottleneck. The choice is unobservable.

Catalog story

Iceberg has a stronger catalog story by design. Iceberg tables are typically registered in a catalog (REST catalog, Hive metastore, Nessie, Polaris, AWS Glue), and the catalog is the source of truth for “where the current metadata pointer lives.” This makes multi-table operations and cross-engine consistency cleaner.

Delta tables don’t strictly require a catalog — the table is the folder, and the folder knows its current state. Unity Catalog (Databricks) and other catalogs sit on top, but the format works without one. That’s a feature for laptop-scale work and a friction point for warehouse-scale work.

Hidden partitioning

Iceberg supports hidden partitioning: you write WHERE event_date = '2026-04-01' and the engine figures out which partition to read without you encoding the partition column in the predicate. The partition scheme is metadata, not part of the file path.

Delta’s partition handling is closer to Hive-style — the partition column is in the path, and queries that want partition pruning typically reference it explicitly. The Liquid Clustering feature added in recent Delta releases provides similar benefits without explicit partition columns, but it’s a different mechanism.

For exploratory queries written by humans, hidden partitioning is genuinely nicer. For machine-generated queries that are aware of the partitioning, the difference shrinks.

Simplicity vs sophistication

Delta is simpler. You can debug a Delta table by reading three JSON files. You can build a Delta writer in a weekend on top of Polars and Parquet. The mental model fits in a paragraph.

Iceberg is more sophisticated. The metadata hierarchy was designed for multi-petabyte tables managed by multiple compute engines simultaneously. The complexity is load-bearing for those use cases and overhead for smaller ones.

This is the deepest design difference. Pick Delta when you’d rather your table format stay out of the way. Pick Iceberg when you’d rather your table format do more of the heavy lifting at scale.

Side-by-side at a glance

DimensionDelta LakeApache Iceberg
OriginDatabricks (2019)Netflix → Apache (2018)
GovernanceLinux FoundationApache Software Foundation
Metadata layoutJSON commit log + checkpointsMetadata.json → manifest list → manifests
Catalog requirementOptionalRecommended (REST, Glue, Nessie, Polaris)
Hidden partitioningVia Liquid ClusteringNative
Ease of mental modelSimpleMore layered
Best atSingle-machine, small/medium lakehouses, simplicityVery large warehouses, multi-engine, fine-grained metadata
Cross-format readIceberg-readable via UniFormDelta-readable via some engines
Polars supportFirst-class via deltalakeVia PyIceberg
LicenseApache 2.0Apache 2.0

Where Flowfile sits in this picture

Flowfile uses Delta Lake as its catalog format. The reasoning is the simplicity argument turned into a product decision: a single-machine tool with a SQLite-tracked catalog and folders of Parquet on disk doesn’t need Iceberg’s metadata sophistication, and would pay the complexity cost without using it.

Delta gives every Flowfile catalog table a transaction log — a small JSON folder next to the Parquet — which is what makes “the table updated” a well-defined event. The reactive scheduler in v0.9 (table_trigger, table_set_trigger) reads that log to know when to fire. Iceberg would have given the same capability through its snapshot machinery, but with more files, more layers, and more code to integrate.

If you’re working in a multi-engine warehouse where Iceberg is already the standard, the right answer is to keep your warehouse on Iceberg, write Parquet from Flowfile, and load it into Iceberg through your existing pipeline. Flowfile is the workshop, not the warehouse — the format choice at the workshop scale is allowed to be different.

Flowfile catalog overview showing Delta-backed tables with versioning and lineage

What I’d tell a friend choosing today

If your team is heavily on Databricks, stay on Delta. If your team is heavily on Snowflake, BigQuery, or AWS Glue, the path of least resistance leans toward Iceberg — though both work.

If you’re starting from scratch, with no prior commitment, and your data fits comfortably on a few machines, pick Delta. The simpler operational model will save you time. If you’re building something that will scale to a multi-petabyte multi-engine warehouse, pick Iceberg. The metadata layer earns its keep at that scale.

The format you pick today isn’t a one-way door. Both have read-support paths into the other format in 2026, and the engineering effort to migrate a table from one to the other is small enough that “choose now, change in three years if needed” is a fine plan.

The vs-vs framing is mostly noise. The work each format is doing — making Parquet behave like a database — is the same. Pick one and go build something.


Related reads: Demystifying Delta Lake for a deeper walk through Delta’s design, Polars vs DuckDB in 2026 for the engines that read these formats, and Catalogs Make Data Easy. Open Formats Keep It Yours. for how a catalog sits on top of either format.

Frequently asked questions

Are Delta Lake and Iceberg interoperable?
More than they used to be. Delta UniForm (introduced in 2023, expanded since) lets a Delta table also be readable as an Iceberg table by writing the Iceberg metadata alongside the Delta log. The reverse — reading Iceberg as Delta — is supported by some engines too. They aren't a single format yet, but in 2026 the gap between 'pick one and live with it' and 'pick one and the other can read it' is small.
Which format do major engines actually support?
Both, in 2026. Spark, Flink, Trino, Presto, Snowflake, BigQuery, Databricks, Athena, and Polars all support Iceberg. Delta has near-equivalent coverage now, though historically the engine support was narrower outside the Databricks orbit. Bench-test the integration that matters to your stack — both formats work; the friction lives at the edges.
Is one format objectively better?
No. The honest engineering trade is: Iceberg's metadata layer is more sophisticated and scales better at very high object counts; Delta's design is simpler and easier to reason about on a single machine or in small lakehouses. Most teams pick the one that matches their existing tooling and don't regret it.
Can I run Delta Lake or Iceberg on a laptop?
Yes — both. Delta Lake has Python bindings via deltalake-rs that work on a local filesystem with no external services. Iceberg has PyIceberg with similar local-mode support. You don't need S3 or a Hive metastore for either; they were designed around object storage but the formats themselves are filesystem-agnostic.
Why does Flowfile use Delta Lake?
Delta is the simpler match for a single-machine tool. The transaction log is one folder of JSON files next to the Parquet data; reading and writing it through Polars or `deltalake-rs` is a small dependency surface; and the operational model — every write is a log entry — is straightforward to reason about. Iceberg is a fine choice for multi-engine warehouses; for the workshop-scale problem Flowfile is solving, Delta is enough and Iceberg's extra machinery would have been overhead without payoff.