Do I have to learn SQL to use the catalog?

No. You can register tables, browse them, preview their data, and read them into a flow without writing a line of SQL. SQL is there as an option in the catalog's editor for ad-hoc queries, but it's never the only path.

What happens to my data if I stop using Flowfile?

Nothing — your data is stored in open Delta Lake format on your own disk. Delta is just Parquet files plus a small JSON transaction log; both are public specifications. You can read every catalog table from DuckDB, Polars, Pandas, Spark, or any other tool that speaks Parquet, with no migration step.

Do I need to understand Delta Lake to use the catalog?

Not at all. Delta is the storage layer that powers versioning and time travel under the hood, but you only need to interact with table names. If you ever want the deeper version-history features, they're there. If you don't, the defaults are fine.

Can I use other tools alongside the Flowfile catalog?

Yes. The catalog tables live in plain folders on your machine (under `~/.flowfile/catalog_tables/`), and any tool that reads Parquet or Delta can open them directly. You don't have to choose between Flowfile and the rest of your stack.

What if I outgrow my laptop and need to move to a warehouse?

Copy the Delta tables up. Snowflake, Databricks, BigQuery, and DuckDB all read Delta directly. The catalog metadata (table names, schemas, run history) is portable too — but the important point is that the data isn't trapped in a Flowfile-specific format, because there isn't one.

All articles

Data Catalog Beginners Open Formats Parquet Delta Lake Local-first

Catalogs Make Data Easy. Open Formats Keep It Yours.

A catalog turns 'where did I save that file?' into 'just give it a name.' Open formats underneath mean the data is yours — readable by anything, portable anywhere, no vendor in the middle.

By Edward van Eechoud April 20, 2026 4 min read

TL;DR. A catalog is just a place that gives your tables names, so you stop managing files and start managing data. That’s the whole user-facing idea. Flowfile’s catalog does it with one click. The data underneath is stored in open formats (Parquet, Delta Lake), so even if you stopped using Flowfile tomorrow, every table is still readable by every tool that speaks those formats. Easy to start. Can’t get stuck.

The “is this the latest one?” problem

Here’s a moment most people will recognise.

You exported some data. Cleaned it up. Did some analysis. Saved the result. A week later you need it again. You open the folder and find orders.csv, orders_clean.csv, orders_clean_v2.csv, orders_FINAL.csv, and orders_FINAL_use_this.csv. You can’t remember which one matters or where the cleaned-up logic lives.

This is the problem catalogs solve. Not “schema evolution” or “metadata management” or any of the other words tools use to describe themselves. The actual problem is: you don’t want to think about files.

What a catalog gives you

A catalog gives every table a name and remembers where it lives. You give it the data; it stores it; later you ask for it back by name. That’s it.

In Flowfile, it looks like this. You finish a flow that produces a table you care about — drop in a Catalog Writer node, call the result sales.monthly_orders, and run the flow. The table is now in the catalog. Tomorrow, in a different flow, you drop in a Catalog Reader node, type sales.monthly_orders, and the data is back — latest version, correct schema, same data everyone else is reading.

What’s gone from your day:

No folder to organise — the catalog handles where the file lives.
No file naming gymnastics — the name is just the name.
No “is this the latest?” question — there’s one entry per name, and you always get the current version.
No file paths in your code — sales.monthly_orders doesn’t change if the underlying storage moves.
No reading instructions to share — a colleague opens the catalog, sees the table, clicks for a preview.

For experienced engineers, this is convenience. For beginners, it’s the difference between being able to keep track of your own work and giving up after three months. Experts have spent years building habits to compensate for not having a catalog — naming conventions, folder structures, READMEs that nobody reads. Beginners haven’t, and they don’t need to. A catalog skips the whole class of problem.

This is also why every cloud warehouse — Snowflake, BigQuery, Databricks — ships with a catalog by default. It isn’t a technical requirement. It’s that nobody can use a database with no names. Flowfile gives you the same shape — catalog.schema.table — without needing a server.

The other half: open formats

Catalogs have a reputation problem, though. People who’ve been burned before associate “catalog” with “vendor lock-in,” because a lot of historical catalogs bundled proprietary storage formats. You put your data in, and now your data is in their shape. Leave the tool, lose access.

Flowfile doesn’t do this. Every catalog table is stored as a Delta Lake table — which is a folder of Parquet files plus a small JSON transaction log. Both are open specifications maintained by the Linux Foundation. They’re readable by DuckDB, Polars, Pandas, Spark, Trino, Athena, Snowflake, BigQuery — essentially every modern data tool.

If you open ~/.flowfile/catalog_tables/ in your file browser, you’ll see directories like monthly_orders_a3f1b2c4/. Inside each one is a handful of .parquet files and a _delta_log/ folder. Nothing magic. Nothing that needs a Flowfile binary to read.

Easy to start, hard to lock in

The combination is what makes this worth caring about.

Easy to start because you don’t have to plan. Register a table, query it, build flows around it. No schema to design up front, no folder structure to commit to. The catalog handles organisation; Delta handles versioning. If you change your mind about the schema later, evolve it in place — old versions are kept until you explicitly delete them.

Hard to lock in because your data isn’t trapped in a Flowfile-specific format. There isn’t one. The catalog database is small — just names, schemas, and run history. The data itself is open Parquet on your own disk. If you decide tomorrow to switch to a warehouse, you copy the Parquet up. If you want to query a Flowfile table from another tool while still using Flowfile, pl.scan_delta(...) works exactly as it would on any other Delta table.

Most “easy” tools achieve easy by hiding things from you. They’re easy until you outgrow them, and then they’re a wall. Open formats are the opposite — they give you somewhere to go. The catalog makes the everyday easy; the open format makes the long term safe.

That combination is what beginners deserve, and it’s what Flowfile’s catalog is built for.

Related reads: Why Your Data Should Stay on Your Laptop for the local-first case, Demystifying Delta Lake for the open format under the catalog, and Virtual Flow Tables for what catalog entries can be beyond a stored file.

Frequently asked questions

Do I have to learn SQL to use the catalog?: No. You can register tables, browse them, preview their data, and read them into a flow without writing a line of SQL. SQL is there as an option in the catalog's editor for ad-hoc queries, but it's never the only path.
What happens to my data if I stop using Flowfile?: Nothing — your data is stored in open Delta Lake format on your own disk. Delta is just Parquet files plus a small JSON transaction log; both are public specifications. You can read every catalog table from DuckDB, Polars, Pandas, Spark, or any other tool that speaks Parquet, with no migration step.
Do I need to understand Delta Lake to use the catalog?: Not at all. Delta is the storage layer that powers versioning and time travel under the hood, but you only need to interact with table names. If you ever want the deeper version-history features, they're there. If you don't, the defaults are fine.
Can I use other tools alongside the Flowfile catalog?: Yes. The catalog tables live in plain folders on your machine (under `~/.flowfile/catalog_tables/`), and any tool that reads Parquet or Delta can open them directly. You don't have to choose between Flowfile and the rest of your stack.
What if I outgrow my laptop and need to move to a warehouse?: Copy the Delta tables up. Snowflake, Databricks, BigQuery, and DuckDB all read Delta directly. The catalog metadata (table names, schemas, run history) is portable too — but the important point is that the data isn't trapped in a Flowfile-specific format, because there isn't one.