Flowfile Blog

Notes on local data platforms, Polars and visual ETL

Practical writing on building data pipelines that run on your laptop — from Excel automation to Delta Lake catalogs to the Polars vs Pandas debate. Updated as Flowfile ships.

Jun 1, 2026 8 min read

If you're building an AI agent, scope it before you scale it

The first move when adding an AI agent isn't picking a bigger model — it's scoping the action space and the feedback. Which is the exact work of designing a good UI.

AI AI Agents Python

Jun 1, 2026 4 min read

Your UI is already your agent's API

Design a UI component well and you've already written an agent tool — the same typed object, no translation layer in between. The pattern, in forty lines of Flowfile.

AI Agents Python Polars

May 11, 2026 4 min read

Flowfile Goes AI

How Flowfile v0.10 adds AI to visual ETL — an agent that edits the canvas, chat, ghost-node suggestions, Fix-with-AI — with BYOK across six providers.

AI Visual ETL AI Agents

May 1, 2026 7 min read

Logic Is a Table, Observed from the Other Side

Logic produces a table, but the logic is the table — observed from the consumer side. What that flip changes about catalog design, and what I had to delete after I noticed it.

Architecture Virtual Tables Lazy Evaluation

Apr 30, 2026 8 min read

Why Flowfile Is the Way It Is

The honest answer for why a 'visual ETL tool' has scheduling, a catalog, a SQL editor, dashboards, and ML nodes — none of it planned, none of it accidental.

Architecture Catalog Product

Apr 29, 2026 6 min read

Schema as a Contract

Every Flowfile node has a schema validation toggle. Turn it on, declare the columns you expect, pick a behaviour for what happens when the source drifts.

Schema Validation Data Quality FlowFrame

Apr 28, 2026 9 min read

Delta Lake with Polars: A Hands-On Walkthrough

Read, write, merge, and time-travel Delta Lake tables from Polars without Spark. Includes a MinIO setup so you can run a local S3-backed lakehouse on your laptop.

Delta Lake Polars Python

Apr 27, 2026 7 min read

Abstraction Is a Zoom Level on a DAG You Already Have

Most dataflow tools sit on a DAG. What separates code that feels easy from code buried in YAML isn't structure — it's how many nodes you can see.

Architecture DAG Abstraction

Apr 27, 2026 8 min read

Delta Lake vs Apache Iceberg in 2026: Which Open Table Format

Both bring ACID transactions and time travel to Parquet on object storage. Picking between Delta Lake and Iceberg in 2026 is less a war and more a fit-for-purpose choice.

Delta Lake Apache Iceberg Open Table Format

Apr 27, 2026 8 min read

Flowfile vs KNIME in 2026: A Practical Comparison

KNIME has the largest node library in open-source visual ETL. Flowfile is younger, leaner, and Polars-native. An honest side-by-side for analysts choosing between them.

Flowfile KNIME Comparison

Apr 27, 2026 9 min read

Polars vs DuckDB in 2026: Which to Pick for Local Analytics

Both are fast, columnar, and built on Arrow. Picking between Polars and DuckDB is less about speed and more about the shape of your work. An honest comparison.

Polars DuckDB Comparison

Apr 27, 2026 7 min read

Tools That Teach Get More Important in an AI World, Not Less

AI-generated code is disposable. Understanding is durable. The tools that matter most over the next few years are the ones that leave you smarter than they found you.

AI Learning Visual ETL

Apr 26, 2026 8 min read

Direction Stopped Mattering: Code and Graph in One Loop

Most ETL tools force a starting point: the canvas or the IDE. Flowfile lets both produce the same graph — and round-trip back to a script you'd actually write.

FlowFrame Code Export Polars

Apr 25, 2026 7 min read

Three Releases In, Flowfile Stopped Being a Pipeline Tool

v0.7 added a catalog. v0.8 moved storage to Delta. v0.9 closed the loop with virtual tables and a SQL editor. Looking back, the catalog quietly became the thing the rest of Flowfile hangs off of.

Release Notes Data Catalog Delta Lake

Apr 20, 2026 4 min read

Catalogs Make Data Easy. Open Formats Keep It Yours.

A catalog turns 'where did I save that file?' into 'just give it a name.' Open formats underneath mean the data is yours — readable by anything, portable anywhere, no vendor in the middle.

Data Catalog Beginners Open Formats

Apr 20, 2026 7 min read

Your Lineage Graph Should Run Your Pipelines

In most stacks, lineage and orchestration are two products. Flowfile collapses them: the same graph that records what flows read what also fires the schedules that run them.

Data Catalog Lineage Scheduling

Apr 16, 2026 5 min read

Automate Your Excel Workflows Without Writing Code

If you rebuild the same Excel report every week, you don't need Python. Here's how visual data pipelines turn that work into a repeatable, one-click process.

Excel Beginner Automation

Apr 16, 2026 8 min read

Big Data Is Dead — Why Your Laptop Is Probably Big Enough

The 'big data' era was real, but it ended quietly. Hardware caught up, working sets shrank, and single-node engines like Polars and DuckDB beat the cluster on most workloads.

Big Data Local-first Polars

Apr 16, 2026 7 min read

Bubble + Stripe + Mailchimp: A Non-Technical Founder's Playbook

You built a product on Bubble, you charge with Stripe, you email with Mailchimp. Here's how to connect all three into one view of who signs up, who pays, and who sticks around.

Small Business Bubble No-code

Apr 16, 2026 6 min read

CRM, ERP, ETL: Which Three-Letter Acronyms a Small Business Actually Needs

Software vendors love three-letter acronyms. Here's a plain-English guide to which ones matter for a small business, which ones you can ignore, and what to buy when.

Small Business Beginner CRM

Apr 16, 2026 8 min read

From Alteryx to Flowfile: A Practical Migration Walkthrough

How to rebuild a real Alteryx workflow in Flowfile — a weekly sales report with lookups, a pivot, and an Excel output — node by node, in under an hour.

Alteryx Migration Visual ETL

Apr 16, 2026 8 min read

Connections, Secrets, and the Catalog in Flowfile's Python API

How Flowfile registers database and cloud-storage connections once — in Python or the UI — and references them everywhere by name, with encryption handled for you.

Python DevEx Secrets

Apr 16, 2026 8 min read

Demystifying Delta Lake: What It Is and Why It Matters

Delta Lake is not a database. It's a transaction log over Parquet that gives you ACID, time travel, and schema evolution — without a server. Here's what it does, in plain English.

Delta Lake Data Engineering Lakehouse

Apr 16, 2026 12 min read

Flowfile's Kafka Source: How Micro-Batching Actually Works

A code-level walkthrough of Flowfile's Kafka source: the 500-message poll, the 100k-row spill to Arrow, Polars LazyFrames, and consumer-group offsets.

Kafka Flowfile Polars

Apr 16, 2026 8 min read

76× Faster Fuzzy Joins: How pl-fuzzy-frame-match Works

Brute-force fuzzy matching is O(N×M) — at 1.2 billion comparisons it falls over. Here's how a two-stage hybrid (ANN + exact scoring) reduces that to seconds while preserving accuracy.

Fuzzy Match Polars Performance

Apr 16, 2026 8 min read

Fuzzy Match in Polars: Joining on Dirty Data with Flowfile

Every real dataset has 'Acme Corp' vs 'ACME Corporation' somewhere. Here's how Flowfile's fuzzy_join — built on Polars and Levenshtein — handles it without a regex in sight.

Polars Fuzzy Match Data Cleaning

Apr 16, 2026 12 min read

Kafka for Analysts: A Practical Guide to Streaming as Micro-Batches

Most analytics 'streaming' is really a sequence of micro-batches. How to think about cleaning, combining, and enriching Kafka data without a streaming engine.

Kafka Streaming Analytics

Apr 16, 2026 6 min read

The Best Way to Learn Python Is to Build Something You'd Actually Use

Tutorials teach syntax. Building real things teaches everything else. Reflections on learning Python the hard way — through a year-long project called Flowfile.

Essay Learning Python

Apr 16, 2026 7 min read

Why Your Data Should Stay on Your Laptop

Local compute plus a built-in data catalog gives you the speed of a desktop tool and the structure of a warehouse — without sending a single row to the cloud.

Local-first Data Catalog Privacy

Apr 16, 2026 8 min read

Open-Source Alternatives to Alteryx in 2026

Alteryx is powerful, but the licensing has gotten brutal. An honest comparison of the open-source visual ETL tools worth evaluating in 2026.

ETL Open Source Comparison

Apr 16, 2026 7 min read

Meta vs. Google Ads: How to Actually Tell Which One Is Selling

Meta says 40 sales. Google says 32. Shopify says 58. Here's why all three are 'right' and how to build a single number you can trust.

Small Business Marketing Google Ads

Apr 16, 2026 7 min read

Polars vs Pandas in 2026: A Practical Guide

Polars is faster, lazier, and stricter than Pandas. Pandas has 15 years of ecosystem. A practical, honest take on when to use which in 2026.

Polars Pandas Performance

Apr 16, 2026 8 min read

RFM: The 50-Year-Old Customer Segmentation Every Small Business Should Steal

Your VIPs, your churn risks, and your dead weight are hiding in the same customer list. RFM is the simple scoring model that separates them in under an hour.

Small Business Customer Segmentation RFM

Apr 16, 2026 6 min read

Stop Copy-Pasting Between Spreadsheets

If your weekly routine involves downloading three exports and stitching them into one master sheet, you're doing a robot's job. Here's the non-technical way to hand it off.

Small Business Spreadsheets Excel

Apr 16, 2026 7 min read

The 10 Numbers Every Small Business Should Track Each Week

A flagship weekly scorecard for small business owners: what to track, where to find each number, and how to stitch them into one report in under an hour.

Small Business Metrics KPIs

Apr 16, 2026 7 min read

Virtual Flow Tables: When a Catalog Entry Is a Pipeline

Most data catalogs know about materialized tables and SQL views. Flowfile adds a third option: a catalog entry that points at a pipeline and resolves lazily. Here's how and why.

Data Catalog Delta Lake Architecture

Apr 16, 2026 6 min read

What Is a Data Pipeline? A Small Business Owner's Guide

If you've heard the term 'data pipeline' and assumed it wasn't for you, here's the plain-English version for small business owners — no engineering background required.

Small Business Beginner Data Pipeline

Apr 16, 2026 6 min read

What Is a Data Pipeline? A Plain-English Guide for Analysts

A data pipeline is a saved recipe that turns raw data into something useful. Here's what one is, what the parts are called, and how to build your first one without a data engineering degree.

Beginner Data Pipelines ETL

Subscribe via RSS