If you're building an AI agent, scope it before you scale it
The first move when adding an AI agent isn't picking a bigger model — it's scoping the action space and the feedback. Which is the exact work of designing a good UI.
Read articlePractical writing on building data pipelines that run on your laptop — from Excel automation to Delta Lake catalogs to the Polars vs Pandas debate. Updated as Flowfile ships.
The first move when adding an AI agent isn't picking a bigger model — it's scoping the action space and the feedback. Which is the exact work of designing a good UI.
Read articleDesign a UI component well and you've already written an agent tool — the same typed object, no translation layer in between. The pattern, in forty lines of Flowfile.
Read articleHow Flowfile v0.10 adds AI to visual ETL — an agent that edits the canvas, chat, ghost-node suggestions, Fix-with-AI — with BYOK across six providers.
Read articleLogic produces a table, but the logic is the table — observed from the consumer side. What that flip changes about catalog design, and what I had to delete after I noticed it.
Read articleThe honest answer for why a 'visual ETL tool' has scheduling, a catalog, a SQL editor, dashboards, and ML nodes — none of it planned, none of it accidental.
Read articleEvery Flowfile node has a schema validation toggle. Turn it on, declare the columns you expect, pick a behaviour for what happens when the source drifts.
Read articleRead, write, merge, and time-travel Delta Lake tables from Polars without Spark. Includes a MinIO setup so you can run a local S3-backed lakehouse on your laptop.
Read articleMost dataflow tools sit on a DAG. What separates code that feels easy from code buried in YAML isn't structure — it's how many nodes you can see.
Read articleBoth bring ACID transactions and time travel to Parquet on object storage. Picking between Delta Lake and Iceberg in 2026 is less a war and more a fit-for-purpose choice.
Read articleKNIME has the largest node library in open-source visual ETL. Flowfile is younger, leaner, and Polars-native. An honest side-by-side for analysts choosing between them.
Read articleBoth are fast, columnar, and built on Arrow. Picking between Polars and DuckDB is less about speed and more about the shape of your work. An honest comparison.
Read articleAI-generated code is disposable. Understanding is durable. The tools that matter most over the next few years are the ones that leave you smarter than they found you.
Read articleMost ETL tools force a starting point: the canvas or the IDE. Flowfile lets both produce the same graph — and round-trip back to a script you'd actually write.
Read articlev0.7 added a catalog. v0.8 moved storage to Delta. v0.9 closed the loop with virtual tables and a SQL editor. Looking back, the catalog quietly became the thing the rest of Flowfile hangs off of.
Read articleA catalog turns 'where did I save that file?' into 'just give it a name.' Open formats underneath mean the data is yours — readable by anything, portable anywhere, no vendor in the middle.
Read articleIn most stacks, lineage and orchestration are two products. Flowfile collapses them: the same graph that records what flows read what also fires the schedules that run them.
Read articleIf you rebuild the same Excel report every week, you don't need Python. Here's how visual data pipelines turn that work into a repeatable, one-click process.
Read articleThe 'big data' era was real, but it ended quietly. Hardware caught up, working sets shrank, and single-node engines like Polars and DuckDB beat the cluster on most workloads.
Read articleYou built a product on Bubble, you charge with Stripe, you email with Mailchimp. Here's how to connect all three into one view of who signs up, who pays, and who sticks around.
Read articleSoftware vendors love three-letter acronyms. Here's a plain-English guide to which ones matter for a small business, which ones you can ignore, and what to buy when.
Read articleHow to rebuild a real Alteryx workflow in Flowfile — a weekly sales report with lookups, a pivot, and an Excel output — node by node, in under an hour.
Read articleHow Flowfile registers database and cloud-storage connections once — in Python or the UI — and references them everywhere by name, with encryption handled for you.
Read articleDelta Lake is not a database. It's a transaction log over Parquet that gives you ACID, time travel, and schema evolution — without a server. Here's what it does, in plain English.
Read articleA code-level walkthrough of Flowfile's Kafka source: the 500-message poll, the 100k-row spill to Arrow, Polars LazyFrames, and consumer-group offsets.
Read articleBrute-force fuzzy matching is O(N×M) — at 1.2 billion comparisons it falls over. Here's how a two-stage hybrid (ANN + exact scoring) reduces that to seconds while preserving accuracy.
Read articleEvery real dataset has 'Acme Corp' vs 'ACME Corporation' somewhere. Here's how Flowfile's fuzzy_join — built on Polars and Levenshtein — handles it without a regex in sight.
Read articleMost analytics 'streaming' is really a sequence of micro-batches. How to think about cleaning, combining, and enriching Kafka data without a streaming engine.
Read articleTutorials teach syntax. Building real things teaches everything else. Reflections on learning Python the hard way — through a year-long project called Flowfile.
Read articleLocal compute plus a built-in data catalog gives you the speed of a desktop tool and the structure of a warehouse — without sending a single row to the cloud.
Read articleAlteryx is powerful, but the licensing has gotten brutal. An honest comparison of the open-source visual ETL tools worth evaluating in 2026.
Read articleMeta says 40 sales. Google says 32. Shopify says 58. Here's why all three are 'right' and how to build a single number you can trust.
Read articlePolars is faster, lazier, and stricter than Pandas. Pandas has 15 years of ecosystem. A practical, honest take on when to use which in 2026.
Read articleYour VIPs, your churn risks, and your dead weight are hiding in the same customer list. RFM is the simple scoring model that separates them in under an hour.
Read articleIf your weekly routine involves downloading three exports and stitching them into one master sheet, you're doing a robot's job. Here's the non-technical way to hand it off.
Read articleA flagship weekly scorecard for small business owners: what to track, where to find each number, and how to stitch them into one report in under an hour.
Read articleMost data catalogs know about materialized tables and SQL views. Flowfile adds a third option: a catalog entry that points at a pipeline and resolves lazily. Here's how and why.
Read articleIf you've heard the term 'data pipeline' and assumed it wasn't for you, here's the plain-English version for small business owners — no engineering background required.
Read articleA data pipeline is a saved recipe that turns raw data into something useful. Here's what one is, what the parts are called, and how to build your first one without a data engineering degree.
Read articleNo posts match your search.