Big Data Is Dead — Why Your Laptop Is Probably Big Enough
The 'big data' era was real, but it ended quietly. Hardware caught up, working sets shrank, and single-node engines like Polars and DuckDB beat the cluster on most workloads.
Read articlePractical writing on building data pipelines that run on your laptop — from Excel automation to Delta Lake catalogs to the Polars vs Pandas debate. Updated as Flowfile ships.
The 'big data' era was real, but it ended quietly. Hardware caught up, working sets shrank, and single-node engines like Polars and DuckDB beat the cluster on most workloads.
Read articleIf you rebuild the same Excel report every week, you don't need Python. Here's how visual data pipelines turn that work into a repeatable, one-click process.
Read articleTutorials teach syntax. Building real things teaches everything else. Reflections on learning Python the hard way — through a year-long project called Flowfile.
Read articleHow to rebuild a real Alteryx workflow in Flowfile — a weekly sales report with lookups, a pivot, and an Excel output — node by node, in under an hour.
Read articleDelta Lake is not a database. It's a transaction log over Parquet that gives you ACID, time travel, and schema evolution — without a server. Here's what it does, in plain English.
Read articleBrute-force fuzzy matching is O(N×M) — at 1.2 billion comparisons it falls over. Here's how a two-stage hybrid (ANN + exact scoring) reduces that to seconds while preserving accuracy.
Read articleA code-level walkthrough of Flowfile's Kafka source: the 500-message poll, the 100k-row spill to Arrow, Polars LazyFrames, and consumer-group offsets.
Read articleEvery real dataset has 'Acme Corp' vs 'ACME Corporation' somewhere. Here's how Flowfile's fuzzy_join — built on Polars and Levenshtein — handles it without a regex in sight.
Read articleHow Flowfile registers database and cloud-storage connections once — in Python or the UI — and references them everywhere by name, with encryption handled for you.
Read articleMost analytics 'streaming' is really a sequence of micro-batches. How to think about cleaning, combining, and enriching Kafka data without a streaming engine.
Read articleLocal compute plus a built-in data catalog gives you the speed of a desktop tool and the structure of a warehouse — without sending a single row to the cloud.
Read articleAlteryx is powerful, but the licensing has gotten brutal. An honest comparison of the open-source visual ETL tools worth evaluating in 2026.
Read articlePolars is faster, lazier, and stricter than Pandas. Pandas has 15 years of ecosystem. A practical, honest take on when to use which in 2026.
Read articleMost data catalogs know about materialized tables and SQL views. Flowfile adds a third option: a catalog entry that points at a pipeline and resolves lazily. Here's how and why.
Read articleA data pipeline is a saved recipe that turns raw data into something useful. Here's what one is, what the parts are called, and how to build your first one without a data engineering degree.
Read articleYou built a product on Bubble, you charge with Stripe, you email with Mailchimp. Here's how to connect all three into one view of who signs up, who pays, and who sticks around.
Read articleSoftware vendors love three-letter acronyms. Here's a plain-English guide to which ones matter for a small business, which ones you can ignore, and what to buy when.
Read articleMeta says 40 sales. Google says 32. Shopify says 58. Here's why all three are 'right' and how to build a single number you can trust.
Read articleYour VIPs, your churn risks, and your dead weight are hiding in the same customer list. RFM is the simple scoring model that separates them in under an hour.
Read articleIf your weekly routine involves downloading three exports and stitching them into one master sheet, you're doing a robot's job. Here's the non-technical way to hand it off.
Read articleA flagship weekly scorecard for small business owners: what to track, where to find each number, and how to stitch them into one report in under an hour.
Read articleIf you've heard the term 'data pipeline' and assumed it wasn't for you, here's the plain-English version for small business owners — no engineering background required.
Read articleNo posts match your search.