All articles

From Alteryx to Flowfile: A Practical Migration Walkthrough

How to rebuild a real Alteryx workflow in Flowfile — a weekly sales report with lookups, a pivot, and an Excel output — node by node, in under an hour.

TL;DR. Migrating from Alteryx to Flowfile looks intimidating and isn’t. The visual vocabulary translates almost one-to-one: Input Data becomes Read Data, Summarize becomes Group By, Cross Tab becomes Pivot, and so on. You rebuild rather than convert (no automated tool gets you there cleanly), which is faster than it sounds — usually under an hour per workflow — and doubles as a free audit of logic nobody has re-read in two years. This post walks through a realistic weekly sales report end-to-end, side-by-side.

The workflow we’re migrating

Here is an Alteryx workflow that half the business-intelligence teams on earth have a version of:

Input Data (orders.csv)
    → Select (drop internal_id, legacy_flag)
    → Join with Input Data (customers.xlsx) on customer_id
    → Filter to current quarter
    → Summarize by region & product: sum(spend), count(order_id)
    → Sort by spend descending
    → Cross Tab: rows = region, columns = product, values = spend
    → Output Data (weekly_report.xlsx)

A weekly report. Two file inputs. One join. One filter. One aggregation. One pivot. One export. This is the shape of most Alteryx workflows in the wild — and it rebuilds in Flowfile in about 20 minutes.

The node-to-node map

Before the walkthrough, here is the cheat sheet. Alteryx users can translate most workflows with this table alone:

Alteryx nodeFlowfile nodeNotes
Input DataRead DataCSV, Excel, Parquet, JSON
Text InputManual InputType data directly
Output DataWrite DataSame formats as input
SelectSelect DataPick, rename, reorder columns
FormulaFormulaCreate or modify columns
FilterFilter DataRow conditions
SortSort DataOne or many columns
UniqueDrop DuplicatesOn any subset of columns
SampleSampleFirst N, random N, etc.
JoinJoinInner, left, right, outer
Append FieldsCross JoinCartesian product
UnionUnion DataStack rows
Fuzzy MatchFuzzy MatchApproximate-string joins
SummarizeGroup ByAll standard aggregations
Cross TabPivot DataLong → wide
TransposeUnpivot DataWide → long
Text To Columns (split)Text to Rows / FormulaDepending on the shape
Record IDAdd Record IDSequential row identifier
Run Command / PythonPython ScriptSandboxed Docker execution
Input DatabaseRead from DatabasePostgres, MySQL, SQL Server, Oracle, DuckDB
Output DatabaseWrite to DatabaseSame set
Amazon S3 Download / AzureRead from Cloud ProviderS3, ADLS, GCS

Nodes Alteryx users often ask about:

  • Browse (Alteryx) — every Flowfile node has a built-in data preview; you don’t need a separate node.
  • Comment / Container — Flowfile uses descriptions on each node plus the visual canvas organisation; there’s no dedicated container node (yet).
  • Spatial / predictive — Flowfile doesn’t ship dedicated spatial or statistical nodes the way Alteryx does. For those, the Python Script or Polars Code node lets you drop in shapely, geopandas, scikit-learn, statsmodels, etc.

Flowfile's per-node data preview sits at the bottom of the canvas — no dedicated Browse node needed

Step-by-step rebuild

Assume you’ve already installed Flowfile (pip install flowfile, or the desktop app) and opened the visual canvas on a blank flow.

Step 1: The two inputs

Drag two Read Data nodes onto the canvas. Point the first at orders.csv, the second at customers.xlsx. Flowfile auto-detects column types and shows a preview of the first 100 rows in the right-hand panel.

Rename the nodes (every node has a description field). Call them “Load orders” and “Load customers”. Those descriptions will appear on the canvas and, later, in any code you generate.

Step 2: Trim the orders

Drop a Select Data node after “Load orders”. Uncheck internal_id and legacy_flag. Rename it “Drop internal columns”.

This is the direct equivalent of the Alteryx Select node. The uncheck-to-remove behaviour is identical.

Step 3: Join orders to customers

Add a Join node. Drag the output of “Drop internal columns” into its left input and the output of “Load customers” into its right. Configure:

  • Left column: customer_id
  • Right column: customer_id
  • Join type: left (keep all orders, attach customer where it matches)

Preview the output. You should see every order row now enriched with customer columns.

If your keys don’t quite match — you have Acme Corp in one file and ACME Corporation in the other — swap the Join for a Fuzzy Match node. We’ll cover that separately; for now assume the keys are clean.

Step 4: Filter to the current quarter

Add a Filter Data node. Condition: order_date >= '2026-01-01' AND order_date < '2026-04-01'. Flowfile’s filter syntax accepts Polars expressions, so pl.col("order_date").dt.quarter() == 1 also works.

Rename to “Current quarter only”.

Step 5: Aggregate

Add a Group By node. Group on region and product. Add two aggregations:

  • spendsum → alias total_spend
  • order_idcount → alias num_orders

This is the Alteryx Summarize node with a different name.

Step 6: Sort

Add a Sort Data node. Sort on total_spend, descending.

Step 7: Pivot

Add a Pivot Data node (Alteryx’s Cross Tab). Configure:

  • Rows: region
  • Columns: product
  • Values: total_spend (sum)

Preview: a wide table with one row per region and one column per product.

Step 8: Write the Excel file

Add a Write Data node. Choose .xlsx, pick a path (weekly_report.xlsx), click save.

Hit Run on the canvas. The whole pipeline executes in seconds on anything up to a few million rows — Polars under the hood. The file lands where you pointed it.

That’s the full migration. Eight nodes. No code. Visually identical to the Alteryx equivalent in structure.

What you get that you didn’t have

You rebuilt the workflow. You got something for free in the process:

  • It’s version-controllable. Flowfile flows save as a single file you can drop in Git. .yxmd is technically XML-ish but nobody version-controls Alteryx in practice.
  • It can export as Python. Click Generate Code and a standalone Polars script appears — one that runs anywhere Python runs, with no Flowfile dependency at execution time. The Alteryx equivalent is Alteryx Server plus a Designer licence.
  • It can schedule itself. Right-click the flow, set an interval, or trigger on an upstream catalog table update. No separate scheduler to provision.
  • Your output can go into the catalog. Replace the Write Data (.xlsx) with a Write to Catalog node and the result becomes a Delta-backed, versioned, time-travellable table other flows can read — see Why Your Data Should Stay on Your Laptop.
  • It’s free and open-source. MIT-licensed; your flow isn’t hostage to a licence renewal.

The team that cares about open formats, laptop performance, and exporting to Python as a safety net picks this up fast. The team that cares about Alteryx’s statistical and spatial node libraries may want to keep some workflows on Alteryx and only migrate the ETL-shaped ones.

The rebuild playbook (at team scale)

If you are coordinating a team migration:

  1. Pick one workflow. Ideally a weekly batch job that is painful but not business-critical.
  2. Rebuild from scratch, not from conversion. Sit with the original author if possible. The conversation is useful; the rebuild is fast.
  3. Run both in parallel for one cycle. Compare outputs row-by-row. duckdb EXCEPT SELECT across the two files catches every difference in seconds.
  4. Retire the Alteryx version. Disable the schedule, archive the .yxmd, move on.
  5. Repeat. Order workflows by pain-per-cycle, not by ease of migration. The ugliest Alteryx workflow is usually the fastest Flowfile win.

A team of three analysts typically retires a backlog of 20–30 workflows in a few weeks at this pace. Nobody ever reports missing the old tool — they report missing specific statistical nodes or a familiar keyboard shortcut, both of which are addressable.

When to stay on Alteryx

Be fair. There are reasons to stay:

  • You rely heavily on Alteryx Spatial or the Predictive tools, and rebuilding them in Python isn’t worth the effort yet.
  • You have Alteryx Server infrastructure and deeply-embedded scheduling/permissioning that would take longer to re-platform than it saves.
  • Your team is non-technical enough that even Flowfile’s Python escape hatch feels scary, and you truly never need it.

Most teams don’t fall into these buckets. If you’re reading this, you probably don’t either.

Try the migration

The lowest-risk way to feel this is to pick your simplest real Alteryx workflow, install Flowfile on your laptop, and rebuild it from scratch while the Alteryx Designer is still open on your other monitor. You’ll know within 30 minutes whether the pattern fits.


Related reads: Open-Source Alternatives to Alteryx for the broader landscape comparison, Polars vs Pandas in 2026 for why Flowfile’s engine choice matters at scale, and Connections, Secrets, and the Catalog in Flowfile’s Python API for the developer-experience side once you’re ready to graduate from purely visual.

Frequently asked questions

How long does a typical Alteryx-to-Flowfile migration take?
One workflow of modest complexity (load, join, filter, group by, export) takes about 30–60 minutes to rebuild from scratch, including time to set up your database and file connections. Rebuilding is almost always faster than attempting to convert .yxmd files automatically, and the rebuild acts as a useful audit of the original logic.
Can I open my .yxmd files directly in Flowfile?
No — the formats are incompatible and no automated converter exists today. What does translate cleanly is the visual concept: every major Alteryx node has a direct Flowfile equivalent, and the canvas metaphor is identical. You drag, you connect, you preview.
Which Alteryx nodes map to which Flowfile nodes?
Input Data → Read Data; Select → Select Data; Filter → Filter; Formula → Formula; Join → Join; Fuzzy Match → Fuzzy Match; Summarize → Group By; Cross Tab → Pivot Data; Transpose → Unpivot Data; Sort → Sort Data; Unique → Drop Duplicates; Append Fields → Cross Join; Union → Union Data; Output Data → Write Data. The vocabulary is different; the concepts are the same.
Do I need to abandon the visual workflow to get the Flowfile advantages?
No. Flowfile is visual-first with a code escape hatch, not the other way around. You can rebuild your entire Alteryx workflow visually in Flowfile and never touch code. If you later want to schedule it, version it in Git, or export to production as a Python script, that option is there.
What does Flowfile have that Alteryx doesn't?
A built-in Delta Lake-backed catalog with time travel, native Polars performance, a Python API that works alongside the canvas, a code generator that exports flows as standalone Polars scripts (no Flowfile dependency at runtime), a browser demo, and an open-source licence. What Alteryx still has: a larger node library for statistical and spatial analysis, and 15 years of Stack Overflow answers.
Can I run old Alteryx workflows and new Flowfile pipelines side-by-side during migration?
Yes — that's the recommended pattern. Rebuild one workflow in Flowfile, run both for a month, compare outputs row-by-row, then retire the Alteryx version. Migrating a team one workflow at a time avoids the all-or-nothing risk of a full cutover.