Self-hosted, open-source visual ETL

Flowfile connect your data, then keep flowing.

A fast, self-hosted visual ETL platform that's quick to set up — code to visual, visual to code.

Features

One self-hosted platform, every workflow

Everything is a pipeline — but every pipeline plugs into a data catalog, Delta Lake storage, streaming ingestion, scheduling, and a Polars-compatible Python API. A full platform, running on your own infrastructure.

Visual Editor

Drag-and-drop nodes to build complex data pipelines without writing a single line of code. Perfect for data analysts and anyone who prefers a visual approach.

Intuitive drag-and-drop interface
Real-time data preview at each step
28 transformation nodes
Write to Delta Lake via the catalog

Python API

Write pipelines in Python with a familiar, Polars-like syntax. Full programmatic control with the same engine under the hood.

import flowfile_frame as ff

df = ff.read_csv("sales.csv")
result = (
    df.filter(ff.col("sales") > 1000)
      .group_by("category")
      .agg(ff.sum("sales"))
)

Data Catalog & Delta Lake

Tables you write to the catalog are stored as Delta Lake with version history, time travel, and merge/upsert support. Track lineage, runs, and artifacts in one place.

Kafka & Streaming

Ingest from Kafka or Redpanda as a canvas node or with the Python API. Bridge batch and streaming workloads in one pipeline.

Multi-Cloud & Databases

Read and write to S3, Azure Data Lake, and GCS. Connect to PostgreSQL, MySQL, and files — CSV, Excel, Parquet. Your data, wherever it lives.

Sandboxed Python Kernels

Run arbitrary Python in isolated Docker containers. Use matplotlib, scikit-learn, or your own libraries — results flow back into the pipeline.

Scheduling & Triggers

Run flows on intervals or trigger them when catalog tables update. Built-in orchestration — no external scheduler needed.

Polars Performance

Built on Polars, not Pandas — lazy evaluation and query optimization, the same speed you'd get writing Polars by hand. Export flows as clean Polars code: a complete script or an honest error, never a silently broken one.

Integrations

Connect to your data, transform anything

20 connectors and 28 transformation nodes — files, databases, cloud storage, streams, and APIs in; joins, pivots, window functions, ML, SQL, and Python in between. Same building blocks on the canvas and in the Python API.

CSVExcelParquetArrow IPCDelta LakePostgreSQLAmazon S3JoinFuzzy matchSelect columnsDrop duplicates

Explore all integrations →

How It Works

Visual pipeline building

Connect nodes to build your data pipeline. Each node transforms the data as it flows through — from raw input to final output.

flowfile — Visual Editor

CSV Input

Filter

Group By

Output

Click a node to see its data

Raw Data 8 rows

product	category	sales
Widget A	Electronics	1,200
Widget B	Electronics	890
Gadget X	Home	2,100
Gadget Y	Home	760
Tool Pro	Tools	3,200

Raw Data

Sales data loaded from CSV file

8 rows × 6 cols

Interactive

Try it yourself

Live Demo Preview

Open fullscreen

Loading preview...

Pinch to zoom • Tap to interact • Best on desktop

Live Demo Lite

Open in new tab

Loading Flowfile...

First load may take a moment while Pyodide initializes

This is a lightweight browser version. Install the full version for database connections, larger datasets, and more.

Python API

Same pipeline, in code

Prefer coding? Build the exact same pipeline using the Flowfile Python API. Export visual flows as code, or write pipelines programmatically.

pipeline.py

import flowfile_frame as ff

# Read and filter data
df = ff.read_csv("sales_data.csv")
filtered = df.filter(ff.col("sales") > 1000)

# Group by category and aggregate
result = (
    filtered
    .group_by("category")
    .agg(
        ff.sum("sales").alias("total_sales"),
        ff.sum("quantity").alias("total_quantity"),
        ff.count().alias("count")
    )
)

result.write_parquet("output.parquet")

Polars-like syntax

Export visual flows as code

Lazy evaluation

Community

Install nodes other users built

Need a node Flowfile doesn't ship? Someone may have built it already — community nodes install in one click, right from inside the app. And sharing your own is easy: build it in the app, hit publish, and Flowfile opens the pull request for you.

Auto Date Parser String Cleaner kmeans clustering kernel Mood Emoji

Browse community nodes →

Get Started

Up and running in seconds

Pick how you want to run Flowfile — desktop app, Python package, or self-hosted.

macOS Apple Silicon · .dmg macOS Intel · .dmg Windows x64 · .exe Linux Debian/Ubuntu · .deb

Version 0.13.2 · all platforms, checksums & signatures

pip install flowfile

flowfile run ui

Launches the visual editor in your browser. Requires Python 3.10+.

git clone https://github.com/Edwardvaneechoud/Flowfile.git
cd Flowfile
docker compose up -d

Runs the full stack — Designer, Core & Worker. Open localhost:8080 when it's up.

Read the Docs View on GitHub

Why Flowfile

What makes it unique

A full self-hosted data platform — visual and code, catalog and connections — running on your infrastructure, not a vendor's SaaS. It runs on a single machine — not Spark, not a warehouse. Build the logic here, export the Polars code, run it where production lives.

Visual meets code

Build pipelines visually, then export as clean Python code. Switch between both anytime — no vendor lock-in.

Self-hosted, deploy anywhere

Runs on your machine with a single pip install, your own Docker, or as a desktop app. Your data never leaves your infrastructure.

Catalog as the source of truth

Delta Lake storage, lineage, run history, and event-based triggers. Your data assets live in one place — not scattered across notebooks.

Built on Polars

Under the hood, Flowfile uses Polars for fast, memory-efficient data processing. Same performance you'd get in code.

AI helps you build, document, debug

Describe a flow and the assistant builds it on the canvas. Generate node descriptions as you go. Get a one-paragraph diagnosis on failed runs.

Bring your own model

Plug in your own provider key. No hosted Flowfile model, no telemetry — the assistant sees your graph, never your data.

Connect

Let's connect

Have questions, feedback, or want to contribute? Reach out or support the project.

Self-Hosted & Open Source

Ready to manage your data your way?

Flowfile is free, MIT-licensed, and runs on your infrastructure. Build pipelines visually, export real Polars code — no license traps, no telemetry.

Get Started Free Star on GitHub

Flowfile connect your data, then keep flowing.

One self-hosted platform, every workflow

Visual Editor

Python API

Data Catalog & Delta Lake

Kafka & Streaming

Multi-Cloud & Databases

Sandboxed Python Kernels

Scheduling & Triggers

Polars Performance

Connect to your data, transform anything

Visual pipeline building

Raw Data

Try it yourself

Same pipeline, in code

Install nodes other users built

Up and running in seconds

What makes it unique

Visual meets code

Self-hosted, deploy anywhere

Catalog as the source of truth

Built on Polars

AI helps you build, document, debug

Bring your own model

Let's connect

Report an Issue

Ask a Question

Buy me a coffee

Connect on LinkedIn

Ready to manage your data your way?

Flowfile connect your data, then keep flowing. connect transform explore store organize your data, then keep flowing.

One self-hosted platform, every workflow

Visual Editor

Python API

Data Catalog & Delta Lake

Kafka & Streaming

Multi-Cloud & Databases

Sandboxed Python Kernels

Scheduling & Triggers

Polars Performance

Connect to your data, transform anything

Visual pipeline building

Raw Data

Try it yourself

Same pipeline, in code

Install nodes other users built

Up and running in seconds

What makes it unique

Visual meets code

Self-hosted, deploy anywhere

Catalog as the source of truth

Built on Polars

AI helps you build, document, debug

Bring your own model

Let's connect

Report an Issue

Ask a Question

Buy me a coffee

Connect on LinkedIn

Ready to manage your data your way?

Flowfile connect your data, then keep flowing.