Manage your data your way
The local, open-source data platform. Everything is a pipeline — visual or code — but every pipeline plugs into a data catalog, Delta Lake storage, streaming ingestion, scheduling, and a Polars-compatible Python API. That's what makes it a platform, not just an ETL tool — running on your machine, your infrastructure, your cloud.
One local platform, every workflow
Everything is a pipeline — but every pipeline plugs into a data catalog, Delta Lake storage, streaming ingestion, scheduling, and a Polars-compatible Python API. A full platform, running locally on your infrastructure.
Visual Editor
Drag-and-drop nodes to build complex data pipelines without writing a single line of code. Perfect for data analysts and anyone who prefers a visual approach.
- Intuitive drag-and-drop interface
- Real-time data preview at each step
- 30+ transformation nodes
- Write to Delta Lake via the catalog
Python API
Write pipelines in Python with a familiar, Polars-like syntax. Full programmatic control with the same powerful engine under the hood.
import flowfile_frame as ff
df = ff.read_csv("sales.csv")
result = (
df.filter(ff.col("sales") > 1000)
.group_by("category")
.agg(ff.sum("sales"))
) Data Catalog & Delta Lake
Every table is stored as Delta Lake with version history, time travel, and merge/upsert support. Track lineage, runs, and artifacts in one place.
Kafka & Streaming
Ingest from Kafka or Redpanda as a canvas node or with the Python API. Bridge batch and streaming workloads in one pipeline.
Multi-Cloud & Databases
Read and write to S3, Azure Data Lake, and GCS. Connect to PostgreSQL, MySQL, and files — CSV, Excel, Parquet. Your data, wherever it lives.
Sandboxed Python Kernels
Run arbitrary Python in isolated Docker containers. Use matplotlib, scikit-learn, or your own libraries — results flow back into the pipeline.
Scheduling & Triggers
Run flows on intervals or trigger them when catalog tables update. Built-in orchestration — no external scheduler needed.
Polars Performance
Built on Polars, not Pandas. Enjoy 10-100x faster execution with lazy evaluation and query optimization. Export flows as clean Python code — no vendor lock-in.
Visual pipeline building
Connect nodes to build your data pipeline. Each node transforms the data as it flows through — from raw input to final output.
Click a node to see its data
| product | category | sales |
|---|---|---|
| Widget A | Electronics | 1,200 |
| Widget B | Electronics | 890 |
| Gadget X | Home | 2,100 |
| Gadget Y | Home | 760 |
| Tool Pro | Tools | 3,200 |
Raw Data
Sales data loaded from CSV file
Try it yourself
Pinch to zoom • Tap to interact • Best on desktop
This is a lightweight browser version. Install the full version for database connections, larger datasets, and more.
Same pipeline, in code
Prefer coding? Build the exact same pipeline using the Flowfile Python API. Export visual flows as code, or write pipelines programmatically.
import flowfile_frame as ff
# Read and filter data
df = ff.read_csv("sales_data.csv")
filtered = df.filter(ff.col("sales") > 1000)
# Group by category and aggregate
result = (
filtered
.group_by("category")
.agg(
ff.sum("sales").alias("total_sales"),
ff.sum("quantity").alias("total_quantity"),
ff.count().alias("count")
)
)
result.write_parquet("output.parquet") Up and running in seconds
Install Flowfile with pip and launch the visual editor with a single command.
pip install flowfile flowfile run ui Drag & drop nodes to create your data pipeline
What makes it unique
A full local data platform — visual and code, catalog and connections — running on your infrastructure, not a vendor's SaaS.
Visual meets code
Build pipelines visually, then export as clean Python code. Switch between both anytime — no vendor lock-in.
Local-first, deploy anywhere
Runs on your machine with a single pip install, your own Docker, or as a desktop app. Your data never leaves your infrastructure.
Catalog as the source of truth
Delta Lake storage, lineage, run history, and event-based triggers. Your data assets live in one place — not scattered across notebooks.
Built on Polars
Under the hood, Flowfile uses Polars for fast, memory-efficient data processing. Same performance you'd get in code.
Let's connect
Have questions, feedback, or want to contribute? Reach out or support the project.
Report an Issue
Found a bug or have a feature request? Open an issue on GitHub.
Open GitHub IssuesAsk a Question
Need help or want to discuss ideas? Join the GitHub Discussions.
Start a DiscussionBuy me a coffee
Enjoying Flowfile? Support development with a small donation.
Support the ProjectConnect on LinkedIn
Let's connect! Follow for updates and data engineering insights.
View ProfileReady to manage your data your way?
Join the community building the local, open-source data platform. Free, self-hosted, and ready for production — on your infrastructure.