Think of NiFi as the delivery truck dispatcher at FreshMart — it decides what data goes where, in what format, on what schedule.
AWS SQS is like a waiting line at the deli counter. SNS is like the store's PA announcement system.
Build your own custom NiFi processor in Go — add special business logic that no built-in processor has.
Learn NiFi through FreshMart — a grocery store chain. Each workflow below is a real-world data task you can build step by step. No prior NiFi experience needed.
Every time a cashier scans an item, a sale record is created. NiFi collects all register data across 50 stores, normalizes it, and loads it into the data warehouse every 15 minutes.
/pos/sales/*.json folder
store_id, sku, qty, price, cashier_id, timestamp
price → unit_price, timestamp → sale_ts
environment=prod, ingest_time=${now()}
DW_SALES_FACT table AND publish to pos-sales Kafka topic
{"store_id":"FM-042", "sku":"BAN-CAVENDISH-1LB", "qty":3, "price":0.99, "cashier_id":"EMP-1147", "timestamp":"2024-11-01T14:23:00Z"}
NiFi watches inventory levels across all warehouses. When bananas drop below 200 cases, it automatically triggers a reorder to the vendor and notifies the store manager.
INVENTORY table every 5 min, capture rows where updated_at > last_runqty < reorder_threshold → LOW-STOCK path, else → OK pathhttps://vendor.dole.com/api/ordersfreshmart-reorder-alerts SQS queue for manager appREORDER_LOG table for auditEvery morning, 12 vendors drop CSV files on an SFTP server. NiFi picks them up, validates each row, maps vendor SKUs to FreshMart SKUs, and loads them into the receiving system.
sftp://vendors.freshmart.com/incoming/*.csv every 30 minvendor_id from filename: ${filename:substringBefore('_')}item_code in product catalog DBfailure/ folder + email alert to vendorvendor-deliveries topic AND archive original CSV to S3DOLE-BAN-001, Cavendish Bananas 1lb, 240, 0.28, 2024-11-01
→ After LookupRecord + JoltTransform:
{"sku":"BAN-CAVENDISH-1LB", "name":"Bananas 1lb", "cases":240, "unit_cost":0.28}
When corporate updates the store planogram (which product goes on which shelf), NiFi distributes the changes to all 50 store systems, the mobile app, and the digital shelf labels.
aisle, section, shelf_level as FlowFile attributesPRODUCT_LOCATIONS table (used by mobile app)Every Monday morning, NiFi pulls the weekly staff schedule from the HR system, generates shift reminders for each employee, and syncs attendance records to payroll.
https://hr.freshmart.internal/api/schedules/week (triggers Monday 6 AM)department: Produce, Deli, Checkout, Stocking"Hi ${emp_name}, your shift is ${day} at ${start_time}"PAYROLL_ATTENDANCE tableCustomers submit feedback via the app, kiosks, or email. NiFi processes it in real time — routing complaints about specific products to the safety team and triggering automatic recalls if the same SKU gets 5+ complaints in an hour.
customer-feedback Kafka topic"spoiled|sick|recall|rotten|smell" → SAFETY pathCUSTOMER_FEEDBACK table for analysisNiFi is a visual data pipeline builder — drag, drop, connect processors on a canvas, and data flows automatically. No heavy coding needed!
Drag-and-drop canvas. Connect boxes with arrows. Each box = a Processor. Each arrow = a Connection carrying data.
Data moves as FlowFiles — packets with content + attributes (metadata). Like envelopes: the letter inside + address + stamps on the outside.
300+ built-in processors: read files, call APIs, transform JSON, talk to Kafka, SQS, databases. Each has config properties and output ports.
NiFi automatically slows data when queues get too full — like a store blocking the entrance when checkout is backed up.
Every FlowFile's journey is recorded — where it came from, every transformation, where it went. Full audit trail, like a receipt for each data item.
Run multiple NiFi nodes as a cluster — handled by ZooKeeper! If one node fails, others keep the data flowing.
| Feature | Apache Kafka | Apache NiFi | Together |
|---|---|---|---|
| Primary Job | Message streaming 🚀 | Data routing & ETL 🗺️ | Ingest → Stream → Route |
| Speed | Millions/sec ⚡ | Thousands/sec 🏃 | Best of both |
| Setup | Code required 💻 | Visual canvas 🖱️ | Visual + powerful |
| Best for | Real-time events | Data movement | Enterprise ETL |
| FreshMart use | POS → Inventory | Vendor CSV → Kafka | Full pipeline |
http://localhost:8443/nifi/sftp/vendor_deliveries/ folder.pos-sales to forward daily summaries to the data warehouse.{item_code} to FreshMart's format {sku}.store.id, processed.timestamp, environment.vendor-deliveries topic.DW_SALES_FACT table in IBM Db2.freshmart-urgent-alerts SQS queue for the manager app.s3://freshmart-data-lake/.When Dole Foods drops a CSV of banana deliveries in the SFTP folder, NiFi immediately picks it up, validates each row, translates vendor codes to FreshMart SKUs, publishes to Kafka (so inventory updates instantly), and archives the original file to S3 for compliance.
When 3+ HIGH severity complaints about the same product arrive within 30 minutes, NiFi routes them to a recall flow: broadcasts an SNS alert (triggers SMS to manager, email to vendor), sends an urgent SQS message to the manager app, and publishes a Kafka event to update all digital shelf signs to "REMOVED".
Every 10 minutes, NiFi batches the latest POS sale events from Kafka, enriches each record with full product names and categories via a database lookup, transforms the structure to match the data warehouse star schema, and bulk-inserts into IBM Db2 — all while archiving raw JSON to the S3 data lake.
SQS (Simple Queue Service) is like the numbered ticket system at the deli counter. You take a ticket, your order waits in line, and the deli worker processes one at a time — no message is ever lost.
High throughput, messages may arrive out of order or more than once. Great for FreshMart inventory updates where processing speed matters more than perfect order.
First-In-First-Out — messages delivered exactly once, in order. Used for FreshMart vendor payment processing where order matters.
If a message fails processing 5 times, SQS moves it to a Dead-Letter Queue (DLQ) for investigation. Like setting aside the deli order nobody is picking up.
SNS (Simple Notification Service) is like the store PA system. One announcement reaches everyone who's listening — SMS, email, SQS queues, Lambda functions, all at once.
NiFi's built-in processors don't cover every business need. When you need FreshMart-specific logic — like validating a proprietary barcode format — you can build a custom processor. Go is great because it's fast, compiles to a single binary, and has excellent AWS SDK support.
github.com/aws/aws-sdk-go-v2github.com/segmentio/kafka-gogithub.com/stretchr/testifyJava is NiFi's native language — a Java custom processor lives inside the NiFi JVM, gets full access to the FlowFile API, Controller Services, State Management, and the UI's Properties panel. It becomes a first-class citizen indistinguishable from the 300+ built-in processors.
Packaged as a NiFi Archive (.nar) — NiFi's plugin format. Drop it in the lib/ folder, restart, and it appears in the Add Processor dialog like any built-in. Full UI, custom properties, all relationships.
Shared connection pools, credential providers, schema registries — a Controller Service is a reusable Java object that processors reference. Build a FreshMart vendor lookup service once, use it in 10 processors.
A Reporting Task runs on a schedule and has access to NiFi cluster-level metrics — queue depths, throughput, bulletins. Use it to push FreshMart pipeline health metrics to Prometheus or CloudWatch.
| Capability | Java NAR | Go stdin/stdout |
|---|---|---|
| Native NiFi UI properties | ✅ Full property descriptors, validators, EL | ❌ No UI properties |
| Access Controller Services | ✅ Direct Java API | ❌ Not directly |
| Multiple relationships | ✅ success, failure, retry, etc. | ⚠️ success/failure only |
| State management | ✅ Built-in local/cluster state | ❌ External state needed |
| Deployment | ⚠️ Maven build → .nar → restart | ✅ Copy binary, no restart |
| Dev speed | ⚠️ Slower (Maven, Java verbosity) | ✅ Fast iteration |
| Performance | ✅ No IPC overhead, JVM optimized | ⚠️ stdin/stdout IPC cost |
| Best for | Complex routing, shared services, production enterprise use | Quick business logic, microservice calls, AWS SDK operations |
Everything the NiFi Administrator is responsible for — installation, cluster setup, security, user management, monitoring, backup, and performance tuning — with real FreshMart commands and configs.
| Role | NiFi Permissions | FreshMart Users |
|---|---|---|
| NiFi Admin | All permissions. Create users, manage policies, install JARs | alice@freshmart.com (Platform Eng) |
| Flow Developer | View + Modify + Operate all Process Groups. Cannot manage users | bob@freshmart.com, carol@freshmart.com |
| Flow Operator | Start/Stop processors. View all. Cannot edit configs | ops-team@freshmart.com (DevOps on-call) |
| Store Manager Viewer | View only their store's Process Group. No configs, no data download | store-managers AD group (50 users) |
| Data Analyst | View canvas + provenance. Cannot download FlowFile content | analytics-team AD group |
| Auditor | Provenance only. No canvas, no configs, no content | compliance@freshmart.com |
| Metric | Alert Threshold |
|---|---|
| Queue depth | >80% of back-pressure limit |
| ERROR bulletins | Any ERROR = page on-call immediately |
| JVM heap | >80% for 5+ min |
| DLQ depth | freshmart-dlq > 0 messages |
| Content repo disk | 70% full on /data/nifi/content |
| Active threads | Maxed out for 10+ min |
Back up after every deploy. Configs are small (<1MB) but losing them is catastrophic.
| What | Path |
|---|---|
| Flow config | /opt/nifi/conf/flow.json.gz |
| Properties | /opt/nifi/conf/nifi.properties |
| Users & Policies | users.xml, authorizations.xml |
| Keystores | keystore.jks, truststore.jks |
Tune for your workload. FreshMart uses medium sizing (1k–10k FlowFiles/sec).
Five real FreshMart personas. Each user has different permissions and a completely different day-to-day experience with NiFi.
Bob builds FreshMart's data pipelines. He works in the dev NiFi environment and promotes to production via NiFi Registry. His Monday: build the Cold Chain Monitoring flow from scratch.
http://sensors.freshmart.internal/api/temps. Schedule: Timer Driven, every 2 min.too_warm = \${temperature:gt(40)}. Routes fridge failures immediately.freshmart-urgent-alerts.fifo. Message Group ID: \${sensor_id}COLD_CHAIN_LOG. Verify records appear in PostgreSQL.Diana keeps production flows running. She responds to PagerDuty alerts and restarts failed processors. Her 2 AM: "NiFi ERROR Bulletins Detected" fires — the PostgreSQL database crashed due to disk full.
sudo systemctl status postgresql → confirms disk full crash.aws sqs get-queue-attributes --queue-name freshmart-dlq. Replay any failed FlowFiles via Menu → Provenance → Replay.Emily traces data quality issues back to their NiFi source. Her task: find out why Saturday sales numbers look wrong.
Marcus sees only the "Store FM-042 Status" Process Group. He can see green/red pipeline status but cannot click configs, view data, or see other stores.
/flow READ (to log in)During a food safety audit, Janet must prove that recalled product BAN-CAVENDISH-1LB was tracked through every system on 2024-11-01. NiFi provenance is the forensic trail.
| Event | Processor | What Happened | Time |
|---|---|---|---|
| RECEIVE | GetSFTP | Received DOLE_20241101.csv from vendors SFTP | 06:14:23 |
| FORK | SplitRecord | Batch split into 240 individual product FlowFiles | 06:14:25 |
| ATTR MOD | LookupRecord | Mapped DOLE-BAN-001 → sku=BAN-CAVENDISH-1LB | 06:14:31 |
| SEND | PutDatabaseRecord | Inserted into DW_SALES_FACT — 240 cases logged | 06:14:51 |
| SEND | PublishKafka | Published to vendor-deliveries Kafka topic | 06:14:51 |
| Symptom | Diagnosis & Fix |
|---|---|
| Processor shows INVALID (yellow) | Right-click → Configure. Red fields = missing required properties. Check Controller Services are enabled and Parameter Context is assigned to this Process Group. |
| Queue growing, back-pressure | Downstream too slow. Fix: (1) increase concurrent tasks on slow processor, (2) scale DB connection pool size, (3) reduce MergeRecord batch size to clear faster. |
| FlowFiles vanishing silently | Check auto-terminate relationships. Right-click processor → Configure → Relationships tab. Uncheck any relationship that should not be auto-terminated. Connect it to an error path. |
| ExecuteGroovyScript fails | Read bulletin stack trace. Common: (1) missing import, (2) null FlowFile — add if (!ff) return, (3) groovy-all JAR not in /opt/nifi/lib/. |
| InvokeHTTP → 401 Unauthorized | API token expired. Update Parameter Context with new token. Use: Add Header: Authorization → Bearer #{api.token} |
| NiFi node shows Disconnected | Check: (1) ZooKeeper running on all 3 nodes, (2) network ports 2181/11443/6342 open between nodes, (3) tail nifi-app.log on disconnected node for root cause. |
| JoltTransformJSON → empty output | JOLT spec error. Test at jolt.demo.io. Common: wrong shift path (field names are case-sensitive). Add LogAttribute downstream to print actual output. |
| PutDatabaseRecord → "column not found" | Schema mismatch. Record Writer schema does not match table column names. Add an UpdateRecord step to rename fields, or update the Avro schema in the Record Writer. |