CLI Setup
Install and verify the Databricks CLI locally with WinGet, including the direct Windows alias path.
Databricks ยท Data engineering tutorial
A beginner portfolio project that documents local Databricks CLI setup, workspace authentication, CSV upload, Asset Bundle deployment, PySpark transformations, Delta table outputs, and SQL validation.
A practical first Databricks workflow, with documented commands and screenshot capture points.
Install and verify the Databricks CLI locally with WinGet, including the direct Windows alias path.
Deploy a Databricks job from databricks.yml and run a notebook as a repeatable workflow.
Use PySpark to clean retail orders and publish analytics-ready Delta tables with validation queries.
| File | Purpose | Skill shown |
|---|---|---|
data/retail_orders.csv | Small synthetic ecommerce order export. | CSV source modeling |
databricks.yml | Defines the Databricks Asset Bundle job. | Bundle configuration |
notebooks/retail_orders_pipeline.py | Reads CSV data, cleans orders, derives metrics, and writes Delta tables. | PySpark transformations |
sql/quality_checks.sql | Checks duplicates, required fields, invalid amounts, revenue summaries, and top customers. | SQL validation |
docs/runbook-with-screenshots.md | Documents each step with screenshot filenames and expected results. | Implementation documentation |
# 1. Verify CLI
databricks -v
# 2. Authenticate
databricks auth login --host https://YOUR-WORKSPACE-URL
# 3. Upload sample CSV
databricks fs cp data/retail_orders.csv dbfs:/Volumes/main/default/demo/retail_orders.csv --overwrite
# 4. Deploy and run
databricks bundle validate
databricks bundle deploy
databricks bundle run retail_orders_pipeline
# 5. Validate in Databricks SQL
sql/quality_checks.sql
Built a Databricks retail data pipeline tutorial that uses the Databricks CLI, Asset Bundles, PySpark, and Delta tables to load raw CSV orders, clean and enrich records, publish analytics-ready metrics, and validate output with screenshot-documented SQL checks.