Snowflake 101 tutorial project

Retail Analytics Pipeline with Snowflake SQL

A beginner-friendly data warehouse project that stages a CSV order extract, loads it into a raw Snowflake table, transforms it into a clean analytics order mart, and validates the result with SQL quality checks.

Project type
Snowflake 101 warehouse tutorial
Source data
Synthetic ecommerce orders CSV
Core tools
Snowflake SQL, internal stage, COPY INTO
Final output
Analytics-ready order mart and quality checks

Project goals

What I wanted to achieve

I built this project to show the first principles of cloud data warehousing in Snowflake. The aim was not to create a huge production system; the aim was to make the core workflow easy to inspect: raw source data enters through a stage, lands in a raw table, gets transformed with SQL, and becomes a clean analytics table that a dashboard or analyst could use.

The project is written as a 101 tutorial because it is meant to be useful for resume reviewers and beginners at the same time. A recruiter can quickly see Snowflake, SQL, staging, loading, transformation, and quality checks. A learner can follow the files step by step and understand why each warehouse object exists.

10source order rows
4Snowflake SQL files
2warehouse table layers
3quality check groups

Before starting

Tech stack, tools, and setup needed

This project can be followed in two ways. The easiest path is using Snowflake's browser UI, Snowsight, where no local installation is needed. The more developer-style path uses SnowSQL to upload the CSV from your machine with a command.

For this portfolio tutorial, the required technology is deliberately small: Snowflake for the warehouse, SQL for transformations and checks, and one CSV file as the source data. I avoided Python, dbt, Airflow, Docker, and external cloud storage here because the goal is to teach Snowflake loading and transformation basics before adding orchestration or production tooling.

Required

  • A Snowflake account or trial account.
  • Snowsight access in the browser.
  • Permission to create a database, schema, warehouse, stage, and tables.
  • The included data/orders.csv file.
  • The SQL files in snowflake-101-tutorial/sql/.

Optional

  • SnowSQL, only if you want to upload the CSV from your terminal using PUT. The PUT command is not a normal Snowsight worksheet command.
  • Git, only if you want to clone or version the project locally.
  • A code editor such as VS Code for reading the SQL files.
  • A spreadsheet viewer if you want to inspect orders.csv manually.

What does not need to be installed

  • No Python environment is required to run the Snowflake tutorial.
  • No local database is required.
  • No Docker container is required.
  • No AWS S3, Azure Blob Storage, or Google Cloud Storage bucket is required for this beginner version.

Expected beginner workflow

  • Open Snowflake in the browser.
  • Create the warehouse objects with 01_setup.sql.
  • Upload the CSV to the internal stage.
  • Run the load, transform, and quality-check scripts in order.

Business scenario

Why this pipeline exists

Imagine a small ecommerce business that exports orders from an operational system. The business wants to understand completed revenue, returned orders, cancelled orders, customer behavior, regional performance, and channel performance. The raw CSV is useful, but it is not yet analytics-ready.

The Snowflake tutorial simulates that common handoff between operational data and analytics data. The raw order extract is loaded exactly as source data, then a separate analytics table applies business rules: completed orders count toward revenue, returned and cancelled orders remain visible but contribute zero net revenue, and useful flags are created for reporting.

Raw data mindset

Keep the source table close to the original file so loading problems are easy to debug.

Analytics mindset

Create a clean table with normalized values, revenue rules, status flags, and a timestamp.

Quality mindset

Check duplicates, missing required IDs, and invalid financial values before trusting reports.

Data source

Where the data came from and why it was designed this way

The dataset is synthetic sample ecommerce data created specifically for this tutorial. It is not copied from a real company, customer system, Kaggle dataset, or private source. I used synthetic data because the project is public on my portfolio and should not expose personal, commercial, or confidential information.

The rows were designed to mimic a realistic order export: each order has a customer, date, region, sales channel, product category, quantity, unit price, discount amount, and status. The file includes completed, returned, and cancelled orders so the SQL transformation has to apply a real business rule instead of simply summing every row.

Design choiceReasonWhat it allows the project to show
Small synthetic CSVSafe to publish and easy to inspect.Beginner-friendly loading without privacy risk.
Multiple order statusesCompleted, returned, and cancelled orders behave differently.Business rules for revenue calculation.
Multiple regions and channelsAnalytics tables usually support grouped reporting.Revenue by region and channel performance queries.
Discount fieldReal revenue often differs from gross order value.Gross revenue, net revenue, and discount-rate calculations.
Repeat customersCustomers can place more than one order.Top-customer summaries and future customer-level features.

Architecture

How the Snowflake flow works

The project uses a simple two-layer warehouse pattern. RAW_ORDERS receives data loaded from the staged CSV file. ANALYTICS_ORDERS is rebuilt from the raw table and contains the cleaned fields used for analysis.

1. Local CSV source: data/orders.csv
2. Snowflake internal stage: ORDER_STAGE
3. Raw landing table: RAW_ORDERS
4. Analytics table: ANALYTICS_ORDERS
5. Quality checks and business summaries

Why this flow

Why I used stage -> raw table -> analytics table

I chose this flow because it mirrors a common warehouse pattern while staying simple enough for a 101 tutorial. Each layer has a different responsibility, so the project is easier to debug, explain, and extend.

StepWhy it existsProblem it solves
CSV source fileRepresents an operational extract from an ecommerce system.Gives the project a realistic starting point.
Internal stageSnowflake loads files from stages, not directly from a local folder in a worksheet.Separates file storage from table loading.
File formatTells Snowflake how to read headers, quotes, blanks, and CSV structure.Prevents parsing mistakes during load.
Raw tableKeeps a landing copy close to the source file.Makes load issues easier to inspect before applying business logic.
Analytics tableApplies cleaning, normalization, revenue rules, and flags.Creates a clean table suitable for dashboards or analysis.
Quality checksTests whether the final table is reasonable before trusting the numbers.Catches duplicates, missing IDs, and invalid financial values.

Data model

Source columns and purpose

The dataset is synthetic and safe to publish. It contains enough variation to show meaningful transformation logic: multiple channels, several regions, completed orders, returned orders, cancelled orders, discounts, and repeat customers.

ColumnPurposeHow it is used
order_idUnique order identifierChecked for duplicates.
customer_idCustomer identifierRequired for customer-level summaries.
order_dateOrder dateUsed for ordering and future time-based reporting.
regionSales regionNormalized and grouped for revenue reporting.
channelWeb, mobile, or storeNormalized and used for channel performance.
quantity, unit_priceOrder value inputsUsed to calculate gross revenue.
discount_amountDiscount appliedUsed to calculate net revenue and discount rate.
statusCompleted, returned, or cancelledControls revenue logic and status flags.

Detailed tutorial

Step-by-step implementation

The project is split into four SQL files so each stage has one clear responsibility. This makes it easier to explain during interviews and easier for a reviewer to inspect.

Prepare the Snowflake workspace

Before running project SQL, open Snowsight and choose a role that can create objects. In a personal trial account this may be ACCOUNTADMIN. In a shared or work account, a limited learning role with database, schema, warehouse, stage, and table privileges is better.

Why it matters: Snowflake permissions control who can create warehouses, databases, schemas, stages, and tables. If the role is too limited, the setup script will fail before the tutorial begins.

Create Snowflake objects

Run sql/01_setup.sql. This creates RESUME_DEMO_DB, RETAIL_ANALYTICS, COMPUTE_WH, CSV_WITH_HEADER, ORDER_STAGE, RAW_ORDERS, and ANALYTICS_ORDERS.

Why it matters: Snowflake work starts with clear object organization. Separating database, schema, stage, and tables makes the project easier to operate and explain.

Inspect the source CSV

Open data/orders.csv before uploading it. Confirm that it has a header row, 10 order rows, numeric quantity and price fields, and the statuses completed, returned, and cancelled.

Why it matters: good data work starts before loading. Looking at the source file helps explain why the file format skips one header row and why the transformation needs status-based revenue logic.

Upload the CSV into the internal stage

Upload data/orders.csv to @ORDER_STAGE through Snowsight's data loading flow. If using SnowSQL, run PUT file://data/orders.csv @ORDER_STAGE AUTO_COMPRESS=TRUE; from your terminal in the project folder, not inside a normal worksheet.

Why it matters: stages are the bridge between files and warehouse tables. This project uses an internal stage so a beginner does not need AWS S3, Azure Blob Storage, or Google Cloud Storage. After upload, run LIST @ORDER_STAGE; if you want to confirm Snowflake can see the file.

Load the raw table with COPY INTO

Run sql/02_load_raw_orders.sql. The script truncates RAW_ORDERS, loads the staged file with COPY INTO, aborts on load errors, and returns a row count.

Expected result: RAW_ORDERS contains 10 rows from the sample CSV.

Transform raw orders into analytics orders

Run sql/03_transform_orders.sql. The transformation creates ANALYTICS_ORDERS with normalized text fields, gross revenue, net revenue, completed and returned flags, discount rate, and load timestamp.

Business rule: only completed orders contribute to net_revenue. Returned and cancelled orders stay in the table for visibility but contribute zero revenue.

Run quality checks and summaries

Run sql/04_quality_checks.sql. It checks duplicate order IDs, missing customer IDs, invalid financial values, revenue by region, channel performance, and top customers.

Why it matters: a small data pipeline is still only useful if the output can be trusted. These checks show how I think about reliability even in a beginner tutorial.

Transformation logic

Key SQL idea

The central transformation is the revenue rule. The raw file has order status, quantity, price, and discount. The analytics table converts those fields into reporting metrics.

CASE
    WHEN LOWER(status) = 'completed'
        THEN (quantity * unit_price) - COALESCE(discount_amount, 0)
    ELSE 0
END AS net_revenue
CompletedCounts toward net revenue after discount.
ReturnedRemains visible, but net revenue is zero.
CancelledRemains visible, but net revenue is zero.

Beginner checks

What to check if something looks wrong

The most common mistakes in a first Snowflake loading project are not complex SQL problems. They are usually setup details: the wrong role is selected, the file was not uploaded to the expected stage, the warehouse is not selected, or the staged file name does not match the load pattern.

SymptomLikely causeWhat to run or check
Setup script failsThe selected role cannot create objects.Use a role with create privileges, or ask for database, schema, warehouse, stage, and table permissions.
COPY INTO loads zero rowsThe CSV was not uploaded to @ORDER_STAGE, or the staged file name does not match the pattern.Run LIST @ORDER_STAGE; and confirm the file is visible.
Raw row count is not 10The stage may contain an extra matching file from a previous attempt.Remove old staged files or create a fresh stage, then upload only orders.csv.
Date or numeric values fail to loadThe file format or CSV content does not match the table columns.Confirm the CSV has one header row and values like 2026-01-03, 329.99, and completed.
Revenue looks too highReturned or cancelled orders may have been counted as revenue.Check the CASE expression in 03_transform_orders.sql; only completed orders should contribute to net_revenue.

Real project screenshots

What the project looks like locally

These screenshots are captured from the actual local website and project files in this repository. They show the project page, folder structure, SQL setup script, and a preview of the generated analytics logic/output.

Screenshot of the Snowflake 101 project page
Portfolio project page for the Snowflake 101 retail analytics tutorial.
Screenshot of the Snowflake project folder structure
Actual local project folder with data and SQL tutorial files.
Screenshot of the Snowflake setup SQL file
Actual setup SQL creating the database, schema, warehouse, stage, and tables.
Screenshot of Snowflake tutorial output preview
Preview of the analytics output and quality checks based on the included CSV data.

Expected findings

What the output shows

Revenue rule is clear

Completed orders contribute to revenue. Returned and cancelled orders remain visible for reporting but do not inflate revenue.

North leads revenue

The North region has strong completed revenue because it includes a high-value electronics order and another completed home order.

Web is an important channel

Web has multiple completed orders, making it a useful channel to inspect in a dashboard or follow-up analysis.

Project evidence

What this demonstrates

This project demonstrates the core warehouse workflow behind many analytics engineering tasks: receiving a source extract, staging it safely, loading a raw table, applying SQL business rules, and validating the final table before using it for reporting.

The most important skill shown is not just writing SQL. It is separating responsibilities: source data remains inspectable in RAW_ORDERS, business logic lives in the transformation step, and quality checks make the output easier to trust.

Run order

Files to execute

-- 0. Choose a Snowflake role with object-creation privileges

-- 1. Inspect the source file locally
data/orders.csv

-- 2. Create Snowflake objects
sql/01_setup.sql

-- 3. Upload the source file to @ORDER_STAGE
data/orders.csv

-- Optional upload verification
LIST @ORDER_STAGE;

-- 4. Load raw rows
sql/02_load_raw_orders.sql

-- 5. Build analytics table
sql/03_transform_orders.sql

-- 6. Validate and summarize
sql/04_quality_checks.sql