Snowflake 101 Retail Analytics Tutorial

Project goals

What I wanted to achieve

I built this project to show the first principles of cloud data warehousing in Snowflake. The aim was not to create a huge production system; the aim was to make the core workflow easy to inspect: raw source data enters through a stage, lands in a raw table, gets transformed with SQL, and becomes a clean analytics table that a dashboard or analyst could use.

The project is written as a 101 tutorial because it is meant to be useful for resume reviewers and beginners at the same time. A recruiter can quickly see Snowflake, SQL, staging, loading, transformation, and quality checks. A learner can follow the files step by step and understand why each warehouse object exists.

10source order rows

4Snowflake SQL files

2warehouse table layers

3quality check groups

Before starting

Tech stack, tools, and setup needed

This project can be followed in two ways. The easiest path is using Snowflake's browser UI, Snowsight, where no local installation is needed. The more developer-style path uses SnowSQL to upload the CSV from your machine with a command.

For this portfolio tutorial, the required technology is deliberately small: Snowflake for the warehouse, SQL for transformations and checks, and one CSV file as the source data. I avoided Python, dbt, Airflow, Docker, and external cloud storage here because the goal is to teach Snowflake loading and transformation basics before adding orchestration or production tooling.

Required

A Snowflake account or trial account.
Snowsight access in the browser.
Permission to create a database, schema, warehouse, stage, and tables.
The included data/orders.csv file.
The SQL files in snowflake-101-tutorial/sql/.

Optional

SnowSQL, only if you want to upload the CSV from your terminal using PUT. The PUT command is not a normal Snowsight worksheet command.
Git, only if you want to clone or version the project locally.
A code editor such as VS Code for reading the SQL files.
A spreadsheet viewer if you want to inspect orders.csv manually.

What does not need to be installed

No Python environment is required to run the Snowflake tutorial.
No local database is required.
No Docker container is required.
No AWS S3, Azure Blob Storage, or Google Cloud Storage bucket is required for this beginner version.

Expected beginner workflow

Open Snowflake in the browser.
Create the database, schema, warehouse, file format, stage, and raw table with 01_setup.sql.
Upload the CSV to the internal stage.
Run the load, transform, and quality-check scripts in order.

Business scenario

Why this pipeline exists

Imagine a small ecommerce business that exports orders from an operational system. The business wants to understand completed revenue, returned orders, cancelled orders, customer behavior, regional performance, and channel performance. The raw CSV is useful, but it is not yet analytics-ready.

The Snowflake tutorial simulates that common handoff between operational data and analytics data. The raw order extract is loaded exactly as source data, then a separate analytics table applies business rules: completed orders count toward revenue, returned and cancelled orders remain visible but contribute zero net revenue, and useful flags are created for reporting.

Raw data mindset

Keep the source table close to the original file so loading problems are easy to debug.

Analytics mindset

Create a clean table with normalized values, revenue rules, status flags, and a timestamp.

Quality mindset

Check duplicates, missing required IDs, and invalid financial values before trusting reports.

Data source

Where the data came from and why it was designed this way

The dataset is synthetic sample ecommerce data created specifically for this tutorial. It is not copied from a real company, customer system, Kaggle dataset, or private source. I used synthetic data because the project is public on my portfolio and should not expose personal, commercial, or confidential information.

The rows were designed to mimic a realistic order export: each order has a customer, date, region, sales channel, product category, quantity, unit price, discount amount, and status. The file includes completed, returned, and cancelled orders so the SQL transformation has to apply a real business rule instead of simply summing every row.

Design choice	Reason	What it allows the project to show
Small synthetic CSV	Safe to publish and easy to inspect.	Beginner-friendly loading without privacy risk.
Multiple order statuses	Completed, returned, and cancelled orders behave differently.	Business rules for revenue calculation.
Multiple regions and channels	Analytics tables usually support grouped reporting.	Revenue by region and channel performance queries.
Discount field	Real revenue often differs from gross order value.	Gross revenue, net revenue, and discount-rate calculations.
Repeat customers	Customers can place more than one order.	Top-customer summaries and future customer-level features.

Architecture

How the Snowflake flow works

The project uses a simple two-layer warehouse pattern. RAW_ORDERS receives data loaded from the staged CSV file. ANALYTICS_ORDERS is created by the transformation script from the raw table and contains the cleaned fields used for analysis.

1. Local CSV source: data/orders.csv

2. Snowflake internal stage: ORDER_STAGE

3. Raw landing table: RAW_ORDERS

4. Analytics table: ANALYTICS_ORDERS

5. Quality checks and business summaries

Why this flow

Why I used stage -> raw table -> analytics table

I chose this flow because it mirrors a common warehouse pattern while staying simple enough for a 101 tutorial. Each layer has a different responsibility, so the project is easier to debug, explain, and extend.

Step	Why it exists	Problem it solves
CSV source file	Represents an operational extract from an ecommerce system.	Gives the project a realistic starting point.
Internal stage	Snowflake loads files from stages, not directly from a local folder in a worksheet.	Separates file storage from table loading.
File format	Tells Snowflake how to read headers, quotes, blanks, and CSV structure.	Prevents parsing mistakes during load.
Raw table	Keeps a landing copy close to the source file.	Makes load issues easier to inspect before applying business logic.
Analytics table	Applies cleaning, normalization, revenue rules, and flags.	Creates a clean table suitable for dashboards or analysis.
Quality checks	Tests whether the final table is reasonable before trusting the numbers.	Catches duplicates, missing IDs, and invalid financial values.

Data model

Source columns and purpose

The dataset is synthetic and safe to publish. It contains enough variation to show meaningful transformation logic: multiple channels, several regions, completed orders, returned orders, cancelled orders, discounts, and repeat customers.

Column	Purpose	How it is used
`order_id`	Unique order identifier	Checked for duplicates.
`customer_id`	Customer identifier	Required for customer-level summaries.
`order_date`	Order date	Used for ordering and future time-based reporting.
`region`	Sales region	Normalized and grouped for revenue reporting.
`channel`	Web, mobile, or store	Normalized and used for channel performance.
`quantity`, `unit_price`	Order value inputs	Used to calculate gross revenue.
`discount_amount`	Discount applied	Used to calculate net revenue and discount rate.
`status`	Completed, returned, or cancelled	Controls revenue logic and status flags.

Detailed tutorial

Step-by-step implementation

The project is split into four SQL files so each stage has one clear responsibility. This makes it easier to explain during interviews and easier for a reviewer to inspect.

Prepare the Snowflake workspace

Before running project SQL, open Snowsight and choose a role that can create objects. In a personal trial account this may be ACCOUNTADMIN. In a shared or work account, a limited learning role with database, schema, warehouse, stage, and table privileges is better.

Why it matters: Snowflake permissions control who can create warehouses, databases, schemas, stages, and tables. If the role is too limited, the setup script will fail before the tutorial begins.

Create Snowflake objects

Run sql/01_setup.sql. This creates RESUME_DEMO_DB, RETAIL_ANALYTICS, COMPUTE_WH, CSV_WITH_HEADER, ORDER_STAGE, and RAW_ORDERS. The analytics table is created later by 03_transform_orders.sql.

Why it matters: Snowflake work starts with clear object organization. Separating database, schema, stage, and tables makes the project easier to operate and explain. Run setup before uploading the CSV; if you rerun setup later, upload the CSV again because replacing an internal stage removes files stored in that stage.

Inspect the source CSV

Open data/orders.csv before uploading it. Confirm that it has a header row, 10 order rows, numeric quantity and price fields, and the statuses completed, returned, and cancelled.

Why it matters: good data work starts before loading. Looking at the source file helps explain why the file format skips one header row and why the transformation needs status-based revenue logic.

Upload the CSV into the internal stage

Upload data/orders.csv to @ORDER_STAGE through Snowsight's data loading flow. If using SnowSQL, run PUT file://data/orders.csv @ORDER_STAGE AUTO_COMPRESS=TRUE; from your terminal in the project folder, not inside a normal worksheet.

Why it matters: stages are the bridge between files and warehouse tables. This project uses an internal stage so a beginner does not need AWS S3, Azure Blob Storage, or Google Cloud Storage. After upload, run LIST @ORDER_STAGE; if you want to confirm Snowflake can see the file.

Load the raw table with COPY INTO

Run sql/02_load_raw_orders.sql. The script truncates RAW_ORDERS, loads the staged file with COPY INTO, aborts on load errors, forces a reload of the staged file for repeatable tutorial runs, and returns a row count.

Expected result: RAW_ORDERS contains 10 rows from the sample CSV.

Transform raw orders into analytics orders

Run sql/03_transform_orders.sql. The transformation creates ANALYTICS_ORDERS with normalized text fields, gross revenue, net revenue, completed and returned flags, discount rate, and load timestamp.

Business rule: only completed orders contribute to net_revenue. Returned and cancelled orders stay in the table for visibility but contribute zero revenue.

Run quality checks and summaries

Run sql/04_quality_checks.sql. It checks duplicate order IDs, missing customer IDs, invalid financial values, revenue by region, channel performance, and top customers.

Why it matters: a small data pipeline is still only useful if the output can be trusted. These checks show how I think about reliability even in a beginner tutorial.

Transformation logic

Key SQL idea

The central transformation is the revenue rule. The raw file has order status, quantity, price, and discount. The analytics table converts those fields into reporting metrics.

CASE
    WHEN LOWER(status) = 'completed'
        THEN (quantity * unit_price) - COALESCE(discount_amount, 0)
    ELSE 0
END AS net_revenue

CompletedCounts toward net revenue after discount.

ReturnedRemains visible, but net revenue is zero.

CancelledRemains visible, but net revenue is zero.

Beginner checks

What to check if something looks wrong

The most common mistakes in a first Snowflake loading project are not complex SQL problems. They are usually setup details: the wrong role is selected, the file was not uploaded to the expected stage, the warehouse is not selected, or the staged file name does not match the load pattern.

Symptom	Likely cause	What to run or check
Setup script fails	The selected role cannot create objects.	Use a role with create privileges, or ask for database, schema, warehouse, stage, and table permissions.
`COPY INTO` loads zero rows	The CSV was not uploaded to `@ORDER_STAGE`, the staged file name does not match the pattern, or `FORCE = TRUE` was removed after an earlier load.	Run `LIST @ORDER_STAGE;` and confirm the file is visible. Keep `FORCE = TRUE` for repeatable tutorial runs.
Raw row count is more than 10	The stage may contain extra matching files from previous attempts.	Remove old staged files or create a fresh stage, then upload only `orders.csv`.
Date or numeric values fail to load	The file format or CSV content does not match the table columns.	Confirm the CSV has one header row and values like `2026-01-03`, `329.99`, and `completed`.
Revenue looks too high	Returned or cancelled orders may have been counted as revenue.	Check the `CASE` expression in `03_transform_orders.sql`; only completed orders should contribute to `net_revenue`.

Real project screenshots

What the project looks like locally

These screenshots are captured from the actual local website and project files in this repository. They show the project page, folder structure, SQL setup script, and a preview of the generated analytics logic/output.

Screenshot of the Snowflake 101 project page — Portfolio project page for the Snowflake 101 retail analytics tutorial.

Screenshot of the Snowflake project folder structure — Actual local project folder with data and SQL tutorial files.

Screenshot of the Snowflake setup SQL file — Actual setup SQL creating the database, schema, warehouse, stage, and tables.

Screenshot of Snowflake tutorial output preview — Preview of the analytics output and quality checks based on the included CSV data.

Expected findings

What the output shows

Revenue rule is clear

Completed orders contribute to revenue. Returned and cancelled orders remain visible for reporting but do not inflate revenue.

North leads revenue

The North region has strong completed revenue because it includes a high-value electronics order and another completed home order.

Web is an important channel

Web has multiple completed orders, making it a useful channel to inspect in a dashboard or follow-up analysis.

Project evidence

What this demonstrates

This project demonstrates the core warehouse workflow behind many analytics engineering tasks: receiving a source extract, staging it safely, loading a raw table, applying SQL business rules, and validating the final table before using it for reporting.

The most important skill shown is not just writing SQL. It is separating responsibilities: source data remains inspectable in RAW_ORDERS, business logic lives in the transformation step, and quality checks make the output easier to trust.

Run order

Files to execute

-- 0. Choose a Snowflake role with object-creation privileges

-- 1. Inspect the source file locally
data/orders.csv

-- 2. Create Snowflake objects
sql/01_setup.sql

-- 3. Upload the source file to @ORDER_STAGE
data/orders.csv

-- Optional upload verification
LIST @ORDER_STAGE;

-- 4. Load raw rows
sql/02_load_raw_orders.sql

-- 5. Build analytics table
sql/03_transform_orders.sql

-- 6. Validate and summarize
sql/04_quality_checks.sql