Models
Models are the core building blocks of LeapSQL. Each model is a SQL file that defines a transformation, producing a table or view in your database.
What is a Model?
A model is a .sql file that contains:
- Optional frontmatter - YAML configuration specifying how the model should be built
- SQL query - The transformation logic that produces your output
/*---
name: customer_orders
materialized: table
---*/
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) as total_orders,
SUM(o.amount) as total_spent
FROM stg_customers c
LEFT JOIN stg_orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_nameModel Naming
File Names
Model files should use lowercase with underscores:
customer_summary.sqlstg_orders.sqlfct_daily_sales.sql
Model Names
The model name is determined by (in order of precedence):
- The
namefield in frontmatter - The filename without the
.sqlextension
/*---
name: my_custom_name -- This takes precedence
---*/
SELECT * FROM sourceIf no frontmatter name is specified, a file named customer_orders.sql creates a model called customer_orders.
Automatic Dependency Detection
LeapSQL's most powerful feature is automatic dependency detection. You don't need to use special functions like ref() - just write normal SQL:
-- LeapSQL automatically detects dependencies on:
-- - stg_customers
-- - stg_orders
SELECT
c.customer_id,
c.customer_name,
o.order_id,
o.amount
FROM stg_customers c
JOIN stg_orders o ON c.customer_id = o.customer_idHow It Works
LeapSQL parses your SQL using a full SQL parser to identify:
- Tables in
FROMclauses - Tables in
JOINclauses - Subqueries and CTEs
UNIONsources
It then matches these against:
- Other models in your project
- Tables created from seeds
- External tables in your database
Dependency Resolution
When you run leapsql run, models are executed in topological order:
stg_customers ─┐
├──> customer_orders ──> customer_summary
stg_orders ────┘This ensures dependencies are always built before the models that need them.
Model Types
Staging Models
First layer of transformation - clean and standardize raw data:
/*---
name: stg_customers
materialized: view
---*/
SELECT
id as customer_id,
TRIM(name) as customer_name,
LOWER(email) as email,
created_at::TIMESTAMP as created_at
FROM raw_customers
WHERE id IS NOT NULLIntermediate Models
Complex business logic and aggregations:
/*---
name: int_customer_metrics
materialized: view
---*/
SELECT
customer_id,
COUNT(DISTINCT order_id) as order_count,
SUM(amount) as total_revenue,
AVG(amount) as avg_order_value,
MIN(order_date) as first_order_date,
MAX(order_date) as last_order_date
FROM stg_orders
GROUP BY customer_idMart Models
Final models ready for consumption:
/*---
name: dim_customers
materialized: table
---*/
SELECT
c.customer_id,
c.customer_name,
c.email,
c.created_at,
COALESCE(m.order_count, 0) as lifetime_orders,
COALESCE(m.total_revenue, 0) as lifetime_revenue,
m.first_order_date,
m.last_order_date,
CASE
WHEN m.total_revenue > 1000 THEN 'high_value'
WHEN m.total_revenue > 100 THEN 'medium_value'
ELSE 'low_value'
END as customer_tier
FROM stg_customers c
LEFT JOIN int_customer_metrics m ON c.customer_id = m.customer_idCommon Table Expressions (CTEs)
CTEs are fully supported and don't create separate dependencies:
/*---
name: customer_analysis
materialized: table
---*/
WITH customer_orders AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total_spent
FROM stg_orders
GROUP BY customer_id
),
customer_segments AS (
SELECT
customer_id,
CASE
WHEN total_spent > 1000 THEN 'premium'
WHEN total_spent > 100 THEN 'standard'
ELSE 'basic'
END as segment
FROM customer_orders
)
SELECT
c.customer_id,
c.customer_name,
co.order_count,
co.total_spent,
cs.segment
FROM stg_customers c
LEFT JOIN customer_orders co ON c.customer_id = co.customer_id
LEFT JOIN customer_segments cs ON c.customer_id = cs.customer_idTemplating in Models
Models can use Starlark templating for dynamic SQL:
Expressions
SELECT
id,
name,
created_at
FROM customers
WHERE created_at >= '{{ env.START_DATE }}'Control Flow
SELECT
id,
{* for col in ['name', 'email', 'phone'] *}
{{ col }},
{* endfor *}
created_at
FROM customersMacros
SELECT
customer_id,
{{ utils.safe_divide('total_revenue', 'order_count') }} as avg_order_value
FROM customer_metricsSee the Templating section for complete documentation.
Model Selection
Run specific models using the --select flag:
# Run a single model
leapsql run --select customer_orders
# Run multiple models
leapsql run --select customer_orders --select order_metrics
# Run a model and all its dependencies
leapsql run --select +customer_orders
# Run a model and all models that depend on it
leapsql run --select customer_orders+Best Practices
1. One Model, One Purpose
Each model should have a single, clear purpose:
-- Good: Single purpose
/*--- name: customer_order_count ---*/
SELECT customer_id, COUNT(*) as orders FROM orders GROUP BY 1
-- Bad: Multiple unrelated outputs
/*--- name: customer_and_product_metrics ---*/
SELECT ... -- customer metrics
UNION ALL
SELECT ... -- product metrics (unrelated)2. Descriptive Names
Use names that describe what the model contains:
fct_daily_orders- Fact table of daily order datadim_customers- Customer dimension tableint_customer_rfm- Intermediate RFM (Recency, Frequency, Monetary) calculation
3. Document with Frontmatter
Use frontmatter to document ownership and purpose:
/*---
name: customer_lifetime_value
materialized: table
owner: analytics-team
meta:
purpose: Calculate customer LTV for marketing segmentation
refresh: daily
---*/4. Keep Dependencies Shallow
Avoid deep dependency chains that are hard to debug:
Good: raw -> staging -> mart (2 hops)
Bad: raw -> stg -> int1 -> int2 -> int3 -> mart (5 hops)Next Steps
- Frontmatter - All configuration options
- Materializations - Table, view, and incremental
- Dependencies - Deep dive into dependency detection