Materializations
Materializations define how LeapSQL creates and updates your model outputs in the database. Choose the right materialization strategy based on your data volume, query patterns, and freshness requirements.
Available Materializations
| Type | Description | Use Case |
|---|---|---|
table | Creates a physical table | Default; most analytics models |
view | Creates a SQL view | Simple transformations; staging |
incremental | Merges new data into existing table | Large fact tables; append-only data |
Table Materialization
The default materialization. Creates a physical table by:
- Dropping the existing table (if it exists)
- Creating a new table with the query results
/*---
name: customer_summary
materialized: table
---*/
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total_spent
FROM orders
GROUP BY customer_idGenerated SQL:
DROP TABLE IF EXISTS customer_summary;
CREATE TABLE customer_summary AS
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total_spent
FROM orders
GROUP BY customer_id;When to Use Tables
- Models that are queried frequently
- Aggregations and complex transformations
- Final "mart" models consumed by BI tools
- When query performance is more important than storage
Considerations
- Full table rebuild on each run
- Requires enough storage for complete dataset
- Downtime during rebuild (table briefly doesn't exist)
View Materialization
Creates a SQL view instead of a physical table. The query runs each time the view is accessed.
/*---
name: stg_customers
materialized: view
---*/
SELECT
id as customer_id,
TRIM(name) as customer_name,
LOWER(email) as email
FROM raw_customersGenerated SQL:
CREATE OR REPLACE VIEW stg_customers AS
SELECT
id as customer_id,
TRIM(name) as customer_name,
LOWER(email) as email
FROM raw_customers;When to Use Views
- Staging models (light transformations)
- When you want changes to propagate immediately
- Simple column renaming or type casting
- When storage is a concern
Considerations
- Query runs every time the view is accessed
- No performance benefit from pre-computation
- Can be slower for complex transformations
- Excellent for development iteration
Incremental Materialization
Updates only new or changed records instead of rebuilding the entire table. This is essential for large datasets where full rebuilds are impractical.
/*---
name: fct_orders
materialized: incremental
unique_key: order_id
---*/
SELECT
order_id,
customer_id,
amount,
status,
created_at,
updated_at
FROM raw_orders
WHERE updated_at > '{{ var("last_run_at", "1900-01-01") }}'How Incremental Works
- First run: Creates the table with all data (like
tablematerialization) - Subsequent runs:
- Executes your query (which should filter to new/changed rows)
- Merges results into the existing table using the
unique_key
Generated SQL (subsequent runs):
-- Using DuckDB's INSERT OR REPLACE
INSERT OR REPLACE INTO fct_orders
SELECT
order_id,
customer_id,
amount,
status,
created_at,
updated_at
FROM raw_orders
WHERE updated_at > '2024-01-15 10:30:00';unique_key Configuration
The unique_key identifies which rows to update vs insert:
Single column:
/*---
unique_key: order_id
---*/Composite key:
/*---
unique_key:
- customer_id
- date
---*/Incremental Strategies
Append-Only Data
For data that's only inserted, never updated:
/*---
name: fct_events
materialized: incremental
unique_key: event_id
---*/
SELECT *
FROM raw_events
WHERE event_timestamp > '{{ var("last_run_at", "1900-01-01") }}'Data with Updates
For data that can be modified after initial insert:
/*---
name: fct_orders
materialized: incremental
unique_key: order_id
---*/
SELECT *
FROM raw_orders
WHERE updated_at > '{{ var("last_run_at", "1900-01-01") }}'When to Use Incremental
- Large fact tables (millions+ rows)
- Data with clear incremental patterns
- When full rebuilds take too long
- Append-only event data
Considerations
- Requires a reliable
unique_key - Need a column to identify new/changed rows (timestamp, ID)
- First run still processes all data
- More complex to debug than full rebuilds
- Late-arriving data needs special handling
Choosing the Right Materialization
┌─────────────────┐
│ Is the model │
│ complex? │
└────────┬────────┘
│
┌───────────┴───────────┐
│ │
▼ ▼
No (simple Yes (joins,
transforms) aggregations)
│ │
▼ │
┌────────┐ │
│ view │ │
└────────┘ │
▼
┌─────────────────┐
│ Is the table │
│ very large? │
└────────┬────────┘
│
┌───────────┴───────────┐
│ │
▼ ▼
No (< 1M rows) Yes (> 1M rows)
│ │
▼ ▼
┌────────┐ ┌─────────────┐
│ table │ │ incremental │
└────────┘ └─────────────┘Performance Comparison
| Aspect | Table | View | Incremental |
|---|---|---|---|
| Build time | O(n) | O(1) | O(delta) |
| Query time | Fast | Varies | Fast |
| Storage | Full | None | Full |
| Complexity | Low | Low | Medium |
Best Practices
1. Start with Tables
Use table materialization by default. Only optimize to incremental when you have a proven performance need.
2. Use Views for Staging
Staging models rarely need to be materialized as tables:
/*---
name: stg_customers
materialized: view -- Light transformation, view is sufficient
---*/3. Design for Incremental
When building incremental models, ensure your source data supports it:
- Has a reliable timestamp column
- Has a stable unique identifier
- Doesn't have backdated updates (or handle them explicitly)
4. Full Refresh Option
Sometimes you need to rebuild an incremental model completely:
# Full refresh a specific model
leapsql run --select fct_orders --full-refreshNext Steps
- Dependencies - How LeapSQL detects model dependencies
- Models - Model configuration basics
- CLI Reference - Run command options