Skip to content

Materializations

Materializations define how LeapSQL creates and updates your model outputs in the database. Choose the right materialization strategy based on your data volume, query patterns, and freshness requirements.

Available Materializations

TypeDescriptionUse Case
tableCreates a physical tableDefault; most analytics models
viewCreates a SQL viewSimple transformations; staging
incrementalMerges new data into existing tableLarge fact tables; append-only data

Table Materialization

The default materialization. Creates a physical table by:

  1. Dropping the existing table (if it exists)
  2. Creating a new table with the query results
sql
/*---
name: customer_summary
materialized: table
---*/

SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(amount) as total_spent
FROM orders
GROUP BY customer_id

Generated SQL:

sql
DROP TABLE IF EXISTS customer_summary;
CREATE TABLE customer_summary AS
SELECT
    customer_id,
    COUNT(*) as order_count,
    SUM(amount) as total_spent
FROM orders
GROUP BY customer_id;

When to Use Tables

  • Models that are queried frequently
  • Aggregations and complex transformations
  • Final "mart" models consumed by BI tools
  • When query performance is more important than storage

Considerations

  • Full table rebuild on each run
  • Requires enough storage for complete dataset
  • Downtime during rebuild (table briefly doesn't exist)

View Materialization

Creates a SQL view instead of a physical table. The query runs each time the view is accessed.

sql
/*---
name: stg_customers
materialized: view
---*/

SELECT
    id as customer_id,
    TRIM(name) as customer_name,
    LOWER(email) as email
FROM raw_customers

Generated SQL:

sql
CREATE OR REPLACE VIEW stg_customers AS
SELECT
    id as customer_id,
    TRIM(name) as customer_name,
    LOWER(email) as email
FROM raw_customers;

When to Use Views

  • Staging models (light transformations)
  • When you want changes to propagate immediately
  • Simple column renaming or type casting
  • When storage is a concern

Considerations

  • Query runs every time the view is accessed
  • No performance benefit from pre-computation
  • Can be slower for complex transformations
  • Excellent for development iteration

Incremental Materialization

Updates only new or changed records instead of rebuilding the entire table. This is essential for large datasets where full rebuilds are impractical.

sql
/*---
name: fct_orders
materialized: incremental
unique_key: order_id
---*/

SELECT
    order_id,
    customer_id,
    amount,
    status,
    created_at,
    updated_at
FROM raw_orders
WHERE updated_at > '{{ var("last_run_at", "1900-01-01") }}'

How Incremental Works

  1. First run: Creates the table with all data (like table materialization)
  2. Subsequent runs:
    • Executes your query (which should filter to new/changed rows)
    • Merges results into the existing table using the unique_key

Generated SQL (subsequent runs):

sql
-- Using DuckDB's INSERT OR REPLACE
INSERT OR REPLACE INTO fct_orders
SELECT
    order_id,
    customer_id,
    amount,
    status,
    created_at,
    updated_at
FROM raw_orders
WHERE updated_at > '2024-01-15 10:30:00';

unique_key Configuration

The unique_key identifies which rows to update vs insert:

Single column:

sql
/*---
unique_key: order_id
---*/

Composite key:

sql
/*---
unique_key:
  - customer_id
  - date
---*/

Incremental Strategies

Append-Only Data

For data that's only inserted, never updated:

sql
/*---
name: fct_events
materialized: incremental
unique_key: event_id
---*/

SELECT *
FROM raw_events
WHERE event_timestamp > '{{ var("last_run_at", "1900-01-01") }}'

Data with Updates

For data that can be modified after initial insert:

sql
/*---
name: fct_orders
materialized: incremental
unique_key: order_id
---*/

SELECT *
FROM raw_orders
WHERE updated_at > '{{ var("last_run_at", "1900-01-01") }}'

When to Use Incremental

  • Large fact tables (millions+ rows)
  • Data with clear incremental patterns
  • When full rebuilds take too long
  • Append-only event data

Considerations

  • Requires a reliable unique_key
  • Need a column to identify new/changed rows (timestamp, ID)
  • First run still processes all data
  • More complex to debug than full rebuilds
  • Late-arriving data needs special handling

Choosing the Right Materialization

                    ┌─────────────────┐
                    │  Is the model   │
                    │    complex?     │
                    └────────┬────────┘

                 ┌───────────┴───────────┐
                 │                       │
                 ▼                       ▼
            No (simple             Yes (joins,
            transforms)            aggregations)
                 │                       │
                 ▼                       │
            ┌────────┐                   │
            │  view  │                   │
            └────────┘                   │

                              ┌─────────────────┐
                              │  Is the table   │
                              │  very large?    │
                              └────────┬────────┘

                           ┌───────────┴───────────┐
                           │                       │
                           ▼                       ▼
                     No (< 1M rows)          Yes (> 1M rows)
                           │                       │
                           ▼                       ▼
                      ┌────────┐            ┌─────────────┐
                      │ table  │            │ incremental │
                      └────────┘            └─────────────┘

Performance Comparison

AspectTableViewIncremental
Build timeO(n)O(1)O(delta)
Query timeFastVariesFast
StorageFullNoneFull
ComplexityLowLowMedium

Best Practices

1. Start with Tables

Use table materialization by default. Only optimize to incremental when you have a proven performance need.

2. Use Views for Staging

Staging models rarely need to be materialized as tables:

sql
/*---
name: stg_customers
materialized: view  -- Light transformation, view is sufficient
---*/

3. Design for Incremental

When building incremental models, ensure your source data supports it:

  • Has a reliable timestamp column
  • Has a stable unique identifier
  • Doesn't have backdated updates (or handle them explicitly)

4. Full Refresh Option

Sometimes you need to rebuild an incremental model completely:

bash
# Full refresh a specific model
leapsql run --select fct_orders --full-refresh

Next Steps

Released under the MIT License.