DuckDB Adapter
DuckDB is the default and recommended database adapter for LeapSQL. It provides a high-performance analytical database that runs embedded within your application.
Why DuckDB?
- Zero configuration: No server to install or manage
- High performance: Optimized for analytical queries
- SQL standard: Rich SQL support including window functions, CTEs, and more
- File-based: Persist data to disk or run entirely in-memory
- CSV native: Built-in support for reading CSV files directly
Configuration
CLI Usage
# In-memory database (default)
leapsql run
# Persistent database file
leapsql run -database ./warehouse.duckdbProgrammatic Usage
import "github.com/leapstack-labs/leapsql/internal/adapter"
db := adapter.NewDuckDBAdapter()
// In-memory
cfg := adapter.Config{Path: ":memory:"}
// Or persistent
cfg := adapter.Config{Path: "./warehouse.duckdb"}
err := db.Connect(ctx, cfg)Features
In-Memory Mode
When no database path is specified (or using :memory:), DuckDB creates a temporary in-memory database:
- Fast: No disk I/O overhead
- Temporary: Data is lost when the process exits
- Ideal for: Development, testing, one-off analyses
leapsql run # Uses in-memory by defaultPersistent Mode
Specify a file path to persist data across sessions:
- Durable: Data survives process restarts
- Shareable: Database file can be copied or backed up
- Ideal for: Production use, data that needs to persist
leapsql run -database ./data/warehouse.duckdbCSV Loading
The DuckDB adapter uses read_csv_auto for automatic CSV loading:
CREATE OR REPLACE TABLE tablename
AS SELECT * FROM read_csv_auto('path/to/file.csv', header=true)Features:
- Automatic type inference
- Header detection
- Handles various CSV dialects
- Fast parallel reading
Schema Support
DuckDB uses main as the default schema. You can reference tables with full qualification:
SELECT * FROM main.customersSupported SQL Features
DuckDB supports a rich SQL dialect including:
Window Functions
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) as running_total
FROM ordersCommon Table Expressions (CTEs)
WITH customer_totals AS (
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
)
SELECT * FROM customer_totals WHERE total > 1000Advanced Aggregations
SELECT
customer_id,
ARRAY_AGG(product_id) as products,
MAP(LIST(product_id), LIST(quantity)) as product_quantities
FROM order_items
GROUP BY customer_idJSON Functions
SELECT
data->>'name' as name,
data->'address'->>'city' as city
FROM json_tableMetadata Queries
The adapter retrieves metadata using DuckDB's information schema:
SELECT
column_name,
data_type,
is_nullable,
ordinal_position
FROM information_schema.columns
WHERE table_schema = 'main' AND table_name = 'users'
ORDER BY ordinal_positionPerformance Tips
Use Appropriate Types
DuckDB infers types from CSV files, but explicit casting can improve performance:
SELECT
CAST(id AS INTEGER) as id,
CAST(created_at AS TIMESTAMP) as created_at
FROM {{ ref('raw_events') }}Leverage Parallelism
DuckDB automatically parallelizes queries. For best results:
- Use columnar operations over row-by-row processing
- Prefer set-based operations
- Use window functions instead of self-joins where possible
Persistent Database for Large Data
For datasets that take time to load, use a persistent database to avoid reloading on every run:
# First run loads seeds and builds models
leapsql run -database ./warehouse.duckdb
# Subsequent runs are faster (seeds already loaded)
leapsql run -database ./warehouse.duckdbLimitations
- Single process: DuckDB doesn't support concurrent writes from multiple processes
- File locking: The database file is locked during use
- Memory: Large queries may require significant RAM in in-memory mode
Error Handling
Common errors and solutions:
"database is locked"
Another process has the database open. Ensure only one LeapSQL process accesses the database at a time.
"out of memory"
The query requires more RAM than available. Solutions:
- Use a persistent database (enables disk spilling)
- Optimize your query to process less data
- Increase available memory
"file not found" for CSV
Ensure the CSV path is correct. LeapSQL converts paths to absolute paths before loading.