Getting Started
settings.yml
Understanding the settings.yml configuration file
Overview
Every Framework project has a settings.yml file that controls project-specific settings. You can edit it directly or through the GUI.
Settings use the config package's environment format with a default: key (and optional environment overrides like production:).
Split Files
For larger projects, split complex sections into separate files. Replace inline content with a file path:
default:
connections: settings/connections.yml
data: settings/data.yml
Then create the referenced files:
settings/connections.yml:
db:
driver: postgresql
host: env("DB_HOST")
database: mydb
settings/data.yml:
raw.survey:
path: inputs/raw/survey.csv
type: csv
Note: Root settings.yml values win if they conflict with a split file (Framework warns when this happens; options are merged).
Sections
Basics
Project type and author information.
project_type: project # project, project_sensitive, course, or presentation
author:
name: "Your Name"
email: "[email protected]"
affiliation: "Your Institution"
Project Structure
Maps logical directory names to paths. All paths are relative to the project root.
directories:
# Source code and notebooks
notebooks: notebooks
scripts: scripts
functions: functions
# Input data (read-only)
inputs_raw: inputs/raw
inputs_intermediate: inputs/intermediate
inputs_final: inputs/final
inputs_reference: inputs/reference
# Derived outputs (write-only)
outputs_private: outputs/private
outputs_public: outputs/public
# Cache and scratch space
cache: outputs/private/cache
scratch: outputs/private/scratch
Access with settings("notebooks") or settings("directories.notebooks").
Packages
R packages to install and optionally load with scaffold():
packages:
- name: dplyr
auto_attach: true # Load automatically
- name: tidyr
auto_attach: true
- name: ggplot2
auto_attach: true
- name: readr
auto_attach: false # Install only, namespace explicitly
See Packages for details on version pinning and GitHub packages.
Data Catalog
The data catalog maps logical names to file paths. You can use three notation styles:
Hierarchical (nested structure):
data:
raw:
survey:
path: inputs/raw/survey.csv
type: csv
intermediate:
cleaned:
path: inputs/intermediate/cleaned.rds
type: rds
final:
analysis:
path: inputs/final/analysis.rds
type: rds
Mixed (dots in keys):
data:
raw.survey:
path: inputs/raw/survey.csv
type: csv
final.analysis:
path: inputs/final/analysis.rds
type: rds
Flat (full dot-notation keys):
data.raw.survey:
path: inputs/raw/survey.csv
type: csv
data.final.analysis:
path: inputs/final/analysis.rds
type: rds
All three styles are accessed the same way:
data_read("raw.survey")
data_read("final.analysis")
See Data Management for full documentation.
Git & Hooks
Git hook configuration:
git:
hooks:
ai_sync: false # Sync AI assistant files before commit
data_security: false # Check for secrets before commit
See Git Integration for details.
AI Assistants
AI assistant configuration:
ai:
canonical_file: "CLAUDE.md" # CLAUDE.md, AGENTS.md, or .github/copilot-instructions.md
The canonical file is the source of truth for AI instructions. Other AI files are synced from it when using git hooks.
See AI Assistants for details.
Connections
Database connections use env() to read credentials from environment variables:
connections:
db:
driver: postgresql
host: env("DB_HOST")
database: env("DB_NAME")
user: env("DB_USER")
password: env("DB_PASS")
See Database Connections for full documentation.
Random Seeds
Control random seed behavior for reproducibility:
seed: 123 # Default seed value
seed_on_scaffold: false # Set seed automatically when scaffold() runs
When seed_on_scaffold: true, Framework calls set.seed() during scaffold(). This is off by default.
Environment Variables
Use env() to read environment variables in your settings:
connections:
db:
host: env("DB_HOST")
password: env("DB_PASS", "default_value") # With default
Store actual values in .env (which is gitignored):
DB_HOST=localhost
DB_PASS=secret123
Environment-Specific Settings
For different settings per environment, add named environment blocks:
default:
connections:
db:
database: dev_database
production:
connections:
db:
database: prod_database
Activate with:
Sys.setenv(R_CONFIG_ACTIVE = "production")
Accessing Settings
settings("author.name")
settings("directories")
settings("notebooks") # Shorthand for directories.notebooks
settings("missing.key", default = "fallback")
Complete Structure
Here's the full structure of a project's settings.yml:
default:
project_type: project
author:
name: "Your Name"
email: "[email protected]"
affiliation: "Your Institution"
packages:
- name: dplyr
auto_attach: true
- name: tidyr
auto_attach: true
- name: ggplot2
auto_attach: true
- name: readr
auto_attach: false
directories:
notebooks: notebooks
scripts: scripts
functions: functions
inputs_raw: inputs/raw
inputs_intermediate: inputs/intermediate
inputs_final: inputs/final
inputs_reference: inputs/reference
outputs_private: outputs/private
outputs_public: outputs/public
cache: outputs/private/cache
scratch: outputs/private/scratch
seed: 123
seed_on_scaffold: false
ai:
canonical_file: "CLAUDE.md"
git:
hooks:
ai_sync: false
data_security: false
data:
# Data catalog entries
connections:
# Database connections