Getting Started

settings.yml

Understanding the settings.yml configuration file

Overview

Every Framework project has a settings.yml file that controls project-specific settings. You can edit it directly or through the GUI.

Settings use the config package's environment format with a default: key (and optional environment overrides like production:).

Split Files

For larger projects, split complex sections into separate files. Replace inline content with a file path:

default:
  connections: settings/connections.yml
  data: settings/data.yml

Then create the referenced files:

settings/connections.yml:

db:
  driver: postgresql
  host: env("DB_HOST")
  database: mydb

settings/data.yml:

raw.survey:
  path: inputs/raw/survey.csv
  type: csv

Note: Root settings.yml values win if they conflict with a split file (Framework warns when this happens; options are merged).

Sections

Basics

Project type and author information.

project_type: project  # project, project_sensitive, course, or presentation

author:
  name: "Your Name"
  email: "[email protected]"
  affiliation: "Your Institution"

Project Structure

Maps logical directory names to paths. All paths are relative to the project root.

directories:
  # Source code and notebooks
  notebooks: notebooks
  scripts: scripts
  functions: functions

  # Input data (read-only)
  inputs_raw: inputs/raw
  inputs_intermediate: inputs/intermediate
  inputs_final: inputs/final
  inputs_reference: inputs/reference

  # Derived outputs (write-only)
  outputs_private: outputs/private
  outputs_public: outputs/public

  # Cache and scratch space
  cache: outputs/private/cache
  scratch: outputs/private/scratch

Access with settings("notebooks") or settings("directories.notebooks").

Packages

R packages to install and optionally load with scaffold():

packages:
  - name: dplyr
    auto_attach: true   # Load automatically
  - name: tidyr
    auto_attach: true
  - name: ggplot2
    auto_attach: true
  - name: readr
    auto_attach: false  # Install only, namespace explicitly

See Packages for details on version pinning and GitHub packages.

Data Catalog

The data catalog maps logical names to file paths. You can use three notation styles:

Hierarchical (nested structure):

data:
  raw:
    survey:
      path: inputs/raw/survey.csv
      type: csv
  intermediate:
    cleaned:
      path: inputs/intermediate/cleaned.rds
      type: rds
  final:
    analysis:
      path: inputs/final/analysis.rds
      type: rds

Mixed (dots in keys):

data:
  raw.survey:
    path: inputs/raw/survey.csv
    type: csv
  final.analysis:
    path: inputs/final/analysis.rds
    type: rds

Flat (full dot-notation keys):

data.raw.survey:
  path: inputs/raw/survey.csv
  type: csv
data.final.analysis:
  path: inputs/final/analysis.rds
  type: rds

All three styles are accessed the same way:

data_read("raw.survey")
data_read("final.analysis")

See Data Management for full documentation.

Git & Hooks

Git hook configuration:

git:
  hooks:
    ai_sync: false        # Sync AI assistant files before commit
    data_security: false  # Check for secrets before commit

See Git Integration for details.

AI Assistants

AI assistant configuration:

ai:
  canonical_file: "CLAUDE.md"  # CLAUDE.md, AGENTS.md, or .github/copilot-instructions.md

The canonical file is the source of truth for AI instructions. Other AI files are synced from it when using git hooks.

See AI Assistants for details.

Connections

Database connections use env() to read credentials from environment variables:

connections:
  db:
    driver: postgresql
    host: env("DB_HOST")
    database: env("DB_NAME")
    user: env("DB_USER")
    password: env("DB_PASS")

See Database Connections for full documentation.

Random Seeds

Control random seed behavior for reproducibility:

seed: 123                 # Default seed value
seed_on_scaffold: false   # Set seed automatically when scaffold() runs

When seed_on_scaffold: true, Framework calls set.seed() during scaffold(). This is off by default.

Environment Variables

Use env() to read environment variables in your settings:

connections:
  db:
    host: env("DB_HOST")
    password: env("DB_PASS", "default_value")  # With default

Store actual values in .env (which is gitignored):

DB_HOST=localhost
DB_PASS=secret123

Environment-Specific Settings

For different settings per environment, add named environment blocks:

default:
  connections:
    db:
      database: dev_database

production:
  connections:
    db:
      database: prod_database

Activate with:

Sys.setenv(R_CONFIG_ACTIVE = "production")

Accessing Settings

settings("author.name")
settings("directories")
settings("notebooks")           # Shorthand for directories.notebooks
settings("missing.key", default = "fallback")

Complete Structure

Here's the full structure of a project's settings.yml:

default:
  project_type: project

  author:
    name: "Your Name"
    email: "[email protected]"
    affiliation: "Your Institution"

  packages:
    - name: dplyr
      auto_attach: true
    - name: tidyr
      auto_attach: true
    - name: ggplot2
      auto_attach: true
    - name: readr
      auto_attach: false

  directories:
    notebooks: notebooks
    scripts: scripts
    functions: functions
    inputs_raw: inputs/raw
    inputs_intermediate: inputs/intermediate
    inputs_final: inputs/final
    inputs_reference: inputs/reference
    outputs_private: outputs/private
    outputs_public: outputs/public
    cache: outputs/private/cache
    scratch: outputs/private/scratch

  seed: 123
  seed_on_scaffold: false

  ai:
    canonical_file: "CLAUDE.md"

  git:
    hooks:
      ai_sync: false
      data_security: false

  data:
    # Data catalog entries

  connections:
    # Database connections