Getting Started

Conventions

Framework conventions and project organization

Philosophy

Framework favors convention over configuration: sensible defaults that work for most projects. But everything is customizable, so you never have to fight the conventions for your own needs or preferences.

Package Loading

Use library() for packages and fully namespace functions from packages not commonly used across your project. This lets you trust that scaffold() provides consistent behavior.

With an example settings.yml:

packages:
  - name: dplyr
    auto_attach: true
  - name: readr
    auto_attach: false

Your code would look like:

library(framework)
scaffold()

df <- readr::read_csv("data.csv") |>
  filter(x > 0) |>
  mutate(y = x * 2)

Here dplyr functions (filter, mutate) work directly because auto_attach: true. But readr is namespaced explicitly since it's installed but not auto-loaded.

Functions and Auto-Loading

Framework automatically sources all .R files in your functions/ directory when scaffold() runs.

For example, consider this functions/ directory:

functions/
├── text_processing.R
├── model_prep.R
└── plotting_helpers.R

Every file is sourced on scaffold, giving you access to all your functions without manual loading.

Important: Only include function definitions in these files. Any code outside a function() block will execute every time scaffold() runs.

This approach eliminates the need to manually source files:

source("helpers/cleaning.R")
source("helpers/plotting.R")
source("utils/text_processing.R")

With auto-loading, you can reorganize your functions directory — rename files, move functions between files, add new files — without updating any notebooks or scripts that use them.

This behavior can be turned off in global settings if you prefer manual control.

See Functions for full documentation.

Inputs and Outputs

Framework separates inputs (read-only data) from outputs (generated files).

Inputs

Data you consume lives in inputs/, organized by stage:

inputs/
├── raw/           # External data, untouched
├── intermediate/  # Mid-pipeline transformations
└── final/         # Analysis-ready datasets
  • raw/: Data from outside sources, never modified
  • intermediate/: Cleaned or transformed data mid-pipeline
  • final/: Analysis-ready datasets

Data Catalog

Framework tracks your data files in a catalog stored in settings.yml. As you work, register files with data_add():

data_add("inputs/raw/survey_2024.csv", name = "raw.survey")
data_add("inputs/final/analysis.rds", name = "final.analysis")

Or save and register in one step:

data_save(clean_df, "intermediate.survey_clean")

The catalog becomes a manifest that tells collaborators what data files exist and where they live. Load data by name:

survey <- data_read("raw.survey")

See Data Management for full documentation.

Outputs

Files you produce live in outputs/, organized by type:

outputs/
├── tables/    # Data tables, CSVs
├── figures/   # Plots, charts
├── models/    # Saved model objects
└── notebooks/ # Rendered notebooks
  • tables/: save_table() writes here
  • figures/: save_figure() writes here
  • models/: save_model() writes here
  • notebooks/: save_notebook() renders and saves here

Use save_notebook("notebooks/analysis.qmd") to render and save in one step. This keeps generated HTML/PDF files separate from source .qmd files, making it easy to .gitignore outputs and keep the project organized.

Work Directories

  • notebooks/: Quarto/R Markdown analysis files
  • scripts/: Standalone R scripts
  • functions/: Reusable R functions (auto-loaded by scaffold())
  • scratch/: One-off experiments, throwaway code

Sensitive Data Projects

For projects with privacy requirements, inputs split into:

inputs/
├── private/    # Sensitive data (gitignored)
└── public/     # Shareable data

This makes the privacy boundary explicit in your directory structure.

Data Integrity

Framework automatically tracks data integrity. When you use data_read(), file hashes are stored in framework.db. If a file changes unexpectedly, you'll get a warning:

Warning: File hash changed for 'inputs/raw/survey.csv'
Previous hash: a1b2c3...
Current hash:  d4e5f6...

This helps catch accidental data modifications and ensures reproducibility.

Customization

All project types are fully customizable:

  • Standard projects: Can add/remove any directories
  • Courses: Structured for lectures, assignments, modules
  • Presentations: Minimal structure for RevealJS slides

Courses and presentations are common use cases for data scientists with their own optimized defaults, but you can adjust any project type's structure to fit your workflow.

Configure defaults globally in setup() or per-project in settings.yml.