Welcome to CLIF 101 🏥
A gentle introduction to coding with established CLIF datasets. Learn best practices for analysis, validation, and collaboration.
What is CLIF?
The Common Longitudinal ICU data Format (CLIF) is a federated data standard designed specifically for critical care research. Unlike generic common data models, CLIF preserves the temporal granularity essential for ICU research while enabling privacy-preserving multi-site collaboration.
Why CLIF matters
| Challenge | CLIF Solution |
|---|---|
| Data silos across hospitals | Federated model - data stays local |
| Lost temporal granularity | Hourly resolution preserved |
| Inconsistent vocabularies | mCIDE controlled vocabularies |
| Slow multi-site studies | Days instead of years |
Getting Started
This tutorial assumes you have access to a CLIF dataset at your institution. If you’re new to CLIF and need to build your dataset first, check out:
- CLIF Data Dictionary - Schema definitions
- EHR-TO-CLIF - ETL examples from various sites
Prerequisites
- Python 3.9+
- Access to CLIF-formatted data (Parquet or CSV files)
- Basic familiarity with pandas/polars
Install clifpy
pip install clifpy
clifpy uses DuckDB and Polars under the hood for memory-efficient processing of large datasets.
Quick Example
Here’s a taste of what working with CLIF looks like:
from clifpy import ClifOrchestrator
# Initialize with your CLIF data directory
clif = ClifOrchestrator(data_dir="path/to/clif/data")
# Load core tables
clif.load_tables(["patient", "hospitalization", "vitals", "labs"])
# Create an hourly wide dataset for analysis
wide_df = clif.create_wide_dataset(
tables=["vitals", "labs"],
time_resolution="1h"
)
# Calculate SOFA scores
sofa_df = clif.calculate_sofa_scores()
What You’ll Learn
Ready to dive in?
Start with Loading CLIF Data to learn the fundamentals.
Pro tip: Always read the schema files before writing code! Add this to your workflow: check clifpy’s schema definitions to get exact column names.