User Guide¶
Welcome to the CLIFpy User Guide. This guide provides comprehensive documentation for working with CLIF data using CLIFpy.
Overview¶
CLIFpy is designed to make working with CLIF (Common Longitudinal ICU data Format) data straightforward and efficient. Whether you're a researcher analyzing ICU outcomes, a data scientist building predictive models, or a clinician exploring patient data, this guide will help you make the most of CLIFpy.
Guide Organization¶
CLIF Orchestrator¶
Learn how to manage multiple CLIF tables simultaneously with consistent configuration and validation.
Wide Dataset Creation¶
Create comprehensive time-series datasets by joining multiple CLIF tables with automatic pivoting and high-performance processing.
Outlier Handling¶
Detect and remove physiologically implausible values using configurable ranges and category-specific validation.
Tables¶
Detailed guides for each CLIF table type:
- Patient demographics
- ADT (Admission, Discharge, Transfer) events
- Hospitalization information
- Laboratory results
- Vital signs
- Respiratory support
- Medication administration
- Clinical assessments
- Patient positioning
Data Validation¶
Understand how CLIFpy validates your data against CLIF schemas and how to interpret validation results.
Working with Timezones¶
Learn best practices for handling timezone-aware datetime data across different hospital systems.
Key Concepts¶
Table-Based Architecture¶
CLIFpy organizes ICU data into standardized tables, each representing a specific aspect of patient care:
from clifpy.tables import Patient, Labs, Vitals
# Each table is a self-contained unit
patient = Patient.from_file('/data', 'parquet')
labs = Labs.from_file('/data', 'parquet')
vitals = Vitals.from_file('/data', 'parquet')
Consistent Interface¶
All tables share common methods inherited from BaseTable
:
from_file()
- Load data from filesvalidate()
- Run comprehensive validationisvalid()
- Check validation statusget_summary()
- Get table statistics
Standardized Categories¶
CLIF defines standardized categories for consistent data representation: - Lab categories: chemistry, hematology, coagulation, etc. - Location categories: icu, ward, ed, etc. - Medication groups: vasopressor, sedative, antibiotic, etc.
Timezone Awareness¶
All datetime columns are timezone-aware to handle data from different time zones correctly:
# Specify timezone when loading
table = TableClass.from_file(
data_directory='/data',
filetype='parquet',
timezone='US/Central'
)
Common Workflows¶
Loading and Validating Data¶
from clifpy.clif_orchestrator import ClifOrchestrator
# Load multiple tables
orchestrator = ClifOrchestrator('/data', 'parquet', 'US/Central')
orchestrator.initialize(tables=['patient', 'labs', 'vitals'])
# Validate all tables
orchestrator.validate_all()
# Check validation status
for table_name in orchestrator.get_loaded_tables():
table = getattr(orchestrator, table_name)
print(f"{table_name}: {'Valid' if table.isvalid() else 'Invalid'}")
Filtering and Analysis¶
# Category-based filtering
icu_stays = adt.filter_by_location_category('icu')
# Patient cohort analysis
cohort_ids = ['P001', 'P002', 'P003']
cohort_vitals = vitals.df[vitals.df['hospitalization_id'].isin(cohort_ids)]
Best Practices¶
- Always validate data after loading to ensure compliance with CLIF standards
- Use appropriate timezones for your data source
- Filter early to reduce memory usage with large datasets
- Review validation errors to understand data quality issues
- Use the orchestrator when working with multiple related tables
Next Steps¶
- Explore specific table guides
- Learn about data validation
- See practical examples
- Review the API reference