ClifOrchestrator¶
clifpy.clif_orchestrator.ClifOrchestrator
¶
Orchestrator class for managing multiple CLIF table objects.
This class provides a centralized interface for loading, managing, and validating multiple CLIF tables with consistent configuration.
Attributes:
Name | Type | Description |
---|---|---|
data_directory |
str
|
Path to the directory containing data files |
filetype |
str
|
Type of data file (csv, parquet, etc.) |
timezone |
str
|
Timezone for datetime columns |
output_directory |
str
|
Directory for saving output files and logs |
patient |
Patient
|
Patient table object |
hospitalization |
Hospitalization
|
Hospitalization table object |
adt |
Adt
|
ADT table object |
labs |
Labs
|
Labs table object |
vitals |
Vitals
|
Vitals table object |
medication_admin_continuous |
MedicationAdminContinuous
|
Medication administration table object |
patient_assessments |
PatientAssessments
|
Patient assessments table object |
respiratory_support |
RespiratorySupport
|
Respiratory support table object |
position |
Position
|
Position table object |
Initialize the ClifOrchestrator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_directory
|
str
|
Path to the directory containing data files |
required |
filetype
|
str
|
Type of data file (csv, parquet, etc.) |
'csv'
|
timezone
|
str
|
Timezone for datetime columns |
'UTC'
|
output_directory
|
str
|
Directory for saving output files and logs. If not provided, creates an 'output' directory in the current working directory. |
None
|
Source code in clifpy/clif_orchestrator.py
convert_wide_to_hourly
¶
convert_wide_to_hourly(
wide_df,
aggregation_config,
memory_limit="4GB",
temp_directory=None,
batch_size=None,
)
Convert wide dataset to hourly aggregation using DuckDB.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wide_df
|
DataFrame
|
Wide dataset from create_wide_dataset() |
required |
aggregation_config
|
Dict[str, List[str]]
|
Dict mapping aggregation methods to columns Example: { 'mean': ['heart_rate', 'sbp'], 'max': ['spo2'], 'min': ['map'], 'median': ['glucose'], 'first': ['gcs_total'], 'last': ['assessment_value'], 'boolean': ['norepinephrine'], 'one_hot_encode': ['device_category'] } |
required |
memory_limit
|
str
|
DuckDB memory limit (e.g., '4GB', '8GB') |
'4GB'
|
temp_directory
|
Optional[str]
|
Directory for DuckDB temp files |
None
|
batch_size
|
Optional[int]
|
Process in batches if specified |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
Hourly aggregated DataFrame with nth_hour column |
Source code in clifpy/clif_orchestrator.py
create_wide_dataset
¶
create_wide_dataset(
tables_to_load=None,
category_filters=None,
sample=False,
hospitalization_ids=None,
cohort_df=None,
output_format="dataframe",
save_to_data_location=False,
output_filename=None,
return_dataframe=True,
batch_size=1000,
memory_limit=None,
threads=None,
show_progress=True,
)
Create wide time-series dataset using DuckDB for high performance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tables_to_load
|
Optional[List[str]]
|
List of tables to include (e.g., ['vitals', 'labs']) |
None
|
category_filters
|
Optional[Dict[str, List[str]]]
|
Dict of categories to pivot for each table Example: { 'vitals': ['heart_rate', 'sbp', 'spo2'], 'labs': ['hemoglobin', 'sodium'], 'respiratory_support': ['device_category'] } |
None
|
sample
|
bool
|
If True, use 20 random hospitalizations |
False
|
hospitalization_ids
|
Optional[List[str]]
|
Specific hospitalization IDs to include |
None
|
cohort_df
|
Optional[DataFrame]
|
DataFrame with time windows for filtering |
None
|
output_format
|
str
|
'dataframe', 'csv', or 'parquet' |
'dataframe'
|
save_to_data_location
|
bool
|
Save output to data directory |
False
|
output_filename
|
Optional[str]
|
Custom filename for output |
None
|
return_dataframe
|
bool
|
Return DataFrame even when saving |
True
|
batch_size
|
int
|
Number of hospitalizations per batch |
1000
|
memory_limit
|
Optional[str]
|
DuckDB memory limit (e.g., '8GB') |
None
|
threads
|
Optional[int]
|
Number of threads for DuckDB |
None
|
show_progress
|
bool
|
Show progress bars |
True
|
Returns:
Type | Description |
---|---|
Optional[DataFrame]
|
Wide dataset as DataFrame or None |
Source code in clifpy/clif_orchestrator.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
|
get_loaded_tables
¶
Return list of currently loaded table names.
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: List of loaded table names |
Source code in clifpy/clif_orchestrator.py
get_sys_resource_info
¶
Get system resource information including CPU, memory, and practical thread limits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
print_summary
|
bool
|
Whether to print a formatted summary |
True
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict containing system resource information: |
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Dict[str, Any]
|
|
Source code in clifpy/clif_orchestrator.py
get_tables_obj_list
¶
Return list of loaded table objects.
Returns:
Name | Type | Description |
---|---|---|
List |
List
|
List of loaded table objects |
Source code in clifpy/clif_orchestrator.py
initialize
¶
Initialize specified tables with optional filtering and column selection.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tables
|
List[str]
|
List of table names to load. Defaults to ['patient']. |
None
|
sample_size
|
int
|
Number of rows to load for each table. |
None
|
columns
|
Dict[str, List[str]]
|
Dictionary mapping table names to lists of columns to load. |
None
|
filters
|
Dict[str, Dict]
|
Dictionary mapping table names to filter dictionaries. |
None
|
Source code in clifpy/clif_orchestrator.py
load_table
¶
Load table data and create table object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table_name
|
str
|
Name of the table to load |
required |
sample_size
|
int
|
Number of rows to load |
None
|
columns
|
List[str]
|
Specific columns to load |
None
|
filters
|
Dict
|
Filters to apply when loading |
None
|
Returns:
Type | Description |
---|---|
The loaded table object |
Source code in clifpy/clif_orchestrator.py
validate_all
¶
Run validation on all loaded tables.
This method runs the validate() method on each loaded table and reports the results.