Encounter Stitching¶
The encounter stitching functionality identifies and groups hospitalizations that occur within a specified time window of each other, treating them as a single continuous encounter. This is particularly useful for handling cases where patients are discharged and quickly readmitted, such as transfers between the emergency department and inpatient units.
Overview¶
In clinical data, what appears as separate hospitalizations may actually represent a single continuous episode of care. Common scenarios include:
- ED to inpatient transfers - Patient admitted through ED, then formally admitted to hospital
- Inter-facility transfers - Patient moved between hospitals within a health system
- Brief discharges - Patient discharged and readmitted within hours (e.g., for procedures)
- Administrative separations - Billing or administrative reasons create multiple records
The encounter stitching algorithm links these related hospitalizations using a configurable time window (default: 6 hours) between discharge and subsequent admission.
How It Works¶
The stitching algorithm:
- Sorts hospitalizations by patient and admission time
- Calculates gaps between discharge and next admission for each patient
- Links encounters when the gap is less than the specified time window
- Assigns encounter blocks - a unique identifier grouping linked hospitalizations
- Updates tables in-place - adds
encounter_blockcolumn to both hospitalization and ADT tables
Basic Usage¶
Quick Start with Automatic Stitching¶
from clifpy.clif_orchestrator import ClifOrchestrator
# Initialize orchestrator with stitching enabled
clif = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True, # Enable automatic stitching
stitch_time_interval=6 # 6-hour window (default)
)
# Load tables - stitching happens automatically
clif.initialize(['hospitalization', 'adt'])
# Access the encounter mapping
mapping = clif.get_encounter_mapping()
print(f"Created {mapping['encounter_block'].nunique()} encounter blocks")
Custom Time Windows¶
# Use a 12-hour window for linking encounters
clif = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True,
stitch_time_interval=12 # 12-hour window
)
# Use a 2-hour window for stricter linking
clif_strict = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True,
stitch_time_interval=2 # 2-hour window
)
Direct Function Usage¶
You can also use the stitching function directly without the orchestrator:
from clifpy.utils.stitching_encounters import stitch_encounters
# Load your dataframes
hospitalization_df = pd.read_parquet('hospitalization.parquet')
adt_df = pd.read_parquet('adt.parquet')
# Perform stitching
hosp_stitched, adt_stitched, encounter_mapping = stitch_encounters(
hospitalization=hospitalization_df,
adt=adt_df,
time_interval=12 # 12-hour window
)
Parameters¶
ClifOrchestrator Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
stitch_encounter |
bool | False | Enable automatic encounter stitching during initialization |
stitch_time_interval |
int | 6 | Hours between discharge and next admission to consider encounters linked |
Direct Function Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
hospitalization |
pd.DataFrame | Required | Hospitalization table with required columns |
adt |
pd.DataFrame | Required | ADT table with required columns |
time_interval |
int | 6 | Hours between discharge and next admission to consider encounters linked |
Required Data Columns¶
Hospitalization Table¶
patient_idhospitalization_idadmission_dttmdischarge_dttmage_at_admissionadmission_type_categorydischarge_category
ADT Table¶
hospitalization_idin_dttmout_dttmlocation_categoryhospital_id
Output¶
When stitching is enabled, the process:
- Updates hospitalization table - Adds
encounter_blockcolumn - Updates ADT table - Adds
encounter_blockcolumn - Creates encounter mapping - Available via
clif.get_encounter_mapping(): -hospitalization_id: Original hospitalization identifier -encounter_block: Assigned encounter block number
Understanding Encounter Blocks¶
Each encounter block represents a continuous episode of care:
# Access the mapping after initialization
mapping = clif.get_encounter_mapping()
# Find multi-hospitalization encounters
multi_hosp = mapping.groupby('encounter_block').size()
multi_hosp_encounters = multi_hosp[multi_hosp > 1]
print(f"Encounters with multiple hospitalizations: {len(multi_hosp_encounters)}")
# Get details for a specific encounter block
block_1_hosps = mapping[mapping['encounter_block'] == 1]
print(f"Hospitalizations in encounter block 1: {block_1_hosps['hospitalization_id'].tolist()}")
Practical Examples¶
Calculate True Length of Stay¶
When encounters are stitched, you can calculate the true length of stay across linked hospitalizations:
# Access stitched hospitalization data
stitched_df = clif.hospitalization.df
# Calculate encounter-level statistics
encounter_stats = stitched_df.groupby('encounter_block').agg({
'admission_dttm': 'min', # First admission
'discharge_dttm': 'max', # Last discharge
'hospitalization_id': 'count', # Number of linked hospitalizations
'patient_id': 'first'
})
# Calculate total length of stay
encounter_stats['total_los_days'] = (
(encounter_stats['discharge_dttm'] - encounter_stats['admission_dttm'])
.dt.total_seconds() / 86400
)
print(encounter_stats[['patient_id', 'hospitalization_id', 'total_los_days']].head())
Analyze ICU Stays Across Encounters¶
# Access stitched ADT data
adt_stitched_df = clif.adt.df
# Find ICU stays by encounter
icu_by_encounter = adt_stitched_df[
adt_stitched_df['location_category'] == 'icu'
].groupby('encounter_block').agg({
'in_dttm': 'min',
'out_dttm': 'max',
'hospitalization_id': 'nunique'
})
print("ICU stays by encounter block:")
print(icu_by_encounter.head())
Filter Data by Encounter Properties¶
# Find encounters with ED to inpatient transfers
ed_admits = clif.adt.df[
clif.adt.df['location_category'] == 'ed'
]['encounter_block'].unique()
inpatient_admits = clif.adt.df[
clif.adt.df['location_category'].isin(['icu', 'ward'])
]['encounter_block'].unique()
ed_to_inpatient = set(ed_admits) & set(inpatient_admits)
print(f"Encounters with ED to inpatient transfer: {len(ed_to_inpatient)}")
Compare Different Time Windows¶
# Test effect of different time windows
windows = [3, 6, 12, 24]
results = []
for window in windows:
clif_test = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True,
stitch_time_interval=window
)
clif_test.initialize(['hospitalization', 'adt'])
mapping = clif_test.get_encounter_mapping()
results.append({
'window_hours': window,
'total_encounters': mapping['encounter_block'].nunique(),
'multi_hosp_encounters': (mapping.groupby('encounter_block').size() > 1).sum()
})
results_df = pd.DataFrame(results)
print(results_df)
Integration with Other Features¶
Wide Dataset Creation¶
Stitched encounters are automatically used when creating wide datasets:
# Initialize with stitching
clif = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True
)
# Load tables (stitching happens automatically)
clif.initialize(['hospitalization', 'adt', 'labs', 'vitals'])
# Create wide dataset using stitched encounters
wide_df = clif.create_wide_dataset(
start_time='admission_dttm',
end_time='discharge_dttm',
time_col='charttime'
)
Validation¶
The stitched tables maintain compatibility with validation methods:
# Validate all loaded tables (including stitched ones)
validation_results = clif.validate_all()
# Check specific tables
clif.hospitalization.validate()
clif.adt.validate()
Best Practices¶
-
Choose appropriate time windows: - 2-4 hours: Strict linking for direct transfers only - 6 hours (default): Balances capturing related encounters while avoiding over-grouping - 12-24 hours: Liberal definition, captures day surgery readmissions
-
Validate stitching results:
-
Consider your analysis goals: - Outcome studies: Use stitched encounters to avoid counting transfers as readmissions - Resource utilization: May want to keep encounters separate for accurate billing - Quality metrics: Check if measure specifications require episode-based analysis
-
Document your choices:
# Save stitching parameters for reproducibility if clif.encounter_mapping is not None: stitching_info = { 'time_interval_hours': clif.stitch_time_interval, 'timestamp': pd.Timestamp.now(), 'num_encounters_created': clif.encounter_mapping['encounter_block'].nunique(), 'num_multi_hosp_encounters': ( clif.encounter_mapping.groupby('encounter_block').size().gt(1).sum() ) } # Save to file or include in analysis metadata
Technical Details¶
Algorithm Implementation¶
The stitching algorithm:
- Filters required columns from hospitalization and ADT tables
- Joins hospitalization and ADT data
- Sorts by patient_id and admission_dttm
- Calculates hours between discharge and next admission
- Creates linked flag for gaps < time_interval
- Iteratively propagates encounter_block IDs through linked chains
- Updates original dataframes with encounter_block column
Performance Considerations¶
- Stitching is performed in-memory using pandas operations
- Performance scales linearly with number of hospitalizations
- For datasets with >1M hospitalizations, ensure adequate RAM (8GB+ recommended)
- Processing time is typically seconds to minutes depending on data size
Error Handling¶
The orchestrator handles common issues:
- Missing tables: Warns if hospitalization or ADT tables are not loaded
- Missing columns: Raises ValueError with specific missing columns listed
- Processing errors: Catches exceptions and reports them without failing initialization
Limitations¶
- Currently only links hospitalizations for the same patient
- Does not consider clinical criteria (purely time-based)
- Requires both hospitalization and ADT tables to be present
- Does not link across different hospital systems (requires same patient_id)
Troubleshooting¶
Common Issues¶
Issue: "Encounter stitching requires both hospitalization and ADT tables to be loaded"
- Solution: Include both 'hospitalization' and 'adt' in your initialize() call
Issue: "Missing required columns in hospitalization DataFrame"
- Solution: Ensure your data contains all required columns listed above
- Check: Use clif.hospitalization.df.columns to see available columns
Issue: No encounters are being stitched despite close admissions - Check: Verify datetime columns are properly parsed and in the same timezone - Check: Ensure discharge_dttm is not null for hospitalizations you expect to link - Try: Increase the time window to see if encounters get linked
Debugging¶
# Enable detailed output during initialization
clif = ClifOrchestrator(
data_directory='/path/to/data',
filetype='parquet',
timezone='UTC',
stitch_encounter=True,
stitch_time_interval=6
)
# Check if stitching was attempted
clif.initialize(['hospitalization', 'adt'])
# Verify encounter_block was added
print("Hospitalization columns:", clif.hospitalization.df.columns.tolist())
print("Has encounter_block:", 'encounter_block' in clif.hospitalization.df.columns)
# Check mapping
if clif.encounter_mapping is not None:
print(f"Mapping shape: {clif.encounter_mapping.shape}")
else:
print("No encounter mapping created")
See Also¶
- ClifOrchestrator - Main interface for CLIF data operations
- Hospitalization Table - Structure of hospitalization data
- ADT Table - Structure of ADT data
- Wide Dataset Creation - Creating analysis-ready datasets
- Examples Notebook - Interactive examples