Patient Table¶
The Patient table contains core demographic information and serves as the primary reference for all other CLIF tables through the patient_id
field.
Overview¶
The Patient table includes: - Unique patient identifiers - Birth and death dates - Demographics (sex, race, ethnicity) - Primary language
Required Columns¶
Column | Type | Description |
---|---|---|
patient_id | VARCHAR | Unique patient identifier |
birth_date | DATETIME | Date of birth |
death_dttm | DATETIME | Date/time of death (null if alive) |
race_name | VARCHAR | Free-text race description |
race_category | VARCHAR | Standardized race category |
ethnicity_name | VARCHAR | Free-text ethnicity description |
ethnicity_category | VARCHAR | Standardized ethnicity category |
sex_name | VARCHAR | Free-text sex description |
sex_category | VARCHAR | Standardized sex category |
language_name | VARCHAR | Primary language |
language_category | VARCHAR | Standardized language category |
Standardized Categories¶
Race Categories¶
- Black or African American
- White
- American Indian or Alaska Native
- Asian
- Native Hawaiian or Other Pacific Islander
- Unknown
- Other
Ethnicity Categories¶
- Hispanic
- Non-Hispanic
- Unknown
Sex Categories¶
- Male
- Female
- Unknown
Usage Examples¶
Loading Patient Data¶
from clifpy.tables import Patient
# Load from file
patient = Patient.from_file(
data_directory='/path/to/data',
filetype='parquet',
timezone='US/Central'
)
# Validate the data
patient.validate()
Basic Analysis¶
# Get summary statistics
summary = patient.get_summary()
print(f"Total patients: {summary['num_rows']}")
# Demographics distribution
demographics = patient.df.groupby(['sex_category', 'race_category']).size()
print(demographics)
# Age calculation (if needed)
patient.df['age'] = (
pd.Timestamp.now() - patient.df['birth_date']
).dt.days / 365.25
# Find elderly patients
elderly = patient.df[patient.df['age'] >= 65]
Cohort Building¶
# Female patients over 65
cohort = patient.df[
(patient.df['sex_category'] == 'Female') &
(patient.df['age'] >= 65)
]
# Living patients
alive = patient.df[patient.df['death_dttm'].isna()]
# Specific ethnicity
hispanic = patient.df[patient.df['ethnicity_category'] == 'Hispanic']
Joining with Other Tables¶
# Get patient demographics for lab results
labs_with_demographics = labs.df.merge(
patient.df[['patient_id', 'age', 'sex_category']],
on='patient_id',
how='left'
)
# Analyze by demographic groups
lab_by_sex = labs_with_demographics.groupby('sex_category')['lab_value'].mean()
Data Quality Checks¶
# Check for missing demographics
missing_sex = patient.df[patient.df['sex_category'].isna()]
missing_race = patient.df[patient.df['race_category'].isna()]
# Validate age ranges
patient.df['age'] = (pd.Timestamp.now() - patient.df['birth_date']).dt.days / 365.25
invalid_age = patient.df[(patient.df['age'] < 0) | (patient.df['age'] > 120)]
# Check death date consistency
invalid_death = patient.df[
patient.df['death_dttm'] < patient.df['birth_date']
]
Best Practices¶
- Always validate demographic categories against standardized values
- Handle missing data appropriately for demographic fields
- Calculate age at time of admission, not current date
- Protect PHI by using only de-identified patient_ids
- Document any demographic data transformations
API Reference¶
For detailed API documentation, see Patient API