Patient Table¶

The Patient table contains core demographic information and serves as the primary reference for all other CLIF tables through the patient_id field.

Overview¶

The Patient table includes: - Unique patient identifiers - Birth and death dates - Demographics (sex, race, ethnicity) - Primary language

Required Columns¶

Column	Type	Description
patient_id	VARCHAR	Unique patient identifier
birth_date	DATETIME	Date of birth
death_dttm	DATETIME	Date/time of death (null if alive)
race_name	VARCHAR	Free-text race description
race_category	VARCHAR	Standardized race category
ethnicity_name	VARCHAR	Free-text ethnicity description
ethnicity_category	VARCHAR	Standardized ethnicity category
sex_name	VARCHAR	Free-text sex description
sex_category	VARCHAR	Standardized sex category
language_name	VARCHAR	Primary language
language_category	VARCHAR	Standardized language category

Standardized Categories¶

Race Categories¶

Black or African American
White
American Indian or Alaska Native
Asian
Native Hawaiian or Other Pacific Islander
Unknown
Other

Ethnicity Categories¶

Hispanic
Non-Hispanic
Unknown

Sex Categories¶

Male
Female
Unknown

Usage Examples¶

Loading Patient Data¶

from clifpy.tables import Patient

# Load from file
patient = Patient.from_file(
    data_directory='/path/to/data',
    filetype='parquet',
    timezone='US/Central'
)

# Validate the data
patient.validate()

Basic Analysis¶

# Get summary statistics
summary = patient.get_summary()
print(f"Total patients: {summary['num_rows']}")

# Demographics distribution
demographics = patient.df.groupby(['sex_category', 'race_category']).size()
print(demographics)

# Age calculation (if needed)
patient.df['age'] = (
    pd.Timestamp.now() - patient.df['birth_date']
).dt.days / 365.25

# Find elderly patients
elderly = patient.df[patient.df['age'] >= 65]

Cohort Building¶

# Female patients over 65
cohort = patient.df[
    (patient.df['sex_category'] == 'Female') & 
    (patient.df['age'] >= 65)
]

# Living patients
alive = patient.df[patient.df['death_dttm'].isna()]

# Specific ethnicity
hispanic = patient.df[patient.df['ethnicity_category'] == 'Hispanic']

Joining with Other Tables¶

# Get patient demographics for lab results
labs_with_demographics = labs.df.merge(
    patient.df[['patient_id', 'age', 'sex_category']],
    on='patient_id',
    how='left'
)

# Analyze by demographic groups
lab_by_sex = labs_with_demographics.groupby('sex_category')['lab_value'].mean()

Data Quality Checks¶

# Check for missing demographics
missing_sex = patient.df[patient.df['sex_category'].isna()]
missing_race = patient.df[patient.df['race_category'].isna()]

# Validate age ranges
patient.df['age'] = (pd.Timestamp.now() - patient.df['birth_date']).dt.days / 365.25
invalid_age = patient.df[(patient.df['age'] < 0) | (patient.df['age'] > 120)]

# Check death date consistency
invalid_death = patient.df[
    patient.df['death_dttm'] < patient.df['birth_date']
]

Best Practices¶

Always validate demographic categories against standardized values
Handle missing data appropriately for demographic fields
Calculate age at time of admission, not current date
Protect PHI by using only de-identified patient_ids
Document any demographic data transformations

API Reference¶

For detailed API documentation, see Patient API