Discharge Abstract Database (DAD) is a Canada-wide database of hospital admission and discharge data excluding the province of Quebec, maintained by the Canadian Institute for Health Information (CIHI). The data points in DAD include patient demographics, comorbidities coded in the International Statistical Classification of Diseases and Related Health Problems (ICD), interventions encoded in the Canadian Classification of Health Interventions (CCI) and the length of stay. DAD is the de-identified 10% sample available under the Data Liberation Initiative (DLI) for academic researchers. DAD is arguably the most comprehensive country-wide discharge dataset in the world.
Discharge Abstract Database is used for creating public reports for hospitals, researchers, and the general public. DAD data has also been used for disease-specific research and analysis, including public health, disease surveillance, and health services research. CIHI provides DAD in the SPSS (.sav) format with each record having horizontal fields for 20 comorbidities and 25 interventions. The format is not ideal for slicing and dicing the data for visualization for clinicians to obtain clinical insights.
DADpy provides a set of functions for using the DAD dataset for machine learning and visualization. The package does not include the dataset. Academic researchers can request the DAD dataset from CIHI. This is an unofficial repo, and I’m not affiliated with CIHI. Please retain the disclaimer below in forks.
Installation: (Will add to pypi soon)
pip install https://github.com/E-Health/dadpy/releases/download/1.0.0/dadpy-1.0.0-py3-none-any.whl
from dadpy import DadLoad
from dadpy import DadRead
# with the trailing slash
dl = DadLoad('/path/to/dad/sample/spss/sav/file/') # clin_sample_spss.sav
dr = dad_read(dl.sample)
# records with obesity as pandas df
# Partial gastrectomy for repair of gastric diverticulum
# comorbidities as dict for visualization
print(dr.comorbidity('E66')) # Obesity
# co-occurance of treatments as dict
print(dr.interventions('1NF80')) # Partial gastrectomy for repair of gastric diverticulum
# Get the one-hot-encoded vector for machine learning
dr.vector(dr.has_diagnosis('E66'), significant_chars=3, include_treatments=True)
We use poetry for development. PR are welcome. Please see CONTRIBUTING.md in the repo. Start by renaming .env.example to .env and add path for tests to run. Add jupiter notebooks to the notebook folder. Include the disclaimer below.
Disclaimer: Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2016-17). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.
Let us know if you use DADpy for creating interesting jupyter notebooks.