Categories
Health Research Methodology Healthcare Analytics OpenSource

DADpy: The swiss army knife for discharge abstract database

Discharge Abstract Database (DAD) is a Canada-wide database of hospital admission and discharge data excluding the province of Quebec, maintained by the Canadian Institute for Health Information (CIHI). The data points in DAD include patient demographics, comorbidities coded in the International Statistical Classification of Diseases and Related Health Problems (ICD), interventions encoded in the Canadian Classification of Health Interventions (CCI) and the length of stay. DAD is the de-identified 10% sample available under the Data Liberation Initiative (DLI) for academic researchers. DAD is arguably the most comprehensive country-wide discharge dataset in the world.

The Swiss army knife for Discharge Abstract Database

Discharge Abstract Database is used for creating public reports for hospitals, researchers, and the general public. DAD data has also been used for disease-specific research and analysis, including public health, disease surveillance, and health services research. CIHI provides DAD in the SPSS (.sav) format with each record having horizontal fields for 20 comorbidities and 25 interventions. The format is not ideal for slicing and dicing the data for visualization for clinicians to obtain clinical insights.

DADpy provides a set of functions for using the DAD dataset for machine learning and visualization. The package does not include the dataset. Academic researchers can request the DAD dataset from CIHI. This is an unofficial repo, and I’m not affiliated with CIHI. Please retain the disclaimer below in forks.

Installation: (Will add to pypi soon)

We use poetry for development. PR are welcome. Please see CONTRIBUTING.md in the repo. Start by renaming .env.example to .env and add path for tests to run. Add jupiter notebooks to the notebook folder. Include the disclaimer below.

Disclaimer: Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2016-17). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Let us know if you use DADpy for creating interesting jupyter notebooks. 

Categories
Health Research Methodology Information Systems

Grounded Theory – QRMine: Qualitative Research support tools in Python.

Grounded theory (GT) emerged as a research methodology from medical sociology following the seminal work by Barney Glaser and Anselm Strauss. However, they later developed different views on their original contribution with their supporters leading to the establishment of a classical Glaserian GT and a pragmatic Straussian Grounded Theory. Constant comparison is central in Classical Grounded Theory, and it involves incident to incident comparison for identifying categories, incident to category comparison for refining the categories and category to category comparison for the emergence of the theory.

Grounded Theory ResearchGlaser’s Classical GT (1) provides guidelines for evaluation of the GT methodology. The evaluation should be based on whether the theory fits the data, whether the theory is understandable to the non-professionals, whether the theory is generalizable to other situations, and whether the theory offers control over the structure and processes.

Strauss and Corbin (2) recommended a strict coding structure elaborating on how to code and structure data. The seminal article by Strauss and Corbin describes three stages of coding: open coding, axial coding, and selective coding. Classical Grounded Theory offers more flexibility than Straussian GT while the latter may be easier to conduct especially for new researchers.

Open coding is the first step where data is broken down analytically, and conceptually similar chunks are grouped together under categories and subcategories. Once the differences between the categories are established, properties and dimensions of each are dissected. Coding in GT may be overwhelming, and scaling up of categories from open coding may be difficult. This leads to the generation of low-level theories. With natural language processing, information systems can help young researchers to make sense of the of data that they have collected during the stage of open coding. QRMine is a software suite for supporting qualitative researchers using NLP. Gtdict is a module that identifies Categories, Properties, and Dimensions in the interview transcript.

QRMine is opensource and is available here. Ideas, comments and pull requests welcome.

Last 3 commits to GitHub Repo:

References:

1.
Glaser BG. The Constant Comparative Method of Qualitative Analysis. Social Problems [Internet]. 1965 Apr;12(4):436–45. Available from: http://dx.doi.org/10.1525/sp.1965.12.4.03a00070
2.
Corbin JM, Strauss A. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology [Internet]. 1990;13(1):3–21. Available from: http://dx.doi.org/10.1007/BF00988593 [Source]