Categories
Health Research Methodology Healthcare Analytics OpenSource

DADpy: The swiss army knife for discharge abstract database

Discharge Abstract Database (DAD) is a Canada-wide database of hospital admission and discharge data excluding the province of Quebec, maintained by the Canadian Institute for Health Information (CIHI). The data points in DAD include patient demographics, comorbidities coded in the International Statistical Classification of Diseases and Related Health Problems (ICD), interventions encoded in the Canadian Classification of Health Interventions (CCI) and the length of stay. DAD is the de-identified 10% sample available under the Data Liberation Initiative (DLI) for academic researchers. DAD is arguably the most comprehensive country-wide discharge dataset in the world.

The Swiss army knife for Discharge Abstract Database

Discharge Abstract Database is used for creating public reports for hospitals, researchers, and the general public. DAD data has also been used for disease-specific research and analysis, including public health, disease surveillance, and health services research. CIHI provides DAD in the SPSS (.sav) format with each record having horizontal fields for 20 comorbidities and 25 interventions. The format is not ideal for slicing and dicing the data for visualization for clinicians to obtain clinical insights.

DADpy provides a set of functions for using the DAD dataset for machine learning and visualization. The package does not include the dataset. Academic researchers can request the DAD dataset from CIHI. This is an unofficial repo, and I’m not affiliated with CIHI. Please retain the disclaimer below in forks.

Installation: (Will add to pypi soon)

We use poetry for development. PR are welcome. Please see CONTRIBUTING.md in the repo. Start by renaming .env.example to .env and add path for tests to run. Add jupiter notebooks to the notebook folder. Include the disclaimer below.

Disclaimer: Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2016-17). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Let us know if you use DADpy for creating interesting jupyter notebooks. 

Categories
Healthcare Analytics OpenSource Resources

OSCAR EMR EForm Export (CSV) to FHIR

This is a simple application to convert a CSV file to a FHIR bundle and post it to a FHIR server in Golang. The OSCAR EMR has an EForm export tool that exports EForms to a CSV file that can be downloaded. This tool can load that CSV file to a FHIR server for consolidated analysis. This tool can be used with any CSV, if columns specified below (CSV format section) are present.

Use Cases

This is useful for family practice groups with multiple OSCAR EMR instances. Analysts at each site can use this to send data to a central FHIR server for centralized data analysis and reporting. Public health agencies using OSCAR or similar health information systems can use this to consolidate data collection.

How to build

First go get all dependencies This package includes three tools (Go build them separately from the cmd folder):

Fhirpost: The application for posting the csv fie to the FHIR server

Serverfhir: A simple FHIR server for testing (requires mongodb). We recommend using PHIS-DW for production.

Report: A simple application for descriptive statistics on the csv file

Format of the CSV file


Using vocabulary such as SNOMED for field names in the E-Form is very useful for consolidated analysis.

Each record should have:

demographicNo → The patient ID
dateCreated
efmfid → The ID of the eform
fdid → The ID of the each form field.
(The Eform export csv of OSCAR typically has all these fields and requires no further processing)

Mapping

  • Bundle with unique patients. All columns mapped to observations.
  • Submitter mapped to Practitioner.
  • Document type bundle with composition as the first entry
  • Unique fullUrls are generated.
  • PatientID is location + demographicNo
  • Budle of 1 composition, 1 practitioner, 1 or more patients, and many observations
  • Validates with R4 schema

How to use:

  • Change the settings in .env
  • You can compile this for Windows, Mac or Linux. Check the fhirmap.go file and make any desired changes. You should be able to figure out the mapping rules from this file.
  • It reads data.csv file from the same folder by default. (can be specified by the -file commandline argument: fhirpost -file=data.csv)
  • Start mongodb and run server and fhirpost in separate windows for testing.
  • On windows, you can just double-click executables to run. (Closes automatically after run)

Privacy and security:
This application does not encrypt the data. Use it only in a secure network.

Disclaimer:
This is an experimental application. Use it at your own risk.

Categories
Information Systems OpenSource

OSCAR EMR and FHIR

OSCAR (Open Source Clinical Application and Resource) EMR is a web-based electronic medical record (EMR) system initially developed for primary care clinics in Canada. Oscar is a Java spring based web application with a relatively old codebase. OSCAR is widely used in the provinces of Ontario and British Columbia and is supported by many Oscar service providers.

Fast Healthcare Interoperability Resources (FHIR) is an HL7 standard describing data schema and a RESTful API for health information exchange. FHIR is fast emerging as the de-facto standard for interoperability between health information systems because of its simplicity and the use of existing web standards such as REST.

OSCAR being primarily designed for primary care clinics does not support interoperability with other systems out of the box. FHIR in its entirety is not supported by OSCAR. A partial implementation of FHIR to support the immunization dataflow as FHIR bundles is available. One of the requests that constantly pops up in the OSCAR community is the need for a full FHIR API implementation for OSCAR.

We had some initial discussions on how to go about implementing a FHIR API for OSCAR EMR. FHIR is a REST API exposing FHIR Resources such as Patients, Observations and CarePlan as JSON resources. The HAPI-FHIR java library defines all the FHIR resources and the associated functions. The first step in building the API is to map the relatively messy OSCAR data model to FHIR resources. The Patient resource has been mapped and is available in the OSCAR repository. This (/src/main/java/org/oscarehr/integration/fhir/model/Patient.java) can be used as the template to map other required resources.

The next step is to extend the REST API that is currently available to expose FHIR APIs after authentication. If you have some ideas/expertise/interest in this, please comment below.

Categories
OpenSource Resources

UMLS APIs for clinical vocabularies

Originally published by Bell Eapen at nuchange.ca on August 20, 2019. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

UMLS, or Unified Medical Language System, is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.

Natural Language Processing (NLP) on the vast amount of data captured by electronic medical records (EMR) is gaining popularity. The recent advances in machine learning (ML) algorithms and the democratization of high-performance computing (HPC) have reduced the technical challenges in NLP. However, the real challenge is not the technology or the infrastructure, but the lack of interoperability — in this case, the inconsistent use of terminology systems.


natural language processing
UMLS for NLP

NLP tasks start with recognizing medical terms in the corpus of text and converting it into a standard terminology space such as SNOMED and ICD. This requires a terminology mapping service that can do this mapping in an easy and consistent manner. The Unified Medical Language System (UMLS) terminology server is the most popular for integrating and distributing key terminology, classification and coding standards. The consistent use of  UMLS resources leads to effective and interoperable biomedical information systems and services, including EMRs.


To make things easier, UMLS provides both REST-based and SOAP-based services that can be integrated into software applications. A high-level library that encapsulated these services, making the REST calls easy to the user is required for the efficient use of these resources.  Umlsjs is one such high-level library for the UMLS REST web services for javascript. It is free, open-source and available on NPM, making it easy to integrate into any javascript (for browsers) or any nodejs applications.


The umlsjs package is available on GitHub and the NPM. It is still work in progress and any coding/documentation contributions are welcome. Please read the CONTRIBUTING.md file on the repository for instructions. If you use it and find any issues, please report it on GitHub.


Categories
Healthcare Analytics Research

Elasticsearch for analyzing CORD-19 dataset

COVID-19 Open Research Dataset (CORD-19) is a dataset of approximately 47,000 scholarly articles, about COVID-19, SARS-CoV-2, and related coronaviruses made free to the research community by a coalition of research groups. The articles are provided as JSON files for the global research community to apply natural language processing.

Siouxsie Wiles and Toby Morris / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

Elasticsearch (ES) is a Lucene based text search engine using schema-free JSON documents. Elasticsearch is fast and has clients libraries available for most programming languages including python. Loading the COVID-19 data on to an ES instance will be helpful for easy search and analysis, all within the comfort of the jupyter notebook. The availability of the Apache spark (spark) connector makes the exchange of data between ES and spark easy. I have listed below, the simple steps to load the files to an ES instance.

First, download and install ES and the ES-spark connector from here, and start ES. Next, Download and install Apache spark from here: https://spark.apache.org/ CORD-19 dataset is available here.

STEP 1: Create a spark session: (add the path to the connector jar)

STEP 2: Load the JSON files:

+——–+——————–+——————–+——————–+——–+——————–+——————–+
|abstract| back_matter| bib_entries| body_text|metadata| paper_id| ref_entries|
+——–+——————–+——————–+——————–+——–+——————–+——————–+
| []|[[[[456,, 453, 8 …|[[[[R, Zhang, [],…|[[[], [], , i c ,…| [[], ]|28b10724357672324…|[[, Fumagalli M, …|
+——–+——————–+——————–+——————–+——–+——————–+——————–+
only showing top 1 row

STEP 3: Select the required fields:

+——————–+——————–+——————–+——————–+——————–+
| paper_id| metadata.title| metadata.authors| abstract.text| body_text.text|
+——————–+——————–+——————–+——————–+——————–+
|28b10724357672324…| | []| []|[i c , a n t i f …|
|1aa3e788fc6b03c14…|Dark Proteome of …|[[[Himachal Prade…|[Recently emerged…|[World health org…|
|558d318e1655da9f5…|Connectivity anal…|[[[University of …|[We utilized a ce…|[Schizophrenia is…|
+——————–+——————–+——————–+——————–+——————–+
only showing top 3 rows

STEP 4: Create an index and write the spark df into ES:

STEP 5: Do the search!

That’s it! You can now use it for search and do analytics on the returned records. Next, I will show you how to use QRMine on CORD-19!

Categories
Research Resources

McMaster develops tool for COVID-19 battle

This article was first published on Brighter World. Read the original article.

McMaster University researchers have developed a tool to share with the international health sciences community which can help determine how the coronavirus that causes COVID-19 is spreading and whether it is evolving.

Simply put, the tool is a set of molecular ‘fishing hooks’ to isolate the virus, SARS-CoV-2, from biological samples. This allows laboratory researchers to gain insight into the properties of the isolated virus COVID-19 by then using a technology called next-generation sequencing.

The details were published on Preprints.org.

“You wouldn’t use this technology to diagnose the patient, but you could use it to track how the virus evolves over time, how it transmits between people, how well it survives outside the body, and to find answers to other questions,” said principal investigator Andrew McArthur, associate professor of biochemistry and biomedical sciences, and a member of the Michael G. DeGroote Institute for Infectious Disease Research (IIDR) at McMaster.

“Our tool, partnered with next-generation sequencing, can help scientists understand, for example, if the virus has evolved between patient A and patient B.”

McArthur points out that the standard technique to isolate the virus involves culturing it in cells in contained labs by trained specialists. The McMaster tool gives a faster, safer, easier and less-expensive alternative, he said.

“Not every municipality or country will have specialized labs and researchers, not to mention that culturing a virus is dangerous,” he said.

“This tool removes some of these barriers and allows for more widespread testing and analyses.”

First author Jalees Nasir, a PhD candidate in biochemistry and biomedical sciences at McMaster, has been working with McMaster and Sunnybrook Health Sciences Centre researchers to develop a bait capture tool that can specifically isolate respiratory viruses. When news recently broke of COVID-19, Nasir knew he could develop a “sequence recipe” to help researchers to isolate the novel virus more easily.

“When you have samples from a patient, for example, it can consist of a combination of virus, bacteria and human material, but you’re really only interested in the virus,” Nasir said. “It’s almost like a fishing expedition. We are designing baits that we can throw into the sample as hooks and pull out the virus from that mixture.”

The decision was made to release the sequences publicly without the normal practice of peer-review or clinical evaluation to ensure this tool was available to all quickly, recognizing the urgency of the situation, said McArthur.

The research team plans to collaborate with Sunnybrook for further testing but also hopes other scientists can quickly perform their own validation.

McArthur added that a postdoctoral fellow in his lab, David Speicher, is currently communicating details of the technology to the international clinical epidemiology community.

“Since we’re dealing with an outbreak, there was no value in us doing a traditional academic study and the experiments,” said McArthur. “We designed this tool and are releasing it for use by others.

“In part, we’re relying on our track record of knowing what we are doing, but we’re also relying on people who have the virus samples in hand being able to do the validation experiment so that it’s reliable.”

The research was funded by the Comprehensive Antibiotic Resistance Database at McMaster.

This article was first published on Brighter World. Read the original article.

Categories
HIS

Public Health Data Warehouse on FHIR

The Ontario government is building a connected health care system centred around patients, families and caregivers through the newly established Ontario Health Teams (OHT). As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

Public Health Data Warehouse Framework on FHIR

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

Categories
Research

Researchers discover new toxin that impedes bacterial growth

This article was first published on Brighter World. Read the original article.

An international research collaboration has discovered a new bacteria-killing toxin that shows promise of impacting superbug infectious diseases.

The discovery of this growth-inhibiting toxin, which bacteria inject into rival bacteria to gain a competitive advantage, was published today in the journal Nature.

The discovery is the result of teamwork by co-senior authors John Whitney, assistant professor of the Department of Biochemistry and Biomedical Sciences at McMaster University, and Mike Laub, professor of biology at the Massachusetts Institute of Technology (MIT).

Whitney and his PhD student Shehryar Ahmad at McMaster’s Michael G. DeGroote Institute for Infectious Disease Research were studying how bacteria secrete antibacterial molecules when they came across a new toxin. This toxin was an antibacterial enzyme, one the researchers had never seen before.

After determining the molecular structure of this toxin, Whitney and Ahmad realized that it resembles enzymes that synthesize a well-known bacterial signalling molecule called (p)ppGpp. This molecule normally helps bacteria survive under stressful conditions, such as exposure to antibiotics.

“The 3D structure of this toxin was at first puzzling because no known toxins look like enzymes that make (p)ppGpp, and (p)ppGpp itself is not a toxin,” said Ahmad.

Suspecting the toxin might kill bacteria by overproducing harmful quantities of (p)ppGpp, the McMaster team shared their findings with Laub, an investigator of the U.S. Howard Hughes Medical Institute.

Boyuan Wang, a postdoctoral researcher in the Laub lab who specializes in (p)ppGpp signaling, examined the activity of the newly discovered enzyme. He soon realized that rather than making (p)ppGpp, this enzyme instead produced a poorly understood but related molecule called (p)ppApp. Somehow, the production of (p)ppApp was harmful to bacteria.

The researchers determined that the rapid production of (p)ppApp by this enzyme toxin depletes cells of a molecule called ATP. ATP is often referred to as the ‘energy currency of the cell’ so when the supply of ATP is exhausted, essential cellular processes are compromised and the bacteria die.

“I find it absolutely fascinating that evolution has essentially “repurposed” an enzyme that normally helps bacteria survive antibiotic treatment and, instead, has deployed it for use as an antibacterial weapon,” said Whitney.

The research conducted at McMaster University was funded by the Canadian Institutes for Health Research and is affiliated with the CIHR Institute for Infection and Immunity (CIHR-III) hosted at McMaster University with additional funding from the David Braley Centre for Antibiotic Discovery. The research at MIT was supported by the Howard Hughes Medical Institute and the U.S. National Institutes of Health.

“This is an important discovery with potential implications for developing alternatives to antibiotics, a global priority in the fight against antimicrobial resistance. It is heartening to see that young Canadian researchers like Dr. Whitney are thriving and emerging as leaders in this area,” said Charu Kaushic, scientific director of the CIHR-III and a professor of pathology and molecular medicine at McMaster.

Categories
Machine Learning

Machine Learning in population health: Creating conditions that ensure good health.

Machine Learning (ML) in healthcare has an affinity for patient-centred care and individual-level predictions. Population health deals with health outcomes in a group of individuals and the outcome distribution in the group. Both individual health and population health are not divergent, but at the same time, both are not the same and may require different approaches. ML in public health applications receives far less attention.

The skills available to public health organizations to transition towards an integrated data analytics is limited. Hence the latest advances in ML and artificial intelligence (AI) have made very little impact on public health analytics and decision making. The biggest barrier is the lack of expertise in conceiving and implementing data warehouse systems for public health that can integrate health information systems currently in use. 

The data in public health organizations are generally scattered in disparate information systems within the region or even within the same organization. Efficient and effective health data warehousing requires a common data model for integrated data analytics. The OHDSI – OMOP Common Data Model allows for the systematic analysis of disparate observational databases and EMRs. However, the emphasis is on patient-level prediction. Research on how patient-centred data models to observation-centred population health data models are the need of the hour.

We are making a difficult yet important transition towards integrated health by providing new ways of delivering services in local communities by local health teams. The emphasis is clearly on digital health. We need efficient and effective digital tools and techniques. Motivated by the Ontario Health Teams’ digital strategy, I have been working on tools to support this transition.

Hephestus is a software tool for ETL (Extract-Transform-Load) for open-source EMR systems such as OSCAR EMR and national datasets such as Discharge Abstract Database (DAD). It is organized into modules to allow code reuse. Hephestus uses SqlAlchemy for database connection and auto-mapping tables to classes and bonobo for managing ETL. Hephaestus aims to support common machine learning workflows such as model building with Apache spark and model deployment using serverless architecture. I am also working on FHIR based standards for ML model deployments.

Hephaestus is a work in progress and any help will be highly appreciated. Hephaestus is an open-source project on GitHub. If you are looking for an open-source project to contribute to Hacktoberfest, consider Hephaestus! 

Categories
Uncategorized

OSCAR in a BOX – Virtualized and fault-tolerant OSCAR EMR

Originally published by Bell Eapen at nuchange.ca on August 20, 2019. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

TL;DROSCAR in a BOX is a fault-tolerant OSCAR instance that you can use out of the box and is virtually maintenance-free!

Image credit DarkoStojanovic @ Pixabay

OSCAR EMR is an open-source Electronic Medical Record (EMR) for the Canadian family physicians. The official OSCAR repository is available here: https://bitbucket.org/oscaremr/

OSCAR is a spring java application deployed in a tomcat container with MySQL database backend. OSCAR project being relatively old, with few users outside Canada, has struggled to keep pace with the developments in the electronic health records domain. However, OSCAR is still useful and popular among family physicians and some public health organizations as it is free and well supported.

Oscar is known for its support for the billing workflow, data collection forms (eForms) and comprehensive patient charts (eCharts). Some of the limitations of OSCAR include lack of scalability beyond a handful of users and limited support for data analytics. Oscar by design is hard to be virtualized as a docker container. Availability of a docker container is crucial for sustainable and fault-tolerant deployment on the cloud and distributed systems such as Kubernetes.

Docker is the world’s leading software container platform, used mostly for DevOps. Docker is also useful for developers to set up a development environment in a few easy steps. I was one of the first few who worked on virtualizing OSCAR. Thanks for all those who forked (and hopefully used) this repository.

I have continued my work on OSCAR docker container and has been successful in creating a (reasonably stable) container. It is now available on docker hub. I am now working on a fault-tolerant deployment of OSCAR in customized hardware. I (and some of my friends who know about and encouraged this project) call it OSCAR in a BOX! It has multiple instances of OSCAR with each instance capable of self-healing when a JAVA process hangs (fairly common for OSCAR). The database is replicated, and both the database and documents incrementally back up to an additional disk.

OSCAR in a BOX is ideal for family physicians who wish to adopt OSCAR but does not have the technical support for maintaining the system. OSCAR in a BOX is plug and play and is virtually maintenance-free. The virtualization workflow will also be useful for existing bigger user groups reeling under the sluggish pace of OSCAR. Please let me know if anybody is interested in collaborating.

BTW, did you check out Drishti?

Originally published by Bell Eapen at nuchange.ca on August 20, 2019. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.