Categories
Healthcare Analytics Information Systems OpenSource

OHDSI OMOP to FHIR mapper

TL;DR Below is an open-source common-line tool for converting an OHDSI OMOP cohort (defined in ATLAS) to a FHIR bundle and vice versa.

Originally published by Bell Eapen at nuchange.ca on July 22, 2020. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

OHDSI OMOP CDM is one of the most popular clinical data models for health data warehouses. The simple, but clinically motivated data structure is intuitively appealing to clinicians leading to its good adoption. In this respect, it has overtaken HL7-V3 which is more robust but has a steeper learning curve, especially for clinicians. The OHDSI OMOP CDM is widely used in the pharmaceutical industry for drug monitoring.

FHIR is emerging as the defacto standard for health system interoperability, owing largely to its simplicity and the use of existing and popular standards such as REST. As NoSQL databases become more and popular in healthcare, FHIR can also be a good persistence schema. It aligns well with search technologies such as elasticsearch.

As both standards are popular, conversion from one to the other may be commonly required. Researchers at Georgia Tech have an open-source tool – GT-FHIR2 – for mapping an existing OHDSI OMOP CDM database as FHIR endpoint. However, conversion between existing systems may not be easy with a full-stack solution. 

I have a simpler solution that I believe will be useful in the following scenarios:

  • To export a cohort to a FHIR based analytics tool.
  • To load new resources to OMOP CDM databases for incremental ETL.

Omopfhirmap is a command-line tool for mapping a OHDSI cohort, defined in ATLAS, to a FHIR bundle that can be optionally submitted to a FHIR server for processing. Conversely, it can process a FHIR bundle and add resources to an existing CDM database ignoring duplicates. Unlike GT-FHIR2, the OMOP on FHIR Project at Georgia Tech omopfhirmap does not expose OMOP database as FHIR endpoints. 

I have used spring-boot and JPA for easy wiring of services and abstraction of database and the hapi-fhir as it is an obvious choice for any java based FHIR applications. It is still work in progress and any help will be appreciated (Refer to CONTRBUTING.md).

Categories
Health Research Methodology Healthcare Analytics

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Originally published by Bell Eapen at nuchange.ca on June 11, 2020. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are archaic, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm


These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

Categories
Health Research Methodology Healthcare Analytics OpenSource

DADpy: The swiss army knife for discharge abstract database

Discharge Abstract Database (DAD) is a Canada-wide database of hospital admission and discharge data excluding the province of Quebec, maintained by the Canadian Institute for Health Information (CIHI). The data points in DAD include patient demographics, comorbidities coded in the International Statistical Classification of Diseases and Related Health Problems (ICD), interventions encoded in the Canadian Classification of Health Interventions (CCI) and the length of stay. DAD is the de-identified 10% sample available under the Data Liberation Initiative (DLI) for academic researchers. DAD is arguably the most comprehensive country-wide discharge dataset in the world.

The Swiss army knife for Discharge Abstract Database

Discharge Abstract Database is used for creating public reports for hospitals, researchers, and the general public. DAD data has also been used for disease-specific research and analysis, including public health, disease surveillance, and health services research. CIHI provides DAD in the SPSS (.sav) format with each record having horizontal fields for 20 comorbidities and 25 interventions. The format is not ideal for slicing and dicing the data for visualization for clinicians to obtain clinical insights.

DADpy provides a set of functions for using the DAD dataset for machine learning and visualization. The package does not include the dataset. Academic researchers can request the DAD dataset from CIHI. This is an unofficial repo, and I’m not affiliated with CIHI. Please retain the disclaimer below in forks.

Installation: (Will add to pypi soon)

pip install https://github.com/E-Health/dadpy/releases/download/1.0.0/dadpy-1.0.0-py3-none-any.whl
from dadpy import DadLoad
from dadpy import DadRead

# with the trailing slash
dl = DadLoad('/path/to/dad/sample/spss/sav/file/') # clin_sample_spss.sav
dr = dad_read(dl.sample)

# records with obesity as pandas df
print(dr.has_diagnosis('E66'))
# Partial gastrectomy for repair of gastric diverticulum
print(dr.has_treatment('1NF80')) 

# comorbidities as dict for visualization
print(dr.comorbidity('E66')) # Obesity
# co-occurance of treatments as dict
print(dr.interventions('1NF80')) # Partial gastrectomy for repair of gastric diverticulum

# Get the one-hot-encoded vector for machine learning
dr.vector(dr.has_diagnosis('E66'), significant_chars=3, include_treatments=True)

We use poetry for development. PR are welcome. Please see CONTRIBUTING.md in the repo. Start by renaming .env.example to .env and add path for tests to run. Add jupiter notebooks to the notebook folder. Include the disclaimer below.

Disclaimer: Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2016-17). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Let us know if you use DADpy for creating interesting jupyter notebooks. 

Categories
Healthcare Analytics OpenSource Resources

OSCAR EMR EForm Export (CSV) to FHIR

This is a simple application to convert a CSV file to a FHIR bundle and post it to a FHIR server in Golang. The OSCAR EMR has an EForm export tool that exports EForms to a CSV file that can be downloaded. This tool can load that CSV file to a FHIR server for consolidated analysis. This tool can be used with any CSV, if columns specified below (CSV format section) are present.

Use Cases

This is useful for family practice groups with multiple OSCAR EMR instances. Analysts at each site can use this to send data to a central FHIR server for centralized data analysis and reporting. Public health agencies using OSCAR or similar health information systems can use this to consolidate data collection.

How to build

First go get all dependencies This package includes three tools (Go build them separately from the cmd folder):

Fhirpost: The application for posting the csv fie to the FHIR server

Serverfhir: A simple FHIR server for testing (requires mongodb). We recommend using PHIS-DW for production.

Report: A simple application for descriptive statistics on the csv file

Format of the CSV file


Using vocabulary such as SNOMED for field names in the E-Form is very useful for consolidated analysis.

Each record should have:

demographicNo → The patient ID
dateCreated
efmfid → The ID of the eform
fdid → The ID of the each form field.
(The Eform export csv of OSCAR typically has all these fields and requires no further processing)

Mapping

  • Bundle with unique patients. All columns mapped to observations.
  • Submitter mapped to Practitioner.
  • Document type bundle with composition as the first entry
  • Unique fullUrls are generated.
  • PatientID is location + demographicNo
  • Budle of 1 composition, 1 practitioner, 1 or more patients, and many observations
  • Validates with R4 schema

How to use:

  • Change the settings in .env
  • You can compile this for Windows, Mac or Linux. Check the fhirmap.go file and make any desired changes. You should be able to figure out the mapping rules from this file.
  • It reads data.csv file from the same folder by default. (can be specified by the -file commandline argument: fhirpost -file=data.csv)
  • Start mongodb and run server and fhirpost in separate windows for testing.
  • On windows, you can just double-click executables to run. (Closes automatically after run)

Privacy and security:
This application does not encrypt the data. Use it only in a secure network.

Disclaimer:
This is an experimental application. Use it at your own risk.

Categories
Healthcare Analytics Research

Elasticsearch for analyzing CORD-19 dataset

COVID-19 Open Research Dataset (CORD-19) is a dataset of approximately 47,000 scholarly articles, about COVID-19, SARS-CoV-2, and related coronaviruses made free to the research community by a coalition of research groups. The articles are provided as JSON files for the global research community to apply natural language processing.

Elasticsearch (ES) is a Lucene based text search engine using schema-free JSON documents. Elasticsearch is fast and has clients libraries available for most programming languages including python. Loading the COVID-19 data on to an ES instance will be helpful for easy search and analysis, all within the comfort of the jupyter notebook. The availability of the Apache spark (spark) connector makes the exchange of data between ES and spark easy. I have listed below, the simple steps to load the files to an ES instance.

First, download and install ES and the ES-spark connector from here, and start ES. Next, Download and install Apache spark from here: https://spark.apache.org/ CORD-19 dataset is available here.

STEP 1: Create a spark session: (add the path to the connector jar)

from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .appName("ElasticSpark-1") \
    .config("spark.driver.extraClassPath", "/path/elasticsearch-hadoop-7.6.2/dist/elasticsearch-spark-20_2.11-7.6.2.jar") \
    .config("spark.es.port","9200") \
    .config("spark.driver.memory", "8G") \
    .config("spark.executor.memory", "12G") \
    .getOrCreate()

STEP 2: Load the JSON files:

path = "/path/data/biorxiv_medrxiv/biorxiv_medrxiv/"
df = spark.read.json(path, multiLine=True)
df.show(1)

+——–+——————–+——————–+——————–+——–+——————–+——————–+
|abstract| back_matter| bib_entries| body_text|metadata| paper_id| ref_entries|
+——–+——————–+——————–+——————–+——–+——————–+——————–+
| []|[[[[456,, 453, 8 …|[[[[R, Zhang, [],…|[[[], [], , i c ,…| [[], ]|28b10724357672324…|[[, Fumagalli M, …|
+——–+——————–+——————–+——————–+——–+——————–+——————–+
only showing top 1 row

STEP 3: Select the required fields:

df3 = df.select(df.paper_id, df.metadata.title, df.metadata.authors, df.abstract.text, df.body_text.text)
df3.show(3)

+——————–+——————–+——————–+——————–+——————–+
| paper_id| metadata.title| metadata.authors| abstract.text| body_text.text|
+——————–+——————–+——————–+——————–+——————–+
|28b10724357672324…| | []| []|[i c , a n t i f …|
|1aa3e788fc6b03c14…|Dark Proteome of …|[[[Himachal Prade…|[Recently emerged…|[World health org…|
|558d318e1655da9f5…|Connectivity anal…|[[[University of …|[We utilized a ce…|[Schizophrenia is…|
+——————–+——————–+——————–+——————–+——————–+
only showing top 3 rows

STEP 4: Create an index and write the spark df into ES:

from elasticsearch import Elasticsearch
es = Elasticsearch()
es.indices.create(index="covid")
df3.write.format("es").mode('overwrite').save("covid")

STEP 5: Do the search!

es.search(index="covid", q="metadata.title:(CD14 OR CD8)")

That’s it! You can now use it for search and do analytics on the returned records. Next, I will show you how to use QRMine on CORD-19!