Health data warehousing is becoming an important requirement for deriving knowledge from the vast amount of health data that healthcare organizations collect. A data warehouse is vital for collaborative and predictive analytics. The first step in designing a data warehouse is to decide on a suitable data model. This is followed by the extract-transform-load (ETL) process that converts source data to the new data model amenable for analytics.
The OHDSI – OMOP Common Data Model is one such data model that allows for the systematic analysis of disparate observational databases and EMRs. The data from diverse systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools that are already available.
Each data source requires customized ETL tools for this conversion from the source data to CDM. The OHDSI ecosystem has made some tools available for helping the ETL process such as the White Rabbit and the Rabbit In a Hat. However, health data warehousing process is still challenging because of the variability of source databases in terms of structure and implementations.
Hephestus is an open-source python tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. The ultimate aim is to develop a tool that can translate the report from the OHDSI tools into an ETL script with minimal intervention. This is a good python starter project for eHealth geeks.
Anyone anywhere in the world can build their own environment that can store patient-level observational health data, convert their data to OHDSI’s open community data standards (including the OMOP Common Data Model), run open-source analytics using the OHDSI toolkit, and collaborate in OHDSI research studies that advance our shared mission toward reliable evidence generation. Join the journey! here
Disclaimer: Hephestus is just my experiment and is not a part of the official OHDSI toolset.
Serverless is the new kid on the block with services such as AWS Lambda, Google Cloud Functions or Microsoft Azure Functions. Essentially it lets users deploy a function (Function As A Service or FaaS) on the cloud with very little effort. Requirements such as security, privacy, scaling, and availability are taken care of by the framework itself. As healthcare slowly yet steadily progress towards machine learning and AI, serverless is sure to make a significant impact on Health IT. Here I will explain serverless (and some related technologies) for the semi-technical clinicians and put forward some architectural best practices for using serverless in healthcare with FHIR as the data interchange format.
Let us say, your analyst creates a neural network model based on a few million patient records that can predict the risk for MI from BP, blood sugar, and exercise. Let us call this model r = f(bp, bs, e). The model is so good that you want to use it on a regular basis on your patients and better still, you want to share it with your colleagues. So you contact your IT team to make this happen.
This is what your IT guys currently do: First, they create a web application that can take bp, bs and e as inputs using a standard interface such as REST and return r. Next, they rent a virtual machine (VM) from a cloud provider (such as DigitalOcean). Then they convert this application into a container (docker) and deploy it in the VM. You now can use this as an application from your browser (chrome) or your EMR (such as OpenMRS or OSCAR) can directly access this function. You can share it with your colleagues and they can access it in their browsers and you are happy. The VM can support up to 3 users at a time.
In a couple of months, your algorithm becomes so popular that at any one time hundreds of users try to access it and your poor VM crashes most of the time or your users have to wait forever. So you call your IT guys again for a solution. They make 100 copies of your container, but your hospital is reluctant to give you the additional funding required.
Your smart resident notices that your application is being used only in the morning hours and in the night all the 100 containers are virtually sleeping. This is not a good use of the funding dollars. You contact your IT guys again, and they set up Kubernetes for orchestrating the containers according to usage. So, what is Serverless? Serverless is a framework that makes all these so easy that you may not even need your IT guys to do this. (Well, maybe that is an exaggeration)
My personal favourite serverless toolset (if you care) is Kubernetes + Knative + riff. I don’t try to explain what the last two are or how to use them. They are so new that they keep changing every day. In essence, your IT team can complete all the above tasks with few commands typed on the command line on the cloud provider of your choice. The application (function rather) can even scale to zero! (You don’t pay anything when nobody uses it and add more containers as users increase, scaling down in the night as in your case).
What are the best practices when you design such useful cloud-based ‘functions’ for healthcare that can be shared by multiple users and organizations? Well, here are my two cents!
First, you need a standard for data exchange. As JSON is the data format for most APIs, FHIR wins hands down here.
Next, APIs need a mechanism to expose their capabilities and properties to the world. For example, r = f(bp, bs, e) needs to tell everyone what it accepts (bp, bs, e) and what it returns (at the bare minimum). FHIR has a resource specifically for this that has been (not so creatively) named as an Endpoint. So, a function endpoint should return a FHIR Endpoint resource with information about itself if there is no payload.
What should the payload be? Payload should be a FHIR Bundle that has all the FHIR Resources that the function needs (bp, bs and e as FHIR Observations in your case). The bundle should also include a FHIR Subscription resource that points to the receiving system (maybe your EMR) for the response ( r ).
So, what next?
Take the phone and call your IT team. Tell them to take Kubernetes + Knative + riff for a spin! I might do the same and if I do, I will share it here.
A forward-looking McMaster donor is investing $7 million in a new research centre dedicated specifically to tackle the growing global threat of antimicrobial resistance.
David Braley, whose gifts to the university include a $50-million investment in McMaster teaching, learning and health-care research and delivery, has allocated $7 million from that 2007 gift towards the new David Braley Centre for Antibiotic Discovery.
The centre will operate from the Michael G. DeGroote Institute for Infectious Disease Research, whose labs and offices are located on campus in the Michael G. DeGroote Centre for Learning and Discovery.
“This is a very timely investment,” says Paul O’Byrne, dean and vice- president, Faculty of Health Sciences. “This provides fresh resources to a team of researchers who are among the world’s leaders in their field. Creating this centre gives them the chance to do their best work at a time in history when it’s needed most.”
The funding comes from a portion of Braley’s 2007 gift that had been designated for emerging health-care research priorities.
The David Braley Centre for Antibiotic Discovery will be home to McMaster’s leading researchers in the field of antimicrobial resistance, or AMR. The new resources will allow the team to concentrate more specific effort on that problem.
“Antimicrobial resistance is a slow-moving catastrophe, but make no mistake: within the next 30 years, it will kill millions, strangle our health-care systems and significantly alter life as we know it unless we develop new ways to attack the problem,” says Gerry Wright, who heads both the David Braley Centre for Antibiotic Discovery and the Institute for Infectious Disease Research. “The opportunity to open this centre is a hopeful sign, and we are grateful for Mr. Braley’s vision and his vote of confidence. This problem must be solved, and it can be solved.”
The waning effectiveness of traditional antibiotics gives urgency to the search for new forms of antibiotics and other ways to boost the effectiveness of existing drugs.
Widespread use of antibiotics in agriculture and medicine has accelerated resistance to penicillin and its related medicines, as bacteria evolve to meet the threat.
Infection control and treatment without antibiotics could cast the world back to the early 1900s, when infectious diseases routinely killed people, Wright says.
Today, at least 700,000 people around the world – including 2,000 in Canada – die each year as a result of drug-resistant diseases. The global total is expected to rise to 10 million deaths per year by 2050 if no new solutions are found.
The medical costs associated with AMR are predicted to reach $100 trillion within that same time frame.
This year, the United Nations published a report projecting that without immediate global action, AMR could force up to 24 million people into extreme poverty by the year 2030.
The Government of Canada, through FedDev Ontario, is providing McMaster with $1.2 million to expand The Forge, a collaborative makerspace where entrepreneurs can access advanced equipment to design and build innovative new products.
The Honourable Filomena Tassi, Minister of Seniors and Member of Parliament for Hamilton West-Ancaster-Dundas, made the announcement today on behalf of the Honourable Navdeep Bains, Minister of Innovation, Science and Economic Development and minister responsible for FedDev Ontario.
“FedDev Ontario’s funding is providing invaluable support to the innovation community in Hamilton,” said Tassi. “The government of Canada is proud to support McMaster — one of Canada’s premier research-intensive universities — to expand The Forge’s makerspace and allow more companies to develop and bring new products to market.”
The funding will allow The Forge to expand its makerspace as it moves into a 10,000 square-foot facility shared with partner Innovation Factory. It will also purchase additional 3D printers and other fabricating equipment, and increase support to entrepreneurs through mentoring. As a result, the number of companies supported will almost double from 24 to up to 40 annually, with up to 75 new jobs created as a result.
“This strategic investment from the Government of Canada will strengthen the entrepreneurial capacity of our region by providing McMaster’s students and the wider Hamilton community access to the centralized expertise and infrastructure so essential for creating start-ups and business growth opportunities,” said Karen Mossman, Acting Vice-President of Research at McMaster and chair of the McMaster Innovation Park board of directors.
More than 105 tech companies have graduated from The Forge since its founding in 2014, with more than 300 employees hired and $20 million of private and public investment raised.
The Forge’s expansion further enhances McMaster’s entrepreneurial ecosystem and reputation as a leader in developing innovative manufacturing assets, in particular within the McMaster Innovation Park, which is also home to the McMaster Automotive Research Centre (MARC) and the Centre for Biomedical Engineering and Advanced Manufacturing (BEAM).
This is a series on serializing and deploying machine learning pipelines developed using pyspark. Part 1 is here. This is specifically for apache spark and is basically notes to myself.
We will be using the Mleap for serializing the model. I have added below a brief introduction about Mleap copied from their website. For more information, please visit the Mleap website.
MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.
This series is about serializing and deploying. If you are interested in model building, Susan’s article here is an excellent resource.
In part one we imported the dependencies. The next step is to initialize spark and import the data.
We will create and serialize the pipeline next week. I promised to deploy using Java 11 and spring boot 2.1. Java 11 was released on Sept 25 and I feel it can have a huge impact on java based EMRs like OSCAR and OpenMRS. More about that story soon on NuChange Blog!
Machine Learning (ML) and Artificial Intelligence (AI) are the buzzwords lately and it is heartening to find local HSPs scrambling to get on the bandwagon. The emphasis is mostly on creating models which require technical as well as clinical expertise. The quintessential ‘blackbox’ model is a good healthcare analytics exercise, but deploying the model to be useful at the bedside belongs to the IT domain.
This article is about creating a simple model using discharge abstract database (DAD) as the database and Apache spark as the framework, serialize it into a format that can be used externally and building a simple website that deploys the model for users to make predictions. To make this interesting, we will create the website using Java 11 and Spring boot 2.1 that are yet to be released at the time of writing. Both will be released when we reach there. But, please note that this is about deploying a model/pipeline created with spark (which may be an overkill for most projects). Here are some good resources if have small data/simple model.
This post is actually a note to myself as I explore the process. As always the focus is on understanding the process and not on the utility of the model. Feel free to comment below and add your own notes/ideas.
TL;DR the code will be available on our GitHub repository as we progress.
First, let us start with a brief description of Apache Spark. Apache spark is an open-source big-data API with inbuilt cluster computing ability. Spark is highly accessible and offers simple APIs in Python, Java, Scala, and R. I have picked python as I can use the python interpreter at CC right from pycharm IDE. Pyspark is the python library for interacting with spark which can be linked to sys.path at runtime using the findspark library. Most machine learning pipelines are available in pyspark. We will be building a simple logistic regression model. The necessary libraries can be imported as below.
import pyspark.sql.functions asF
from pyspark import SparkContext
from pyspark.mllib.classification import LogisticRegressionWithLBFGS
from pyspark.mllib.util import MLUtils
I will be back again with more next week. In the meantime have a look at DAD and the data dictionary. As always the customary disclaimer below:
Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.
Neural Network and deep-learning are the buzzwords lately. Machine learning has been in vogue for some time, but the easy availability of storage and processing power has made it popular. The interest is palpable in business schools as well. The ML related techniques have not percolated much from the IT departments to business, but everybody seems to be interested. So, let us build a Neural Network model in 10 minutes.
This is the scenario:
You have a collection of independent variables (IV) that predict a dependent variable (DV). You have a theoretical model and want to know if it is good enough. Remember, we are not testing the model. We are just checking how good the IVs are in predicting DV. If they are not good predictors to start with, why waste time conjuring a fancy model! Sounds familiar? Let’s get started.
The three model.add statements represent the three layers in Neural Network. The number after Dense is the number of neurons in each layer. You can play with these values a bit. These settings should work in most business cases. Read this for more information.
Open Source health information systems provide cost-effective tools for healthcare. Even if you are not a coder, you may be able to contribute to open source projects. As a matter of fact, some open-source projects find it difficult to get volunteers to document and test the code. E-Health enthusiasts from the clinical and management fields often want to contribute to popular open source projects, but do not know how.
Open source projects involve a collaboration of people with various skills, often with no way of physically meeting each other. In a complex software product, even a misplaced comma can break the system. How do open source projects effectively collaborate avoiding such code-breaking mistakes? Well, they use some specialized tools and workflows to manage code, many of which are not familiar to non-programmers. In the next few posts, I will introduce you to the most important tool that coders use; the versioning system. We shall discuss Git (the most popular versioning system), from a non-programmers perspective.
This is not for those who are familiar with Git and we will not be discussing advanced Git usage. Hence, let me state the assumptions that I am making about you as the reader. You have not heard of Git before. You are as scared of code as you are scared of python. When you hear Java, the first thing that comes to your mind is the island in Indonesia. You don’t know what ‘typing on the command line’ means. But you own a computer, know how to download and install programs, know how to navigate the web, wants to learn more about contributing to open-source projects and above all want to help save lives especially in resource-deprived areas. Watch the video below for inspiration.
Grounded theory (GT) emerged as a research methodology from medical sociology following the seminal work by Barney Glaser and Anselm Strauss. However, they later developed different views on their original contribution with their supporters leading to the establishment of a classical Glaserian GT and a pragmatic Straussian Grounded Theory. Constant comparison is central in Classical Grounded Theory, and it involves incident to incident comparison for identifying categories, incident to category comparison for refining the categories and category to category comparison for the emergence of the theory.
Glaser’s Classical GT (1) provides guidelines for evaluation of the GT methodology. The evaluation should be based on whether the theory fits the data, whether the theory is understandable to the non-professionals, whether the theory is generalizable to other situations, and whether the theory offers control over the structure and processes.
Strauss and Corbin (2) recommended a strict coding structure elaborating on how to code and structure data. The seminal article by Strauss and Corbin describes three stages of coding: open coding, axial coding, and selective coding. Classical Grounded Theory offers more flexibility than Straussian GT while the latter may be easier to conduct especially for new researchers.
Open coding is the first step where data is broken down analytically, and conceptually similar chunks are grouped together under categories and subcategories. Once the differences between the categories are established, properties and dimensions of each are dissected. Coding in GT may be overwhelming, and scaling up of categories from open coding may be difficult. This leads to the generation of low-level theories. With natural language processing, information systems can help young researchers to make sense of the of data that they have collected during the stage of open coding. QRMine is a software suite for supporting qualitative researchers using NLP. Gtdict is a module that identifies Categories, Properties, and Dimensions in the interview transcript.
Electronic capture of patient data is vital in any health information system. It ‘s hard to bundle every form that a clinician will ever need along with an EMR. The EMRs adopt various strategies to solve this problem, but a general standard is lacking.
Eforms is OSCAR’s solution to this problem. The OSCAR eForms are arguably one of the most useful features of OSCAR and is being used in many settings beyond which it was initially designed for. Community eForms can be downloaded from the OSCAR Canada Users Society.
EForm is not an elegant solution and creating complex eforms require programming expertise. Reporting of data collected through eForms is difficult because of the way in which the data is abstracted as key-value pairs in the database.
Oscar provides basic eForm generator functionality built-in using a form image in the background with controls transposed on top. However, it is not user-friendly and lacks the ability to save and continue the work later.
Watch the video below to see how it is done. This is still being tested and is not ready for production. Contact us for more details. Please report bugs, function requests, and feed backs.