Categories
HIS

Public Health Data Warehouse on FHIR

The Ontario government is building a connected health care system centred around patients, families and caregivers through the newly established Ontario Health Teams (OHT). As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

Public Health Data Warehouse Framework on FHIR

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

Categories
Research

Researchers discover new toxin that impedes bacterial growth

This article was first published on Brighter World. Read the original article.

An international research collaboration has discovered a new bacteria-killing toxin that shows promise of impacting superbug infectious diseases.

The discovery of this growth-inhibiting toxin, which bacteria inject into rival bacteria to gain a competitive advantage, was published today in the journal Nature.

The discovery is the result of teamwork by co-senior authors John Whitney, assistant professor of the Department of Biochemistry and Biomedical Sciences at McMaster University, and Mike Laub, professor of biology at the Massachusetts Institute of Technology (MIT).

Whitney and his PhD student Shehryar Ahmad at McMaster’s Michael G. DeGroote Institute for Infectious Disease Research were studying how bacteria secrete antibacterial molecules when they came across a new toxin. This toxin was an antibacterial enzyme, one the researchers had never seen before.

After determining the molecular structure of this toxin, Whitney and Ahmad realized that it resembles enzymes that synthesize a well-known bacterial signalling molecule called (p)ppGpp. This molecule normally helps bacteria survive under stressful conditions, such as exposure to antibiotics.

“The 3D structure of this toxin was at first puzzling because no known toxins look like enzymes that make (p)ppGpp, and (p)ppGpp itself is not a toxin,” said Ahmad.

Suspecting the toxin might kill bacteria by overproducing harmful quantities of (p)ppGpp, the McMaster team shared their findings with Laub, an investigator of the U.S. Howard Hughes Medical Institute.

Boyuan Wang, a postdoctoral researcher in the Laub lab who specializes in (p)ppGpp signaling, examined the activity of the newly discovered enzyme. He soon realized that rather than making (p)ppGpp, this enzyme instead produced a poorly understood but related molecule called (p)ppApp. Somehow, the production of (p)ppApp was harmful to bacteria.

The researchers determined that the rapid production of (p)ppApp by this enzyme toxin depletes cells of a molecule called ATP. ATP is often referred to as the ‘energy currency of the cell’ so when the supply of ATP is exhausted, essential cellular processes are compromised and the bacteria die.

“I find it absolutely fascinating that evolution has essentially “repurposed” an enzyme that normally helps bacteria survive antibiotic treatment and, instead, has deployed it for use as an antibacterial weapon,” said Whitney.

The research conducted at McMaster University was funded by the Canadian Institutes for Health Research and is affiliated with the CIHR Institute for Infection and Immunity (CIHR-III) hosted at McMaster University with additional funding from the David Braley Centre for Antibiotic Discovery. The research at MIT was supported by the Howard Hughes Medical Institute and the U.S. National Institutes of Health.

“This is an important discovery with potential implications for developing alternatives to antibiotics, a global priority in the fight against antimicrobial resistance. It is heartening to see that young Canadian researchers like Dr. Whitney are thriving and emerging as leaders in this area,” said Charu Kaushic, scientific director of the CIHR-III and a professor of pathology and molecular medicine at McMaster.

Categories
Machine Learning

Machine Learning in population health: Creating conditions that ensure good health.

Machine Learning (ML) in healthcare has an affinity for patient-centred care and individual-level predictions. Population health deals with health outcomes in a group of individuals and the outcome distribution in the group. Both individual health and population health are not divergent, but at the same time, both are not the same and may require different approaches. ML in public health applications receives far less attention.

The skills available to public health organizations to transition towards an integrated data analytics is limited. Hence the latest advances in ML and artificial intelligence (AI) have made very little impact on public health analytics and decision making. The biggest barrier is the lack of expertise in conceiving and implementing data warehouse systems for public health that can integrate health information systems currently in use. 

The data in public health organizations are generally scattered in disparate information systems within the region or even within the same organization. Efficient and effective health data warehousing requires a common data model for integrated data analytics. The OHDSI – OMOP Common Data Model allows for the systematic analysis of disparate observational databases and EMRs. However, the emphasis is on patient-level prediction. Research on how patient-centred data models to observation-centred population health data models are the need of the hour.

We are making a difficult yet important transition towards integrated health by providing new ways of delivering services in local communities by local health teams. The emphasis is clearly on digital health. We need efficient and effective digital tools and techniques. Motivated by the Ontario Health Teams’ digital strategy, I have been working on tools to support this transition.

Hephestus is a software tool for ETL (Extract-Transform-Load) for open-source EMR systems such as OSCAR EMR and national datasets such as Discharge Abstract Database (DAD). It is organized into modules to allow code reuse. Hephestus uses SqlAlchemy for database connection and auto-mapping tables to classes and bonobo for managing ETL. Hephaestus aims to support common machine learning workflows such as model building with Apache spark and model deployment using serverless architecture. I am also working on FHIR based standards for ML model deployments.

Hephaestus is a work in progress and any help will be highly appreciated. Hephaestus is an open-source project on GitHub. If you are looking for an open-source project to contribute to Hacktoberfest, consider Hephaestus! 

Categories
Uncategorized

OSCAR in a BOX – Virtualized and fault-tolerant OSCAR EMR

Originally published by Bell Eapen at nuchange.ca on August 20, 2019. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

TL;DROSCAR in a BOX is a fault-tolerant OSCAR instance that you can use out of the box and is virtually maintenance-free!

Image credit DarkoStojanovic @ Pixabay

OSCAR EMR is an open-source Electronic Medical Record (EMR) for the Canadian family physicians. The official OSCAR repository is available here: https://bitbucket.org/oscaremr/

OSCAR is a spring java application deployed in a tomcat container with MySQL database backend. OSCAR project being relatively old, with few users outside Canada, has struggled to keep pace with the developments in the electronic health records domain. However, OSCAR is still useful and popular among family physicians and some public health organizations as it is free and well supported.

Oscar is known for its support for the billing workflow, data collection forms (eForms) and comprehensive patient charts (eCharts). Some of the limitations of OSCAR include lack of scalability beyond a handful of users and limited support for data analytics. Oscar by design is hard to be virtualized as a docker container. Availability of a docker container is crucial for sustainable and fault-tolerant deployment on the cloud and distributed systems such as Kubernetes.

Docker is the world’s leading software container platform, used mostly for DevOps. Docker is also useful for developers to set up a development environment in a few easy steps. I was one of the first few who worked on virtualizing OSCAR. Thanks for all those who forked (and hopefully used) this repository.

I have continued my work on OSCAR docker container and has been successful in creating a (reasonably stable) container. It is now available on docker hub. I am now working on a fault-tolerant deployment of OSCAR in customized hardware. I (and some of my friends who know about and encouraged this project) call it OSCAR in a BOX! It has multiple instances of OSCAR with each instance capable of self-healing when a JAVA process hangs (fairly common for OSCAR). The database is replicated, and both the database and documents incrementally back up to an additional disk.

OSCAR in a BOX is ideal for family physicians who wish to adopt OSCAR but does not have the technical support for maintaining the system. OSCAR in a BOX is plug and play and is virtually maintenance-free. The virtualization workflow will also be useful for existing bigger user groups reeling under the sluggish pace of OSCAR. Please let me know if anybody is interested in collaborating.

BTW, did you check out Drishti?

Originally published by Bell Eapen at nuchange.ca on August 20, 2019. If you have some feedback, reach out to the author on Twitter,  LinkedIn or  Github.

Categories
HIS

Drishti | An Open mHealth sense-plan-act framework based on FHIR!

Originally published by Bell Eapen at nuchange.ca on August 13, 2019. If you have some feedback, reach out to the author on TwitterLinkedIn  or  Github.

TL;DRHere is an open-source mHealth framework based on FHIR! and here is the paper and my presentation at ICSE!

Pervasive health monitoring is becoming less and less intrusive with better sensors, and more and more useful with machine learning and predictive analytics.

MHealth (mobile health) could play an important part in pervasive health monitoring. It is difficult for clinicians to efficiently use the data from disparate apps that do not communicate with each other. For example, if a clinician has to monitor a patient’s blood sugar, blood pressure and physical activity, the clinician may have to check data from multiple apps. Another challenge is the difficulty in communicating clinical requirements to app developers and it is difficult to test and approve the clinical validity of these apps. Besides, there are always privacy and security concerns with personal health information.

Open mHealth is a framework introduced to manage the problem of interoperability between apps. It is an open-source project. Open mHealth project provides interfaces for cloud services such as GoogleFit and Fitbit and converts the data into a common data format. BIT model deals with the communication problem between clinicians and developers during app development. Drishti incorporates Open mHealth framework into the BIT model using FHIR as the common data model.

The BIT model is based on the Sense-Plan-Act paradigm from robotics. The BIT model encourages conceptualizing mHealth apps as three distinct components: Profilers that sense data on various physiological parameters such as blood pressure, planners that create a clinical intervention plan and actors that deliver the plan to the users as alerts or messages on their mobile devices. Drishti adopts the BIT model as a design model with all components sharing a central data repository. Drishti makes sharing of information with the doctors easy, by integrating it into an EMR. The central data repository also makes big data applications possible.

The central data repository in Drishti uses FHIR schema for storage. FHIR is a schema for health data created by HL7 that defines ‘Resources’ that can be exchanged as json or xml using RESTful interfaces. Resources support 80% of common use cases and the rest can be supported using extensions. For example, age and gender are defined for a Patient resource, while skin type that is not commonly used is defined through an extension if required. Drishti uses the ‘Observation’ resource for storing data from profilers and the ‘CarePlan’ resource for the planner and actor components.

Open mHealth is the profiler in Drishti. All data from the various cloud services are converted to FHIR Observations and stored in the Drishti-Cog. The Drishti-Planner can take data stored in the cog and create a careplan and the actor can deliver it to the patient. Drishti uses OpenMRS EMR for managing access, both for clinicians and patients. We have developed an OpenMRS module for integration with Drishti. The javascript visualization library called hGraph provides a consolidated view of the data pulled from sensors to the clinician.

In the current implementation, the cog is a FHIR server based on the HAPI java library. Planner and actor components are just stubs that can be extended for several use cases. The planner is a python flask app and the viewer is a Vue App that can be used as a native mobile app. Both are templates that can be extended. The entire stack is available on GitHub along with pre-built Docker containers for quick prototyping.

Here is a typical use case. Depression is a common mental health problem, characterized by loss of interest in activities that you normally enjoy. Patients with depression are typically treated with anti-depressant drugs. The clinicians need to track the patient’s activity to assess progress along with medication compliance. The patient can use an activity tracker app and a medication tracker app, both sending data to the cog as FHIR observations. The clinicians can have a consolidated view in their EMR and create alerts or messages (plan) that can be delivered to the patient’s mobile device. The interventions can also be created by AI systems.

Drishti was presented at Software Engineering in Healthcare conference in Montreal and selected for FHIR devdays. Please cite Drishti as below:

Bell Raj Eapen, Norm Archer, Kamran Sartipi, and Yufei Yuan. 2019. Drishti: a sense-plan-act extension to open mHealth framework using FHIR. In Proceedings of the 1st International Workshop on Software Engineering for Healthcare (SEH ’19). IEEE Press, Piscataway, NJ, USA, 49-52. DOI: https://doi.org/10.1109/SEH.2019.00016

Categories
OpenSource Resources

Hephestus: Health data warehousing tool for public health and clinical research

Originally published by Bell Eapen at nuchange.ca on November 3, 2018. If you have some feedback, reach out to the author on TwitterLinkedIn or Github.

Health data warehousing is becoming an important requirement for deriving knowledge from the vast amount of health data that healthcare organizations collect. A data warehouse is vital for collaborative and predictive analytics. The first step in designing a data warehouse is to decide on a suitable data model. This is followed by the extract-transform-load (ETL) process that converts source data to the new data model amenable for analytics.

The OHDSI – OMOP Common Data Model is one such data model that allows for the systematic analysis of disparate observational databases and EMRs. The data from diverse systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools that are already available.

Each data source requires customized ETL tools for this conversion from the source data to CDM. The OHDSI ecosystem has made some tools available for helping the ETL process such as the White Rabbit and the Rabbit In a Hat. However, health data warehousing process is still challenging because of the variability of source databases in terms of structure and implementations.

Hephestus is an open-source python tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. The ultimate aim is to develop a tool that can translate the report from the OHDSI tools into an ETL script with minimal intervention. This is a good python starter project for eHealth geeks.

Anyone anywhere in the world can build their own environment that can store patient-level observational health data, convert their data to OHDSI’s open community data standards (including the OMOP Common Data Model), run open-source analytics using the OHDSI toolkit, and collaborate in OHDSI research studies that advance our shared mission toward reliable evidence generation. Join the journey! here

Disclaimer: Hephestus is just my experiment and is not a part of the official OHDSI toolset.

  • SSH URL
  • Clone URL
Categories
HIS

Serverless on FHIR: Management guidelines for the semi-technical clinician!

Originally published at nuchange.ca on February 12, 2018. If you have some feedback, reach out to the author on TwitterLinkedIn or Github.

Serverless is the new kid on the block with services such as AWS Lambda, Google Cloud Functions or Microsoft Azure Functions. Essentially it lets users deploy a function (Function As A Service or FaaS) on the cloud with very little effort. Requirements such as security, privacy, scaling, and availability are taken care of by the framework itself. As healthcare slowly yet steadily progress towards machine learning and AI, serverless is sure to make a significant impact on Health IT. Here I will explain serverless (and some related technologies) for the semi-technical clinicians and put forward some architectural best practices for using serverless in healthcare with FHIR as the data interchange format.

artificial intelligence
Serverless on FHIR

Let us say, your analyst creates a neural network model based on a few million patient records that can predict the risk for MI from BP, blood sugar, and exercise. Let us call this model r = f(bp, bs, e). The model is so good that you want to use it on a regular basis on your patients and better still, you want to share it with your colleagues. So you contact your IT team to make this happen.

This is what your IT guys currently do: First, they create a web application that can take bp, bs and e as inputs using a standard interface such as REST and return r. Next, they rent a virtual machine (VM) from a cloud provider (such as DigitalOcean). Then they convert this application into a container (docker) and deploy it in the VM. You now can use this as an application from your browser (chrome) or your EMR (such as OpenMRS or OSCAR) can directly access this function. You can share it with your colleagues and they can access it in their browsers and you are happy. The VM can support up to 3 users at a time.

In a couple of months, your algorithm becomes so popular that at any one time hundreds of users try to access it and your poor VM crashes most of the time or your users have to wait forever. So you call your IT guys again for a solution. They make 100 copies of your container, but your hospital is reluctant to give you the additional funding required.

Your smart resident notices that your application is being used only in the morning hours and in the night all the 100 containers are virtually sleeping. This is not a good use of the funding dollars. You contact your IT guys again, and they set up Kubernetes for orchestrating the containers according to usage. So, what is Serverless? Serverless is a framework that makes all these so easy that you may not even need your IT guys to do this. (Well, maybe that is an exaggeration)

My personal favourite serverless toolset (if you care) is Kubernetes + Knative + riff. I don’t try to explain what the last two are or how to use them. They are so new that they keep changing every day. In essence, your IT team can complete all the above tasks with few commands typed on the command line on the cloud provider of your choice. The application (function rather) can even scale to zero! (You don’t pay anything when nobody uses it and add more containers as users increase, scaling down in the night as in your case).

Best Practices

What are the best practices when you design such useful cloud-based ‘functions’ for healthcare that can be shared by multiple users and organizations? Well, here are my two cents!

First, you need a standard for data exchange. As JSON is the data format for most APIs, FHIR wins hands down here.

Next, APIs need a mechanism to expose their capabilities and properties to the world. For example, r = f(bp, bs, e) needs to tell everyone what it accepts (bp, bs, e) and what it returns (at the bare minimum). FHIR has a resource specifically for this that has been (not so creatively) named as an Endpoint. So, a function endpoint should return a FHIR Endpoint resource with information about itself if there is no payload.

What should the payload be? Payload should be a FHIR Bundle that has all the FHIR Resources that the function needs (bp, bs and e as FHIR Observations in your case). The bundle should also include a FHIR Subscription resource that points to the receiving system (maybe your EMR) for the response ( r ).

So, what next?

Take the phone and call your IT team. Tell them to take
Kubernetes + Knative + riff for a spin! I might do the same and if I do, I will share it here.

Categories
Research

eHealth against antimicrobial resistance

This article was first published on Brighter World. Read the original article.

A forward-looking McMaster donor is investing $7 million in a new research centre dedicated specifically to tackle the growing global threat of antimicrobial resistance.

David Braley, whose gifts to the university include a $50-million investment in McMaster teaching, learning and health-care research and delivery, has allocated $7 million from that 2007 gift towards the new David Braley Centre for Antibiotic Discovery.

The centre will operate from the Michael G. DeGroote Institute for Infectious Disease Research, whose labs and offices are located on campus in the Michael G. DeGroote Centre for Learning and Discovery.

Researchers associated with the new David Braley Centre for Antibiotic Discovery. Photo by Georgia Kirkos.

“This is a very timely investment,” says Paul O’Byrne, dean and vice- president, Faculty of Health Sciences. “This provides fresh resources to a team of researchers who are among the world’s leaders in their field. Creating this centre gives them the chance to do their best work at a time in history when it’s needed most.”

The funding comes from a portion of Braley’s 2007 gift that had been designated for emerging health-care research priorities.

The David Braley Centre for Antibiotic Discovery will be home to McMaster’s leading researchers in the field of antimicrobial resistance, or AMR. The new resources will allow the team to concentrate more specific effort on that problem.

“Antimicrobial resistance is a slow-moving catastrophe, but make no mistake: within the next 30 years, it will kill millions, strangle our health-care systems and significantly alter life as we know it unless we develop new ways to attack the problem,” says Gerry Wright, who heads both the David Braley Centre for Antibiotic Discovery and the Institute for Infectious Disease Research.  “The opportunity to open this centre is a hopeful sign, and we are grateful for Mr. Braley’s vision and his vote of confidence. This problem must be solved, and it can be solved.”

Dr. Gerry Wright standing at a podium in a hallway in front of a black curtain at the opening of the David Braley Centre for Antibiotic Discovery
Gerry Wright, director of the Institute for Infectious Disease Research and the new David Braley Centre for Antibiotic Discovery, addresses the crowd at the opening of the new research centre. Photo by Georgia Kirkos.

The waning effectiveness of traditional antibiotics gives urgency to the search for new forms of antibiotics and other ways to boost the effectiveness of existing drugs.

Widespread use of antibiotics in agriculture and medicine has accelerated resistance to penicillin and its related medicines, as bacteria evolve to meet the threat.

Infection control and treatment without antibiotics could cast the world back to the early 1900s, when infectious diseases routinely killed people, Wright says.

Today, at least 700,000 people around the world – including 2,000 in Canada ­­– die each year as a result of drug-resistant diseases. The global total is expected to rise to 10 million deaths per year by 2050 if no new solutions are found.

The medical costs associated with AMR are predicted to reach $100 trillion within that same time frame.

Close-up of a lab coat that reads "David Braley Centre for Antibiotic Discovery."
Photo by Georgia Kirkos.

This year, the United Nations published a report projecting that without immediate global action, AMR could force up to 24 million people into extreme poverty by the year 2030.

Categories
Research Resources

McMaster’s start-up incubator to receive $1.2 million from FedDev Ontario

This article was first published on Daily News. Read the original article.

The Government of Canada, through FedDev Ontario, is providing McMaster with $1.2 million to expand The Forge, a collaborative makerspace where entrepreneurs can access advanced equipment to design and build innovative new products.

 

Forge

The Honourable Filomena Tassi, Minister of Seniors and Member of Parliament for Hamilton West-Ancaster-Dundas, made the announcement today on behalf of the Honourable Navdeep Bains, Minister of Innovation, Science and Economic Development and minister responsible for FedDev Ontario.

“FedDev Ontario’s funding is providing invaluable support to the innovation community in Hamilton,” said Tassi. “The government of Canada is proud to support McMaster — one of Canada’s premier research-intensive universities — to expand The Forge’s makerspace and allow more companies to develop and bring new products to market.”

The funding will allow The Forge to expand its makerspace as it moves into a 10,000 square-foot facility shared with partner Innovation Factory. It will also purchase additional 3D printers and other fabricating equipment, and increase support to entrepreneurs through mentoring. As a result, the number of companies supported will almost double from 24 to up to 40 annually, with up to 75 new jobs created as a result.

“This strategic investment from the Government of Canada will strengthen the entrepreneurial capacity of our region by providing McMaster’s students and the wider Hamilton community access to the centralized expertise and infrastructure so essential for creating start-ups and business growth opportunities,” said Karen Mossman, Acting Vice-President of Research at McMaster and chair of the McMaster Innovation Park board of directors.

More than 105 tech companies have graduated from The Forge since its founding in 2014, with more than 300 employees hired and $20 million of private and public investment raised.

The Forge’s expansion further enhances McMaster’s entrepreneurial ecosystem and reputation as a leader in developing innovative manufacturing assets, in particular within the McMaster Innovation Park, which is also home to the McMaster Automotive Research Centre (MARC) and the Centre for Biomedical Engineering and Advanced Manufacturing (BEAM).

This article was first published on Daily News. Read the original article.

Categories
Machine Learning

Creating, serializing and deploying a machine learning model for healthcare: Part 2

This is a series on serializing and deploying machine learning pipelines developed using pyspark. Part 1 is here. This is specifically for apache spark and is basically notes to myself.

We will be using the Mleap for serializing the model. I have added below a brief introduction about Mleap copied from their website. For more information, please visit the Mleap website.

MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.

This series is about serializing and deploying. If you are interested in model building, Susan’s article here is an excellent resource.

In part one we imported the dependencies. The next step is to initialize spark and import the data.

 _logger = logging.getLogger(__name__)
    findspark.init(ConfigParams.__SPARK_HOME__)

    # Configuration
    conf = SparkConf(). \
        setAppName('BellSpark')
    # Spark Session replaces SparkContext
    spark = SparkSession.builder. \
        appName("BellSparkTest1"). \
        config('spark.jars.packages',
               'ml.combust.mleap:mleap-spark-base_2.11:0.9.3,ml.combust.mleap:mleap-spark_2.11:0.9.3'). \
        config(conf=conf). \
        getOrCreate()

    # Read csv
    df = spark.read.csv(ConfigParams.__DAD_PATH__, header=True, inferSchema=True)

In the above code, you have to set the spark home and path to DAD csv file. Obviously, you can name your app whatever you need. Mleap packages are loaded in the spark session.

To keep it simple, we are going to create a logistic regression model. The required variables are selected:

# Select the columns that we need
    df = df.select('TLOS_CAT', 'ACT_LCAT', 'ALC_LCAT', \
                    'ICDCOUNT', 'CCICOUNT')

TLOS_CAT (Total length of stay) is the dependent variable (DV) and the rest are IVs. Please note that the choice of variables may not be ideal, but that is not our focus.

Now, recode TLOS_CAT to binary as we are going to build a logistic regression model.

# Change all NA to 0
    df = df.na.fill(0)

    # Recode TLOS_CAT to binary
    df = df \
        .withColumn('TLOS_CAT_NEW', F.when(df.TLOS_CAT <= 5, 0).otherwise(1)) \
        .drop(df.TLOS_CAT)

    df.printSchema()

We will create and serialize the pipeline next week. I promised to deploy using Java 11 and spring boot 2.1. Java 11 was released on Sept 25 and I feel it can have a huge impact on java based EMRs like OSCAR and OpenMRS. More about that story soon on NuChange Blog!