Machine Learning

Creating, serializing and deploying a machine learning model for healthcare: Part 2

please share

This is a series on serializing and deploying machine learning pipelines developed using pyspark. Part 1 is here. This is specifically for apache spark and is basically notes to myself.

We will be using the Mleap for serializing the model. I have added below a brief introduction about Mleap copied from their website. For more information, please visit the Mleap website.

MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.

This series is about serializing and deploying. If you are interested in model building, Susan’s article here is an excellent resource.

In part one we imported the dependencies. The next step is to initialize spark and import the data.

In the above code, you have to set the spark home and path to DAD csv file. Obviously, you can name your app whatever you need. Mleap packages are loaded in the spark session.

To keep it simple, we are going to create a logistic regression model. The required variables are selected:

TLOS_CAT (Total length of stay) is the dependent variable (DV) and the rest are IVs. Please note that the choice of variables may not be ideal, but that is not our focus.

Now, recode TLOS_CAT to binary as we are going to build a logistic regression model.

We will create and serialize the pipeline next week. I promised to deploy using Java 11 and spring boot 2.1. Java 11 was released on Sept 25 and I feel it can have a huge impact on java based EMRs like OSCAR and OpenMRS. More about that story soon on NuChange Blog!

Bell Eapen

A dermatologist and an eHealth expert with expertise in Healthcare analytics, mHealth, health information exchange, benefits evaluation research, change management, and population informatics.[Resume]
please share

One thought on “Creating, serializing and deploying a machine learning model for healthcare: Part 2

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.