Machine Learning Viewpoint

AI will never replace the doctor. Or will it?

Let me start with the usual and widely accepted narrative: AI is emerging as a major technological disrupter in medicine, crunching lots of data, providing accurate diagnosis and treatment. But it will (and can) never replace a doctor even in specialties amenable to machine-driven automation such as radiology or dermatology​1​. However, these assumptions are based on the current paradigms of medicine constrained by the boundaries of current cognitive abilities. Are we oblivious to a paradigm shift happening in medicine?

The human genome mapping project​2​ and the subsequent democratization of the ‘omics’ fields promised the new paradigm of ‘personalized medicine’ which never really materialized (at least till now)​3​. AI (used here as an encompassing term including big data analytics and machine learning) can potentially take personalized medicine to the realm of holistic medicine. Time will tell whether this paradigm shift will materialize. But it is important to understand how some of the concepts that we take for granted may get redefined and reconceptualized in the new paradigm (if it happens), just as modern medicine emerged from natural and alternative medical traditions.

The major tenets of modern medicine are diagnosis, prognosis and therapeutics (treatment). Diagnosis is the process of bucketing a given case into a pattern of observations that has been previously characterized — often represented by a recognizable name. Diabetes, Hypertension and typhoid fever are examples. The prognosis and the treatment depend on the diagnostic label assigned. Patterns that do not fit into the list emerge from time to time. A pattern that resembled pneumonia that emerged recently in Wuhan, China, caused by a coronavirus was labelled SARS-Cov-2. A common use case of AI in medicine is to assign a given set of observations into one of these named entities (diagnostic decision support systems). The clinical community argues that AI can help a clinician in this process, but cannot replace him or her. One of the main reasons for the clinician’s self-belief in irreplaceability is the fact that AI learns from existing labels — the training data set — that the clinicians themselves prepare.

The process of making a diagnosis is to reduce the stochastic observations in the human body into a set of named patterns (diagnoses) that humans can comprehend, identify and utilize. In an AI-dominated world ‘diagnoses’ lose their relevance as the machines can recognize, identify and utilize a potentially infinite number of patterns and entities. Even if ‘diagnoses’ exist, their number is likely to be huge, much beyond the cognitive capabilities of humans.

Currently, the prognosis of any disease state is based on limited observations and limited data points. Big data will extend these limits thereby making prognostic predictions more accurate. Machine learning models that drive such predictions are likely to be at best partially explainable and at worst complete black boxes. However, explainable or not, such prognostic predictors are likely to improve health system optimizations. The role of clinicians is going to be identifying the variables to optimize.

In the therapeutics realm, AI may push us closer to the promised personalized medicine. Traditional clinical research relying mostly on the ‘rigorous’ randomized controlled trials (RCT) may lose its relevance in the new paradigm. Some argue that RCTs have already become unsustainable with long turnover times and mounting costs. With no two humans having the same omics profile — the level of abstraction introduced by a statistically significant difference between the ‘random’ treatment and control groups — is useful for humans, but not for AI. The emerging methods such as nanotechnology, nanorobotics and 3D printing, combined with advanced predictive analytics, molecular modelling and drug designing would lead to tailored interventions that are created ‘just-in-time’ for every individual according to his or her needs. This process is likely to be beyond the reach of human comprehension, but human intervention may be needed to maintain the flow of data through the system.

‘Health’ is another concept that is taken for granted as something that everybody can instinctively understand. Health is widely recognized as a state of absence of disease. As disease/diagnosis states become infinite, ‘health’ may need a reconceptualization too. Let us call it Health 3.0 for now. Medicine ceases to be the art of restoring health but optimizing Health 3.0. I do not attempt to provide a framework to define Health 3.0 here, but posit that it will include abstract concepts such as happiness and quality of life, paradoxically beyond the cognitive capabilities of AI.

Clinicians may still be irreplaceable, but in helping AI to define health!
Some of the changes that AI and allied technologies can bring are already visible. The omics fields have introduced several subcategories of existing diagnostic entities​4​. In most cases, clinicians ignore these subtypes, seeing things at a higher and manageable level. Reinforcement Learning (RL) algorithms can potentially learn from big data that are not labelled by clinicians​5​. RL is closer to cognitive computing — computerized models that simulate human thought — optimizing ‘reward’, a concept closer to Health 3.0. Computer-aided drug design is becoming increasingly popular supplemented by an enormous amount of data derived from electronic medical records​6​.

I am neither trying to predict the future impact of AI in medicine nor arguing for or against the role of ‘human’ clinicians. The media and the scientific literature are replete with stories of AI approaching and in some cases surpassing, the clinicians in certain tasks. AI may not be an incremental disrupter that may change the way we practice. As paradigms change, some of the questions that we ask today such as — Can AI make the correct diagnosis, Can AI choose the correct treatment — may lose relevance? AI may never replace doctors, but it may change what doctors do and may take us a step closer to holistic medicine!


  1. 1.
    Karches KE. Against the iDoctor: why artificial intelligence should not replace physician judgment. Theor Med Bioeth. Published online April 2018:91-110. doi:10.1007/s11017-018-9442-3
  2. 2.
    Collins FS. The Human Genome Project: Lessons from Large-Scale Biology. Science. Published online April 11, 2003:286-290. doi:10.1126/science.1084564
  3. 3.
    Chen R, Snyder M. Promise of personalized omics to precision medicine. WIREs Syst Biol Med. Published online November 26, 2012:73-82. doi:10.1002/wsbm.1198
  4. 4.
    Boyd S, Galli S, Schrijver I, Zehnder J, Ashley E, Merker J. A Balanced Look at the Implications of Genomic (and Other “Omics”) Testing for Disease Diagnosis and Clinical Care. Genes. Published online September 1, 2014:748-766. doi:10.3390/genes5030748
  5. 5.
    Chen M, Herrera F, Hwang K. Cognitive Computing: Architecture, Technologies and Intelligent Applications. IEEE Access. Published online 2018:19774-19783. doi:10.1109/access.2018.2791469
  6. 6.
    Qian T, Zhu S, Hoshida Y. Use of big data in drug development for precision medicine: an update. Expert Review of Precision Medicine and Drug Development. Published online May 4, 2019:189-200. doi:10.1080/23808993.2019.1617632
Cite this article as: Eapen BR. (July 7, 2021). - AI will never replace the doctor. Or will it?. Retrieved July 29, 2021, from
Information Systems OpenSource


OSCAR (Open Source Clinical Application and Resource) EMR is a web-based electronic medical record (EMR) system initially developed for primary care clinics in Canada. Oscar is a Java spring based web application with a relatively old codebase. OSCAR is widely used in the provinces of Ontario and British Columbia and is supported by many Oscar service providers.

Fast Healthcare Interoperability Resources (FHIR) is an HL7 standard describing data schema and a RESTful API for health information exchange. FHIR is fast emerging as the de-facto standard for interoperability between health information systems because of its simplicity and the use of existing web standards such as REST.

OSCAR being primarily designed for primary care clinics does not support interoperability with other systems out of the box. FHIR in its entirety is not supported by OSCAR. A partial implementation of FHIR to support the immunization dataflow as FHIR bundles is available. One of the requests that constantly pops up in the OSCAR community is the need for a full FHIR API implementation for OSCAR.

We had some initial discussions on how to go about implementing a FHIR API for OSCAR EMR. FHIR is a REST API exposing FHIR Resources such as Patients, Observations and CarePlan as JSON resources. The HAPI-FHIR java library defines all the FHIR resources and the associated functions. The first step in building the API is to map the relatively messy OSCAR data model to FHIR resources. The Patient resource has been mapped and is available in the OSCAR repository. This (/src/main/java/org/oscarehr/integration/fhir/model/ can be used as the template to map other required resources.

The next step is to extend the REST API that is currently available to expose FHIR APIs after authentication. If you have some ideas/expertise/interest in this, please comment below.


Public Health Data Warehouse on FHIR

The Ontario government is building a connected health care system centred around patients, families and caregivers through the newly established Ontario Health Teams (OHT). As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

Public Health Data Warehouse Framework on FHIR

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

Machine Learning

Machine Learning in population health: Creating conditions that ensure good health.

Machine Learning (ML) in healthcare has an affinity for patient-centred care and individual-level predictions. Population health deals with health outcomes in a group of individuals and the outcome distribution in the group. Both individual health and population health are not divergent, but at the same time, both are not the same and may require different approaches. ML in public health applications receives far less attention.

The skills available to public health organizations to transition towards an integrated data analytics is limited. Hence the latest advances in ML and artificial intelligence (AI) have made very little impact on public health analytics and decision making. The biggest barrier is the lack of expertise in conceiving and implementing data warehouse systems for public health that can integrate health information systems currently in use. 

The data in public health organizations are generally scattered in disparate information systems within the region or even within the same organization. Efficient and effective health data warehousing requires a common data model for integrated data analytics. The OHDSI – OMOP Common Data Model allows for the systematic analysis of disparate observational databases and EMRs. However, the emphasis is on patient-level prediction. Research on how patient-centred data models to observation-centred population health data models are the need of the hour.

We are making a difficult yet important transition towards integrated health by providing new ways of delivering services in local communities by local health teams. The emphasis is clearly on digital health. We need efficient and effective digital tools and techniques. Motivated by the Ontario Health Teams’ digital strategy, I have been working on tools to support this transition.

Hephestus is a software tool for ETL (Extract-Transform-Load) for open-source EMR systems such as OSCAR EMR and national datasets such as Discharge Abstract Database (DAD). It is organized into modules to allow code reuse. Hephestus uses SqlAlchemy for database connection and auto-mapping tables to classes and bonobo for managing ETL. Hephaestus aims to support common machine learning workflows such as model building with Apache spark and model deployment using serverless architecture. I am also working on FHIR based standards for ML model deployments.

Hephaestus is a work in progress and any help will be highly appreciated. Hephaestus is an open-source project on GitHub. If you are looking for an open-source project to contribute to Hacktoberfest, consider Hephaestus! 

Machine Learning

Creating, serializing and deploying a machine learning model for healthcare: Part 2

This is a series on serializing and deploying machine learning pipelines developed using pyspark. Part 1 is here. This is specifically for apache spark and is basically notes to myself.

We will be using the Mleap for serializing the model. I have added below a brief introduction about Mleap copied from their website. For more information, please visit the Mleap website.

MLeap is a common serialization format and execution engine for machine learning pipelines. It supports Spark, Scikit-learn and Tensorflow for training pipelines and exporting them to an MLeap Bundle. Serialized pipelines (bundles) can be deserialized back into Spark for batch-mode scoring or the MLeap runtime to power realtime API services.

This series is about serializing and deploying. If you are interested in model building, Susan’s article here is an excellent resource.

In part one we imported the dependencies. The next step is to initialize spark and import the data.

In the above code, you have to set the spark home and path to DAD csv file. Obviously, you can name your app whatever you need. Mleap packages are loaded in the spark session.

To keep it simple, we are going to create a logistic regression model. The required variables are selected:

TLOS_CAT (Total length of stay) is the dependent variable (DV) and the rest are IVs. Please note that the choice of variables may not be ideal, but that is not our focus.

Now, recode TLOS_CAT to binary as we are going to build a logistic regression model.

We will create and serialize the pipeline next week. I promised to deploy using Java 11 and spring boot 2.1. Java 11 was released on Sept 25 and I feel it can have a huge impact on java based EMRs like OSCAR and OpenMRS. More about that story soon on NuChange Blog!

Machine Learning

Creating, serializing and deploying a machine learning model for healthcare: Part 1

Machine Learning (ML) and Artificial Intelligence (AI) are the buzzwords lately and it is heartening to find local HSPs scrambling to get on the bandwagon. The emphasis is mostly on creating models which require technical as well as clinical expertise. The quintessential ‘blackbox’ model is a good healthcare analytics exercise, but deploying the model to be useful at the bedside belongs to the IT domain.

This article is about creating a simple model using discharge abstract database (DAD) as the database and Apache spark as the framework, serialize it into a format that can be used externally and building a simple website that deploys the model for users to make predictions. To make this interesting, we will create the website using Java 11 and Spring boot 2.1 that are yet to be released at the time of writing. Both will be released when we reach there. But, please note that this is about deploying a model/pipeline created with spark (which may be an overkill for most projects). Here are some good resources if have small data/simple model.

This post is actually a note to myself as I explore the process. As always the focus is on understanding the process and not on the utility of the model. Feel free to comment below and add your own notes/ideas.

TL;DR the code will be available on our GitHub repository as we progress.


First, let us start with a brief description of Apache Spark. Apache spark is an open-source big-data API with inbuilt cluster computing ability. Spark is highly accessible and offers simple APIs in Python, Java, Scala, and R. I have picked python as I can use the python interpreter at  CC right from pycharm IDE. Pyspark is the python library for interacting with spark which can be linked to sys.path at runtime using the findspark library. Most machine learning pipelines are available in pyspark. We will be building a simple logistic regression model. The necessary libraries can be imported as below.

I will be back again with more next week. In the meantime have a look at DAD and the data dictionary. As always the customary disclaimer below:

Read Part 2.

Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.


Are you ready to ‘Git’ into Open Source

Open Source health information systems provide cost-effective tools for healthcare. Even if you are not a coder, you may be able to contribute to open source projects. As a matter of fact, some open-source projects find it difficult to get volunteers to document and test the code. E-Health enthusiasts from the clinical and management fields often want to contribute to popular open source projects, but do not know how. 

Open source projects involve a collaboration of people with various skills, often with no way of physically meeting each other. In a complex software product, even a misplaced comma can break the system. How do open source projects effectively collaborate avoiding such code-breaking mistakes? Well, they use some specialized tools and workflows to manage code, many of which are not familiar to non-programmers. In the next few posts, I will introduce you to the most important tool that coders use; the versioning system. We shall discuss Git (the most popular versioning system), from a non-programmers perspective.

This is not for those who are familiar with Git and we will not be discussing advanced Git usage. Hence, let me state the assumptions that I am making about you as the reader. You have not heard of Git before. You are as scared of code as you are scared of python. When you hear Java, the first thing that comes to your mind is the island in Indonesia. You don’t know what ‘typing on the command line’ means. But you own a computer, know how to download and install programs, know how to navigate the web, wants to learn more about contributing to open-source projects and above all want to help save lives especially in resource-deprived areas. Watch the video below for inspiration.

At the end of this journey, you will know how to follow open-source projects and make minor code contributions. This might initiate you into learning computer programming, but that is not my intention. You might even win a free T-Shirt from DigitalOcean. If you are ready to jump right in, follow the steps here


Health Research Methodology Information Systems

Grounded Theory – QRMine: Qualitative Research support tools in Python.

Grounded theory (GT) emerged as a research methodology from medical sociology following the seminal work by Barney Glaser and Anselm Strauss. However, they later developed different views on their original contribution with their supporters leading to the establishment of a classical Glaserian GT and a pragmatic Straussian Grounded Theory. Constant comparison is central in Classical Grounded Theory, and it involves incident to incident comparison for identifying categories, incident to category comparison for refining the categories and category to category comparison for the emergence of the theory.

Grounded Theory ResearchGlaser’s Classical GT (1) provides guidelines for evaluation of the GT methodology. The evaluation should be based on whether the theory fits the data, whether the theory is understandable to the non-professionals, whether the theory is generalizable to other situations, and whether the theory offers control over the structure and processes.

Strauss and Corbin (2) recommended a strict coding structure elaborating on how to code and structure data. The seminal article by Strauss and Corbin describes three stages of coding: open coding, axial coding, and selective coding. Classical Grounded Theory offers more flexibility than Straussian GT while the latter may be easier to conduct especially for new researchers.

Open coding is the first step where data is broken down analytically, and conceptually similar chunks are grouped together under categories and subcategories. Once the differences between the categories are established, properties and dimensions of each are dissected. Coding in GT may be overwhelming, and scaling up of categories from open coding may be difficult. This leads to the generation of low-level theories. With natural language processing, information systems can help young researchers to make sense of the of data that they have collected during the stage of open coding. QRMine is a software suite for supporting qualitative researchers using NLP. Gtdict is a module that identifies Categories, Properties, and Dimensions in the interview transcript.

QRMine is opensource and is available here. Ideas, comments and pull requests welcome.

Last 3 commits to GitHub Repo:


Glaser BG. The Constant Comparative Method of Qualitative Analysis. Social Problems [Internet]. 1965 Apr;12(4):436–45. Available from:
Corbin JM, Strauss A. Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology [Internet]. 1990;13(1):3–21. Available from: [Source]


Oscar eForm Generator

OSCAR Eform Generator
OSCAR Eform Generator

Electronic capture of patient data is vital in any health information system. It ‘s hard to bundle every form that a clinician will ever need along with an EMR. The EMRs adopt various strategies to solve this problem, but a general standard is lacking.

Eforms is OSCAR’s solution to this problem. The OSCAR eForms are arguably one of the most useful features of OSCAR and is being used in many settings beyond which it was initially designed for. Community eForms can be downloaded from the OSCAR Canada Users Society.

EForm is not an elegant solution and creating complex eforms require programming expertise. Reporting of data collected through eForms is difficult because of the way in which the data is abstracted as key-value pairs in the database.

Oscar provides basic eForm generator functionality built-in using a form image in the background with controls transposed on top. However, it is not user-friendly and lacks the ability to save and continue the work later.

Oscar EForm Generator

I have created an online OSCAR eForm generator that solves most of the above-mentioned problems. Here is an advanced OSCAR eform generator with drag and drop controls. You can save the form as a text file and continue editing later after loading the content. It also supports radio-buttons by internally mapping to OSCAR supported code. You can pull OSCAR demographic fields and define complex show/hide rules. You can cut and paste the generated code into the OSCAR eform editor. The form is generated in your browser using javascript and it is not sent or saved on our server.

Watch the video below to see how it is done. This is still being tested and is not ready for production. Contact us for more details. Please report bugs, function requests, and feed backs.

The application is available here: .


Up and running with activiti in 20 minutes

Image Credit: Unsplash @ pixabay

Activiti is a BPMN automation tool that makes communication between business analyst and the developers easy. Activiti has a web-based graphical interface for business analysts to prepare workflows that can be enhanced by developers adding Java code using an Eclipse plugin. It has a lightweight engine that can be embedded in Java applications to deploy the workflow and an explorer for deploying the process definitions online. Activiti also has a REST interface.

Activiti uses an in-memory database by default. Installation requires Java and servlet container like Tomcat. I have created a puppet script to automate activity installation in a virtual machine. The script installs activity Explorer and the REST interface with the mySQL database. if you want to connect to an external database you can make the necessary changes in the properties file within the code folder.

Installation instructions for Activiti

There are ways of creating a virtual Linux machine in your laptop (Mac and Windows). Virtualization leaves your operating system untouched, and the virtual machine can be removed without a trace after use. Without further ado, you can install this in 5 easy steps using my puppet script.

1. Install VirtualBox.
2. Install Vagrant.
3. Download and extract the zip file below to any folder.


4. Windows users double-click run.bat. Mac users run the following command from the download folder.

(The script takes approximately 10 minutes to setup the machine. However, no response is needed from your part. An internet connection is required.)
5. Access in your browser:

  • Access Activiti at http://localhost:8001/activiti-explorer
  • Access REST interface at http://localhost:8001/activiti-rest/services

To stop the machine on windows, use stop.bat and on Mac:

You can restart the machine as step 4 above. Restarting the machine does not require an internet connection.

If you want to destroy (uninstall) the virtual machine, use the following command in the script folder.

Feel free to fork and improve this script on GitHub. Pull-requests are welcome. Join E-Health on GitHub if you want direct write access to the repository.

JK0-022  ”
MB2-707  ”
70-177  ”
70-462  ”
MB5-705  ”
350-018  ”
1z0-434  ”
70-411  ”
70-483  ”
70-480  ”
NS0-157  ”
1Y0-201  ”
000-106  ”
JK0-022  ”
HP0-S42  ”
70-534  ”
OG0-093  ”
100-105  ,”
JK0-022  ”
70-417  ”
70-532  ”
c2010-657  ”
200-120  ”
SY0-401  ”
3002  ”
100-105  ,”
OG0-091  ”
OG0-093  ”
000-104  ”
350-050  ”
400-201  ”
OG0-093  ”
ADM-201  ”
EX300  ”
LX0-103  ”
000-104  ”
000-105  ”
70-246  ”
070-461  ”
MB6-703  ”
70-347  ”
ADM-201  ”
70-462  ”
1V0-601  ”
220-901  ”
9A0-385  ”