Trapped in a Black Box: AI Interpretability and Ethical Obligations to Patients

Is it ethical to use “black box” AI for medical purposes?

This essay was prepared as an entry for the 8th Annual Oxford Uehiro Prize in Practical Ethics.

Photo by Erda Estremera on Unsplash


“We’ve analyzed your results, and unfortunately, it’s malignant.” Emmanuella’s doctor mapped out the “best treatment plan,” but upon expressing personal concerns about it, Emmanuella was disappointed by the doctor‘s brief response: her plan is based on the “most promising artificial intelligence (AI) technology.” Is that a sufficient explanation? While much previous literature discusses the scientific superiority, professional approval, and legal benefits of providing human-level interpretations from biomedical AI models, few discussions focus on patient welfare and rights. Therefore, I dedicate this space to investigate the following question: are uninterpretable (black-box) biomedical AI models ethical with respect to patients and their basic rights? After describing interpretable AI in the context of healthcare, I explore how it aligns with the deontological principles which predominate medical ethics, then examine how other branches of practical ethics might weigh in.[1, 2]

Photo by National Cancer Institute on Unsplash

Background: Interpretable v. Black-box AI in Medicine

AI is the ability of a computer to complete tasks for which humans would need intelligence, and machine learning (ML) is a subfield of AI in which the computer can learn from patterns in data to make a prediction or decision without explicit programming.[1] A ML model is “interpretable” when a human, such as the patient or physician, can understand why and how some prediction or decision was made; perhaps certain genetic variants or comorbidities, for example, largely influenced some disease prediction. Recently, some ML methods have become vastly complex to afford improved predictive ability on large datasets, but there is often a tradeoff for interpretability. The lack of interpretability of these “black-box” models raises concerns for modern ML including whether the model is doing exactly what it should. For example, suppose an image classifier learns to identify images of dogs not by the animal itself but by the tendency of the dogs to wear collars. Ultimately, the classifier might accurately identify the dog images, but it would not be executing the task correctly; it would make mistakes given images of other animals with collars or collarless dogs. Amongst higher impact scenarios, such as in healthcare, the consequences of such a scenario can be dangerous. AI and ML are becoming an integral part of many healthcare processes. Some AI methods make diagnostic predictions, such as the diagnosis of skin cancer based on images of skin lesions, as accurately as a healthcare provider,[2] and some, such as the IDx-DR method to identify diabetic retinopathy, operate without human expertise.[3] Some treatment planning routines, such as radiotherapy planning, are now largely assisted by AI.[4] As AI-assisted healthcare becomes more of a reality, considering the ethics with regards to patient welfare is crucial.

A Deontological Framework for Patient Rights

A common way to establish patient rights in Western medicine is through a deontological framework.[5] Deontology is the belief that there are moral duties which one must fulfill.[6] I address three of these major concepts, patient autonomy, beneficence, and justice, with respect to interpretable AI, below:

Patient Autonomy

Patient autonomy, a pillar of bioethics, is a patient’s right to make personal healthcare decisions based on their own reasons.[5] In the introductory scenario, if the doctor wished to honor Emmanuella’s autonomy, he would respond to her concerns by providing her with the information necessary to understand and accept the plan herself; in other words, he would recognize Emmanuella’s right to informed consent, a cornerstone of respect for patient autonomy.[7] When healthcare providers are required to use technology which provides insufficient explanation, they are prevented from understanding the AI-based recommendations themselves and can not, subsequently, inform patients. Black-box models therefore inhibit the practice of informed consent, violating patient autonomy.

Admittedly, in many cultures, the patient-provider fiduciary relationship, which denotes the ratio of decision-making control between a patient and provider, is accepted to be unequal. The Illinois Supreme Court once stated, for instance, that the patient should have confidence and faith in the physician.[5] The foundation on which this paternalistic relationship style stands, however, is the patient’s trust. One might wonder, then, how the physician should maintain this foundation of trust if (1) it is difficult to tell whether the technology responsible for their treatments or diagnoses is doing exactly what it should, and (2) the decisions are not solely based on the physician’s knowledge but rather a model which the physician was told to trust. Unlike medical studies utilizing the natural sciences, which display evidence and guide the reader through the deduction process, black-box studies only show a general framework for the operations performed to generate predictions. Black-box based medicine, therefore, requires yet a higher degree of trust than traditional medicine.


Lack of interpretability also compromises the patient’s entitlement to beneficence. Beneficence describes the provider’s responsibility to act in the patient’s best interest.[5] When the reasons for a model’s decision or prediction are unclear, determining whether the outcome is in the patient’s best interest is difficult. Naturally, the physician can supplement the automated decision with their own knowledge, thereby using their own expertise to make a final decision. In fact, one may argue that most AI technologies in healthcare are mostly supportive tools meant to guide the physician and challenge human error. While this is true, how might the situation be resolved when some automatically derived decision differs from what the physician would have decided alone? The provider is then put at a crossroads in which there is no basis for making an informed decision because the model lacks interpretability.

Notably, beneficence and autonomy frequently conflict in cases in which the patient requests a different treatment course to the provider’s recommendation. In these cases, beneficence is further clarified as the obligation of the provider to act in the patient’s best interest as the patient understands those interests, defaulting back to informed consent.[7] Referring back to the Patient Autonomy section, therefore, it becomes evident that black-box models are a threat to beneficence in cases of patient-provider conflict.


Finally, justice, defined in the context of healthcare, indicates providing all patients with fair and equal treatment.[5] Unfortunately, racial and gender bias is a legitimate concern for ML-based healthcare, but interpretability can help expose it. Because available training data often contains primarily white or cisgender male participants, some models will make predictions resulting in cisgender men or white people being prioritized over gender and racial minorities.[8] Additionally, conditions may manifest differently between various races and gender identities. For example, the aforementioned skin cancer diagnosis model may frequently misdiagnose darker skinned patients if it was solely trained on images of lighter skinned patients.[9] Implicit model biases such as these challenge the principle of justice. When a model is interpretable, this bias is more easily identified and rectified. However, when a model lacks interpretability, racial or gender bias may go unnoticed. Thus, a patient of minority demographics does not necessarily receive the justice they deserve, and the provider is once again encouraged to make an unjust call, especially in the context of deciding which patients more urgently require certain resources.

The Consequentialist

From a consequentialist perspective, in which one believes in making choices that result in the best net outcome for the most people,[10] uses of such models might still be justified despite the risk of bias. If using a black-box model might improve the overall outcome for a majority of patients, then the best net outcome can be achieved if the model is utilized. Unfortunately, there is no way to objectively measure “outcome” in this situation; perhaps the minority of patients who were not helped or even hurt by the use of the model were cumulatively more affected than those in the majority. Furthermore, a deontologist would argue that there are situations in which it is morally wrong and therefore unethical to sacrifice some, especially those of racial and gender minorities, to improve the outcome for most.[6]

One might add that many healthcare providers will maintain their own sets of biases. It is important to avoid the “two wrongs make a right” fallacy: existing biases do not permit new ones. From a consequentialist perspective, however, using high performance black-box models may actually improve the state of healthcare by automating important decisions on which the provider may hold some bias, thereby minimizing harm done to patients and maximizing beneficence and justice. On the contrary, interpretable models could also be useful to combat provider biases since they possess inherent explanations which can challenge the provider’s mindset. Finally, if black-box models significantly outperform other models, a consequentialist may argue that requiring interpretability hinders progress in the medical research domain, thus limiting a provider’s ability to provide beneficence for patients. We must recall, however, that beneficence often relies on informed consent in cases of conflict with autonomy. Lack of interpretability, as previously mentioned, does not allow a provider to respect the patient’s right to informed consent.

The Egoist

Among various forms of egoism, rational and ethical egoism are two normative takes which state, respectively, that one rationally ought to or morally ought to take the action which leads to greatest self-benefit.[11] The immediate viewpoint that both rational and ethical egoists might take is that since black-box models often show the best performance, researchers ought to continue pursuing them to acquire maximum professional advancement. The egoist perspective, in general, does not strongly consider the welfare of the patients unless the perspective comes from patients themselves. Everybody, however, including healthcare providers and researchers alike, tends to be a patient at some point in life. Given that researchers, healthcare providers, and patients are not disjoint sets, both the ethical and rational egoist perspectives could also argue that professionals ought to work toward using interpretable AI models for medicine to simultaneously protect their own patient rights while advancing their scientific careers.

Other Scientific Alternatives

Regarding confidence in a model’s ability to improve healthcare outcomes, black-box models for biomedical purposes are often externally validated on independent datasets, increasing confidence that the predictions derived from the model are useful. External validation is important, but it is not a replacement for interpretability. When the training and validation sets are similar, then the model is likely to perform well, but this does not indicate that it is doing what it is supposed to; it may only indicate that the two datasets are sufficiently similar to one another.

Alternatively, one might employ explainability methods on black-box models. In contrast to interpretability, which is typically understood as a built-in characteristic, explainability methods tend to be post-hoc methods to produce explanations regarding how various features or parameters of the model affected the outcome. Some may argue that using explainability methods on high performance black-box models gives more reliable explanations than using an interpretable model with lower performance. Explainability critics, however, point out that many black-box explanations will not accurately reflect the computations of the original model; if they did, the model would be interpretable.[12] Whether explainability methods can be equally as reliable as inherent interpretability is, therefore, controversial.


Ultimately, healthcare systems exist to serve patients, so consideration of patient rights and welfare is fundamental to any ethical discussion regarding healthcare. With AI becoming more involved in medical processes, it is imperative to evaluate whether we can ethically use such technologies. Interpretability provides: (1) patients with sufficient explanations to protect their autonomy, (2) providers with enough clarity to maximize beneficence for patients, and (3) researchers the ability to identify and target model bias to enforce justice. Black-box models do not afford enough of a human-level understanding to preserve the basic deontological principles which underlie patient rights in a multitude of healthcare systems and should therefore be avoided for medical purposes. In reality, there are likely situations in which no interpretable alternatives perform as well as black-box models. In these scenarios, the provider may consider the black-box model the best option for the patient, reintroducing the beneficence-autonomy conflict. As before, the provider should explain the options and reasoning to allow the patient informed consent: do they wish to trust the black-box?

  1. Stonier, Tom. ‘The Evolution of Machine Intelligence’. In Beyond Information: The Natural History of Intelligence, 107–33. London: Springer London, 1992.₆.
  2. Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
  3. Savoy, M. (2020). IDx-DR for Diabetic Retinopathy Screening. American Family Physician, 101(5), 307–308.
  4. Wang, C., Zhu, X., Hong, J. C., & Zheng, D. (2019). Artificial intelligence in radiotherapy treatment planning: Present and future. Technology in Cancer Research & Treatment, 18, 1533033819873922.
  5. Olejarczyk, Jacob P., & Young, Michael. (2021). Patient Rights and Ethics. In StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing.
  6. Alexander, L., & Moore, M. (2007). Deontological Ethics. Stanford Encyclopedia of Philosophy.
  7. Eyal, N. (2011) Informed Consent. Stanford Encyclopedia of Philosophy.
  8. Tat, E., Bhatt, D. L., & Rabbat, M. G. (2020). Addressing bias: Artificial intelligence in cardiovascular medicine. The Lancet Digital Health, 2(12), e635–e636.
  9. Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D., & Tzovara, A. (2021). Addressing bias in big data and AI for health care: A call for open science. Patterns, 2(10), 100347.
  10. Sinnott-Armstrong, W. (2003). Consequentialism. Stanford Encyclopedia of Philosophy.
  11. Shaver, R. (2002). Egoism. Stanford Encyclopedia of Philosophy.
  12. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store