Baxter Eaves

Components of safe Artificial Intelligence

What artificial intelligence needs — in addition to people — to be safe

Modern AI, on its own, is dangerous. It is brittle and difficult to understand, which makes it an unpredictable element. So then how can we use AI in safety critical applications? Today, advanced AI systems like those in self-driving cars have even more advanced systems mitigating the damage they can do. Those systems are called people. The person behind the wheel of a self-driving car is responsible for harm done by that car, the physician is responsible for the outcome of any treatment done; the flag officer is responsible for the geopolitical ramifications of a military movement. This is why the AI industry has been OK with using black-box technology in these applications: the responsibility has been placed upon the user — who is likely the person that least understands the technology. Ethical discussion aside, this post will discuss what AI needs — other than reliance on people — to be safe.

Interpretability and explainability are insufficient

In recent years, there has been a great deal of focus on so-called "explainable" artificial intelligence (XAI). The goal of XAI is to develop approaches to explain black box models to human decision makers. Opponents of XAI argue that, rather than expending resources developing wrappers around uninterpretable models, that we should focus of developing more general, scaleable interpretable models because explaining an interpretable model tha trivial. Certainly, understanding your AI is good, but both explanation and interpretation require time. Time for a person to observe an explanation, to formulate an interpretation, and to understand. What is happening in that time?

AI/ML exists because it is impossible for people to process and learn from massive data sets, and neither Interpretable nor Explainable AI solve this problem. They simply replace the problem of processing impossible amount of data with scrutinizing impossible amounts of knowledge.

Machine knowledge must be interpretable, and the machine must be aware of, and curate, its knowledge because a person cannot.

Epistemic awareness

Safe AI must be aware of its knowledge, or epistemically aware. This means that it knows what it knows and can tell you when it does not. It can tell you under what conditions it is likely to succeed, and under what conditions it is likely to fail. Furthermore, it can tell you when things look weird (anomalies or data entry errors), and identify and characterize overarching world-level changes that affect its performance and understanding of the world.

In addition to increasing the robustness of an AI, a number of other safety features arise from epistemic awareness.

Learning from streams

Retraining means waiting. Waiting means things are happening that you do not know about and cannot react to. If you learn from data as they come in you have instant information and are able to react to situations with the most up-to-date knowledge possible. Waiting a week for an updated model is unacceptable in health and defense applications where things can change dramatically without warning.

Uncertainty quantification

There are two types of uncertainty: uncertainty associated with the data and uncertainty associated with the model. The model may be very certain that a particular prediction has high variability simply because the data are highly variable; or it may be highly uncertain about a low variance prediction because it just cannot figure out what is going on. We need to know both of these things. If the natural variance of the data is high, we may need to search for a new supporting variable to explain that variance. For example, the height of any human is more variable than the height of any two-year-old human. On the other hand, if the model cannot figure out how to model a prediction, we need to hedge our bets or to ask the system what extra data it needs to do a better job, or to take some other intervention.

Anomaly detection

If an AI has an intuitive model of the world it can know when a datum does not adhere to that model. This is anomaly detection. If the underlying model is nonsense, it cannot sensibly detect anomalies. Often 'outlier' and 'anomaly' are confused. An outlier is a datum that lies too far away from the average. The user must decide how far is too far. It also requires that the concept of 'farness' can be applied to the data. This becomes difficult with categorical data (like eye color), or with mixed variable types. Anomalousness is in the eye of the beholder, and if the AI beholds too differently from the user, there will be incompatibilities which lead to failures, distrust, and disuse.

Novelty detection

Novelty detection is a broader concept than anomaly detection. We think of an anomaly as a one-off observation, while novelty can be an event that changes many observations. For example, a new class arises: our model which is trained to classify images of fruits starts receiving images of a vegetables. Or maybe the interaction between variables changes: the floor of the building in which we run an assay becomes important to prediction because of mold issue. Or maybe the entire world goes crazy because of a global viral pandemic (a far-fetched scenario to be sure). How will your AI respond? It will break. Tragically.

A safe AI must recognize these situations. It must tell the user what changed, when the change occurred and what affect the change has on its performance.


Detecting when things go wrong is great, but detecting and reacting is better. Reaction can take many forms. Maybe the AI re-composes itself to handle world-breaking events, or perhaps it asks for help. For example, systems meant to track power-use behavior pre- and post-global-pandemic can probably be broken into two distinct parts. But it would be good for policy makers to know which model, pre- or post-pandemic, better describes behavior at a given time to help determine the mood of the public at large.

At a smaller scale, a system should be able to ask for information when it is unsure, so it must know which information it needs to improve itself. In this way the AI could direct the data collection process to optimize learning and knowledge. As a patient is admitted to the hospital, health care professionals (HPCs) run tests to ultimately diagnose and treat the patient. At each intervention performed by HPCs the AI could show its belief about the diagnosis, its certainty, and recommend which interventions will help it make the most informed decision.

Conclusion: Ethically transferring responsibility to people

AI decision support safety is a two-part problem. AI with the above features will be safe in a production-wise sense: it will provide vital knowledge quickly, it will be robust to odd data and events, and it will provide a means of self-defense and self-improvement. But this machine is part of a team. And it is not the key player; it is a humble advisor. So, the machine must not only learn safely, it must communicate safely. People have a host of social learning biases that help them to learn incredibly quickly, but that have caused a lot of problems when interacting with machines. In a future post, I'll discuss these issues.

Key Points

To be safe, an artificial intelligence must

  1. Store its knowledge in a way that is naturally human interpretable
  2. Learn from streams of data
  3. Be epistemically aware, that is be aware of its own knowledge which allows it to
    • Detect anomalous data/events
    • Identify knowledge gaps
    • Characterize its performance
    • Detect world-level changes to the data process
  4. Characterize and react to novelty