PG Madhavan
10 min readOct 7, 2021

--

Causality & Counterfactuals — Role in IoT Digital Twin

Prologue:

For the past six months, I was singularly focused on Causality and IoT; this note is the outcome. It includes propositions about (1) causality, (2) its role in IoT, (3) algorithms for causal structure and causal factor estimation, (4) how to use causal graph simulation for counterfactual experiments, (5) what is a digital twin and (6) why causal graph is an ideal data-driven digital twin.

Preliminaries

My recent foray into Causality and IoT has convinced me that an explanatory article is helpful to crystallize the value of Causality methods for IoT industry.

There are two threads here — (1) Causality & Counterfactuals which are less known and (2) IoT Digital Twin where a lot is known but there is confusion. I will lay out the state-of-the-art in both areas and then bring them together. I will state them in terms of a series of (logical) propositions that explain each idea briefly.

To reduce to propositions, I will need to confine myself to a fairly narrow usage scenario — both topics are vast and Causality is shrouded in confusing philosophical underpinnings (for us lay people). Hence, I will confine my words to a specific IoT use case in this note. It should be obvious that this canonical use case applies to Manufacturing, Oil & Gas, Utilities, Building Management, Smart City, Consumer IoT and so on.

Context — electric vehicles

Here is our canonical use case. In a typical Electric Vehicle (EV), multiple end points are monitored via sensors attached to them. This multichannel data is the starting point of IoT deployments in many other contexts.

EVs are set to become ubiquitous and battery life is a challenge that is not going to be solved anytime soon (if we go by history). Aspiration of EV manufacturers and indeed every driver is to maximize the driving range after a full battery charge.

IoT sensors at strategic locations collect important information simultaneously in real-time, let us say. A current draw sensor at the battery (A), a torque sensor at the motor (B), RPM sensor at the wheels © and ambient temperature sensor (D) are part of a typical IoT data collection regime implemented on an EV.

If our information extraction algorithm in the EV digital twin can estimate causal effects –which ones have cause-effect relationships and what are the magnitudes of the causal effects — one can hope to adjust the electro-mechanical system on the fly so that current draw is minimized without sacrificing driver performance expectations. What is more, this can be done on an individual EV basis since driving style, road and weather conditions are different for different drivers at different times and places. One can hope to have an IoT system that maximizes driving range on a car-by-car basis. For all I know, Tesla is already doing this . . . 😊

Keep this EV example in mind as we discuss Causality & Counterfactuals and the roles they will play in IoT Digital Twins.

Causality & counterfactuals

While causality can be a confusing term to define, we will adopt a simple operational/ engineering characterization.

Causality: “X causes Y” means that changing X alone changes Y

There may be another “A” that also changes “Y” but if you hold everything constant except “X” and a change in X changes Y, we say that “X causes Y”. Should Y change immediately? Can the effect be later? How much later will we allow? We will allow both instantaneous changes and delayed changes (delay guided by time series analysis). Much of the traditional Causality literature considers instantaneous or “STRUCTURAL” causality. IoT data are multichannel time series (almost always) and hence we will allow “LAGGED” causality also. Another type of Causality is “Granger” causality where if X can predict (portions of) Y, X causes Y.

“Correlation is not causation”. Even though we say X is correlated to Y, that is an incorrect characterization; X and Y are correlated is the proper statement — correlation does not have a direction. On the other hand, Causation is directional: (1) X causes Y, (2) Y causes X (3) no cause-effect relationship. This multiplicity is one reason why Causal structure determination and estimation is hard.

Causality can be abstracted as a Directed (Acyclic) Graph — “DAG”. Causal Graph will have directional arrows (links) or no arrows. Link weights are the Causal Factors — X causes “how much” Y. Self-cycles are verboten — understandable since X cannot cause itself, X. Acyclic constraint applies to Structural Causality — it has emerged mainly due to the history of Causality in areas such as Social Science and Health science; clearly, X causing Y and then instantaneously, Y causing X in turn seems implausible in most social or medical interventions.

We conclude that causal graphs in IoT applications are Directed Graphs or “DGs” with no self-cycles. Here is an example.

Figure 1. Typical causal structure and link weights (for illustration purposes only!)

The causal graph above is a made up one (NOT estimated from data) to explain it uses. We see the four EV IoT data channel labels on the right-hand side. They are replicated (but not shown) on each of the four additional vertical lines from right to left.

Figure 1 may not look very familiar to IoT practitioners. All it shows are cause-effect connections (if any) among RPM, Torque, Battery Draw and Ambient Temperature. In a typical IoT dashboard, we may see these four time series plotted (or displayed as a heatmap, etc.). Causal Graph above is NEW visualization for IoT.

The cause-effect relationships in figure 1 show both the instantaneous (structural) ones between the two right-most vertical lines and among their lagged components. (T-d) label on top of each vertical line shows the lags, d.

There are many “within” and “across” channel cause-effect relationships which taken together is displayed as a Causal Graph — we call it “Fence” graph for obvious reasons.

While the causal relationships and their strengths themselves are informative (for example, its use in Condition Monitoring may lead to more sensitive fault detection and prediction), the unprecedented value of Fence Graph is in its use in simulation. Since the links are causal and not correlation, Fence Graph simulations will be physically meaningful.

NOTE that if we had a neural network or deep-learning model of the EV, using them to simulate EV behavior will be misleading; in these and other current ML methodologies, all parameters are estimated using correlations. (Every ML solution today arises from Normal Equation or Wiener-Hopf equation for stochastic linear systems; solution involves only auto and cross correlations). It is self-evident that correlations can be misleading (increase in the number of Pastors in Australia in early 1900s correlated with increasing alcohol consumption, etc.) and hence simulations using them will also be misleading.

Counterfactual experiments are “what-if” analysis that we can perform on Fence Graph by simulating various conditions.

Counterfactual: Counterfactual statements refer to what is possible or impossible . . . as opposed to what happens. In other words, what would have been true under different circumstances. Example: If kangaroos had no tail, they will topple over! From Marletto, “The Science of Can and Can’t” (2021).

We perform Counterfactual experiments AFTER the data are collected and we have a Fence Graph. NOTE that for the what-if experiments we want to perform, we do not want to go back and collect more data under different experimental conditions! This is wasteful and in many IoT use cases, impractical.

As an example, we can ask, “What if the ambient temperature was lower; will it reduce battery discharge rate for the same torque?”. Instead of performing a lab experiment, in a Fence Graph simulation, we can reduce the mean value of Ambient Temperature data stream and with the existing causal links and weights, we can simulate and measure the effect on the other three variable data streams. Of course, Ambient Temperature did not drop in the collected data — so it NOT a fact. But we are able to perform this experiment using casual graph — hence this is a COUNTER-fact(ual) experiment.

If the improvement is desirable and is repeatable across a fleet of vehicles, a cooling system may be added to the battery enclosure in the next EV production run. This is how Counterfactual Experiments can directly lead to performance optimization. Similar scenarios can be imagined in Manufacturing, Oil & Gas, Utilities, Building Management, Smart City and so on.

Digital twins & causal graphs

There is a lot of excitement AND confusion around Digital Twins these days! Here are a few propositions that should help clarify.

There is a consensus evolving that any Digital Twin should be animated by field data in real-time; the time constant is dependent on the use case — refresh rate could be milliseconds to days. This precludes CAD/CAM Physics-based simulations; when they are used in the design of a widget, there is no operating widget physical counterpart to supply field data . . .

Digital Twin: A software counterpart of a physical entity that DYNAMICALLY evolves with field data.

At a minimum, such a digital twin will be useful for show-and-tell; this has some value in condition monitoring — changes in the digital twin displayed in some fashion can alert the human in the loop to impending failure . . .

To fully appreciate various aspects of a Digital Twin, consider figure 2.

Figure 2. Digital twin ecosystem

Digital twin does not exist in isolation — there are essential enablers and interactions with the physical world. The middle row is considered as “Digital Twin” — its main job is the extract useful information from the incoming IoT data, close the loop, now through visualizations for the human-in-the-loop who interacts with the physical world as a result.

In the future, the hope is that the human will be “above-the-loop” most of the time and the Digital Twin will interact directly with the physical world (through PLCs and SCADA systems in the case of Manufacturing).

Some concepts swirling around Digital Twin deserve more exploration:

· Simulation Digital Twin: typically, Physics-based simulations. It is a “model-based” top-down approach to modeling the STRUCTURE of a physical asset in most cases. Even when updated with field data, we are imposing our model via differential equations and its simplified versions (ROM — reduced order models or surrogate models) on the physical system. It is self-evident that such a model is constrained by the assumptions in equations (however precise the equations themselves may be!) that never exactly match reality, approximations in the solvers and physical aspects that simply cannot be modeled as equations.

· AR/VR/MR. They are part of Digital Twin in as much as they help close the loop via visualizations. Some argument may be made that such “rich” visualizations will enhance information extraction at a level higher than a simple dashboard.

Another question that keeps arising is whether “Causal Graph is ML”?

While ML is primarily for Classification and Regression (or Prediction) and Causal Graph can indeed be used for classification (causal factors as features) and prediction (as in counterfactual simulation), the MAIN purpose of Causal Graph is System Analysis, i.e., understanding the underlying system as its internal cause-effect mechanisms.

Hence, Causal Graph goes beyond machine learning and is part of the larger Data Science family.

Causal graph as digital twin

· IoT deployments collect multichannel data (time series) since the end points monitored and their interactions are of significance to the IoT Practitioner.

· Proper processing of multichannel data requires estimating within and across channel influences.

· Multichannel IoT Causal (“MIC”) digital twin estimates the within and across Causal Factors which are super-important to understand root-causes (and hence possible fixes).

· Beyond product design in PLM which focuses on structure, the DYNAMICS of operating devices and interconnected systems are central to the value of IoT.

· MIC digital twin is a purely data-driven, bottom-up, non-parametric method to model the DYNAMICS of the system under consideration.

· Fence Graph which is a type of MIC digital twin allows counterfactual simulation that alone can reveal operational performance improvements that are possible without additional experimentation in the field.

Epilogue

The series of propositions put forward in this note is intended to clarify concepts related to Causality, Counterfactuals and Digital Twins. While full consensus is not expected, I hope we will have a shared basis for future discussions and development.

I have not discussed any of the technical matters in this note. I offer the following articles for in-depth reading and better understanding of my propositions.

For package containing documents that comprise introduction, theory, algorithms & simulation methods for CAUSAL Digital Twin (CDT), go to http://www.syansol.com/ and click on MIC Repository.

These are the documents included in the zip file, “Causal Digital Twin_repository”:

1. A gentle general introduction to multichannel signal processing — Usher Syndrome in IoT

2. Relevance to root-cause analysis — Root Cause Analysis

3. Theory and algorithms for MIC Fence Graph digital twin — Causal Digital Twin theory

4. Bringing together Fence Graph & Counterfactuals — Counterfactual experiments

5. COMPLETE technical details including Fence Graph Simulation -

· TECHNICAL Report: Simulation of Fence Graph

I welcome further direct communication (pgmad@live.com) for clarifications and discussions to establish these propositions.

Dr. PG Madhavan

https://www.linkedin.com/in/pgmad/

#IoT #Simulation #Multichannel #Digitaltwin #Causaldigitaltwin #Fencegraph #Causality

--

--