Are truly individualised predictions of infinitely complex systems inherently impossible?
I’d say: yes.
What structure can perfectly capture system’s complexity? Look-up tables, databases, exhaustive if-else rules, etc. This gives us 100% information. We can answer any queries. Important implication: a unit we ask about is already in the table. Now, take humans as an example. Is it possible to get two exactly the same individuals? From an infinite data regime POV the answer is (theoretically) yes. Practically? Hard no - mostly due to existence of time. So even if we collected all information about all humans (extreme: past, present, future, atom-level info, monitored all the time), the answer is still no. Universe is non-repeatable. Even a clone of Earth would have differences (different space-time). Anything connected to ‘true’ time (excluding synthetic simulations) is non-repeatable.
Result: can’t build perfect info structures of the world -> can’t have individualised predictions.
Things become easier if we deal with finite complexity systems - this is what we get when we build our datasets. They typically record only a small number of properties, making them vast simplifications of the world. Positive of this: a higher chance of getting all combinations of properties (100% info). Negative: this is not the true world anymore. Example: a record in a dataset describing a person in fact connects to multiple people in the world (only all atoms arrangements + space-time info would be unique enough).
So the individualised predictions are only individualised within the realm of the dataset, but in the grand scheme of things they are rather conditional averages (heterogenous groups). Thus, there is no such thing as truly individualised predictions of our world. This is fine if we consciously accept our datasets are simplified realities, but the problem I have with this is that based on analyses of those simplified systems we make decisions about actions we then apply in the real world, which is infinitely more complex than the system/dataset we analysed.
This brings us to RCTs, which, by following this thinking, makes them inherently broken. It’s simply impossible to control for all the variables. It would be more accurate to call them randomised trials. The ‘control’ bit happens only within the simplified reality of the dataset, but there are plenty of other properties not taken into account (hidden confounders). The hope I guess here is that we capture enough variance through the features we selected.
Another way of looking at those things is that error-free predictions of the real world are inherently impossible (can’t do look-up tables). We can only hope to reduce errors. Example: it is possible to do look-up tables for selected features, but there will be errors when applying decisions in the real world.
Increasing modelling complexity will reduce errors and bring us closer to reality (though we can never reach it fully). This is especially true if we are interested in conditional averages. For plain averages it is easier to get usefully-low errors. Simpler systems are of course easier to model but are more disconnected from reality (takes more faith when making real-life decisions).
Although we can never reach zero-error levels, perhaps there is a low-enough threshold where it works well in practice (tolerable mistakes, more useful than harmful).
While simple models were enough to obtain useful average predictions (e.g. linear models), I feel heterogenous targets will require not just more complexity, unimaginably more complexity, likely beyond human comprehension. Example: neural networks are hugely complex and we are nowhere near where we want to be in terms of automated intelligence.
I would argue that designing and handling complex-enough systems will eventually surpass our cognitive capabilities. Our insistence on simple, intuitive and transparent solutions will have to be pushed away at some point (how much longer can we exploit simple things?).
Due to our inherent limitations as human beings, unless we enhance ourselves via brain chips, this route of increased complexity will likely require a system capable of self data collection and self regulation (model training). We will have to build machines that can build even more intelligent machines. This is nothing new and perhaps sounds like a cheap sci-fi movie, but I personally don’t see any other sensible direction if we want to make significant progress and finally make those vacuum cleaners do the job on their own!
D.