1. The paradigm of statistical learning is the following: one declare
a. data D
b. A learner with degrees of freedom
c. We do a regularized fit via cross validation on D
2. This assumes that D is representative, but how to know that this is the case? Cross validation does not change the argument.
3. From Taleb: "Statistical regress argument (or the problem of the circularity of statistics): We need data to discover a probability distribution. How do we know if we have enough? From the probability distribution. If it is a Gaussian, then a few points of data will suffice. How do we know it is a Gaussian? From the data. So we need the data to tell us what probability distribution to assume, and we need a probability distribution to tell us how much data we need. This causes a severe regress argument, which is somewhat shamelessly circumvented by resorting to the Gaussian and its kin."
4. distinguishing between risk and uncertainty
See also Handbook of Game theory, Petyon & Zamir, p10-12 (Savage small world)
5. learn to distinguish a cat from a dog with 10^10 pictures has more chance to succeed even with 10^4 parameters than predict a trend on 10^5 points with 10^2 parameters
6. approach 1 essentially overfits: learning is focused on one type of data, and even on a limited sample of this type of data. At best we have a perfect specialization, when the data is stable (eg: pattern recognition: DL / image). Learning can be long (DL:> 20 years ...)
In addition, 1 potentially suffers from self-justification bias (time-to-market is important), ecological maladjustment (if the law of decreasing returns is valid in the world), opportunity cost
7. the learning process itself is not the fit, but
a. The choice of the heuristic (= {learner, choice of metaparameters})
b. Beyond, the iterative learning of this exploration
8. 7.a, b, are today at best in the AI program, certainly not in the Machine Learning in its most common sense which is the pattern recognition (last avatar: Deep Learning)
9. The power of generalization is the key concept. The bias / variance compromise of the theory of statistical learning is only an (anecdotal) mode of this concept as soon as D is not guaranteed as representative (often the most realistic hypothesis)
10. This power of generalization can be seen as an embedding of D ⊂ D_ ⊂ D':
a. D_ area of interest (financial Time Series ...)
b. D' Domain of generalization of the heuristic (RFIM? ...)
11. a heuristic is a rule of thumb acquired in the long term, in relation to an (noisy) environment, CF Gerd Gigerenzer, Henry Brighton "Homo Heuristicus: Why Biased Minds Make Better Inferences"
Http://library.mpib-berlin.mpg.de/ft/gg/GG_Homo_2009.pdf
12. a heuristic tested on D comes from a meta-learning in a rather large (and not accessible) D' to guarantee its robustness.
13. the structure of the environment, its invariants, is a key element of the problem.
We can think (in Gigerenzer) that the natural human environment has favored the emergence of heuristics such as Take-the-best or Tallying. "Cognitive science is increasingly stressing the senses in which the cognitive system performs remarkably well when generalizing from few observations, so much so that human performance is often as optimal" (in Gerenzer)
The spontaneous use of these heuristics hides an essential prior: we already know that they work in the contexts where they are used.
14. In this sense, one can think of looking primarily at the 'natural' priors of the financial world
15. we can use "as if" a heuristic from D' hoping that its power of generalization fits D_
16. Anecdote (?): Harry Markowitz received the Nobel prize in economics for finding the optimal solution, the mean-variance portfolio. When he made his own retirement investments, however, he did not use his optimizing strategy, but instead relied on a simple heuristic: 1 / N, that is, allocate your money equally to each of N alternatives
No comments:
Post a Comment