A paradigm of decision in the uncertain is the hierarchical / compositional approach, 'an idea that pervades almost all attempts to manage complexity' dixit Russel p426
This paradigm is a priori distinct from (and encompasses the) DL (deep learning) in the sense of compression paradigm, CF "compositional learning" 4.c, to which we shall return
We formally see a hierarchy as a tower of symmetries, generally 'orthogonal'.
We can therefore say that the standard category of learning is Grph, the category of graphs, CF Spivak.
Statistical learning Stat often considers features as given. Even in DL the initial processing of the data (Natural language ...) or the structure of the network (CNN ...) contains priors of the designer.
We can distinguish three levels of learning, from the most idealized to the most realistic
a. Features are given
b. Features to be created
c. (Tower of) symmetries to discover
Real learning ranges from c to a.
We distinguish here clearly the notion of decision from the problematic of fit of statistical learning: the compositional paradigm does not require the notion of fit.
Consider for example a family of simple models \( M( \lambda ^{\nu} ,\alpha ^ {\mu} ) \), where \( \lambda ^{\nu}, \nu < \nu_0 \) are scalars, and \( \alpha^{\mu} ,\mu < \mu_0 \) are ‘features’.
Suppose that a person 𝔭 has at his disposal a model \( M \), and is led, in different contexts, to use this model. We can distinguish 3 heuristics
𝔭 chooses a \( \lambda \) for each \( \alpha^\mu \) : \( \lambda^{f_\mu } \) then choises among \( \lambda^{f_{\mu} } * \alpha^{\mu} \)
𝔭 chooses a \( \alpha^{\mu_0} \) then choises a \( \lambda^{\nu} * \alpha^{\mu_0} \)
𝔭 chooses a \( \lambda^{\nu}*\alpha^{\mu} \)
Which correspond respectively to the following three graphs \( { a, b, c } \) :
\begin{array}{r c l}
\lambda & \rightarrow & \alpha \\
\alpha & \rightarrow & \lambda \\
\alpha & \simeq & \lambda \\
\end{array}
We can therefore consider that the three corresponding models are in Grph the category of graphs. In other words, we have 3 Objects \( { a, b, c } \) in Grph.
In Grph, \( a \rightarrow b \) is a possible morphism: it is a kind of 'duality'.
There is, on the other hand, no morphism \( a \rightarrow c \) or \( b \rightarrow c \) in Grph.
Let us denote by \( C ^ 2 \) this category with three elements.
Note that if we worked in PrO, the category of the PreOrder, we would have this 'duality' only as a fonctor between \( C ^ 2 \) and \( C^{2op} \) : \( a \rightarrow b \) is not a morphism in PrO.
But 𝔭 must above all order its triplet \( (\lambda, \alpha, \partial) \).
we can have
\begin{array}{r c l o t}
\partial & \leftarrow & \lambda & \leftarrow & \alpha, \quad equivalent \quad to \quad \partial \lambda \leftarrow \alpha \\
\lambda & \leftarrow & \partial & \leftarrow & \alpha , \quad equivalent \quad to \quad \lambda \leftarrow \partial \alpha \\
\partial & \leftarrow & \lambda & \leftarrow & \partial \leftarrow \alpha , \quad equivalent \quad to \quad \partial \lambda \leftarrow \partial \alpha \\
\partial &\leftarrow & \alpha & \simeq &\lambda , \quad equivalent to \quad \partial \leftarrow \lambda \alpha \\
\end{array}
We have thus defined a category \( C ^ 3 \) still in Grph
Each order corresponds to an 'environment symmetry', CF 'learning falacy' and SGII
\( \lambda \) dominance : scaling dominance
\( \alpha \) dominance : feature dominance
\( \partial \) dominance : complexity dominance
The general economics of decision theory is therefore obviously not the tradeoff bias variance or penalization, but the hierarchization of the different symmetries of the field studied.
We can represent the passage from one domain to another via a morphism within Grph.
In the spirit of "μεταφορά ", to put in relation these models can help to discover the good symmetries of our domain.