Friday, 30 December 2016

learning as categorification IV


A paradigm of decision in the uncertain is the hierarchical / compositional approach, 'an idea that pervades almost all attempts to manage complexity' dixit Russel p426
This paradigm is a priori distinct from (and encompasses the) DL (deep learning) in the sense of compression paradigm, CF "compositional learning" 4.c, to which we shall return

We formally see a hierarchy as a tower of symmetries, generally 'orthogonal'.

We can therefore say that the standard category of learning is Grph, the category of graphs, CF Spivak.

Statistical learning Stat often considers features as given. Even in DL the initial processing of the data (Natural language  ...) or the structure of the network (CNN ...) contains priors of the designer.
We can distinguish three levels of learning, from the most idealized to the most realistic
a. Features are given
b. Features to be created
c. (Tower of) symmetries to discover
Real learning ranges from c to a.

We distinguish here clearly the notion of decision from the problematic of fit of statistical learning: the compositional paradigm does not require the notion of fit.

Consider for example a family of simple models \( M( \lambda ^{\nu} ,\alpha ^ {\mu} ) \), where \( \lambda ^{\nu}, \nu < \nu_0 \) are scalars, and \( \alpha^{\mu} ,\mu < \mu_0 \) are ‘features’.
Suppose that a person 𝔭 has at his disposal a model \( M \), and is led, in different contexts, to use this model. We can distinguish 3 heuristics
𝔭 chooses a \( \lambda \) for each \( \alpha^\mu \) : \( \lambda^{f_\mu } \) then choises among \( \lambda^{f_{\mu} } * \alpha^{\mu} \)
𝔭 chooses a \( \alpha^{\mu_0} \) then choises a \( \lambda^{\nu} * \alpha^{\mu_0} \)
𝔭 chooses a \( \lambda^{\nu}*\alpha^{\mu} \)
Which correspond respectively to the following three graphs  \( { a, b, c } \) :
\begin{array}{r c l}
\lambda & \rightarrow & \alpha \\
\alpha & \rightarrow & \lambda \\
\alpha & \simeq & \lambda \\
\end{array}
We can therefore consider that the three corresponding models are in Grph the category of graphs. In other words, we have 3 Objects \( { a, b, c } \) in Grph.
In Grph,  \( a \rightarrow b \) is a possible morphism: it is a kind of 'duality'.
There is, on the other hand, no morphism \( a \rightarrow c \) or \( b \rightarrow c \) in Grph.
Let us denote by \( C ^ 2 \) this category with three elements.
Note that if we worked in PrO, the category of the PreOrder, we would have this 'duality' only as a fonctor between \( C ^ 2 \) and \( C^{2op} \) :  \( a \rightarrow b \) is not a morphism in PrO.

Suppose now that 𝔭 gives itself more latitude in terms of choice of complexity, in the precise sense of the choice of \( (\nu_0, \mu_0) \): greater or lesser. ie if symbolizes this choice by an element \( \partial_ {xy} \) of the group \( \partial \) translations in \(Z^2 \): \( (x, y) * (\mu_0, \nu_0) \rightarrow (\mu_0 + x, \nu_0 + y) \), 𝔭 gives the possibility for example to increase the number of \( \lambda \) : \( \partial_{20} ( {\lambda_1, \lambda_2, \lambda_3}, \alpha^{\mu} ) = ( {\lambda_1, \lambda_2, \lambda_3, \lambda_4, \lambda_5}, \alpha^{\mu}) \).
But 𝔭 must above all order its triplet \( (\lambda, \alpha, \partial) \).
we can have
\begin{array}{r c l o t}
\partial & \leftarrow & \lambda & \leftarrow & \alpha, \quad equivalent \quad to \quad \partial \lambda \leftarrow \alpha \\
\lambda & \leftarrow & \partial & \leftarrow & \alpha , \quad equivalent \quad to \quad \lambda \leftarrow \partial \alpha \\
\partial & \leftarrow & \lambda & \leftarrow & \partial \leftarrow \alpha  , \quad equivalent \quad to \quad \partial \lambda \leftarrow \partial \alpha \\
\partial &\leftarrow & \alpha & \simeq &\lambda  , \quad equivalent to \quad \partial \leftarrow \lambda \alpha \\
\end{array}
We have thus defined a category \( C ^ 3 \) still in Grph

Each order corresponds to an 'environment symmetry', CF 'learning falacy' and SGII
\( \lambda \) dominance : scaling dominance
\( \alpha \) dominance : feature dominance
\( \partial \) dominance : complexity dominance
The general economics of decision theory is therefore obviously not the tradeoff bias variance or penalization, but the hierarchization of the different symmetries of the field studied.
We can represent the passage from one domain to another via a morphism within Grph.
In the spirit of "μεταφορά ", to put in relation these models can help to discover the good symmetries of our domain.

Thursday, 29 December 2016

CNN : deep symetries



We annotate "understanding deep convolutional networks", Mallat (Ma16)
Ma16 represents an essential generalization of the Mallat approach to pattern recognition over the past 10 years, based on the learning of a shallow invariance (2 levels) obtained by composition of elementary groups: translation, rotation, Deformations.
Ma16 proposes a link between CNN and semi-direct product of groups of symmetries.

‘The paper studies architectures as opposed to computational learning of network weights, which is an outstanding optimization issue’

def 1 (§5) : the layer \( j \) of a CNN represent signal \( x \) as \( x_j (u,k_j) \), where \( u\) is the translation variable, and  \( k_j \) is the channel index.
The linear operator \( W_j \) and the pointsize non-linearity \( \rho \) are linked by the defining relation :
$$ x_j=\rho W_j x_{j-1} $$

def 2 (§7) : \( f(x) \) is the class of \( x \). We suppose it exists \( f_j \) such that \( f_j (x_j ) = f(x) \), and
\( \forall(x,x' ),|| x_{j-1}-x'_{j-1} || \geq \epsilon  \quad if \quad f(x) \neq f(x') \)
to be compared with (in §2)
\( \forall(x,x' ),|| \Phi(x)-\Phi(x')|| \geq \epsilon \quad  if \quad f(x) \neq f(x') \)
i.e.,  \( x_j \) are features, playing the same role as \( \Phi(x) \)

def 3 (§3) symmetries : We look for invertible operators which preserve the value of \( f \). A global symmetry is an invertible and often non-linear operator \( g \) from \( \Omega \) to \( \Omega \) , such that \( f(g.x) = f(x) \) for all \( x \in \Omega \). If \(g_1\) and \(g_2\) are global symmetries then \(g_1 g_2 \) is also a global symmetry, so products define groups of symmetries. Global symmetries are usually hard to find. We shall first concentrate on local symmetries. We suppose that there is a metric \( |g|_G \) which measures the distance between \(g\in G\) and the identity. A function \(f\) is locally invariant to the action of \(G \) if
\( \forall x \in  \Omega , \exists C_x  > 0 ,\quad \forall g \in G \quad with \quad |g|_G  < C_x  , \quad f(g.x) = f(x) \)\)
ex : translation + difféo : \( g.x(u) = x(u - g(u)) \quad with \quad g \in C^1  (R^n  ) \).
other examples p14

def 1+2+3 : symetry \( \bar g \in G_{j-1} \) :
$$ f_{j-1} ( \bar g . x_{j-1} ) = f_{j-1} (x_{j-1} ) $$
\( \{ \bar g . x_{j-1} \}_{ \bar g \in G_j } \) is the orbit of \( x_{j-1} \).
parallel transport : Mallat makes the \( G_j \) operating via coordinates \(P_j\) :
$$ g. x_j (v) = x_j (g.v) $$
Suppose \( g \in G_j \) defined so that the folowing diagramme commutes
\begin{array}{cols} x_{j-1} & \rightarrow & \bar g x_{j-1} \\ \downarrow & & \downarrow \\ \rho W_j x_{j-1} & \rightarrow &  g.[ \rho W_j x_{j-1} ] = \rho W_j [\bar g . x_{j-1}] \end{array}

Then \( g.x_j = g.[\rho W_j x_{j-1}] = \rho W_j [\bar g . x_{j-1} ] \)
but \( ||\rho W_j x_{j-1} -\rho W_j \bar g. x_{j-1} || < \epsilon \) puisque \( f_{j-1} (x_{j-1} ) = f_{j-1}  ( \bar g .x_{j-1} ) \)
then \( || x_j-g.x_j || < \epsilon \)
so \( f_j (x_j )= f_j (g .x_j ) \)
we will see this forced commutation applied in prop 1
the same logic is used in « learning stable group invariant representations with conv net », Bruna, §3.3 :
 
\begin{array}{r c l}
 z^{n+1} (u,\lambda_1,\lambda_2 ) & = & (z^n (u,∙) \star \psi_{\lambda_2 })(\lambda_1)  \\

 &=& \int (z^n (u,\lambda_1 - \lambda'_1) \psi_{λ_2} (\lambda'_1) d \lambda'_1 \\

  g.z^{n+1} (u,\lambda_1,\lambda_2 ) &=& \int (g.z^n (u,\lambda_1 - \lambda'_1) \psi_{λ_2} (\lambda'_1) d \lambda'_1 \\

 &=& \int (z^n (f(g,u),\lambda_1 - \lambda'_1 + \eta (g) ) \psi_{λ_2} (\lambda'_1) d \lambda'_1 \\

& = & z^{n+1} (f(g,u),\lambda_1+ \eta(g),\lambda_2 ) \\

\end{array}

The new coordinates \( \lambda_2 \) are thus unaffected by the action of \( G \). As a consequence, this property enables a systematic procedure to generate invariance to groups of the form \( G = G_1 \rtimes G_2 \rtimes ...\rtimes G_s \), where \( H_1  \rtimes H_2 \) is the semidirect product of groups. In this decomposition, each factor \(G_i\) is associated with a range of convolutional layers, along the coordinates where the action of \( G_i \) is perceived.

to be compared with (in "convolution", Wikipedia) : Suppose that S is a linear operator acting on functions which commutes with translations : \( S( \tau_x f) = \tau_x (Sf) \quad \forall x\). Then S is given as convolution with a function (or distribution) \( g_S \); that is \(Sf = g_S \star f\). Thus any translation invariant operation can be represented as a convolution.
in our case, we want the operator \( \rho W_j \) to commute with \(G_j\) so that we write it as a convolution on \(G_j (u-v  \rightarrow g^{-1} v ) \)

Sifre (thesis) : ‘A major difference between the translation scattering and convolutional neural network as defined in (2.98) is that in (2.98), every output depth \( p_m \) is connected to every input depth \(  p_{m-1} \). On the contrary, a scattering path \( p_{m}=(\theta_1,j_1,…,θ_{m},j_{m})\)is connected to only one previous path, its ancestor \( p_{m-1}=(\theta_1,j_1,…,θ_{m-1},j_{m-1})\). This implies that the translation invariance is built independently for different path, which can lead to information loss, as we shall explain in Section 4.2.’

The essential result is proposition 1, which shows that ‘hierarchical embedding implies that each \( W_j \) is a convolution on \( G'_{j-1} \).
with 5 we get \( x_j = \rho W_j x_{j-1}\), soit si \( v \in P_j \)  :
$$ x_j (v) = \rho ( \sum_{v' \in P_{j-1} } x_{j-1} (v') w_{j,v} (v') ) $$
the idea is to parametrize \( v = \bar g. b, b\in P_j / G_j  , \quad  \bar g \in G_j \)  : we get a ‘paving’ or rather ‘fiber’ of \( P_j \) following orbites of \( G_j \), CF Figure 4.
if we postulates that \( G_j = G_{j-1} \rtimes H_j,\quad \bar g=(g,h) \), and the commutation 9 is, with \( h=e_{H_j } \)
\begin{array}{r c l}
g.x_j (v) &=& \rho ( \sum_{v' \in P_{j-1} } g.x_{j-1} (v') w_{j,b} (v') )  \\
  &=& \rho ( \sum_{v' \in P_{j-1} } x_{j-1} (g.v') w_{j,b} (v') )  \\
  &=& \rho ( \sum_{v' \in P_{j-1} } x_{j-1} (v') w_{j,b} (g^{-1} v') )  \\
  \end{array}
from \( g.x_{j-1} (v)=x_{j-1} (g.v) \), and after changing variable.
then Mallat writes : \( w_{j,\bar b} (v') = w_{j,(g,h).b} (v' )=h.w_{j,h.b} (g^{-1} v' ) \)
which seems erroneous (Mallat has in fact \( w_{j,(g,l).b} (v' ) \). should we read  \( h.w_{j,b} (g^{-1} v' ) \) ? for me \( w_{j,h.b} (g^{-1} v' ) \) is just an hypothesis
the esssential idea is as in 9. : decoupling levels \( j-1 \) et \( j \)
formely \( \lambda _1 \leftrightarrow g, \lambda _2 \leftrightarrow h.b \)
"The filters  \( w_{j,h.b} \)can be optimized so that variations of \( x_j (g,h,b) \) along \( h \) captures a large variance of \( x_{j-1} \) within each class. Indeed, this variance is then reduced by the next \( \rho W_{j+1} \). The generators of \( H_j \) can be interpreted as principal symmetry generators, by analogy with the principal directions of a PCA"

Wednesday, 28 December 2016

learning as categorification III


1. Lin & Tegmark (LT): "why does deep and cheap learning work so well? "
a. A categorical approach: p7 Fig 3, p12 Table I
b. Proposes a duality cheap / deep
i. Cheap: 'simple polynomials which are sparse, symmetric and / or low-order play a special role in physics' + II.D: low polynomial order, locality, symmetry
ii. Deep: 'One of the most striking features of the physical world is its hierarchical structure.'
c. Is interested in some papers centered on: RG + deep linear nn

2. Symptomatically, the examples given do not correspond to self-learning type DL, but to concatenation / composition of 'symmetries', in the sense of sparse, CF remark 3

3. We have a category sparce graph SpGr, and categories physics Phys and Image classification ImCl, and functors

PhysSpGr
ImClSpGr

4. Remark 1: the old Kant question: is this 'special role' subjective or objective?
Do we really discover low dimensional symmetries or do we discover what we can discover?
a. Fundamental Ockham / generalization bias: CF "against Vapnik"
b. Computational computational stress (CF "μεταφορά ", 2): the 'vicious' circle of learning:
New symmetry → more data → new symmetry → ...

5. Remark 2: How is symmetry learned?
a. Laborans: over time [not on a particular dataset]
b. Various fields bring to light large classes of symmetry: fundamental physics, algebraic geometry, biology (cf PPI), AI, information systems, cognition, engineering, ... CF" reading Building Machines that learn and think like people" (RB)

6. Remark 3: the heuristics-symmetries point of view:
a. To learn is to build a catalog à la Polya (CF RB) of good heuristics, that is to say good symmetries.
b. Distributivity [Bengio] ↔ sparsity [Bach] ↔ heuristic / symmetries
c. Deep is not a mysterious second / 'dual' dimension of learning: just another symmetry: recursivity / sequencial
d. There is an equivalence between sequencial learning and hierarchical learning, via a 'rotation' time ↔ space (depth)
e. In fine, the question is to see the notion of symmetry as much more general than these classical declensions (groups, CF "SGII") or "reductive" (distributivity / sparsity): the theory of categories seems an interesting attempt in this direction. See also heuristics towers in RB

μεταφορά III

1. Bias towards difference... where innovation mean rather learn through comparison / differentiation (on line learn classification cf Brown)

2. Remind μεταφορά :

Analogy ↔ functor

Now on this Cat does not help : analogy is your guide, but this operation is anything but automatic ...
[Reminder: initially the first stake of Cat consists in natural transformations ...]
CF remarks Spivak in (ProMat) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023911
"It is important to note that ologs can be constructed on modeling and simulation, experimental studies, or theoretical considerations that essentially result in the understanding necessary to formulate the olog. This has been done for the proteins considered here on the basis of the results from earlier work which provided sufficient information to arrive at the formulation of the problem as shown in Figure 3"

3. In ProMat Spivak emphasizes hierarchical and functional aspects.
(Functors) such as Cat ~ Sch (CF Spivak 5.4), or type PhysSpGr CF "learning as categorification III", or type of those in "learning as categorification", or word2vec (word → linear spaces) .

4. I nevertheless suspect that the most useful / deep functors are symmetries, in the sense that:
Symmetry ↔ structure

Structure taken in its mathematical meaning. These are rather few ...: linear spaces, groups ... the difficulty is to see one of these structures in the domain studied. Most of the time this is not obvious.


5. Enforcing comparison thus remains the objective:
a.ProMat
b. Http://web.mit.edu/mbuehler/www/papers/BioNanoScience_2011_3.pdf
c. "The term" log "(like a scientist's log book) alludes to the fact that such a study is never really complete, and that a study is only as valuable as it is connected in the network of human understanding. In this paper, we present the results of this study. » Spivak
Http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0023911

But how to do it precisely is anything but obvious ...

6. A paradigm that emerges greatly pushes the traditional boundaries of AI:
DeepMind: "I would like to see a science where an AI would be a research assistant doing all the tedious work of finding interesting articles, identifying a structure in a vast amount of data to get back to the human experts and scientists who could move faster "
This paradigm fits precisely in the vision of Spivak in 5.c. Note the presence of the word 'structure'. My best guess would be to take the term in its strong sense: mathematical (and not the sense in which DeepMind may hear it, its 'statistical' meaning: pattern)


7. In an inescapable race for abstraction, CF 4 in "learning categorification III", it can be seen that the AI, after having long been engaged in recognizing material forms, might seek to (learn) Abstract forms: structures / categories. It is thus necessary to understand the evolution which leads from the linear regression (linear spaces), to the trees, then to the NN, then to the DNN, then to combinations of DNN (CF "SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS ", Graves)

Natural language / neuro economics

1. Universal economic constraint
a. Learning: a fable
i. Suppose an intelligence confronted with two vital tasks
1. learn
2. Decide in uncertainty
ii. It is subjected to a constraint of 'finitude': to produce models has a cost of complexity
iii. i + ii brings a fundamental compromise: simplicity / generalization
b. The solution to 1.a.iii seems to have been (in our world) universally algorithmical
i. algorithm
ii. Mathematical: (logic of) categories, CF μεταφορά
iii. Natural Language (NL)
c. Here 'universal' undoubtedly has an economic logic: tradeoff tractability-expressiveness, remix of simplicity-generalization: CF SGI, II


2. NL between philosophy and AI
One finds surprisingly little trace of the economic problem that the natural languages ​​(NL) have to solve, where one would expect to find the concept as a preamble to any philosophy of language.
The exit out of the essentialist conception of language took time ... we must wait Frege and Wittgenstein to leave it, CF Bouveresse.
Wittgenstein talks about language games both for NL and math. The notion of rule (convention) exhausts their 'philosophy', 'end point' seems to say W.
The first order logic (FOL) is 'combining the best of formal and NL', dixit AI Russell Norvig: gratuitous and fortunately false claim, CF 'paradox of learning' or μεταφορά, but the applications of FOL are numerous, starting with our 'FOL for mining news'.

3. NL in cognition and evolution theory
a. The language represents a remarkable solution to 1.c: a machine to create models, flexible and constrained
b. The rules of use are limits to linguistic computation
c. They allow compromise in the saying 'anything': between everything and nonsense
d. CF :'Linguistic structure is an evolutionary trade-off between simplicity and expressivity' Kirby, and 'The origins of syntax in visually grounded robotic agents', Steel
e. Note that expressivity ≠ creativity (= inference calculus): Kirby's paper only deals with the first, not the second. Now the discovery of rules, more precisely structures, is what matters to us in L2L: CF SGII, μεταφορά

4. NL ~ Cat?
The equivalence NL ~ FOL is obviously false, the language makes it easy to speak of sets of sets and so on. So we would rather try the guess NL ~ Cat, especially with regard to 'creativity'
(RDF in Cat, in "Category for the sciences", Spivak, 6.2.2)

5. Neuro ~ Cat?
Beyond language, one can even wonder if Cat could not constitute a paradigm for the cognition in neurosciences

μεταφορά and ἀναλογία

1. Is maths magic?
Is it right to draw inspiration from it, such as Jules Vuillemin wishing to transpose them to philosophy ? Or does mathematics / physics mimic a universal form of learning, whose language already offers the model, CF NL economics?
The modern mathematical point of view, that of structures (including functors and categories, CF 'on categories'), marks the triumph of algebra (CF algebraic geometry), i.e. rules / calculation algorithms. The constraints represented by these algorithms, the symmetries they encode, seem to represent a good compromise between tractability and demonstration power.
As already suggested in Symmetry Generalization I, II, in good science, tractability dominates the question of expressivity: tool conditions the explorable

2. the right point of view
As already discussed, CF 'against Vapnik' 13, learning is (art) to find the right point of view.
In a way, the good overview trivialises the field studied: the groups trivialize the resolution of the algebraic equations, the Game Theory trivialises most of the 'economic' problems, CF 'No equilibrium theorem'.
As if, by economic fact, calculation could not move us away from where we start, or that the point of arrival can hardly be more 'distant' than a 'rotation' of the point of departure. To be compared with the local / global exploration dilemma in optimization.
We find this trait throughout "récoltes et semailles" (RS), it is a well-known trademark of Grothendieck (RS p 669, Illusie https://lejournal.cnrs.fr/billets/grothendieck-and-dynamics-impressive, http : //www.cnrs.fr/insmi/IMG/pdf/Alexandre-Grothendieck.pdf)
In the Category paradigm, the good point of view is that of comparison and morphisms

3. Poincaré and the analogy
« Les faits mathématiques dignes d’être étudiés, ce sont ceux qui, par leur analogie avec d’autres faits, sont susceptibles de nous conduire à la connaissance d’une loi mathématique de la même façon que les faits expérimentaux nous conduisent à la connaissance physique. Ce sont ceux qui nous révèlent des parentés insoupçonnées entre d’autres faits, connus depuis longtemps, mais qu’on croyait à tort étrangers les uns aux autres ».
CF : « L’analogie algébrique au fondement de l’analysis situs », Herreman, in ‘L'analogie dans la démarche scientifique : Perspective historique’

4. The learning engine (learning to learn L2L) is therefore the comparison
Μεταφορά: trans-port
Ἀναλογία: (according to) ratio (ratio): proportion

5. 'Comparison' and 'Analogy' are fundamental aspects of knowledge acquisition, in
'Category: An abstract setting for analogy and comparison', Brown & Porter

6. ex 1: the concept of 'symmetry', CF SGII, must be understood first in the sense of group symmetry, ie of group morphism: a symmetry transports (rotates) a solid, or connects 2 positions of this solid , i.e. compares them

7. ex 2: homotopy / homology: from space to group
Here it is the comparison initiated by Poincaré / Betti between the topological spaces and the groups

8. ex 3: Galois theory: from (algebraic equations') fields to group
The Galois theory compares (in its Dedekindian version) (algebraic equations, more specifically) extension of fields and groups

9. The main point is perhaps less to be surprised to discover groups in topological spaces or algebraic extensions than to reaffirm Klein's point of view (and his Erlangen program) that one learns when one connects objects: when one studies its symmetries, or more generally its morphisms

10. enforcing comparison: this is the spirit of category theory

Friday, 23 December 2016

symmetry-generalization II

1. In 'Learning deep architectures for AI', Bengio revolves around the notion of symmetry without ever uttering the word!
Geometry appears 7 times, manifold 14 times
Generalization 80 times
Bengio insists heavily on the limitations of 'local' approaches: 100 occurrences, and opposes 'distributed representation', 51 oc.
2. The Bach team at Cachan on Vision spent time on looking for good priors. In hindsight, it passed by the deep CNN
3. The ecological rationality of Gigerenzer masks the environments symmetries
4. Mallat and al. have sought to join sequential learning and 'groups': translation, rotation, weak deformations (diffeomorphisms), CF CNN : deep symmetries
5. Many groups are Lie groups: manifolds
6. Https://en.wikipedia.org/wiki/Symmetry_(physics)
7. Let S be a system endowed with certain articulations or degrees of freedom
Any transformation T whose result is known on S:
T * S = S
provides strong constraints on (a model of) S
When T is a group, T * S is called a group operation
8. For example, since the Hamiltonian H is spherically symmetric for the system of the electron around the hydrogen nucleus, H commutes with the 3 components of the angular momentum J on the proper space E of H, so that SO ( 3) [group of 3D space rotations] on the ket 𝜓 solution of the Schrödinger equation:
 R(𝜃) * 𝜓 (x) = 𝜓(x) * R(𝜃)
We obtain (more easily) the solution 𝜓= Ylm(𝜃, 𝜙) fn(r), l = 0,1, ..., n- 1 and  m = -1, ..., l \)

9. An example of non-spatial / temporal symmetry in physics:
 a. Isospin (associated group: SU (2))
 b. Gauge symmetry: the local invariance constraint of the Dirac action implies the existence of the EM field and its interaction with the charged particles
10. On the other hand, the statistical instability of a relation y ~ x can be seen as an unknown transformation law:
given a set D on which it seems that y = 𝛃x
y and x the returns to t and t-1 of two instruments
When a new set D' is presented, we find y = 𝛃' x
Our law is therefore not invariant
In reality, we lack a dimension, or variable z, which would allow us to see that
 y = e (z) x
The line y ~ x rotates along z
In other words, the transformation of z corresponds to a transformation of the ratio y / x:
 z → z' 
 y/x → y'/x
Where does z come from ?
In the example of Stoikov, there is no coupling between activities at bid and ask, simply because the world is reduced to one market maker, not to a market maker + insider system as in Kyle85
11. The symmetries impose strong constraints and considerably reduce the field of the possible
CF for physics Zee 'fearful symmetry' (eg p209)