Friday, 1 September 2017
croissance conceptuelle
1. La plupart des problèmes humains sont bien moins formalisés que ne l'est le Pattern Matching [PaMa], un des prototypes de l'IA.
Dans le PaMa, de nombreux exemples sont disponibles pour lesquels la classification 0/1 est disponible, une "exubérance" peu réaliste dans le quotidien humain réel.
On peut même arguer que disposer de la classification c'est avoir déjà résolu le "problème".
2. Prenons l'exemple de l'enfant qui se donne un "concept" encodé via l'opposition chien ⊦ poupée : cette pré-conceptualisation vise par exemple le concept animé ⊦ inanimé, mais l'enfant tâtonne et il lui faudra acquérir de nombreux exemples, - voire même de nouvelles connaissances - pour faire évoluer son ébauche conceptuelle.
Autre exemple, passer du pré-concept assureur ⊦ assuré à celui d'asymétrie de l'information, suppose de franchir un pas que la plupart des gens ne feront pas spontanément. Stiglitz a obtenu le prix Nobel notamment pour son travail sur le sujet.
En math, Galois généralise ses 'manipulations' sur les racines d'un polynomes et les éléments d'un ensemble de cardinal fini en la notion de groupe.
La catégorie des topos généralise la notion de sous-ensemble (http://math.ucr. edu/home/baez/topos.html).
La notion de 'symétrie' / 'structures' se généralise en la théorie des catégories.
3. Précisément, l'humain doit - pour faire avancer son "problème", i.e. sa pré-conceptualisation - collecter des exemples ou chercher des "professeurs".
Apprendre à apprendre, c'est apprendre à collecter (seul) des exemples ou apprendre à collecter des 'tutors'. Dans la plupart des cas bien-sûr l'humain recours aux deux heuristiques.
En Machine Learning on a bien ainsi la notion d'apprentissage supervisé qui est au cœur du PaMa.
4.1 Or cette démarche est coûteuse, et exposée à l'erreur : un exemple ne sera sûrement pas en général de type 0/1, pour la bonne raison que l'apprenant ne dispose pas encore du concept terminal qui lui permettrait de classer sans ambigüité un quelconque exemple.
Le "bruit" est peut-être la principale difficulté de l'exercice.
Prenons le cas du mauvais "professeur" ou tutor:
a. il peut être focus sur des points de détails, c'est " l'arbre qui cache la forêt" : il manque d'esprit d'abstraction
b. il est peu rigoureux, ou à côté du sujet, ou ne domine pas son sujet : manque de pertinence
c. c'est un épigone, résonnant plutôt que raisonnant : manque d'originalité ou de créativité
Selon wikipedia "concept learning", "the classical views of concepts and concept learning in philosophy speak of a process of abstraction, data compression, simplification, and summarization". En réalité les quatre items se ressemblent beaucoup. Etonnament, pertinence et créativité ne sont pas mentionnées.
4.2 Pour le cas de collecter des exemples, prenons la pratique de résolution d'un exercice / problème de maths. Polya est l'auteur d'une compilation d'heuristiques bien connue, dont Terence Tao s'est inspiré encore très jeune pour briller aux Olympiades internationales de maths. Polya insiste essentiellement sur les notions d'analogie et de transformation progressive des données du problème.
On peut encore penser à l'heuristique analogs / antilogs de Mullins et Komisar ('Getting to Plan B'). Analogs et antilogs sont autant d'exemples où l'apprenti entrepreneur esquisse à tâtons le concept de sa future entreprise. Rapprocher et différencier (/opposer) (RD) une base d'exemple permet de progresser dans la 'formulation' (un proxy de résolution) du problème.
On peut arguer que la capacité à générer automatiquement des exemples de qualité est à la source des récents succès de l'AI en apprentissage de jeu : Backgammon, Go.
5. En pratique, la collecte de bons "professeurs" par l'apprenant doit presque tout aux différents graphes sociaux ou institutionnels que les humains bâtissent "spontanément". Les publications scientifiques en constituent le parangon : un article possède un contenu de qualité "minimale" (pair review, higher education), et donne des références pointant vers d'autres auteurs qui sont autant de "professeurs" potentiels pour l'apprenant. Naturellement ces références ne doivent rien au hasard, et tout au travail de l'auteur de l'article, qui a fait un tri soigneux, tant en terme de contenu ('relevance') qu'en terme de qualité.
Lorsque l'apprenant tient une "bon" auteur, il a toutes les chances de trouver de nouvelles pépites dans les "relations" de cet auteur.
Bien entendu, la qualité est partiellement subjective : un "bon" auteur est aussi un bon "traducteur" ou "passeur" pour l'apprenant : il sait se rendre compréhensible à l'apprenant, ce qui dépend du niveau de connaissance de ce dernier. Vikipédia sera plus adapté aux enfants que Wikipédia.
Revenant au cas de la résolution de problèmes de math, et au-delà la recherche en math, se donner de "bons" exemples est en fait la marque des grands découvreurs. Un bon exemple est en effet ce qui permet de remonter vers le bon point de vue, souvent la bonne généralisation conceptuelle.
De ces remarques découlent que le véritable algorithme de résolution de problèmes humains réels est un .. human made graph. C'est bien ce que l’on constate sur internet, où émergent chaque jour de nouveaux graphes spécialisés, comme par exemple dans le domaine médical, le développement informatique, etc.
Bien souvent ces graphes sont cependant très "bruités", au sens précisé ci-dessus. Les auteurs sont en effet insuffisamment identifiés, de sorte que l'apprenant nouveau venu aura bien du mal à séparer l'ivraie du bon grain.
On retombe naturellement sur les problématiques de recommandation qui ont eu le vent en poupe ces dernières années, où l’on distingue aisément deux approches : collaborative ou content-based. Mais encore une fois, l’algorithme est le graphe lui-même, sa qualité borne celle de toute search algo afférant.
6. L’approche par auteur au sens de 5., ou 'collaborative' dans le cas des réseaux sociaux, est infiniment plus simple que l’approche par contenu : le nom d'un auteur encode des contenus bien plus simplement que la description des contenus. On a bien une idée de ce que représente Heidegger, il est bien plus difficile de décrire ce dont Heidegger parle. Plutôt que de chercher si un auteur traite de certaines idées ‘heideggériennes’ (aussi bien partiellement aristotéliciennes, parménidiennes, kantiennes,…) , il est beaucoup plus simple de vérifier qu’il le cite.
En philosophie comme dans la plupart des domaines de connaissance, un encodage minimale performant est le contraste x ⊦ y . Dans notre notation, on écrira x ⊦ y ~ x’ ⊦ y’ pour décrire une équivalence de classe (ou morphisme), et x ⊦ y ⇒ z pour un foncteur conceptuel, au sens où z est une traduction conceptuelle de x ⊦ y .
Exemple 1: chat ⊦ lion ~ chien ⊦ loup ⇒ domestiqué ⊦ sauvage.
Exemple 2: étant donné 2 espaces topologiques homéomorphes x →y, leurs groupes fondamentales sont isomorphes, ici z est l'homotopie. la 'pré-conception' x →y (qui bien-sur est rigoureuse dans le cas présent), se conceptualise plus simplement via la traduction en terme de groupe.
L'apprenant qui encode son pré-concepte par le morphisme chat ⊦ lion ~ chien ⊦ loup, espère tomber, via une interrogation digitale, sur un 'tutor' qui l'aide à aller au niveau conceptuel supérieur. Ce niveau supérieur, que par définition il ignore au moment de sa recherche, et qu'il ne sait représenter que via un morphisme, est domestiqué ⊦ sauvage. Dans le cas de Galois :
opération sur racines ~ opération sur listes ⇒ Group.
De manière générale, l'apprenant encode son apprentissage sous forme de graphe, dans l'esprit de notre 'Conceptual Representation' ou des graphes conceptuelles (CF John Sowa). C'est un encodage 'haut niveau'. Il faut une traduction de cet encodage qui permette l'interrogation digitale. Un telle transformation doit accommoder la versatilité essentielle du langage humain, et consiste essentiellement en du PaMa.
MyGrowingCG ⇝s ⇒ PaMa (⇝s : search engine)
CG : conceptual Graph
En résumé, l'apprenant fait croître simultanément son graphe conceptuel et son 'tutor' graphe ou graphe de référence.
MyGrowingCG ⇌ MyGrowingRG
RG : Referential graph
7. le PaMa fixe la data (eg N documents) et la représentation (word distribution, sentiment, Structure Mapping Engine (SME), ...)
Au contraire la croissance conceptuelle CC ne fixe pas la data, qui peut couvrir tout internet par exemple, ni la représentation, ni le concept, qui est découvert itérativement. Elle reste à la discrétion de l'apprenant pour une large part.
A chaque itération t un doc D(t) donne (l'apprenant choisit) un contenu encodé x(t) ou un auteur X(t); x(t) permet de lancer une nouvelle recherche débouchant sur un nouvel auteur X(t+1).
Notons * à coté de x ou X pour signifier un intérêt, une valorisation de l'apprenant.
On peut distinguer plusieurs cas : en effet soit x(t)* est associé à un auteur qui devient du coup X(t+1), soit X(t)* cite X(t+1), soit x(t)* mais sans qu'un auteur n'y soit attaché, auquel cas il faut faire un (digital) Search :
t : D(t) : X(t), x(t)
X(t)* ⇝ X(t+1)
x(t)*, X ⇝ X(t+1)=X
x(t)* ⇝s X(t+1)
Une fois X(t+1) obtenu, l'apprenant choisit un x(t+1) dans un D(t) dont il est l'auteur.
L'apprenant n'est pas tenu de rester sur un pur affinage de son pré-concept initial. En effet chemin faisant il peut découvrir un autre (pré-)concept y* qui de par sa valeur l'incite à suivre ce fil conceptuel, qui peut aussi bien le conduire à un X* de grand intérêt, qui l'aide à accélérer en retour son 'éclaircissement' (alètheia) de x : les chemins de la découverte ne sont pas linéaires, et le "coup d'oeil" de l'apprenant y joue un rôle déterminant.
Autrement dit c'est autant la valeur de z(t)∈{x(t),X(t)} que sa détermination x ou X qui importe, dans la mesure où la qualité appelle la qualité : z(t)* ⇝ z(t+1)*
Il ne s'agit pas purement d'optimisme dans l'incertain comme pour les bandits manchots : la quality greediness n'est pas une simple curiosity greediness.
Saturday, 25 March 2017
A conceptual representation
we introduce here a multi-level conceptual representation, based on two morphisms : 'is linked to' and 'opposes' :
a. \( \rightarrow \) ,\( \Rightarrow \), \( \Rrightarrow \), \( \rightarrow_4 \)... : 'is linked to', level (n+)1, (n+)2, (n+)3,...
b. \( \vdash \),\( \Vdash \),\( \Vvdash \),\( \vdash_4 \)... : 'opposes', level (n+)1, (n+)2, (n+)3,...
[optional :
c. \(\circ \),\(\circ_2\),...: 'equivalence' or 'opposition', level (n+)1, (n+)2,...
d. \(. \) : under category
e. \( \leadsto\), \( \leadsto_2 \), : to, level (n+)1, (n+)2,...]
Let us give an example of implementation of this representation, in a philosophical context. (we are not trying to 'demonstrate' anything). Let Mr. X has these initial representations:
1. "Science is interested only in the reproducible : \( science \rightarrow reproductible \)
2. Moreover, science often appears as a model of truth : \(science \rightarrow truth \) (for example to people like Jacques Bouveresse)
3. The reproducible thus ends up appearing as an essential criterion of truth: \(reproducible \rightarrow truth \)
In reality the search for truth is not even a goal of science, which swears only by the reproducible. And to bring science and truth closer together is very reductive (for the concept of truth). It can be said that the metaphysical trace of science holds in the de-cision of the symmetry (i.e. its breaking):
$$ truth \rightarrow reproducible \Vdash truth \vdash reproducible "$$
Suppose now that X reads Heiddeger, in particular this passage from 'Introduction to metaphysik' [translation Fried and Polt]:
"For it cannot be decided so readily whether logic and its fundamental rules can provide any measure for the question about beings as such. It could be the other way around, that the whole logic that we know and that we treat like a gift from heaven is grounded in a very definite answer to the question about beings, and that consequently any thinking that simply follows the laws of thought of established logic is intrinsically incapable of even beginning to understand the question about beings, much less of actually unfolding it and leading it toward an answer. In truth, it is only an illusion of rigor and scientificity when one appeals to the principle of contradiction, and to logic in general, in order to prove that all thinking and talk about Nothing is contradictory and therefore senseless."
If X has the following representations on Heidegger:
\(truth \rightarrow question \, about \, beings \)
\(Logik \rightarrow science \)
\(science \vdash philosophy / poetry \)
and link his concept of reproducibility to Heidegger's Gestell (essence of technique / science)
\( Heidegger.science/Gestell \rightarrow X.reproducible \)
And if X retains his view that \(science \rightarrow reproducible \), it seems possible that he enriches his representation with:
\begin{array}{r c l}
truth \rightarrow reproducible & \Vdash & truth \vdash reproducible \\
& \Rrightarrow & \\
science & \Vdash & philosophy / poetry
\end{array}
The relative link between Heidegger concept of philosophy / poetry and X's concept of reproducibility is not necessarily a view of Heidegger, so that this link is a creative -and subjective-one (which may be ultimately true or not [or 'interesting' or not]). (We may note that if poetry is talking about Nothing (last sentence of the text above), it might appear plausible that poetry is not X.reproducible...)
Let us suppose finally that X reads Stuart Kauffman, in particular 'Investigations' and 'Humanity in a creative world'.
If X admits that his concept of \( reproducible \vdash noreproducible \) (as part of the subjective 'ecosystem' of X's thought) is close to Kauffman's \( ergodic \vdash noergodic \) $$reproducible \rightarrow ergodic$$ and if X represents core Kauffman's idea as : \( noergodic \vdash ergodic \Rightarrow physics \vdash biology / complex \)
then X can further enriched his concept of reproducibility:
\begin{array}{r c l}
noergodic \vdash ergodic & \Rightarrow & physics \vdash biology / complex \\
& \Rrightarrow & \\
noreproducible \vdash reproducible & \Rightarrow & physics \vdash biology / complex \\
\end{array}
So that X's concept of reproducibility allows X to create a link between his 'Heidegger's (thought) ecosystem' and his 'Kauffman's (thought) ecosystem' :
\begin{array}{r c l}
Heidegger & \Rrightarrow & Kauffman \\
& \rightarrow_4 & \\
norepro \vdash repro \Rightarrow philo \vdash science & \Rrightarrow & noergodic \vdash ergodic \Rightarrow bio/ complex \vdash physics
\end{array}
Obviously all X morphisms presented above are debatable. But the purpose of this example is to sketch a plausible (necessarily subjective) learning experience. The question is less to ask the objective reality of \( P = [Heidegger.science/Gestell \rightarrow X.reproducible] \) or \( P' = [reproducible \rightarrow ergodic] \) than to help X to be creative. Specifically, could it be possible to (automatically) suggest \( P \) to X ? \( P \) contribute to make X build a bridge between (his) Heidegger and (his) Kauffman through is repro concept : could this drive an Assistant algorithm to suggest \( P \) (and \( P' \) and so on) to X ?
Its not too much difficult to trace the path
\( Bergson \leadsto Whitehead \leadsto Wittgenstein \leadsto Kauffman \)
as Kauffman give (verly light) mention of Whitehead, and then if you google \( Whitehead \circ Kauffman \) you get
\( Whitehead \circ Kauffman \leadsto Shaviro \)
and Prof. Shaviro site give typically more plausible '\( \leadsto \)' genealogy as above (Bergson...)
But maybe X would have more interest in a (hidden) \( Heidegger \leadsto Kauffman \) story ?
Wednesday, 22 March 2017
Learning Fallacy II
In 'Learning Fallacy' LF we began the reflexion on the model / data duality :
a. what kind of data are relevant for my problem - i.e. am I not typically biased towards over-reducing/localising the problem ?
b. how much specific is my model - that is am I not typically biased towards overfiting, i.e. under-symmetrising my model ?
We call the corresponding heuristic :
a. Data Expansion
b. Large Symmetry
The data expansion heuristic is not a more-data-is-better tale. Here we are talking of perspective on 'reality' [recall that your reality is a work-in-progress...] : what kind of implicit (over-simplifying) hypothesis am I doing by leaving seemingly not relevant data ?
Ex a : We already mentioned the RFIM hypothesis on Finance, which precisely is considered to be relevant more broadly for social domain.
The transdisciplinary (or not) paradigm is one manifestation of the Data Expansion problem : a typical example is given in 'Natural language / neuro economics II'.
Ex b : recall the paradoxical behavior of Markowitz in FL : the Portfolio theoretician adopts a fully symmetrised approach for his skin-in-the-game private financial strategy...
'Learning as categorification IV' propose simple examples where symmetries are explicitly declared the essential part of the problem.
a. what kind of data are relevant for my problem - i.e. am I not typically biased towards over-reducing/localising the problem ?
b. how much specific is my model - that is am I not typically biased towards overfiting, i.e. under-symmetrising my model ?
We call the corresponding heuristic :
a. Data Expansion
b. Large Symmetry
The data expansion heuristic is not a more-data-is-better tale. Here we are talking of perspective on 'reality' [recall that your reality is a work-in-progress...] : what kind of implicit (over-simplifying) hypothesis am I doing by leaving seemingly not relevant data ?
Ex a : We already mentioned the RFIM hypothesis on Finance, which precisely is considered to be relevant more broadly for social domain.
The transdisciplinary (or not) paradigm is one manifestation of the Data Expansion problem : a typical example is given in 'Natural language / neuro economics II'.
Ex b : recall the paradoxical behavior of Markowitz in FL : the Portfolio theoretician adopts a fully symmetrised approach for his skin-in-the-game private financial strategy...
'Learning as categorification IV' propose simple examples where symmetries are explicitly declared the essential part of the problem.
Saturday, 11 March 2017
Natural language / neuro economics II
we had a first round over NL in the post NL / neuro economics [NLNE]. Curiously, at that time our (rapid) web foraging missed the important "faculty of language" Hauser et al. 2002 paper [FL]. We will go further in the subject with the 2016 Hauser Watumull 'UGF' and 2014 'On recusion' papers.
FL is interesting for us for two reasons :
a. it links to Learning Fallacy II [LF]
b. it links with NLNE and A Conceptual Representation CR
FL has a clear comparative approach, and its semantics fields builds on the Data Expandion DE Large Symmetry LS tradeoff of LF : 'compar' : 39, 'analog' : 10, 'homolog' 12, specif : 18, uniq : 30.
$$FL \rightarrow Data \,Expansion / Large \, Symmetry$$
In detail :
a. "We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language".
$$FL.animals \rightarrow DE $$"Although scholars interested in language evolution have often ignored comparative data altogether or focused narrowly on data from nonhuman primates, current thinking in neuroscience, molecular biology, and developmental biology indicates that many aspects of neural and developmental function are highly conserved, encouraging the extension of the comparative method to all vertebrates (and perhaps beyond) ."
"Although this line of reasoning may appear obvious, it is surprisingly common for a trait to be held up as uniquely human before any appropriate comparative data are available."
b. "We further argue that FLN may have evolved for reasons other than language, hence comparative studies might look for evidence of such computations outside of the domain of communication (for example, number, navigation, and social relations)."
"We consider the possibility that certain specific aspects of the faculty of language are “spandrels”—by-products of preexisting constraints rather than end products of a history of natural selection (39). This possibility, which opens the door to other empirical lines of inquiry, is perfectly compatible with our firm support of the adaptationist program. Indeed, it follows directly from the foundational notion that adaptation is an “onerous concept” to be invoked only when alternative explanations fail."$$FL.(bio)functions \rightarrow LS$$ in more detail:
$$communication, number, navigation, social \, relations \rightarrow FLN \\ \Rightarrow universal \, constraint \,(computational \, and \, biological) \\ \Rrightarrow LS $$ (it should be read recursively : \( [[x \rightarrow y] \Rightarrow z] \Rrightarrow u \)
This reasoning line dwells on the computational perspective of our NLNE: the problem of NL is a computational one, then a simplicity / creativity one, so essentially - as much as we learnt from Nature eons trial and error 'foraging', recursion is 'nearly optimal'. FL makes clear reference to the Minimalist program : "Recent work on FLN (4, 41–43) suggests the possibility that at least the narrow-syntactic component satisfies conditions of highly efficient computation to an extent previously unsuspected."
$$FL \rightarrow recursion$$
As already mentioned in NLNE, First Order Logic expressiveness is not enough, your need more 'deepness'. The 2-Cat concept in category theory is not proper neither for a conceptual representation, and CR is a tentative trial to a general Creative Assistant.
Friday, 17 February 2017
learning as categorification V
1. Much of what we have in mind – inside I mean, mind as a [private] room… - is what could be called unstructured data. Some people say: ‘mind is full of scorpions’… but more generally, memory is a fantastic recipient, a grimoire full of … mysteries. Γνῶθι σεαυτόν might be read : clean your mansionry. And getting older, it can only go from bad to worse: mind is a great hoarder. How tidying it up ?
2. we should be here bold and resolute, as any (young) theorician : build our own theory, that is classification
$$ (natsci / philo ... )theorician\quad \rightarrow \quad (private \quad real \quad life) \quad learner $$
classify means : you have an \( x \), an \( y \), and (boldly) draw
$$ x \rightarrow y $$
3. Easy ? This is a negative, most of the time…
\( y \) is not of the same nature as \( x \): \( y \) is a ‘symmetry’, \( x \) is some real life ‘object’ : anything from minimal experience to macro ontos.
\( x \) is well-known-from-me, something-I-m-not-indifferent-to : a personal experience, something-I-often-play-with. Even conceptual, \( x \) is concrete-to-me.
\( y \) is eventually some abstract concept, more symmetrical… somehow more simple – balancing this with abstraction - than any \( x \), so that the compression \( x \rightarrow y \) could be significant : our purpose is to make…room.
4. As pointed out by Brown and Porter, the all point is to move from a ‘concrete’ (to the learner at least) collection of \( x_{\omega} \) to some more abstract \( y \)
One path to do that is to find some explicit relations between these \( x_{\omega} \)
The classic example of ‘cardinality’, our \( y \), as in https://golem.ph.utexas.edu/category/2008/10/what_is_categorification.html, make explicit the isomorphism between any finite set. Somehow, any child did that at some point…
Abstracting the detailed structure of these \( x_{\omega} \) unveils the symmetry \( y \).
5. Paradoxically, (student) learning is a long trip to forget (art of explicit building of) isomorphisms and keeping only \( y \) in mind. But real (life) learning tends to be rather a never ending explicit building of morphisms as new \( x_{\omega} \) flow in your mind…
That is : you should enforce your learning to learn ability, that is the art to design new \( y \) and new → between (old and new) \( x_{\omega} \) and (old and new) \( y \)
Hopefully, the more you do it, the more you are good at it… Note that be good at doing something, having learned it, is not as be good at learning anything !...
2. we should be here bold and resolute, as any (young) theorician : build our own theory, that is classification
$$ (natsci / philo ... )theorician\quad \rightarrow \quad (private \quad real \quad life) \quad learner $$
classify means : you have an \( x \), an \( y \), and (boldly) draw
$$ x \rightarrow y $$
3. Easy ? This is a negative, most of the time…
\( y \) is not of the same nature as \( x \): \( y \) is a ‘symmetry’, \( x \) is some real life ‘object’ : anything from minimal experience to macro ontos.
\( x \) is well-known-from-me, something-I-m-not-indifferent-to : a personal experience, something-I-often-play-with. Even conceptual, \( x \) is concrete-to-me.
\( y \) is eventually some abstract concept, more symmetrical… somehow more simple – balancing this with abstraction - than any \( x \), so that the compression \( x \rightarrow y \) could be significant : our purpose is to make…room.
4. As pointed out by Brown and Porter, the all point is to move from a ‘concrete’ (to the learner at least) collection of \( x_{\omega} \) to some more abstract \( y \)
One path to do that is to find some explicit relations between these \( x_{\omega} \)
The classic example of ‘cardinality’, our \( y \), as in https://golem.ph.utexas.edu/category/2008/10/what_is_categorification.html, make explicit the isomorphism between any finite set. Somehow, any child did that at some point…
Abstracting the detailed structure of these \( x_{\omega} \) unveils the symmetry \( y \).
5. Paradoxically, (student) learning is a long trip to forget (art of explicit building of) isomorphisms and keeping only \( y \) in mind. But real (life) learning tends to be rather a never ending explicit building of morphisms as new \( x_{\omega} \) flow in your mind…
That is : you should enforce your learning to learn ability, that is the art to design new \( y \) and new → between (old and new) \( x_{\omega} \) and (old and new) \( y \)
Hopefully, the more you do it, the more you are good at it… Note that be good at doing something, having learned it, is not as be good at learning anything !...
Wednesday, 25 January 2017
Lean management categorification
A. \( LM \rightarrow \quad learning \)
It is surprising and very comforting to note that the very popular 'lean startup' by Eric Ries or the 'getting to plan B' by Mullins & Komisar (KM) demonstrates an obvious relationship between learning to learn (L2L) and lean management (LM)
$$ LM \rightarrow \quad learning $$
Maybe because entrepreneurs have essentially skin in the game , as should have all learner / modeler.
At the heart of lean management, there is the idea that the company is essentially a place of learning; let's count words:
Learn: 292
Know: 123
Problem: 177
Solv: 38
And precisely, a scientific process:
Hypotheses: 67
Assumption: 73
Theor: 31
Test: 206
Feedback: 86
Valid: 74
Experiment: 171
System: 139
Scien: 47
Fail: 132 (yes, failure is included in the scientific learning package!)
LM is the place of a surprising symmetry:
\begin{array}{r c l}
\ product & \quad & startup \\
\uparrow \quad \downarrow \quad & \rightarrow & \uparrow \quad \downarrow \quad \\
learning & \quad & customer \\
\end{array}
The traditional causality is: I learn to produce; LM puts forward : I produce to learn!
Fordism approach is in (product) push mode: Ford produces, the customer buys
LM is in (informational) pull mode: the startup learns from its customer
"The learning about how to build a sustainable business is the outcome of those experiments. For startups, that information is much more important than dollars, awards, or mentions in the press, because it can influence and reshape the next set of ideas. "
"For startups, the role of strategy is to help figure out the right questions to ask"
In detail, the functor LM → Learning is:
\begin{array}{r c l}
\ product & \quad & model \\
\uparrow \quad \downarrow \quad & \rightarrow & \uparrow \quad \downarrow \quad \\
customer & \quad & data \\
\end{array}
More precisely, everything is dynamic: at all times \( t \) LM seeks a minimum viable product, hence the correspondence
\begin{array}{r c l}
\ MVP_t & \rightarrow & model_t \\
\ customer_t & \rightarrow & data_t \\
\end{array}
In fact a set \( data_t \) of data is attached to \( MVP_t \) ( \( model_t \) ): this is the information available at \(t \)
B. Overfit, Regularization
The key to the endless process LM is what corresponds in ML to Active Learning: sequential acquisition of new data:
$$ Model_t → data_t → model_{t + 1} \rightarrow data_{t + 1} ...$$
or in the form of a cycle :
\begin{array}{r c l}
\ model & & \\
\ \downarrow \quad \uparrow & & \\
\ data & & \\
\end{array}
So what we have here is really a learning path
The model is at once:
a. A representation of the domain and,
b. A decision function that allows the exploration of the domain in order to acquire new data.
This learning diagram is similar to the Multi-Armed Bandit (MAB): at each \( t \), one decides which arm to operate, and one observes the reward.
a. Ex1: A / B testing protocol
b. Ex2: H&M in KM, different styles are tested almost simultaneously, and one favors the most successful.
Obviously exploration is expensive, and the whole question is to stay alive until the (relative) completion of the learning process ... this is where the concept of lean / waste (82 occurrences in the text) is looming.
Ries is particularly good at telling his own experience at IMVU. The whole pararagraph 'Talking to customers' is a piece of anthology, absolutely hilarious: the confrontation between the engineer and the 'seventeen-year-old girl' announces a tragi-comedy, and was for Ries the revelation that he has basically lost his time for 6 months! 'There's obviously something wrong', 'deal breaker', 'utterly / fundamentally flawed'...
Ries : 'Here's the question that bothered me most of all: if the goal of these months was to learn these important insights about customers, why did it take so long? How much of our effort contributed to the essential lessons we needed to learn? Could we have learned those lessons earlier if I had not been focused on making the product "better" by adding features and fixing bugs?
Here Ries has a seemingly surprising paragraph if one reads it from the coign of vantage of Statistical Learning: 'optimization versus learning'. In the context of statiscal learning, optimization is almost synonymous with learning.
In line with 'learning fallacy', I would say we have here a case of 'tree hiding the forest': the 6 months lost developing unnecessary features to IMVU, this is an example of over-optimization : This 'model' is demolished when confronting new data.
Carrying the metaphor / functor \( Lean \rightarrow Learn \) further, we get
$$ waste \rightarrow overfit$$
We can be tempted to talk about dynamic regularization: we are looking for the simplest and least expensive model (product) that 'fit' the data.
More precisely: if we calculate the difference between:
a. the ex-post reward \( r_t \): not only measurement of the product's suitability to customer, but more generally learning rate
b. and the ex-ante cost of R&D \(c_t \),
The regularization at \( t + 1 \) is done according to \( r_t - c_t \).
It is obviously advantageous for this measurement to be as continuous as possible: this is the key message of the LM: the increment of time, or cycle duration, must be as low as possible. 'The biggest advantage of working in small batches is that quality problems can be identified much sooner'.
The evaluation of \( r_t \) is anything but obvious. As Ries explains at length, growth or other 'vanity metrics' do not prove that \( r_t-c_t > 0 \). The 'actionable metrics' must make it possible to correctly evaluate \( r_t \), according to Ries.
Of course the LM approach gives no guaranty to converge before running out of cash!
C. symmetries?
But most notably, Ries gives no explicit method to guess the symmetries of the domain.
This part is entirely human black box, discretionary. Assumptions come from art, not science. 'As far as exploration is costless and continuous, you can explore randomly' seems to be the cartoon message of Ries. For example, in the case of Caroline at HP, nothing is said except 'testing'.
When Ries insists on metrics (or post analysis), in fact it is indeed representation, therefore symmetries fundamentally. For example, in the case of Grockit, the initial assumption itself is revised: 'In fact, over time, through dozens of tests, it became clear that the key to student engagement was to offer them a combination of social and solo features. Students preferred having a choice of how to study.'
But a fully symmetrized $$ Social \leftrightarrow solo $$
would have warned against a pure social approach.
Actually, we can argue that Ries gives tree heuristics to guess the symmetries:
a. The five whys has a strong flavor of hierarchical discovery, and in fact target (ground) symmetries.
b. Transfer learning: notably from manufacturing, and Toyota.
c. Catalog of Pivots:
Zoom-in / out
Customer segment (or need)
Value capture
Engine of growth
..
In all cases, Category Theory is closer : isn't "Pivot" a wonderful intuition of ... symmetry ?
Of course, MK's 'analogs and antilogs' is quite in line with Cat.
Incidentally, Ries gives examples of \( Learn * customer \), \( Learn *student \) actions without ever giving a model other than 'testing'. To give a single example, the \( social \leftrightarrow solo \) symmetry seems linked to concepts like mimicry (CF for example the recent theory of mirror neurons) on the one hand and to something as a need of intellectual order / Compression (CF the magical theory of creativity of Schmidhuber)
The statistical learning is: \( Learn * data \), and many methods exist.
But it is especially in the case of Sciences (Mathematics, Physics, Biology, ...) that the action
$$ S = Learn * phenomena $$
manifests oneself through a gigantic theoretical and empirical production.
If the action is \( Learn * Object \), then it seems interesting to learn the functor
$$ Learn * X \rightarrow S $$
whatever X is
In finance (and beyond in social science), the functor was formalized through the econophysics. To give an example: RFIM = Random Field Ising model is according to Bouchaud et al. a paradigm - i.e. a symmetry - plausible, CF eg "Crises and collective socio-economic phenomena: simple models and challenges"
Incidentally, the recent interest of physics for statistical learning (CF Mezard 'physics-statistics-and-information-the-defi-of-mass-data' in La Jaune et la Rouge, the Mallat site at ENS, "Learning as categorification III ", etc.) marks a re-symmetrization:
$$ S \leftrightarrow Learn * X $$
But as Mezard says: "Contrary to what is sometimes said, the irruption of massive data into the study of complex systems is not going to take the place of theory. It is always necessary and even more difficult to understand, analyze, and build a model, but the theorist can rely on new and powerful statistical tools. "
Conclusion: The Lean Management approach is motivated mainly by the constraint of profitability. This constraint, if it has the merit of bringing reflection (from the entrepreneur) back to (the objective observation of) reality, is essentially a transfer from the (~millenary) experimental method in sciences.
Why not to push this transfer / Categorification further ?
It is surprising and very comforting to note that the very popular 'lean startup' by Eric Ries or the 'getting to plan B' by Mullins & Komisar (KM) demonstrates an obvious relationship between learning to learn (L2L) and lean management (LM)
$$ LM \rightarrow \quad learning $$
Maybe because entrepreneurs have essentially skin in the game , as should have all learner / modeler.
At the heart of lean management, there is the idea that the company is essentially a place of learning; let's count words:
Learn: 292
Know: 123
Problem: 177
Solv: 38
And precisely, a scientific process:
Hypotheses: 67
Assumption: 73
Theor: 31
Test: 206
Feedback: 86
Valid: 74
Experiment: 171
System: 139
Scien: 47
Fail: 132 (yes, failure is included in the scientific learning package!)
LM is the place of a surprising symmetry:
\begin{array}{r c l}
\ product & \quad & startup \\
\uparrow \quad \downarrow \quad & \rightarrow & \uparrow \quad \downarrow \quad \\
learning & \quad & customer \\
\end{array}
The traditional causality is: I learn to produce; LM puts forward : I produce to learn!
Fordism approach is in (product) push mode: Ford produces, the customer buys
LM is in (informational) pull mode: the startup learns from its customer
"The learning about how to build a sustainable business is the outcome of those experiments. For startups, that information is much more important than dollars, awards, or mentions in the press, because it can influence and reshape the next set of ideas. "
"For startups, the role of strategy is to help figure out the right questions to ask"
In detail, the functor LM → Learning is:
\begin{array}{r c l}
\ product & \quad & model \\
\uparrow \quad \downarrow \quad & \rightarrow & \uparrow \quad \downarrow \quad \\
customer & \quad & data \\
\end{array}
More precisely, everything is dynamic: at all times \( t \) LM seeks a minimum viable product, hence the correspondence
\begin{array}{r c l}
\ MVP_t & \rightarrow & model_t \\
\ customer_t & \rightarrow & data_t \\
\end{array}
In fact a set \( data_t \) of data is attached to \( MVP_t \) ( \( model_t \) ): this is the information available at \(t \)
B. Overfit, Regularization
The key to the endless process LM is what corresponds in ML to Active Learning: sequential acquisition of new data:
$$ Model_t → data_t → model_{t + 1} \rightarrow data_{t + 1} ...$$
or in the form of a cycle :
\begin{array}{r c l}
\ model & & \\
\ \downarrow \quad \uparrow & & \\
\ data & & \\
\end{array}
So what we have here is really a learning path
The model is at once:
a. A representation of the domain and,
b. A decision function that allows the exploration of the domain in order to acquire new data.
This learning diagram is similar to the Multi-Armed Bandit (MAB): at each \( t \), one decides which arm to operate, and one observes the reward.
a. Ex1: A / B testing protocol
b. Ex2: H&M in KM, different styles are tested almost simultaneously, and one favors the most successful.
Obviously exploration is expensive, and the whole question is to stay alive until the (relative) completion of the learning process ... this is where the concept of lean / waste (82 occurrences in the text) is looming.
Ries is particularly good at telling his own experience at IMVU. The whole pararagraph 'Talking to customers' is a piece of anthology, absolutely hilarious: the confrontation between the engineer and the 'seventeen-year-old girl' announces a tragi-comedy, and was for Ries the revelation that he has basically lost his time for 6 months! 'There's obviously something wrong', 'deal breaker', 'utterly / fundamentally flawed'...
Ries : 'Here's the question that bothered me most of all: if the goal of these months was to learn these important insights about customers, why did it take so long? How much of our effort contributed to the essential lessons we needed to learn? Could we have learned those lessons earlier if I had not been focused on making the product "better" by adding features and fixing bugs?
Here Ries has a seemingly surprising paragraph if one reads it from the coign of vantage of Statistical Learning: 'optimization versus learning'. In the context of statiscal learning, optimization is almost synonymous with learning.
In line with 'learning fallacy', I would say we have here a case of 'tree hiding the forest': the 6 months lost developing unnecessary features to IMVU, this is an example of over-optimization : This 'model' is demolished when confronting new data.
Carrying the metaphor / functor \( Lean \rightarrow Learn \) further, we get
$$ waste \rightarrow overfit$$
We can be tempted to talk about dynamic regularization: we are looking for the simplest and least expensive model (product) that 'fit' the data.
More precisely: if we calculate the difference between:
a. the ex-post reward \( r_t \): not only measurement of the product's suitability to customer, but more generally learning rate
b. and the ex-ante cost of R&D \(c_t \),
The regularization at \( t + 1 \) is done according to \( r_t - c_t \).
It is obviously advantageous for this measurement to be as continuous as possible: this is the key message of the LM: the increment of time, or cycle duration, must be as low as possible. 'The biggest advantage of working in small batches is that quality problems can be identified much sooner'.
The evaluation of \( r_t \) is anything but obvious. As Ries explains at length, growth or other 'vanity metrics' do not prove that \( r_t-c_t > 0 \). The 'actionable metrics' must make it possible to correctly evaluate \( r_t \), according to Ries.
Of course the LM approach gives no guaranty to converge before running out of cash!
C. symmetries?
But most notably, Ries gives no explicit method to guess the symmetries of the domain.
This part is entirely human black box, discretionary. Assumptions come from art, not science. 'As far as exploration is costless and continuous, you can explore randomly' seems to be the cartoon message of Ries. For example, in the case of Caroline at HP, nothing is said except 'testing'.
When Ries insists on metrics (or post analysis), in fact it is indeed representation, therefore symmetries fundamentally. For example, in the case of Grockit, the initial assumption itself is revised: 'In fact, over time, through dozens of tests, it became clear that the key to student engagement was to offer them a combination of social and solo features. Students preferred having a choice of how to study.'
But a fully symmetrized $$ Social \leftrightarrow solo $$
would have warned against a pure social approach.
Actually, we can argue that Ries gives tree heuristics to guess the symmetries:
a. The five whys has a strong flavor of hierarchical discovery, and in fact target (ground) symmetries.
b. Transfer learning: notably from manufacturing, and Toyota.
c. Catalog of Pivots:
Zoom-in / out
Customer segment (or need)
Value capture
Engine of growth
..
In all cases, Category Theory is closer : isn't "Pivot" a wonderful intuition of ... symmetry ?
Of course, MK's 'analogs and antilogs' is quite in line with Cat.
Incidentally, Ries gives examples of \( Learn * customer \), \( Learn *student \) actions without ever giving a model other than 'testing'. To give a single example, the \( social \leftrightarrow solo \) symmetry seems linked to concepts like mimicry (CF for example the recent theory of mirror neurons) on the one hand and to something as a need of intellectual order / Compression (CF the magical theory of creativity of Schmidhuber)
The statistical learning is: \( Learn * data \), and many methods exist.
But it is especially in the case of Sciences (Mathematics, Physics, Biology, ...) that the action
$$ S = Learn * phenomena $$
manifests oneself through a gigantic theoretical and empirical production.
If the action is \( Learn * Object \), then it seems interesting to learn the functor
$$ Learn * X \rightarrow S $$
whatever X is
In finance (and beyond in social science), the functor was formalized through the econophysics. To give an example: RFIM = Random Field Ising model is according to Bouchaud et al. a paradigm - i.e. a symmetry - plausible, CF eg "Crises and collective socio-economic phenomena: simple models and challenges"
Incidentally, the recent interest of physics for statistical learning (CF Mezard 'physics-statistics-and-information-the-defi-of-mass-data' in La Jaune et la Rouge, the Mallat site at ENS, "Learning as categorification III ", etc.) marks a re-symmetrization:
$$ S \leftrightarrow Learn * X $$
But as Mezard says: "Contrary to what is sometimes said, the irruption of massive data into the study of complex systems is not going to take the place of theory. It is always necessary and even more difficult to understand, analyze, and build a model, but the theorist can rely on new and powerful statistical tools. "
Conclusion: The Lean Management approach is motivated mainly by the constraint of profitability. This constraint, if it has the merit of bringing reflection (from the entrepreneur) back to (the objective observation of) reality, is essentially a transfer from the (~millenary) experimental method in sciences.
Why not to push this transfer / Categorification further ?
Tuesday, 17 January 2017
No equilibrium theorem
1. in the classic "How markets slowly digest", Bouchaud et. al (B): "Then begins a kind of hide and seek game, where each side attempts to guess the available liquidity on the other side. A 'tit-for-tat' process then starts, whereby market orders trigger limit orders and limit orders attracts market orders ", 6.5.4
2. Incidentally, tit-for-tat (TT) refers to a strategy of the iterated PD (prisoner dilemma)
3. what can be the meaning of the looming of game theory GT within a paper at distance of Rational Equilibrium approach and critic of K85?
4. We can argue a methodological isomorphism between theoretical physics TP (Physics and learning) and GT
\begin{array}{r c l}
\ TP & \leftrightarrow & GT \\
(SO (3) / local / etc) \quad symmetries & \rightarrow & (infinite) \quad strategic \quad regress \\
\end{array}
5. See, for example, 'Rational interaction', 'game theory, symmetry, and scientific discovery', HW Brock
6. We would be tempted to conclude that B "fail to learn": he does not recognize the natural (theoretical) space of the domain he is studying, i.e. the symmetries of the domain
But GT suggests that "physical" dynamics have a good chance of lapsing in finance
7. In physics the causality \( F ^ i = m * a ^ i \) (or even the equations of the field in RG) establishes the causal link between force and acceleration
8. K is still in this paradigm: \( imb = 1 / λ * δp \) CF 'mm second law'
9. As is often the case with this type of article, equilibrium is obtained conditionally only by assumptions of implicit coordination that are not realistic (K explicitly excludes any form of manipulation, insider and mm assume the optimal strategy of other agents)
10. A natural framework is rather the PD: in a game of sharing P&L (= long term equilibrium), the two agents can play cooperation, ie a kind of à la Kyle equilibrium corresponding to a kind of average historical behavior of these Agents), or deviate significantly from them, fooling other expectations
11. More precisely, in K, the fundamental symmetry is written simply
\begin{array}{r c l}
\ & K & \\
mm & \rightarrow & insider \\
\end{array}
Mm and insider know all about one another, the reasoning rules are CK (common knowledge). They could reverse their respective roles, but they do not, symmetry is only valid in K(nowledge) and not in A(ct)
12. The literature on PD is considerable (and for good reason), but in iterated strategies such as TT or win-stay lose-switch appear not only theoretically (including in an evolutionary framework) but empirically observed (CF Axelrod 84, 'Evolutionary dynamics', Nowak)
13. Note that this takes place in a framework of payoff without any uncertainty: the matrix of the payoffs is perfectly known.
There is therefore no stake in learning a representation of the world
14. Projected in the framework of K, this amounts to saying that the representation (imb) is not an issue
On the other hand, \( \lambda \) can not be stable
And this non-stationarity is fundamental, in the sense that it results from the space symmetry of space (GT) space (finance)
In a GT frame, we now have a substitution in Act between mm and insider:
\begin{array}{r c l}
\ & K, A & \\
mm & \rightarrow & insider \\
\end{array}
The mm can mimic the insider, and reciprocally
2. Incidentally, tit-for-tat (TT) refers to a strategy of the iterated PD (prisoner dilemma)
3. what can be the meaning of the looming of game theory GT within a paper at distance of Rational Equilibrium approach and critic of K85?
4. We can argue a methodological isomorphism between theoretical physics TP (Physics and learning) and GT
\begin{array}{r c l}
\ TP & \leftrightarrow & GT \\
(SO (3) / local / etc) \quad symmetries & \rightarrow & (infinite) \quad strategic \quad regress \\
\end{array}
5. See, for example, 'Rational interaction', 'game theory, symmetry, and scientific discovery', HW Brock
6. We would be tempted to conclude that B "fail to learn": he does not recognize the natural (theoretical) space of the domain he is studying, i.e. the symmetries of the domain
But GT suggests that "physical" dynamics have a good chance of lapsing in finance
7. In physics the causality \( F ^ i = m * a ^ i \) (or even the equations of the field in RG) establishes the causal link between force and acceleration
8. K is still in this paradigm: \( imb = 1 / λ * δp \) CF 'mm second law'
9. As is often the case with this type of article, equilibrium is obtained conditionally only by assumptions of implicit coordination that are not realistic (K explicitly excludes any form of manipulation, insider and mm assume the optimal strategy of other agents)
10. A natural framework is rather the PD: in a game of sharing P&L (= long term equilibrium), the two agents can play cooperation, ie a kind of à la Kyle equilibrium corresponding to a kind of average historical behavior of these Agents), or deviate significantly from them, fooling other expectations
11. More precisely, in K, the fundamental symmetry is written simply
\begin{array}{r c l}
\ & K & \\
mm & \rightarrow & insider \\
\end{array}
Mm and insider know all about one another, the reasoning rules are CK (common knowledge). They could reverse their respective roles, but they do not, symmetry is only valid in K(nowledge) and not in A(ct)
12. The literature on PD is considerable (and for good reason), but in iterated strategies such as TT or win-stay lose-switch appear not only theoretically (including in an evolutionary framework) but empirically observed (CF Axelrod 84, 'Evolutionary dynamics', Nowak)
13. Note that this takes place in a framework of payoff without any uncertainty: the matrix of the payoffs is perfectly known.
There is therefore no stake in learning a representation of the world
14. Projected in the framework of K, this amounts to saying that the representation (imb) is not an issue
On the other hand, \( \lambda \) can not be stable
And this non-stationarity is fundamental, in the sense that it results from the space symmetry of space (GT) space (finance)
In a GT frame, we now have a substitution in Act between mm and insider:
\begin{array}{r c l}
\ & K, A & \\
mm & \rightarrow & insider \\
\end{array}
The mm can mimic the insider, and reciprocally
Subscribe to:
Comments (Atom)