La recherche d’information dans un texte est
une des tâches classiques du text mining.
Un résumé peut consister à mettre en avant les
concepts souvent répétés dans un texte : c’est une approche en fréquence
absolue.
Ou l’on peut se baser sur une distribution de
référence et mettre en avant les concepts du texte relativement plus fréquents
que la référence.
On peut encore itérer cette approche, et
regarder les n-grams relativement plus fréquents.
Mais ces n-grams peuvent être ou non
constitués de concepts accolés dans la phrase (modulo les stop words que l’on aura pris soin de retirer). En général, on peut combiner ces deux cas.
Prenons à titre d’exemple le texte de Polya
bien connu, ‘How to solve it’.
Par ordre de fréquence relative par rapport à
un benchmark (ici wikipédia), on a les différents concepts et leurs associations de type \(x \rightarrow y\), là aussi
dans l’ordre fréquentiel.
Par exemple \( auxiliari\_problem \rightarrow origin\_problem\) , etc.
On voit apparaitre des concepts qui sont bien connu
des lecteurs de ce livre : auxiliari-problem, sign of progress, variate
the problem, heuristic reasonning...
Polya donne à voir comme personne un art de la
découverte qu’il n’hésite pas rapprocher des Grandes Traversées du XVe s.
Cette représentation du texte permet aussi de
créer un résumé automatique, en ordonnant les phrases représentant le mieux
les structures \( x_i \rightarrow Y_i=\{ Y_{j}^{i} \} \):
AUXILIARI_PROBLEM
The auxiliary problem was, as a special case,
in fact much less ambitious than the original problem
To sum up, we used the less difficult, less
ambitious, special, auxiliary problem as a stepping stone in solving the more
difficult, more ambitious, general, original problem
ORIGIN_PROBLEM
Convertible reductions are, in a certain
respect, more important and more desirable than other ways to introduce
auxiliary problems, but auxiliary problems which are not equivalent to the
original problem may also be very useful
SIGN_PROGRESS
The day before that memorable date on which
they sighted the island of San Salvador, as the floating objects in the water
became so frequent, they thought: "It looks Signs of Progress as if we were approaching some land”
Our undertaking may be important or unimportant,
our problem of any kind when we are working intensely, we watch eagerly for
signs of progress as Columbus and his companions watched for signs of
approaching land
WORK_BACKWARD
Modern Heuristic : There are articles
discussing methodical questions often important in elementary mathematics, as
pappus, WORKING BACKWARDS (already quoted under 3) , reductio AD ABSURDUM AND
INDIRECT PROOF, INDUCTION AND MATHEMATICAL INDUCTION, SETTING UP EQUATIONS, TEST
BY DIMENSION, and WHY PROOFS
Analysis is neatly defined by pappus, and it
is a useful term, describing a typical way of devising a plan, starting from
the unknown (or the conclusion) and working backwards, toward the data (or the
hypothesis)
LOOK_UNKNOWN
There are, however, questions and suggestions
which are frequently helpful, as look at the unknown
There is a suggestion that puts our finger on
an essential common point: Look at the unknown
VARIAT_PROBLEM
Variation of the problem may lead us to
auxiliary ELEMENTS, or to the discovery of a more accessible auxiliary PROBLEM
Variation of the problem may lead to some
appropriate auxiliary problem: // you cannot solve the proposed problem, try to
solve first some related problem
DECOMPOS_RECOMBIN
Many questions aim at the variation of the
problem by specified means, as going back to the definition, using analogy,
generalization, SPECIALIZATION, DECOMPOSING AND RECOMBINING
There are certain modes of varying the problem
which are typically useful, as going back to the definition, DECOMPOSING AND
RECOMBINING, introducing AUXILIARY ELEMENTS, GENERALIZATION, SPECIALIZATION,
and the use of ANALOGY
USE_RESULT
Using the result of the auxiliary problem we
easily solve our original problem (we have to complete the parallelogram)
We may use the result of the auxiliary problem
DRAW_FIGUR
We start the detailed consideration of such a
problem by drawing a figure containing the unknown and the data, all these
elements being assembled as it is prescribed by the condition of the problem
HEURIST_REASON
It is concerned with the nature of heuristic
reasoning and, by extension, with a kind of reasoning which is nondemonstrative
although important and which we shall call, for lack of a better term,
plausible reasoning
We could call the reasoning that underlies
this kind of evidence "heuristic reasoning" or "inductive
reasoning" or (if we wish to avoid stretching the meaning of existing
terms) "plausible reasoning
BRIGHT_IDEA
A sudden advance toward the solution is called
a bright idea, a good idea, a happy thought, a brain-wave (in German there is a
more technical term, Einfalt)
" Bright idea, or
"good idea," or "seeing the light," is a colloquial
expression describing a sudden advance toward the solution
SPECIAL_CASE
This auxiliary problem is a special case of
the original problem (the extreme special case in which one of the two ships is
at rest)
INTRODUC_AUXILIARI_ELEMENT
In general, having recollected a formerly
solved related problem and wishing to use it for our present one, we must often
ask: Should we introduce some auxiliary element in order to make its use
possible
We aim at such an effect when, thinking about
the possible use of a formerly solved related problem, we ask: Should you
introduce some auxiliary element in order to make its use possible
KNOW_RELAT_PROBLEM
Setting a routine problem, the teacher thrusts
under the nose of the student an immediate and decisive answer to the question:
Do you know a related problem
Let us go back to the situation as it
presented itself at the beginning of section 10 when the question was asked: Do
you know a related problem
ANALOG
We may vary the problem by decomposing and
RECOMBiNiNG its elements, or by going back to the definition of certain of its
terms, or we may use the great resources of generalization, specialization, and
analogy
Aller plus loin dans la compression de la
représentation peut se faire en projetant Y sur \( X= \{ x_i, \; i<N \} \), i.e. en restreignant les \(Y_{j}^{k}\) aux \( x_i, \; i<N \), se souvenant que les \( x_i \) sont ordonnés selon leur fréquence relative.
Plus précisément, on cherche les occurrences des \(x_i \) dans chaque \( \{Y_{j}^{k}, j<cut \} \), imposant donc
que \(x_i \) soit ‘proche’ de \(x_k \) (dans les \( cut \) premiers \( \{ Y_{j}^{k} \}_j \) ). D’où une
nouvelle table \( x_i \rightarrow Z^i = {x_k^i} \) où
l’on ordonne sur \( \| Z^i \| \) :
le concept \(x_0 \) est le plus ‘central’ en ce qu’il est le plus
connecté.
On supprime ensuite les \( x_i \) quand ils se trouvent dans les \( Z^k, \; i>k \). On choisit enfin de supprimer les \( x_i \) si 50% de \( Z^i \) se
trouvent dans un \( Z^k \; i>k\). On
obtient pour ‘How to solve it’ (partant de près de 1000 \(x_i \), cut =
10) :
On constate que la première thématique, ‘origin_problem’,
contient 169 voisins, dont : 'auxiliari_problem',
'use_result',
'special_case',
'simpler_analog_problem',
'introduc_auxiliari',
'less_ambiti',
'restat',
'devis',
'reconsid',
'tri',
'step',
'familiar',
'simpler',
'vari',
'deriv',
'easier',
'auxiliari',
'various',
'combin',
'passag',
'modifi',
qui sont bien les voisins attendus de la
thématique Variation-Comparaison.
On trouve aussi la thématique
‘sign_progress’, qui pointe sur ‘progress_achiev’, ‘approach_land’, mais aussi
sur un versant psychologique :
‘subconsci_work’, ‘suspect’, ‘mental’…
La thématique ‘part_condit’ renvoie à
‘decompos_recombin’, ‘general_special’, mais aussi sur ‘restat’, ‘tri’ qu’on
retrouve dans le premier cluster ‘origin_problem’.
‘technic_term’
pointe sur ‘heuristic_reason’, ‘bright_idea’ ,’relat’.
‘auxiliari_element’ pointe sur 'auxiliari_problem',
‘variat_problem’,’analog’, ‘familiar’, ‘relat’,
et semble donc assez proche de ‘origin_problem’.
Autre cluster intéressant ‘plausibl_reason’ pointe
sur 'heurist_reason', 'point_view', 'heurist_syllog', 'induct', 'infer'.
On obtient un peu plus de clusters avec cut
= 5 :
Utilisant le logiciel de représentation de graphe et clustering Delphi , on retrouve essentiellement les mêmes résultats. ci dessous la partie haute puis basse du même graphe.





No comments:
Post a Comment