HAL : derniers dépôts du SAMM
jeudi 19 mars 2015

[hal01133175] Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation
Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of engine related data during flights which are used, among other applications, to detect anomalies. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that builds upon human expertise and that remains understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.
jeudi 5 mars 2015

[hal01122393] The Dynamic Random Subgraph Model for the Clustering of Evolving Networks
In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these techniques are suitable for static networks, that is to say, not taking into account the temporal dimension. This work is motivated by the need of analyzing evolving networks where a decomposition of the networks into subgraphs is given. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximization (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to an update step which involves a new state space model from which the parameters along with the hidden states can be estimated using the standard Kalman filter and RauchTungStriebel (RTS) smoother. Simulated data sets are considered to assess the proposed methodology. Finally, dRSM along with the corresponding VEM algorithm are applied to an original maritime network built from printed Lloyd's voyage records.
mercredi 31 décembre 2014

[hal01099026] Uniform Exponential Stability of Discrete Evolution Families on Space of $p$Periodic Sequences
[...]

[hal01099024] Criterion For The Exponential Stability of Discrete Evolution Family Over Banach Spaces
Let $T(1)$ be the algebraic generator of the discrete semigroup $\textbf{T}=\{T(n)\}_{n\geq 0}$. We prove that the system $x_{n+1}=T(1)x_n$ is uniformly exponentially stable if and only if for each real number $\mu$ and each $q$periodic sequence $z(n)$ with $z(0)=0$ the unique solution of the Cauchy Problem \begin{equation*} \left\{ \begin{split} % \nonumber to remove numbering (before each equation) x_{n+1} &= T(1)x_{n}+e^{i\mu(n+1)}z(n+1), \\ x_0&= 0 \end{split} \right.\eqno{(T(1), \mu,0)} \end{equation*} is bounded. We also extend the above result to $q$periodic system $y_{n+1} = A_{n}y_{n}$ i.e. we proved that the system $y_{n+1} = A_{n}y_{n}$ is uniformly exponentially stable if and only if for each real number $\mu$ and each $q$periodic sequence $z(n)$, with $z(0)=0$ the unique solution of the Cauchy Problem \begin{equation*} \left\{ \begin{split} % \nonumber to remove numbering (before each equation) y_{n+1} &= A_{n}y_{n}+e^{i\mu(n+1)}z(n+1), \\ y_0&= 0 \end{split} \right.\eqno{(A_{n}, \mu,0)} \end{equation*} is bounded. Here, $A_{n}$ is a sequence of bounded linear operators on Banach space $\mathcal{X}$.

[hal01099019] New aspects of nonautonomous discrete systems stability
We prove that a discrete evolution family ${\bf U}:=\{U(n,m):\; n\geq m\in \mathbb{Z}_+\}$ of bounded linear operators acting on a complex Banach space $X$ is uniformly esponentially stable if and only if for each forcing term $(f(n))_{n\in \mathbb{Z}_+}$ belonging to $AP_0(\mathbb{Z}_+, X)$, the solution of the discrete Cauchy Problem $$ \left\{ \begin{array}{lc} x(n+1)=A(n)x(n)+f(n), n\in \mathbb{Z}_+ \\ x(0)=0 \end{array} \right. $$ belongs to $AP_0(\mathbb{Z}_+, X)$, where the operatorsvalued sequence $(A(n))_{n\in \mathbb{Z}_+}$ generates the evolution family ${\bf U}$. The approach we use is based on the theory of discrete evolution semigroups associated to this family.
samedi 6 décembre 2014

[hal01086633] UN MODÈLE DYNAMIQUE DE SOUSGRAPHES ALÉATOIRES. ÉTUDE DU SCANDALE ENRON
Résumé. — Ces dernières années, de nombreux modèles de graphes aléatoires ont été proposés pour extraire des informations à partir de réseaux dans des domaines variés. Le principe de ces modèles consiste à chercher des groupes de noeuds ayant des profils de connexion homogènes. La plupart de ces modèles sont adaptés pour des réseaux statiques ayant des arêtes binaires ou discrètes mais sans prendre en compte la dimension temporelle. Ce travail est motivé par la nécessité d'analyser un réseau dynamique décrivant les communications électroniques (email) entre les employés de l'entreprise Enron où les positions sociales jouent un rôle important. Nous proposons dans cet article une extension au cadre dynamique du modèle de graphe aléatoire RSM qui a été récemment proposé pour modéliser à l'aide de groupes latents des réseaux statiques pour lesquels une partition en sousgraphes est connue. Notre approche est basée sur l'utilisation d'un statespace model pour modéliser l'évolution au cours du temps des proportions des groupes latents. Le modèle ainsi obtenu est appelé modèle de sousgraphes aléatoires dynamiques (dRSM) et un algorithme de type EM variationnel (VEM) est proposé pour en effectuer l'inférence. Nous montrons que les approximations variationnelles conduisent à un nouveau statespace model à partir duquel les paramètres ainsi que les états cachés peuvent être estimés en utilisant le filtre de Kalman et le RauchTungStriebel (RTS) smoother. La méthodologie est finalement appliquée au jeu des données d'emails de l'entreprise Enron et permet de mettre en évidence une réaction anticipée des cadres par rapport aux autres employés concernant le scandale à venir.
jeudi 4 décembre 2014

[hal01090085] Mary Astell's words in A Serious Proposal to the Ladies (part I), a lexicographic inquiry with NooJ
In the following article we elected to study with NooJ the lexis of a 17 th century text, Mary Astell's seminal essay, A Serious Proposal to the Ladies, part I, published in 1694. We first focused on the semantics to see how Astell builds her vindication of the female sex, which words she uses to sensitise women to their alienated condition and promote their education. Then we studied the morphology of the lexemes (which is different from contemporary English) used by the author, thanks to the NooJ tools we have devised for this purpose. NooJ has great functionalities for lexicographic work. Its commands and graphs prove to be most efficient in the spotting of archaic words or variants in spelling. Introduction In our previous articles, we have studied the singularities of 17 th century English within the framework of a diachronic analysis thanks to syntactical and morphological graphs and thanks to the dictionaries we have compiled from a corpus that may be expanded overtime. Our early work was based on a limited corpus of English travel literature to Greece in the 17 th century. This article deals with a late seventeenth century text written by a woman philosopher and essayist, Mary Astell (1666–1731), considered as one of the first English feminists. Astell wrote her essay at a time in English history when women were "the weaker vessel" and their main business in life was to charm and please men by their looks and submissiveness. In this essay we will see how NooJ can help us analyse Astell's rhetoric (what point of view does she adopt, does she speak in her own name, in the name of all women, what is her representation of men and women and their relationships in the text, what are the goals of education?). Then we will turn our attention to the morphology of words in the text and use NooJ commands and graphs to carry out a lexicographic inquiry into Astell's lexemes.
vendredi 28 novembre 2014

[hal01088420] IN THE PURSUIT OF A LOST MANUSCRIPT: PTOLEMY’S PLANISPHAERIUM
Research work on texts often needs to rely upon the latest scientific methods to improve its results. When talking about old manuscripts, which make us go far behind, and often exhibit dark zones, the latest software techniques may help. For the scrutiny of one of our manuscripts, an Arabic translation of a now lost Greek Treatise, why not use the NooJ program ?We start by giving the content of the original Treatise. Then, by the analysis of words  their frequency, their spelling – and sentences, we strive to better understand the conditions of the Arabic copy carried out in the 9th Century, in the Middle East, bur only known by copies of the 13th Century performed in Iran.
mardi 25 novembre 2014
samedi 15 novembre 2014
mercredi 5 novembre 2014
mardi 28 octobre 2014

[hal01077039] There is no variational characterization of the cycles in the method of periodic projections
The method of periodic projections consists in iterating projections onto m closed convex subsets of a Hilbert space according to a periodic sweeping strategy. In the presence of m ≥ 3 sets, a longstanding question going back to the 1960s is whether the limit cycles obtained by such a process can be characterized as the minimizers of a certain functional. In this paper we answer this question in the negative. Projection algorithms that minimize smooth convex functions over a product of convex sets are also discussed.
lundi 13 octobre 2014

[halshs01025095] Are autographs integrating the global art market? The case of hedonic prices for French autographs (19602005)
The market for autographs has become more open to international buyers since 1990. Our data set features a large sample of store and auction sales for selected authors every five years from 1960 to 2005. The estimation of a hedonic price function shows that page count, type of author, date and type of the document, together with consumer and assets price indices explain more than one half of the price differences. Authors whho are more often sold at auctions (hence more likely to attract international demand) carry a 28% premium when sold in stores. The autographs (real) price increased by 222% during the period, while the hedonic price increased by 190%. With growing correlation between French autograph prices and art market index, as well as a supply function responsive to market valuation and trends, the French autograph market has become more integrated in the global art market since the 1990's.
mardi 16 septembre 2014

[hal01017853] Anomaly Detection Based on Aggregation of Indicators
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the origin of the problem that produced the anomaly is also essential. This paper introduces a general methodology that can assist human operators who aim at classifying monitoring signals. The main idea is to leverage expert knowledge by generating a very large number of indicators. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. The parameters of the classifier have been optimized indirectly by the selection process. Simulated data designed to reproduce some of the anomaly types observed in real world engines.

[hal01064529] Anomaly Detection Based on Indicators Aggregation
Automatic anomaly detection is a major issue in various areas. Beyond mere detection, the identification of the source of the problem that produced the anomaly is also essential. This is particularly the case in aircraft engine health monitoring where detecting early signs of failure (anomalies) and helping the engine owner to implement efficiently the adapted maintenance operations (fixing the source of the anomaly) are of crucial importance to reduce the costs attached to unscheduled maintenance. This paper introduces a general methodology that aims at classifying monitoring signals into normal ones and several classes of abnormal ones. The main idea is to leverage expert knowledge by generating a very large number of binary indicators. Each indicator corresponds to a fully parametrized anomaly detector built from parametric anomaly scores designed by experts. A feature selection method is used to keep only the most discriminant indicators which are used at inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.
lundi 15 septembre 2014

[hal00984395] Overlapping clustering methods for networks
Networks allow the representation of interactions between objects. Their structures are often complex to explore and need some algorithmic and statistical tools for summarizing. One possible way to go is to cluster their vertices into groups having similar connectivity patterns. This chapter aims at presenting an overview of clustering methods for network vertices. Common community structure searching algorithms are detailed. The wellknown Stochastic Block Model (SBM) is then introduced and its generalization to overlapping mixed membership structure closes the chapter. Examples of application are also presented and the main hypothesis underlying the presented algorithms discussed.

[hal01063831] Online relational and multiple relational SOM
In some applications and in order to address realworld situations better, data may be more complex than simple numerical vectors. In some examples, data can be known only through their pairwise dissimilarities or through multiple dissimilarities, each of them describing a particular feature of the data set. Several variants of the Self Organizing Map (SOM) algorithm were introduced to generalize the original algorithm to the framework of dissimilarity data. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual linear combination of all elements in the data set, referring to a pseudoeuclidean framework. In the present article, an online version of relational SOM is introduced and studied. Similarly to the situation in the Euclidean framework, this online algorithm provides a better organization and is much less sensible to prototype initialization than standard (batch) relational SOM. In a more general case, this stochastic version allows us to integrate an additional stochastic gradient descent step in the algorithm which can tune the respective weights of several dissimilarities in an optimal way: the resulting \emph{multiple relational SOM} thus has the ability to integrate several sources of data of different types, or to make a consensus between several dissimilarities describing the same data. The algorithms introduced in this manuscript are tested on several data sets, including categorical data and graphs. Online relational SOM is currently available in the R package SOMbrero that can be downloaded at http://sombrero.rforge.rproject.org or directly tested on its Web User Interface at http://shiny.nathalievilla.org/sombrero.
dimanche 7 septembre 2014

[hal01061236] On the multiplier rules
We establish new results of firstorder necessary conditions of optimality for finitedimensional problems with inequality constraints and for problems with equality and inequality constraints, in the form of John's theorem and in the form of KarushKuhnTucker's theorem. In comparison with existing results we weaken assumptions of continuity and of differentiability.
mardi 26 août 2014

[hal01058431] A Methodology for the Diagnostic of Aircraft Engine Based on Indicators Aggregation
Aircraft engine manufacturers collect large amount of engine related data during flights. These data are used to detect anomalies in the engines in order to help companies optimize their maintenance costs. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that is understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. The best indicators are selected via a classical forward scheme, leading to a much reduced number of indicators that are tuned to a data set. We illustrate the interest of the method on simulated data which contain realistic early signs of anomalies.
mardi 5 août 2014

[hal01053673] Exploration of a large database of French notarial acts with social network methods
This article illustrates how mathematical and statistical tools designed to handle relational data may be useful to help decipher the most important features and defects of a large historical database and to gain knowledge about a corpus made of several thousand documents. Such a relational model is generally enough to address a wide variety of problems, including most databases containing relational tables. In mathematics, it is referred to as a 'network' or a 'graph'. The article's purpose is to emphasize how a relevant relational model of a historical corpus can serve as a theoretical framework which makes available automatic data mining methods designed for graphs. By such methods, for one thing, consistency checking can be performed so as to extract possible transcription errors or interpretation errors during the transcription automatically. Moreover, when the database is so large that a human being is unable to gain much knowledge by even an exhaustive manual exploration, relational data mining can help elucidate the database's main features. First, the macroscopic structure of the relations between entities can be emphasized with the help of network summaries automatically produced by classification methods. A complementary point of view is obtained via local summaries of the relation structure: a set of networkrelated indicators can be calculated for each entity, singling out, for instance, highly connected entities. Finally, visualisation methods dedicated to graphs can be used to give the user an intuitive understanding of the database. Additional information can be superimposed on such network visualisations, making it possible intuitively to link the relations between entities using attributes that describe each entity. This overall approach is here illustrated with a huge corpus of medieval notarial acts, containing several thousand transactions and involving a comparable number of persons.
lundi 7 juillet 2014

[hal01018732] SOMbrero: an R package for numeric and nonnumeric SelfOrganizing Maps
This paper presents SOMbrero, a new R package for selforganizing maps. Along with the standard SOM algorithm for numeric data, it implements selforganizing maps for contingency tables (''Korresp'') and for dissimilarity data(''relational SOM''), all relying on stochastic (i.e., online) training. It offers many graphical outputs and diagnostic tools, and comes with a userfriendly web graphical interface, based on the shiny R package.
vendredi 4 juillet 2014

[hal01018374] Bagged kernel SOM
In a number of reallife applications, the user is interested in analyzing non vectorial data, for which kernels are useful tools that embed data into an (implicit) Euclidean space. However, when using such approaches with prototypebased methods, the computational time is related to the number of observations (because the prototypes are expressed as convex combinations of the original data). Also, a side effect of the method is that the interpretability of the prototypes is lost. In the present paper, we propose to overcome these two issues by using a bagging approach. The results are illustrated on simulated data sets and compared to alternatives found in the literature.
jeudi 19 mars 2015
jeudi 5 mars 2015
mercredi 31 décembre 2014
samedi 6 décembre 2014
jeudi 4 décembre 2014
vendredi 28 novembre 2014
mardi 25 novembre 2014
samedi 15 novembre 2014
mercredi 5 novembre 2014
mardi 28 octobre 2014
lundi 13 octobre 2014
mardi 16 septembre 2014
lundi 15 septembre 2014
dimanche 7 septembre 2014
mardi 26 août 2014
mardi 5 août 2014
lundi 7 juillet 2014
vendredi 4 juillet 2014