HAL : derniers dépôts du SAMM

samedi 18 juillet 2015

  • [hal-01169573] Anomaly Detection Based on Confidence Intervals Using SOM with an Application to Health Monitoring
    We develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of new data to the Kohonen map learned with corrected healthy data. We apply the proposed method to the detection of aircraft engine anomalies.

  • [hal-01175731] Multiple dissimilarity SOM for clustering and visualizing graphs with node and edge attributes
    When wanting to understand the way a graph G is structured and how the relations it models organize groups of entities, clustering and visualization can be combined to provide the user with a global overview of the graph, on the form of a projected graph: a simplified graph is visualized in which the nodes correspond to a cluster of nodes in the original graph G (with a size proportional to the number of nodes that are classified inside this cluster) and the edges between two nodes have a width proportional to the number of links between the nodes of G classified in the two corresponding clusters. This approach can be trickier when additional attributes (numerical or factors) describe the nodes of G or when the edges of G are of different types and should be treated separately: the simplified representation should then represent similarities for all sets of information. In this proposal, we present a variant of Self-Organizing Maps (SOM), which is adapted to data described by one or several (dis)similarities or kernels recently published in (Olteanu & Villa-Vialaneix, 2015) and which is able to combine clustering and visualization for this kind of graphs.

vendredi 10 juillet 2015

  • [hal-01173774] A usufel lemma for Lagrange multiplier rules in infinite dimension.
    We give some reasonable and usable conditions on a sequence of norm one in a dual banach space under which the sequence does not converges to the origin in the $w^*$-topology. These requirements help to ensure that the Lagrange multipliers are nontrivial, when we are interested for example on the infinite dimensional infinite-horizon Pontryagin Principles for discrete-time problems.

jeudi 9 juillet 2015

  • [hal-01171557] Classification et visualisation de graphes avec SOMbrero
    Récemment, les données structurées et notamment les graphes ont connu un intérêt croissant. Celles-ci ont en effet de multiples applications en sciences humaines et sociales, en biologie ou en informatique. Pour comprendre les structures complexes modélisées par les graphes, une approche courante consiste à combiner classification des sommets du graphe avec visualisation : cela permet de mettre en valeur la structure macroscopique du graphe (classes denses de sommets et leurs relations) avant de se focaliser sur les détails à l'intérieur de telle ou telle classe. En particulier, le graphe est parfois représenté sur la base de la classification, de manière simplifiée : chaque classe est représentée par un méta-sommet, d'aire proportionnelle au nombre de sommets du graphe initial affectés à la classe correspondante, et les relations entre les méta-sommets sont représentées par des arêtes dont l'épaisseur est proportionnelle au nombre de liens entre les sommets des deux classes. Cette proposition se focalisera sur l'utilisation du package R SOMbrero pour l'exploration de graphes et la recherche d'une représentation simplifiée comme décrite plus haut.

vendredi 3 juillet 2015

  • [hal-01169677] Any law of group metric invariant is an inf-convolution.
    In this article, we bring a new light on the concept of the inf-convolution operation $\oplus$ and provides additional informations to the work started in \cite{Ba1} and \cite{Ba2}. It is shown that any internal law of group metric invariant (even quasigroup) can be considered as an inf-convolution. Consequently, the operation of the inf-convolution of functions on a group metric invariant is in reality an extension of the internal law of $X$ to spaces of functions on $X$. We give an example of monoid $(S(X),\oplus)$ for the inf-convolution structure, (which is dense in the set of all $1$-Lipschitz bounded from bellow functions) for which, the map $\arg\min : (S(X),\oplus) \rightarrow (X,.)$ is a (single valued) monoid morphism. It is also proved that, given a group complete metric invariant $(X,d)$, the complete metric space $(\mathcal{K}(X),d_{\infty})$ of all Katetov maps from $X$ to $\R$ equiped with the inf-convolution has a natural monoid structure which provides the following fact: the group of all isometric automorphisms $Aut_{Iso}(\mathcal{K}(X))$ of the monoid $\mathcal{K}(X)$, is isomorphic to the group of all isometric automorphisms $Aut_{Iso}(X)$ of the group $X$. On the other hand, we prove that the subset $\mathcal{K}_C(X)$ of $\mathcal{K}(X)$ of convex functions on a Banach space $X$, can be endowed with a convex cone structure in which $X$ embeds isometrically as Banach space.

mercredi 1er juillet 2015

  • [hal-01169515] Analysis of Professional Trajectories using Disconnected Self-Organizing Maps
    In this paper we address an important economic question. Is there, as mainstream economic theory asserts it, an homogeneous labor market with mechanisms which govern supply and demand for work, producing an equilibrium with its remarkable properties? Using the Panel Study of Income Dynamics (PSID) collected on the period 1984-2003, we study the situations of American workers with respect to employment. The data include all heads of household (men or women) as well as the partners who are on the labor market, working or not. They are extracted from the complete survey and we compute a few relevant features which characterize the worker's situations. To perform this analysis, we suggest using a Self-Organizing Map (SOM, Kohonen algorithm) with specific structure based on planar graphs, with disconnected components (called D-SOM), especially interesting for clustering. We compare the results to those obtained with a classical SOM grid and a star-shaped map (called SOS). Each component of D-SOM takes the form of a string and corresponds to an organized cluster. From this clustering, we study the trajectories of the individuals among the classes by using the transition probability matrices for each period and the corresponding stationary distributions. As a matter of fact, we find clear evidence of heterogeneous parts, each one with high homo-geneity, representing situations well identified in terms of activity and wage levels and in degree of stability in the workplace. These results and their interpretation in economic terms contribute to the debate about flexibility which is commonly seen as a way to obtain a better level of equilibrium on the labor market.

vendredi 26 juin 2015

  • [hal-01168120] How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining
    This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words.

mercredi 24 juin 2015

  • [hal-01166849] Graphs in machine learning: an introduction
    Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network is

mardi 23 juin 2015

  • [hal-00866094] mu-Limit Sets of Cellular Automata from a Computational Complexity Perspective
    This paper concerns $\mu$-limit sets of cellular automata: sets of configurations made of words whose probability to appear does not vanish with time, starting from an initial $\mu$-random configuration. More precisely, we investigate the computational complexity of these sets and of related decision problems. Main results: first, $\mu$-limit sets can have a $\Sigma_3^0$-hard language, second, they can contain only $\alpha$-complex configurations, third, any non-trivial property concerning them is at least $\Pi_3^0$-hard. We prove complexity upper bounds, study restrictions of these questions to particular classes of CA, and different types of (non-)convergence of the measure of a word during the evolution.

vendredi 19 juin 2015

  • [hal-01164797] The inf-onvolution between algebra and optimization. Appliations to the Banah-Stone theorem.
    This work generalize and extend results obtained reently in [2] from the Banah spaces framework to the groups framework. We study abstract classe of functions monoids for the inf-onvolution struture and gives a complete description of the group of unit of such monoids. We then apply this results to obtain various versions of the Banah-Stone theorem for the inf-convolution struture in the group framework. We also give as consequene an algebraic proof of the Banah-Dieudonée theorem. Our tehniques are based on a new optimization result.

samedi 13 juin 2015

  • [hal-01163390] Reducing offline evaluation bias of collaborative filtering algorithms
    Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the evaluation of the performance of a recommendation algorithm computed using historical data (via offline evaluation). This paper presents a new application of a weighted offline evaluation to reduce this bias for collaborative filtering algorithms.

  • [hal-01162980] Using the Mean Absolute Percentage Error for Regression Models
    We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE.

  • [hal-01163367] Exact ICL maximization in a non-stationary time extension of the latent block model for dynamic networks
    The latent block model (LBM) is a flexible probabilistic tool to describe interactions between node sets in bipartite networks, but it does not account for interactions of time varying intensity between nodes in unknown classes. In this paper we propose a non stationary temporal extension of the LBM that clusters simultaneously the two node sets of a bipartite network and constructs classes of time intervals on which interactions are stationary. The number of clusters as well as the membership to classes are obtained by maximizing the exact complete-data integrated likelihood relying on a greedy search approach. Experiments on simulated and real data are carried out in order to assess the proposed methodology.

  • [hal-01162981] Search Strategies for Binary Feature Selection for a Naive Bayes Classifier
    We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost.

mercredi 3 juin 2015

  • [hal-01159001] Asymptotic behavior of compositions of under-relaxed nonexpansive operators
    In general there exists no relationship between the fixed point sets of the composition and of the average of a family of nonexpansive operators in Hilbert spaces. In this paper, we establish an asymptotic principle connecting the cycles generated by under-relaxed compositions of nonexpansive operators to the fixed points of the average of these operators. In the special case when the operators are projectors onto closed convex sets, we prove a conjecture by De Pierro which has so far been established only for projections onto affine subspaces.

vendredi 29 mai 2015

  • [hal-01157696] Asymptotic behavior of compositions of under-relaxed nonexpansive operators
    In general there exists no relationship between the fixed point sets of the composition and of the average of a family of nonexpansive operators in Hilbert spaces. In this paper, we establish an asymptotic principle connecting the cycles generated by under-relaxed compositions of nonexpansive operators to the fixed points of the average of these operators. In the special case when the operators are projectors onto closed convex sets, we prove a conjecture by De Pierro which has so far been established only for projections onto affine subspaces.

vendredi 22 mai 2015

    Résumé. — Ces dernières années, de nombreux modèles de graphes aléatoires ont été proposés pour extraire des informations à partir de réseaux dans des domaines variés. Le principe de ces modèles consiste à chercher des groupes de nœuds ayant des profils de connexion homogènes. La plupart de ces modèles sont adaptés pour des réseaux statiques ayant des arêtes binaires ou discrètes mais sans prendre en compte la dimension temporelle. Ce travail est motivé par la nécessité d'analyser un réseau dynamique décrivant les communications électroniques (e-mail) entre les employés de l'entreprise Enron où les positions sociales jouent un rôle important. Nous proposons dans cet article une extension au cadre dynamique du modèle de graphe aléatoire RSM qui a été récemment proposé pour modéliser à l'aide de groupes latents des réseaux statiques pour lesquels une partition en sous-graphes est connue. Notre approche est basée sur l'utilisation d'un state-space model pour modéliser l'évolution au cours du temps des proportions des groupes latents. Le modèle ainsi obtenu est appelé modèle de sous-graphes aléatoires dynamiques (dRSM) et un algorithme de type EM variationnel (VEM) est proposé pour en effectuer l'inférence. Nous montrons que les approximations variationnelles conduisent à un nouveau state-space model à partir duquel les paramètres ainsi que les états cachés peuvent être estimés en utilisant le filtre de Kalman et le Rauch-Tung-Striebel (RTS) smoother. La méthodologie est finalement appliquée au jeu des données d'e-mails de l'entreprise Enron et permet de mettre en évidence une réaction anticipée des cadres par rapport aux autres employés concernant le scandale à venir.

jeudi 19 mars 2015

  • [hal-01133175] Interpretable Aircraft Engine Diagnostic via Expert Indicator Aggregation
    Detecting early signs of failures (anomalies) in complex systems is one of the main goal of preventive maintenance. It allows in particular to avoid actual failures by (re)scheduling maintenance operations in a way that optimizes maintenance costs. Aircraft engine health monitoring is one representative example of a field in which anomaly detection is crucial. Manufacturers collect large amount of engine related data during flights which are used, among other applications, to detect anomalies. This article introduces and studies a generic methodology that allows one to build automatic early signs of anomaly detection in a way that builds upon human expertise and that remains understandable by human operators who make the final maintenance decision. The main idea of the method is to generate a very large number of binary indicators based on parametric anomaly scores designed by experts, complemented by simple aggregations of those scores. A feature selection method is used to keep only the most discriminant indicators which are used as inputs of a Naive Bayes classifier. This give an interpretable classifier based on interpretable anomaly detectors whose parameters have been optimized indirectly by the selection process. The proposed methodology is evaluated on simulated data designed to reproduce some of the anomaly types observed in real world engines.

jeudi 5 mars 2015

  • [hal-01122393] The Dynamic Random Subgraph Model for the Clustering of Evolving Networks
    In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these techniques are suitable for static networks, that is to say, not taking into account the temporal dimension. This work is motivated by the need of analyzing evolving networks where a decomposition of the networks into subgraphs is given. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximization (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to an update step which involves a new state space model from which the parameters along with the hidden states can be estimated using the standard Kalman filter and Rauch-Tung-Striebel (RTS) smoother. Simulated data sets are considered to assess the proposed methodology. Finally, dRSM along with the corresponding VEM algorithm are applied to an original maritime network built from printed Lloyd's voyage records.

mercredi 31 décembre 2014

  • [hal-01099026] Uniform Exponential Stability of Discrete Evolution Families on Space of $p$-Periodic Sequences

  • [hal-01099024] Criterion For The Exponential Stability of Discrete Evolution Family Over Banach Spaces
    Let $T(1)$ be the algebraic generator of the discrete semigroup $\textbf{T}=\{T(n)\}_{n\geq 0}$. We prove that the system $x_{n+1}=T(1)x_n$ is uniformly exponentially stable if and only if for each real number $\mu$ and each $q$-periodic sequence $z(n)$ with $z(0)=0$ the unique solution of the Cauchy Problem \begin{equation*} \left\{ \begin{split} % \nonumber to remove numbering (before each equation) x_{n+1} &= T(1)x_{n}+e^{i\mu(n+1)}z(n+1), \\ x_0&= 0 \end{split} \right.\eqno{(T(1), \mu,0)} \end{equation*} is bounded. We also extend the above result to $q$-periodic system $y_{n+1} = A_{n}y_{n}$ i.e. we proved that the system $y_{n+1} = A_{n}y_{n}$ is uniformly exponentially stable if and only if for each real number $\mu$ and each $q$-periodic sequence $z(n)$, with $z(0)=0$ the unique solution of the Cauchy Problem \begin{equation*} \left\{ \begin{split} % \nonumber to remove numbering (before each equation) y_{n+1} &= A_{n}y_{n}+e^{i\mu(n+1)}z(n+1), \\ y_0&= 0 \end{split} \right.\eqno{(A_{n}, \mu,0)} \end{equation*} is bounded. Here, $A_{n}$ is a sequence of bounded linear operators on Banach space $\mathcal{X}$.

  • [hal-01099019] New aspects of nonautonomous discrete systems stability
    We prove that a discrete evolution family ${\bf U}:=\{U(n,m):\; n\geq m\in \mathbb{Z}_+\}$ of bounded linear operators acting on a complex Banach space $X$ is uniformly esponentially stable if and only if for each forcing term $(f(n))_{n\in \mathbb{Z}_+}$ belonging to $AP_0(\mathbb{Z}_+, X)$, the solution of the discrete Cauchy Problem $$ \left\{ \begin{array}{lc} x(n+1)=A(n)x(n)+f(n), n\in \mathbb{Z}_+ \\ x(0)=0 \end{array} \right. $$ belongs to $AP_0(\mathbb{Z}_+, X)$, where the operators-valued sequence $(A(n))_{n\in \mathbb{Z}_+}$ generates the evolution family ${\bf U}$. The approach we use is based on the theory of discrete evolution semigroups associated to this family.

jeudi 4 décembre 2014

  • [hal-01090085] Mary Astell's words in A Serious Proposal to the Ladies (part I), a lexicographic inquiry with NooJ
    In the following article we elected to study with NooJ the lexis of a 17 th century text, Mary Astell's seminal essay, A Serious Proposal to the Ladies, part I, published in 1694. We first focused on the semantics to see how Astell builds her vindication of the female sex, which words she uses to sensitise women to their alienated condition and promote their education. Then we studied the morphology of the lexemes (which is different from contemporary English) used by the author, thanks to the NooJ tools we have devised for this purpose. NooJ has great functionalities for lexicographic work. Its commands and graphs prove to be most efficient in the spotting of archaic words or variants in spelling. Introduction In our previous articles, we have studied the singularities of 17 th century English within the framework of a diachronic analysis thanks to syntactical and morphological graphs and thanks to the dictionaries we have compiled from a corpus that may be expanded overtime. Our early work was based on a limited corpus of English travel literature to Greece in the 17 th century. This article deals with a late seventeenth century text written by a woman philosopher and essayist, Mary Astell (1666–1731), considered as one of the first English feminists. Astell wrote her essay at a time in English history when women were "the weaker vessel" and their main business in life was to charm and please men by their looks and submissiveness. In this essay we will see how NooJ can help us analyse Astell's rhetoric (what point of view does she adopt, does she speak in her own name, in the name of all women, what is her representation of men and women and their relationships in the text, what are the goals of education?). Then we will turn our attention to the morphology of words in the text and use NooJ commands and graphs to carry out a lexicographic inquiry into Astell's lexemes.

vendredi 28 novembre 2014

    Research work on texts often needs to rely upon the latest scientific methods to improve its results. When talking about old manuscripts, which make us go far behind, and often exhibit dark zones, the latest software techniques may help. For the scrutiny of one of our manuscripts, an Arabic translation of a now lost Greek Treatise, why not use the NooJ program ?We start by giving the content of the original Treatise. Then, by the analysis of words - their frequency, their spelling – and sentences, we strive to better understand the conditions of the Arabic copy carried out in the 9th Century, in the Middle East, bur only known by copies of the 13th Century performed in Iran.

mardi 25 novembre 2014

  • [hal-01086201] On isometries of Product of normed linear spaces.
    We give a condition on norms under which two vector normed spaces $X$ and $Y$ are isometrically isomorphic if and only if $X\times \R$ and $Y\times \R$ are isometrically isomorphic. We also prove that this result fail for arbitrary norms even if $X=Y=\R^2$ by building a generic counterexamples.

samedi 15 novembre 2014

  • [hal-01082632] A classification of isomorphisms and isometries on functions spaces.
    We establish in this article a formula which will allow to classify isometries as well as partial isometries between spaces of functions. Our result applies in a non compact framework and for abstract class of functions spaces included in the space of continuous and bounded functions on complete metric space. The result of this article is in particular a generalization of the Banach-Stone theorem. We use the techniques of differentiability as in the original proof of Banach for convex functions generalizing the norm of uniform convergence. We use also a duality result in a non convex framework.

mercredi 5 novembre 2014


Du site syndiqué

  • HAL : derniers dépôts du SAMM











MASHS 2014 Modèles et Apprentissage en Sciences Humaines et Sociales

ESANN 2014 : European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

WSOM 2014 - 10th Workshop on Self-Organizing Maps