HAL : derniers dépôts du SAMM

mercredi 10 février 2016

  • [hal-01263540] Modelling time evolving interactions in networks through a non stationary extension of stochastic block models
    The stochastic block model (SBM) describes interactions between nodes of a network following a probabilistic approach. Nodes belong to hidden clusters and the probabilities of interactions only depend on these clusters. Interactions of time varying intensity are not taken into account. By partitioning the whole time horizon, in which interactions are observed, we develop a non stationary extension of the SBM, allowing us to simultaneously cluster the nodes of a network and the fixed time intervals in which interactions take place. The number of clusters as well as memberships to clusters are finally obtained through the maximization of the complete-data integrated likelihood relying on a greedy search approach. Experiments are carried out in order to assess the proposed methodology.

mardi 9 février 2016

  • [hal-01270293] Is the corporate elite disintegrating? Interlock boards and the Mizruchi hypothesis
    This paper proposes an approach for comparing interlocked board networks over time to test for statistically significant change. In addition to contributing to the conversation about whether the Mizruchi hypothesis (that a disintegration of power is occurring within the corporate elite) holds or not, we propose novel methods to handle a longitudinal investigation of a series of social networks where the nodes undergo a few modifications at each time point. Methodologically, our contribution is twofold: we extend a Bayesian model hereto applied to compare two time periods to a longer time period, and we define and employ the concept of a hull of a sequence of social networks, which makes it possible to circumvent the problem of changing nodes over time.

mercredi 3 février 2016

  • [hal-01264886] Infinite-horizon problems under periodicity constraint
    We study soùe infinite-horizon optimization problems on spaces of periodic functions for non periodic Lagrangians. The main strategy relies on the reduction to finite horizon thanks in the introduction of an avering operator.We then provide existence results and necessary optimality conditions in which the corresponding averaged Lagrangian appears.

    In the infinite-horizon and discrete-time framework we establish maximum principles of Pontryagin under assumptions which weaker than these ones of existing results. We avoid several assumptions of continuity and of Fréchet-differentiability and of linear independence. MSC 2010: 49J21, 65K05, 39A99.

  • [hal-01265147] Limited operators and differentiability
    Given Banach spaces Y and X and a linear continuous operator T : Y −→ X, we prove that T is a limited operator if and only if, for every convex continuous function f : X −→ R and every point y ∈ Y , f • T is Fréchet-differentiable at y ∈ Y whenever f is Gâteaux-differentiable at T (y) ∈ X. Some consequences will be given.

mercredi 27 janvier 2016

  • [hal-01261122] Country-scale Exploratory Analysis of Call Detail Records through the Lens of Data Grid Models
    Call Detail Records (CDRs) are data recorded by telecommunications companies, consisting of basic informations related to several dimensions of the calls made through the network: the source, destination , date and time of calls. CDRs data analysis has received much attention in the recent years since it might reveal valuable information about human behavior. It has shown high added value in many application domains like e.g., communities analysis or network planning. In this paper, we suggest a generic methodology based on data grid models for summarizing information contained in CDRs data. The method is based on a parameter-free estimation of the joint distribution of the variables that describe the calls. We also suggest several well-founded criteria that allows one to browse the summary at various granularities and to explore the summary by means of insightful visualizations. The method handles network graph data, temporal sequence data as well as user mobility data stemming from original CDRs data. We show the relevance of our methodology on real-world CDRs data from Ivory Coast for various case studies, like network planning strategy and yield management pricing strategy.

vendredi 22 janvier 2016

  • [hal-01259983] Semiparametric stationarity and fractional unit roots tests based on data-driven multidimensional increment ratio statistics
    In this paper, we show that the central limit theorem (CLT) satisfied by the data-driven Multidimensional Increment Ratio (MIR) estimator of the memory parameter d established in Bardet and Dola (2012) for d ∈ (−0.5, 0.5) can be extended to a semiparametric class of Gaussian fractionally integrated processes with memory parameter d ∈ (−0.5, 1.25). Since the asymptotic variance of this CLT can be estimated, by data-driven MIR tests for the two cases of stationarity and non-stationarity, so two tests are constructed distinguishing the hypothesis d < 0.5 and d ≥ 0.5, as well as a fractional unit roots test distinguishing the case d = 1 from the case d < 1. Simulations done on numerous kinds of short-memory, long-memory and non-stationary processes, show both the high accuracy and robustness of this MIR estimator compared to those of usual semiparametric estimators. They also attest of the reasonable efficiency of MIR tests compared to other usual stationarity tests or fractional unit roots tests. Keywords: Gaussian fractionally integrated processes; semiparametric estimators of the memory parameter; test of long-memory; stationarity test; fractional unit roots test.

mercredi 20 janvier 2016

mercredi 13 janvier 2016

  • [hal-01254346] Maxima of Two Random Walks: Universal Statistics of Lead Changes
    We investigate statistics of lead changes of the maxima of two discrete-time random walks in one dimension. We show that the average number of lead changes grows as π ^(−1) ln t in the long-time limit. We present theoretical and numerical evidence that this asymptotic behavior is universal. Specifically, this behavior is independent of the jump distribution: the same asymptotic underlies standard Brownian motion and symmetric Lévy flights. We also show that the probability to have at most n lead changes behaves as t^(−1/4) (ln t)^n for Brownian motion and as t ^(−β(µ)) (ln t)^n for symmetric Lévy flights with index µ. The decay exponent β ≡ β(µ) varies continuously with the Lévy index when 0 < µ < 2, while β = 1/4 for µ > 2.

samedi 9 janvier 2016

  • [hal-01253191] Pontryagin principle for a Mayer problem governed by a delay functional differential equation
    We establish Pontryagin principles for a Mayer's optimal control problem governed by a functional differential equation. The control functions are piecewise continuous and the state functions are piecewise continuously differentiable. To do that, we follow the method created by Philippe Michel for systems governed by ordinary differential equations, and we use properties of the resolvent of a linear functional differential equation.

  • [hal-01253186] Pontryagin principle for a Mayer problem governed by a delay functional differential equation
    We establish Pontryagin principles for a Mayer's optimal control problem governed by a functional differential equation. The control functions are piecewise continuous and the state functions are piecewise continuously differentiable. To do that, we follow the method created by Philippe Michel for systems governed by ordinary differential equations, and we use properties of the resolvent of a linear functional differential equation.

dimanche 3 janvier 2016

mardi 22 décembre 2015

vendredi 11 décembre 2015

  • [hal-01122393] The Dynamic Random Subgraph Model for the Clustering of Evolving Networks
    In recent years, many clustering methods have been proposed to extract information from networks. The principle is to look for groups of vertices with homogenous connection profiles. Most of these techniques are suitable for static networks, that is to say, not taking into account the temporal dimension. This work is motivated by the need of analyzing evolving networks where a decomposition of the networks into subgraphs is given. Therefore, in this paper, we consider the random subgraph model (RSM) which was proposed recently to model networks through latent clusters built within known partitions. Using a state space model to characterize the cluster proportions, RSM is then extended in order to deal with dynamic networks. We call the latter the dynamic random subgraph model (dRSM). A variational expectation maximization (VEM) algorithm is proposed to perform inference. We show that the variational approximations lead to an update step which involves a new state space model from which the parameters along with the hidden states can be estimated using the standard Kalman filter and Rauch-Tung-Striebel (RTS) smoother. Simulated data sets are considered to assess the proposed methodology. Finally, dRSM along with the corresponding VEM algorithm are applied to an original maritime network built from printed Lloyd's voyage records.

vendredi 27 novembre 2015

  • [hal-01232672] Using SOMbrero for clustering and visualizing graphs
    Graphs have attracted a burst of attention in the last years, with applications to social science, biology, computer science... In the present paper, we illustrate how self-organizing maps (SOM) can be used to enlighten the structure of the graph, performing clustering of the graph together with visualization of a simplified graph. In particular, we present the R package SOMbrero which implements a stochastic version of the so-called relational algorithm: the method is able to process any dissimilarity data and several dissimilarities adapted to graphs are described and compared. The use of the package is illustrated on two real-world datasets: one, included in the package itself, is small enough to allow for a full investigation of the influence of the choice of a dissimilarity to measure the proximity between the vertices on the results. The other example comes from an application in biology and is based on a large bipartite graph of chemical reactions with several thousands vertices.

mardi 10 novembre 2015

  • [tel-01225739] Outils statistiques de traitement d'indicateurs pour le diagnostic et le pronostic des moteurs d'avions
    Détecter les signes d'anomalies dans un système complexe est l'un des principaux objectifs de la maintenance préventive dans l'industrie. Cela permet d’éviter une défaillance ou de limiter les dégradations d'un composant en avançant une opération de maintenance. Le \textit{Health Monitoring} des moteurs d'avions fait partie des domaines industriels pour lesquels cette détection d'anomalies est un enjeu fort. Ainsi, les motoristes, tels que Snecma, collectent de grandes quantités de données relatives au moteur durant chaque vol. Il s'agit de détecter automatiquement, à partir de ces données, les cas où un moteur dévie de son comportement normal. Plus précisément, Snecma développe des applications permettant de prévenir les pannes moteurs en détectant les anomalies. Cette thèse présente comment le savoir des experts de Snecma est exploité pour traiter ces données moteurs. Ce premier travail a permis de mettre en avant les difficultés liées aux traitements des données : qu'il s'agisse des difficultés concernant le stockage des données ou bien des difficultés liées à la définition des algorithmes de traitement eux-mêmes. Ensuite, la thèse propose une méthodologie permettant de combiner le savoir expert à des méthodes d'apprentissage automatique tout en respectant les exigences d'un motoriste tel que Snecma. Parmi celles-ci, on peut citer le besoin de fusionner des informations variées, le contrôle des erreurs et l'interprétabilité des résultats de diagnostic. Pour cela, la méthodologie exploite directement les données issues des algorithmes de traitement développées par les experts eux-mêmes. Cela est rendu possible par une nécessaire homogénéisation des données, autrement dit par une mise en forme commune de celles-ci permettant alors de procéder à leur fusion. L'homogénéisation des données rend possible l'utilisation des algorithmes de classification (supervisée) dont le but est de regrouper automatiquement, en classe, les individus (ici les moteurs) de même nature à partir des informations fournies et sans perdre l'information temporelle. L'homogénéisation des données permet également d'exploiter directement les applications de surveillance mises en place par les experts métier pour détecter les anomalies. De cette façon, la méthodologie mise à disposition par la thèse reste compréhensible par les experts métier. Avant de procéder effectivement à la fusion, un algorithme de sélection de variables est utilisé. La thèse décrit comment le processus de sélection permet une calibration automatique des applications de surveillance développées par les experts métier. De plus, cette sélection permet de répondre en partie à la première exigence de Snecma concernant l'interprétabilité des résultats. En définitive, la méthodologie présentée dans cette thèse a pour but d'aider Snecma à faire converger les labels des anomalies pour l'ensemble de ses utilisateurs. Elle vise également à faciliter et à inciter la mise en place d'une seule et même base de données regroupant : d'une part toutes les mesures et leurs transformations prélevées sur les moteurs et d'autre part les informations relatives aux moteurs pouvant être pertinentes telles que les résultats d'analyse des experts ou les dates de changement de pièces. La base de données ainsi exploitable, cette thèse peut alors proposer un outil de labellisation qui pourra être utilisé pour améliorer, à travers la labellisation des données, les algorithmes de sélection et de classification supervisés.

samedi 7 novembre 2015

    We establish necessary conditions of optimality for discrete-time infinite-horizon optimal control in presence of constraints at infinity. These necessary conditions are in form of weak and strong Pontryagin principles. We use a functional analytic framework and multipliers rules in Banach (sequence) spaces. We establish new properties on Nemytskii operators in sequence spaces. We also provide sufficient conditions of optimality. MSC 2010: 49J21, 65K05, 39A99.

jeudi 5 novembre 2015

  • [hal-01222395] Study of a bias in the offline evaluation of a recommendation algorithm
    Recommendation systems have been integrated into the majority of large online systems to filter and rank information according to user profiles. It thus influences the way users interact with the system and, as a consequence, bias the evaluation of the performance of a recommendation algorithm computed using historical data (via offline evaluation). This paper describes this bias and discuss the relevance of a weighted offline evaluation to reduce this bias for different classes of recommendation algorithms.

  • [hal-01222649] Co-Clustering Network-Constrained Trajectory Data
    Recently, clustering moving object trajectories kept gaining interest from both the data mining and machine learning communities. This problem, however, was studied mainly and extensively in the setting where moving objects can move freely on the euclidean space. In this paper, we study the problem of clustering trajectories of vehicles whose movement is restricted by the underlying road network. We model relations between these trajectories and road segments as a bipartite graph and we try to cluster its vertices. We demonstrate our approaches on synthetic data and show how it could be useful in inferring knowledge about the flow dynamics and the behavior of the drivers using the road network.

  • [hal-01222403] Lasso based feature selection for malaria risk exposure prediction
    In life sciences, the experts generally use empirical knowledge to recode variables, choose interactions and perform selection by classical approach. The aim of this work is to perform automatic learning algorithm for variables selection which can lead to know if experts can be help in they decision or simply replaced by the machine and improve they knowledge and results. The Lasso method can detect the optimal subset of variables for estimation and prediction under some conditions. In this paper, we propose a novel approach which uses automatically all variables available and all interactions. By a double cross-validation combine with Lasso, we select a best subset of variables and with GLM through a simple cross-validation perform predictions. The algorithm assures the stability and the the consistency of estimators.

jeudi 22 octobre 2015

  • [hal-01217172] Model selection and clustering in stochastic block models with the exact integrated complete data likelihood
    The stochastic block model (SBM) is a mixture model for the clustering of nodes in networks. The SBM has now been employed for more than a decade to analyze very different types of networks in many scientific fields, including biology and the social sciences. Recently, an analytical expression based on the collapsing of the SBM parameters has been proposed, in combination with a sampling procedure that allows the clustering of the vertices and the estimation of the number of clusters to be performed simultaneously. Although the corresponding algorithm can technically accommodate up to 10 000 nodes and millions of edges, the Markov chain, however, tends to exhibit poor mixing properties, that is, low acceptance rates, for large networks. Therefore, the number of clusters tends to be highly overestimated, even for a very large number of samples. In this article, we rely on a similar expression, which we call the integrated complete data log likelihood, and propose a greedy inference algorithm that focuses on maximizing this exact quantity. This algorithm incurs a smaller computational cost than existing inference techniques for the SBM and can be employed to analyze large networks (several tens of thousands of nodes and millions of edges) with no convergence problems. Using toy datasets, the algorithm exhibits improvements over existing strategies, both in terms of clustering and model selection. An application to a network of blogs related to illustrations and comics is also provided.

mardi 13 octobre 2015

  • [hal-01214534] Les magistrats entrés par une voie latérale
    En France, le recrutement des magistrats de l’ordre judiciaire s’effectue, pour l’essentiel, par concours, avec, outre une voie principale réservée aux étudiants, trois autres types de concours : ceux destinés aux fonctionnaires, aux juristes du privé expérimentés, ainsi que, plus ponctuellement, des concours exceptionnels ou complémentaires. D’autres modalités de recrutements sur titres consistent à intégrer directement des professionnels expérimentés comme « auditeurs de justice » ou directement comme magistrats. 26% des magistrats de l’ordre judiciaire sont entrés dans le corps par une de ces voies dites « latérale ». La recherche porte sur les enjeux attachés au recrutement des magistrats par ces diverses « voies latérales », en mobilisant les données administratives disponibles et en procédant à une vaste enquête spécifique. Elle s’intéresse particulièrement aux carrières et itinéraires professionnels des magistrats concernés : en quoi sont-ils différents des autres magistrats ? Quelles ont été leurs motivations et quelles sont leurs appréciations quant à leur carrière et leurs activités ? Les diverses filières d’accès au corps conservent-elles durablement des singularités ? S’agit-il in fine d’une diversification des profils répondant à ce que souhaitent les promoteurs de ces filières supposées rapprocher la magistrature des justiciables?


Du site syndiqué

  • HAL : derniers dépôts du SAMM











ESANN 2016 : European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning


ICOR 2016