# Laboratoire de Probabilités, Statistique et Modélisation

### Présentation

Le Laboratoire de Probabilités, Statistique et Modélisation, dans sa forme actuelle, a résulté, au 1er janvier 1999, de la fusion de l'ancien Laboratoire de probabilité de l'université Paris 6 avec l'équipe de Probabilités et statistique de l'université Paris Diderot.

Le laboratoire compte environ 70 enseignants-chercheurs permanents, 50 thésards, une équipe administrative de 6 personnes. Il accueille de plus les activités de deux masters deuxième année, ce qui représente plus de 200 étudiants chaque année.

La thématique du laboratoire s'inscrit dans le domaine des mathématiques appliquées et a pour objet la modélisation, la description et l'estimation des phénomènes aléatoires. Les thèmes de recherche abordés ici concernent des domaines très variés et recouvrent aussi bien des mathématiques fondamentales que des applications dans des domaines aussi divers que la médecine, les sciences humaines, l'astrophysique, les assurances ou la finance...

### Thèmes de recherche

#### 1. Théorie ergodique et systèmes dynamiques

#### 2. Modélisation stochastique

#### 3. Mouvement brownien et calcul stochastique

#### 4. Statistiques

### Equipes de recherche

Le laboratoire comprend six équipes :

- Théorie ergodique et systèmes dynamiques,
- Modélisation stochastique,
- Mouvement brownien et calcul stochastique,
- Statistique,
- Probabilités numériques et mathématiques financières,
- Probabilités-statistiques-biologie.

### [hal-00481055] Report card and indicators of quality in the Seine Estuary: from a scientific approach to an operational tool.

Date: 5 mai 2010 - 19:19

Desc: [...]

### [hal-03796030] From individual-based epidemic models to McKendrick-von Foerster PDEs: a guide to modeling and inferring COVID-19 dynamics

Date: 4 oct 2022 - 12:51

Desc: We present a unifying, tractable approach for studying the spread of viruses causing complex diseases that require to be modeled using a large number of types (e.g., infective stage, clinical state, risk factor class). We show that recording each infected individual's infection age, i.e., the time elapsed since infection, 1. The age distribution $n(t, a)$ of the population at time $t$ can be described by means of a first-order, one-dimensional partial differential equation (PDE) known as the McKendrick-von Foerster equation. 2. The frequency of type $i$ at time $t$ is simply obtained by integrating the probability $p(a, i)$ of being in state $i$ at age a against the age distribution $n(t, a)$. The advantage of this approach is three-fold. First, regardless of the number of types, macroscopic observables (e.g., incidence or prevalence of each type) only rely on a one-dimensional PDE "decorated" with types. This representation induces a simple methodology based on the McKendrick-von Foerster PDE with Poisson sampling to infer and forecast the epidemic. We illustrate this technique using a French data from the COVID-19 epidemic. Second, our approach generalizes and simplifies standard compartmental models using high-dimensional systems of ordinary differential equations (ODEs) to account for disease complexity. We show that such models can always be rewritten in our framework, thus, providing a low-dimensional yet equivalent representation of these complex models. Third, beyond the simplicity of the approach, we show that our population model naturally appears as a universal scaling limit of a large class of fully stochastic individual-based epidemic models, here the initial condition of the PDE emerges as the limiting age structure of an exponentially growing population starting from a single individual.

### [hal-01519688] Nonlinear projection methods for visualizing Barcode data and application on two data sets

Date: 9 mai 2017 - 11:17

Desc: Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self-organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species.

### [inserm-00663565] DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

Date: 27 jan 2012 - 13:15

Desc: BACKGROUND: DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. RESULTS: No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. CONCLUSION: The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.

### [hal-02881014] Lymphopoiesis in transgenic mice over-expressing Artemis

Date: 25 juin 2020 - 14:25

Desc: Artemis is a factor of the non-homologous end joining pathway involved in DNA double-strand break repair that has a critical role in V(D)J recombination. Mutations in DCLRE1C/ARTEMIS gene result in radiosensitive severe combined immunodeficiency in humans owing to a lack of mature T and B cells. Given the known drawbacks of allogeneic hematopoietic stem cell transplantation (HSCT), gene therapy appears as a promising alternative for these patients. However, the safety of an unregulated expression of Artemis has to be established. We developed a transgenic mouse model expressing human Artemis under the control of the strong CMV early enhancer/chicken beta actin promoter through knock-in at the ROSA26 locus to analyze this issue. Transgenic mice present a normal development, maturation and function of T and B cells with no signs of lymphopoietic malignancies for up to 15 months. These results suggest that the over-expression of Artemis in mice (up to 40 times) has no deleterious effects in early and mature lymphoid cells and support the safety of gene therapy as a possible curative treatment for Artemis-deficient patients.

### Autres contacts

U.F.R. Mathématiques

Sophie-Germain

75013 PARIS