Seminários & Videos do DEST – Departamento de Estatística

Seminários do DEST

Vídeos mais Recentes do Canal

Seminários Online sobre a COVID-19

Os seminários foram organizados pelos professores Luiz H. Duczmal e Glaura C. Franco.

Os seminários estão disponíveis em nosso canal do Youtube “Video conferencia do DEST”.

Lista de Seminários do DEST

ANO DE 2024 – 2º SEMESTRE

29/10/2024 às 13:00h – Local: Sala 2076 – ICEx/UFMG

Denis Rustand (KAUST, Arábia Saudita).

Título: Fast, accurate, and flexible Bayesian survival modeling with the R package INLAjoint.

Resumo: This presentation introduces INLAjoint, a user-friendly R package that simplifies the fitting of various survival models using the computationally efficient Integrated Nested Laplace Approximations (INLA) method. INLA offers a significant speed advantage over traditional Markov Chain Monte Carlo (MCMC) methods while maintaining accuracy in parameter estimation. INLAjoint supports a wide range of survival models, including proportional hazards, multi-state, and joint models for multivariate longitudinal and survival data. Joint models, which link multiple regression submodels through correlated or shared random effects, can be computationally intensive. In this context, we underscore the significant reduction in computation time achieved by INLA when compared to MCMC, without compromising on accuracy. Beyond model fitting, the talk provides practical guidance on using the INLAjoint R package, including detailed syntax examples. A key application of joint models is dynamic prediction, which involves estimating the risk of an event (e.g., death or disease progression) based on changes in longitudinal outcomes over time. INLAjoint enables the estimation of dynamic risk predictions and can incorporate updates to these predictions as new longitudinal data becomes available. This makes INLAjoint a valuable tool for analyzing complex health data.

25/10/2024 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Heitor Soares Ramos Filho (Departamento de Ciência da Computação, UFMG).

Título: Deep metric learning e aplicações.

Resumo: A palestra irá abordar o conceito de Deep Metric Learning (DML), uma subárea do aprendizado de máquina que se concentra em aprender representações de dados em um espaço métrico. O principal objetivo é otimizar a similaridade entre instâncias semelhantes e maximizar a distância entre instâncias distintas, utilizando redes neurais profundas. Nessa palestra, iremos introduzir alguns conceitos básicos e aplicações, enfatizando algumas contribuições do nosso grupo de pesquisa para a área.

11/10/2024 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Vinícius Diniz Mayrink (Departamento de Estatística, UFMG).

Título: Spatial functional data analysis: irregular spacing and Bernstein polynomials.

Resumo: Spatial Functional Data (SFD) analysis is an emerging statistical framework that combines Functional Data Analysis (FDA) and spatial dependency modeling. Unlike traditional statistical methods, which treat data as scalar values or vectors, SFD considers data as continuous functions, allowing for a more comprehensive understanding of their behavior and variability. This approach is well-suited for analyzing data collected over time, space, or any other continuous domain. SFD has found applications in various fields, including economics, finance, medicine, environmental science, and engineering. This study proposes new functional Gaussian models incorporating spatial dependence structures, focusing on irregularly spaced data and reflecting spatially correlated curves. The model is based on Bernstein polynomial (BP) basis functions and utilizes a Bayesian approach for estimating unknown quantities and parameters. The study explores the advantages and limitations of the BP model in capturing complex shapes and patterns while ensuring numerical stability. The main contributions of this work include the development of an innovative model designed for SFD using BP, the presence of a random effect to address associations between irregularly spaced observations, and a comprehensive simulation study to evaluate models’ performance under various scenarios. The work also presents one real application of Temperature in Mexico City, showcasing practical illustrations of the proposed model. This is a joint work with Alexander Burbano-Moreno.

04/10/2024 às 13:30h – Local: Zoom e Canal Youtube Seminários DEST-UFMG

Renata Rojas Guerra (Departamento de Estatística, UFSM).

Título: Modelo Rayleigh de escore autorregressivo generalizado para interpretação de dados de imagens SAR.

Resumo: Este trabalho introduz o modelo Rayleigh de escore autorregressivo generalizado (Ray-GAS), um modelo dinâmico útil para a interpretação de dados de radar de abertura sintética (SAR). Ele é derivado da estrutura de escore autorregressivo generalizado (GAS), assumindo que a média condicional da distribuição Rayleigh é um parâmetro que varia conforme o índice da imagem. São desenvolvidas ferramentas de estimação, diagnóstico e previsão para o novo modelo. Além disso, realizamos experimentos numéricos com dados simulados de amplitude de uma imagem SAR single-look para dados de regiões de floresta e lago. Os resultados ilustram a utilidade do modelo Ray-GAS para a compreensão do comportamento estocástico e para a filtragem de retornos de amplitude SAR. Este é um trabalho conjunto com Miguel R. Pena-Ramirez e Fábio M. Bayer.

ANO DE 2024 – 1º SEMESTRE

21/06/2024 às 13:30h – Local: Zoom e Canal Youtube Seminários DEST-UFMG

James Sweeney (University of Limerick, Irlanda).

Título: What is the impact of postcodes on Dublin house prices?

Resumo: Accurate and efficient valuation of property is of utmost importance in a variety of settings, including when securing mortgage finance to purchase a property, or where residential property taxes are set as a percentage of a property’s resale value. Internationally, resale based property taxes are most common due to ease of implementation and the difficulty of establishing site values. In an Irish context, property valuations are currently based on comparison to recently sold neighbouring properties. However, this approach is limited by low property turnover. National property taxes based on property value, as opposed to site value, also act as a disincentive to undertake improvement works due to the ensuing increased tax burden. We have developed a spatial hedonic regression model that separates the spatial and non-spatial contributions of property features to resale value. We mitigate the issue of low property turnover through geographic correlation, borrowing information across multiple property types and finishes. We investigate the impact of address mislabelling on predictive performance, where vendors erroneously have given a more affluent postcode, and evaluate the contribution of improvement works to increased values. Our flexible geo-spatial model outperforms all competitors across a number of different evaluation metrics, including the accuracy of both price prediction and associated uncertainty intervals. While our models are applied in an Irish context, the ability to accurately value properties in markets with low property turnover and to quantify the value contributions of specific property features has widespread application. The ability to separate spatial and non-spatial contributions to a property’s value also provides an avenue to site-value based property taxes.

14/06/2024 às 13:30h – Local: Canal Youtube Seminários DEST-UFMG

Aritra Halder (Drexel University, EUA).

Título: Bayesian Modeling with Spatial Curvature Processes

Resumo: Spatial process models are widely used for modeling point-referenced variables arising from diverse scientific domains. Analyzing the resulting random surface provides deeper insights into the nature of latent dependence within the studied response. We develop Bayesian modeling and inference for rapid changes on the response surface to assess directional curvature along a given trajectory. Such trajectories or curves of rapid change, often referred to as wombling boundaries, occur in geographic space in the form of rivers in a flood plain, roads, mountains or plateaus or other topographic features leading to high gradients on the response surface. We demonstrate fully model based Bayesian inference on directional curvature processes to analyze differential behavior in responses along wombling boundaries. We illustrate our methodology with a number of simulated experiments followed by multiple applications featuring the Boston Housing data; Meuse river data; and temperature data from the Northeastern United States. Supplementary materials for this article are available online.

07/06/2024 às 13:30h – Local: Zoom e Canal Youtube Seminários DEST-UFMG

Raquel Borges (Intel Corporation, EUA).

Título: Generalized predictive comparisons for complex model interpretation.

Resumo: Machine learning algorithms and models constitute the dominant set of predictive methods for a wide range of complex, real-world processes and domains. However, in general, it is difficult to interpret and validate the patterns and insights inferred by the models. We propose a methodology based on generalized predictive comparisons to interpret multiple inputs and interesting functional forms of them to learn and interpret underlying relationships between inputs and the outcome that are inferred by complex models. We demonstrate the broad scope and significance of our generalized predictive comparison methodology by illustrative simulations and case studies.

24/05/2024 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Guilherme Moura (Depto. de Economia e Relações Internacionais, UFSC).

Título: Regularized Autoregressive Wishart Stochastic Volatility.

Resumo: We introduce an extension to Uhlig’s (1994) Wishart Stochastic Volatility model, designed to regularize its covariance predictions toward a specified prior reference matrix. This regularization ensures the stationarity of the observed process and stabilizes the eigenvalues of the predictions. Our method maintains closed-form sequential updating formulas for filtering, prediction, and likelihood evaluation, facilitating practical implementation. Furthermore, we enhance the variance discounting scheme inherent in such models to accommodate varying forgetting rates over time and across different directions in the vector space of observations via directional forgetting. In an empirical portfolio selection application involving up to 1,000 assets, we demonstrate the potential of our proposed approach. It effectively stabilizes the eigenvalues of covariance matrix predictions and generates portfolios with lower risk compared to several benchmark models.

10/05/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Xia Wang (University of Cincinnati, EUA).

Título: Variable selection for zero-inflated Poisson regression model.

Resumo: The study implements an efficient algorithm for variable selection in the zero-inflated count regression model based on Polya-Gamma latent variables. This leads to a closed form posterior conditional distribution under a logistic link function in modeling the excessive zeros, which helps overcome the computational disadvantage of the logistic link compared to a probit link. Simulation studies examines the efficacy of the proposed model in selecting important variables as well as how the choice of link functions, between the two commonly used probit and the logit links, influences the variable selection results. The proposed model and its comparison with other methods are also illustrated through the application to a German Healthcare dataset. This is a joint work with Haichao Zhang.

03/05/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Havard Rue (KAUST, Arábia Saudita).

Título: Cross-validation for dependent data.

Resumo: I will discuss our new take on cross-validation (CV) for dependent data. Traditional use of CV, like leave-one-out CV, is justified using independence-like assumptions. With dependent data, then leave-one-out CV make less sense, as we are evaluating interpolation properties rather than prediction properties. We can adapt the CV idea to dependent data, by removing a set of “near-by” data-points (to be defined), before predicting, but the issue is then how to do this in practice, which is less evident for more involved models. I will discuss our approach in the context of Latent Gaussian Models (LGM) where we can automatically can select appropriate groups of data to remove before predicting one data point. I will also discuss some new results about group-CV for log-Gaussian Cox processes. The new group-CV approach is available in the R-INLA package.

26/04/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Marcelo Bourguignon Pereira (Departamento de Estatística, UFRN).

Título: The weighted beta regression for modeling bounded data.

Resumo: A two-parameter weighted beta distribution is introduced for modeling bounded data, which has many similarities to the beta distribution. We propose a class of regression models where the response is weighted beta distributed and the two shape parameters that index weighted the beta distribution are related to covariates and regression parameters. The proposed regression model is a natural strong competitor of the beta regression model. We study mathematical and statistical properties of the distribution and we provide a useful interpretation of the parameters. The maximum likelihood method is used for estimating the model parameters. Simulation studies are conducted to investigate the performance of the maximum likelihood estimators and the asymptotic confidence intervals of the parameters. An application of the proposed regression model to real bounded data is presented. Trabalho em conjunto com os professores Diego I. Gallardo (Universidad del Bío-Bío) e Roberto Vila (UnB).

19/04/2024 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Adrian P. H. Luna (Departamento de Estatística, UFMG).

Título: Redes Neurais Gráficas GGN.

Resumo: As redes neurais são parte dos métodos de inteligência artificial (IA) mais populares no solução de problemas complexos onde temos muita informação disponível. Parte importante destes métodos é composta pelas redes de convolução. Redes neurais gráficas, GGN, são modelos de redes que tem operadores de convolução gráfica e servem para o caso de ter a informação estruturada na forma de grafos, por exemplo informação espacial, estruturas semânticas, redes de colaboração, etc. Resultados recentes sobre o comportamento assintótico das redes neurais para a determinação da velocidade de convergência as soluções das redes usam aproximações das redes neurais aos modelos da mecânica estatística, o que nos permite caracterizar melhor o problema da não convexidade múltipla associado as redes neurais, e como se reflete isso nas redes GGN. Vamos apresentar também uma aplicação das redes neurais, usando um modelo de GGN para a previsão epidemiológica da dengue no campus da UFMG.

12/04/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Matthias Katsfuss (University of Wisconsin-Madison, EUA).

Título: Probabilistic function estimation via nearest-neighbor directed acyclic graphs.

Resumo: We consider probabilistic inference on continuous functions or fields, such as time series, geospatial fields, response surfaces of computer models, or regression functions. Gaussian processes (GPs) are popular models for such applications, but Gaussian assumptions are too restrictive in many settings. Sparse autoregressive structures corresponding to nearest-neighbor directed acyclic graphs (NN-DAGs) can lead to scalable, accurate, and flexible inference. We provide a number of examples, including so-called Vecchia approximations of GPs, and autoregressive GPs for learning high-dimensional spatial distributions from a small number of training samples (e.g., for climate-model emulation). When the function of interest is latent, we propose a novel framework for variational inference targeting its potentially non-Gaussian posterior. We make NN-DAG assumptions for both the prior and variational families, with highly expressive conditional distributions in the variational family. Scalable model fitting can be achieved via doubly stochastic variational optimization with polylogarithmic time complexity per iteration based on reduced ancestor sets.

05/04/2024 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Uriel M. Silva (Departamento de Estatística, UFMG).

Título: Tópicos em Inferência para Modelos de Espaço de Estados via SMC.

Resumo: Neste seminário será apresentada a teoria básica de inferência via SMC (Sequential Monte Carlo, também conhecidos como Filtros de Partículas) em modelos de Espaço de Estados, assim como alguns problemas em aberto na área. Serão discutidos aspectos de inferência Clássica e Bayesiana para os parâmetros dessa classe de modelos, e em particular a possibilidade de implementação de algoritmos do tipo RMHMC (Riemannian Manifold Hamiltonian Monte Carlo).

22/03/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Ying MacNab (University of British Columbia, Canada).

Título: On Gaussian Markov random field, spatial dependence representation, and local influence function.

Resumo: Gaussian Markov random fields (GMRF) and their multivariate extensions (MGMRFs) are powerful tools for modeling probabilistic interactions of directly related variables. As an important category of graphical models, they are commonly used in spatial statistics (e.g., disease mapping, small area estimation, spatial ecology) and Bayesian statistics, and their applications and potentials of application are far-reaching (e.g., artificial intelligence, deep learning, image processing, computer vision, spatial biology). In this presentation, I give an overview of my recent work on spatial dependence representations for selected (adaptive) (M)GMRF parameterizations and introduce the notions of (a)symmetric local influence, cross-local influence, and associated local and cross-local influence functions. Some recent applications in the contexts of Bayesian (spatial, multivariate, and spatiotemporal) disease mapping and small-area estimation will be presented.

15/03/2024 às 13:30h – Local: Zoom e Canal do Youtube Seminários DEST-UFMG

Pedro Luiz Ramos (PUC, Chile)

Título: Asymptotic properties of generalized closed-form maximum likelihood estimators.

Resumo: The maximum likelihood estimator (MLE) is pivotal in statistical inference, yet its application is often hindered by the absence of closed-form solutions for many models. This poses challenges in real-time computation scenarios, particularly within embedded systems technology, where numerical methods are impractical. This study introduces a generalized form of the MLE that yields closed-form estimators under certain conditions. We derive the asymptotic properties of the proposed estimator and demonstrate that our approach retains key properties such as invariance under one-to-one transformations, strong consistency, and an asymptotic normal distribution. The effectiveness of the generalized MLE is exemplified through its application to the Gamma, Nakagami, and Beta distributions, showcasing improvements over the traditional MLE. Additionally, we extend this methodology to a bivariate gamma distribution, successfully deriving closed-form estimators. This advancement presents significant implications for real-time statistical analysis across various applications.

ANO DE 2023 – 2º SEMESTRE

24/11/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Samuel Faria Cândido (Doutorando, DEST/UFMG)

Título: Bayesian Nonstationary and Nonparametric Covariance Estimation for Large Spatial Data.

Resumo: In spatial statistics, it is often assumed that the spatial field of interest is stationary and its covariance has a simple parametric form, but these assumptions are not appropriate in many applications. Given replicate observations of a Gaussian spatial field, we propose nonstationary and nonparametric Bayesian inference on the spatial dependence. Instead of estimating the quadratic (in the number of spatial locations) entries of the covariance matrix, the idea is to infer a near-linear number of nonzero entries in a sparse Cholesky factor of the precision matrix. Our prior assumptions are motivated by recent results on the exponential decay of the entries of this Cholesky factor for Matern-type covariances under a specific ordering scheme. Our methods are highly scalable and parallelizable. We conduct numerical comparisons and apply our methodology to climate-model output, enabling statistical emulation of an expensive physical model. Reference: Kidd B. and Katzfuss M. (2022) Bayesian Analysis, 17, 1, 291-351.

17/11/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Maíra Soalheiro (Doutoranda, DEST/UFMG)

Título: Modelling for Poisson process intensities over irregular spatial domains.

Resumo: We develop nonparametric Bayesian modelling approaches for Poisson processes, using weighted combinations of structured beta densities to represent the point process intensity function. For a regular spatial domain, such as the unit square, the model construction implies a Bernstein-Dirichlet prior for the Poisson process density, which supports general inference for point process functionals. The key contribution of the methodology is two classes of flexible and computationally efficient models for spatial Poisson process intensities over irregular domains. We address the choice or estimation of the number of beta basis densities and develop methods for prior specification and posterior simulation for full inference about functionals of the point process. The methodology is illustrated with both synthetic and real data sets. Reference: Zhao C. and Kottas A. (2021) Preprint arXiv:2106.04654

10/11/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Abhirup Datta (Department of Biostatistics, Johns Hopkins University, EUA)

Título: Combining machine learning with Gaussian processes for geospatial data.

Resumo: Spatial generalized linear mixed models, consisting of a linear covariate effect and a Gaussian Process (GP) distributed spatial random effect, are widely used for analyses of geospatial data. We consider the setting where the covariate effect is non-linear and propose modeling it using a flexible machine learning algorithm like random forests or deep neural networks. We propose well-principled extensions of these methods, for estimating non-linear covariate effects in spatial mixed models where the spatial correlation is still modeled using GP. The basic principle is guided by how ordinary least squares extends to generalized least squares for linear models to account for dependence. We demonstrate how the same extension can be done for these machine learning approaches like random forests and neural networks. We provide extensive theoretical and empirical support for the methods and show how they fare better than naïve or brute-force approaches to use machine learning algorithms for spatially correlated data. We demonstrate the RandomForestsGLS R-package that implements this extension for random forests.

20/10/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Fernando A. Quintana (PUC, Chile).

Título: Childhood Obesity in Singapore: a Bayesian Nonparametric Approach

Resumo: Overweight and obesity in adults are known to be associated with increased risk of metabolic and cardiovascular diseases. Obesity has now reached epidemic proportions, increasingly affecting children. Therefore, it is important to understand if this condition persists from early life to childhood and if different patterns can be detected to inform intervention policies. Our motivating application is a study of temporal patterns of obesity in children from South Eastern Asia. Our main focus is on clustering obesity patterns after adjusting for the effect of baseline information. Specifically, we consider a joint model for height and weight over time. Measurements are taken every six months from birth. To allow for data-driven clustering of trajectories, we assume a vector autoregressive sampling model with a dependent logit stick-breaking prior. Simulation studies show good performance of the proposed model to capture overall growth patterns, as compared to other alternatives. We also fit the model to the motivating dataset, and discuss the results, in particular highlighting cluster differences. We have found four large clusters, corresponding to children sub-groups, though two of them are similar in terms of both height and weight at each time point. We provide an interpretation of these clusters in terms of combinations of predictors.

29/09/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Luis Carvalho (Boston University, EUA)

Título: Daviance matrix factorization.

Resumo: We investigate a general matrix factorization for deviance-based data losses, extending the ubiquitous singular value decomposition beyond squared error loss. While similar approaches have been explored before, our method leverages classical statistical methodology from generalized linear models (GLMs) and provides an efficient algorithm that is flexible enough to allow for structural zeros via entry weights. Moreover, by adapting results from GLM theory, we provide support for these decompositions by (i) showing strong consistency under the GLM setup, (ii) checking the adequacy of a chosen exponential family via a generalized Hosmer-Lemeshow test, and (iii) determining the rank of the decomposition via a maximum eigenvalue gap method. To further support our findings, we conduct simulation studies to assess robustness to decomposition assumptions and extensive case studies using benchmark datasets from image face recognition, natural language processing, network analysis, and biomedical studies. Our theoretical and empirical results indicate that the proposed decomposition is more flexible, general, and robust, and can thus provide improved performance when compared to similar methods. To facilitate applications, an R package with efficient model fitting and family and rank determination is also provided.

22/09/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Fernanda L. Schumacher (The Ohio State University, EUA)

Título: Penalized Estimation of Scale Mixture Of Skew-Normal Linear Mixed Models using Hamiltonian Monte Carlo

Resumo: In clinical trials, studies often present longitudinal or clustered data. These studies are commonly analyzed using linear mixed models, and for mathematical convenience, it is usually assumed that both random effect and error term follow normal distributions. These restrictive assumptions, however, may result in a lack of robustness against departures from the normal distribution and invalid statistical inferences. An interesting extension to make these models more flexible by accounting for skewness and heavy tails is considering the scale mixture of skew-normal class of distributions. Nevertheless, a practical problem may arise when modeling distributions derived from the skew-normal: the possibility that the maximum likelihood estimate of the parameter that regulates skewness diverges. In this work, this anomaly is illustrated, and an alternative Bayesian estimation via Hamiltonian Monte Carlo is proposed.

15/09/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Wilhelm Alexander C. Steinmetz (Departamento de Matemática, UFMG)

Título: Fundamentos da Matemática e Filosofia Empiricamente Informada.

Resumo: Nesta palestra, procuro abordar como outras ciências como Ciência Cognitiva, Neurociência, Antropologia Cultural, Psicologia do Desenvolvimento, Pedagogia da Matemática e História da Matemática podem lançar um luz sobre questões filosóficas referentes aos Fundamentos da Matemática.

01/09/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Gregory J. Matthews (University of Loyola Chicago, EUA).

Título: Completion of Partially Observed Curves Using Hot Deck Type Imputation

Resumo: Statistical shape analysis of curves is well-developed when curves are fully observed. This work considers partially observed curves and develops methods for curve completion or imputation by leveraging tools from the statistical analysis of shape of fully observed curves, which enables sensible curve completions. On a dataset containing partially observed bovid teeth arising from a biological anthropology application, the method is implemented and classification of the completed teeth is carried out based on a shape distance on the set of curves.

18/08/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Shariq Mohammed (Departamento de Bioestatística, Boston University, EUA)

Título: Layered Variable Selection for Multivariate Bayesian Regression: A Case Study in Imaging-Genomics

Resumo: We propose a statistical framework to integrate radiological magnetic resonance imaging (MRI) and genomic data to identify the underlying radiogenomic associations in lower-grade gliomas (LGG). We devise a novel imaging phenotype by dividing the tumor region into concentric spherical layers that mimic the tumor evolution process. MRI data within each layer is represented by voxel-intensity-based probability density functions which capture the complete information about tumor heterogeneity. Under a Riemannian-geometric framework, these densities are mapped to a vector of principal component scores which act as imaging phenotypes. Subsequently, we build Bayesian variable selection models for each layer with the imaging phenotypes as the response and the genomic markers as predictors. Our novel hierarchical prior formulation incorporates the interior-to-exterior structure of the layers and the correlation between the genomic markers. We employ a computationally efficient Expectation-Maximization-based strategy for estimation. With a focus on the cancer driver genes in LGG, we discuss some biologically relevant findings.

ANO DE 2023 – 1º SEMESTRE

30/06/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Adriana dos Santos Lima (Doutoranda, DEST/UFMG)

Título: Regression analysis of interval-censored failure time data with possibly crossing hazards

Resumo: Interval-censored failure time data occur in many areas, especially in medical follow-up studies such as clinical trials, and in consequence, many methods have been developed for the problem. However, most of the existing approaches cannot deal with situations where the hazard functions may cross each other. To address this, we develop a sieve maximum likelihood estimation procedure with the application of the short-term and long-term hazard ratio model. In the method, the I-splines are used to approximate the underlying unknown function. /n/ extensive simulation study was conducted for the assessment of the finite sample properties of the presented procedure and suggests that the method seems to work well for practical situations. The analysis of a motivated example is also provided. Reference: Zhang H., Wang P. and Sun J. (2018) Statistics in Medicine, 37, 5, 786-775.

23/06/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Michael Willig (University of Connecticut, EUA)

Título: Patterns in ecology: Circular statistics, methodological concerns and bootstrap approaches.

Resumo: In this talk, we present results of an ongoing project designed to (1) demonstrate, via a number of exemplar data sets, how application of classical circular statistics in some designs can lead to erroneous and counterintuitive conclusions; (2) develop a bootstrap approach to overcome limitations associated with marginal totals; (3) apply this bootstrap approach to the exemplar data sets to highlight its salient improvement; and (4) apply both circular statistics (i.e., Rayleigh and Hermans-Rasson Tests) and the proposed boot-strap approach to reproductive phenologies derived from well-studied mammal species from the Amazon of Peru. Finally, we wish to promote collaborations between statisticians and ecologists to address questions in temporal biology.

02/06/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Clarice G. B. Demétrio (ESALQ – USP)

Título: Extended Poisson-Tweedie: properties and regression models for count data.

Resumo: We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form, where $ is the mean, and p are the dispersion and Tweedie power parameters, respectively (Bonat et al, 2018; 18: 24–49). The models are fitted by using an estimating function approach obtained by combining the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters, respectively. This provides a flexible and efficient regression methodology for a comprehensive family of count models including Hermite, Neyman Type A, Pólya-Aeppli, negative binomial and Poisson-inverse Gaussian. The estimating function approach allows us to extend the Poisson-Tweedie distributions to deal with underdispersed count data by allowing negative values for the dispersion parameter. Furthermore, the Poisson-Tweedie family can automatically adapt to highly skewed count data with excessive zeros, without the need to introduce zero-inflated or hurdle components, by the simple estimation of the power parameter. Thus, the proposed models offer a unified framework to deal with under, equi, overdispersed, zero-inflated and heavy-tailed count data. The computational implementation of the proposed models is fast, relying only on a simple Newton scoring algorithm. Simulation studies showed that the estimating function approach provides unbiased and consistent estimators for both regression and dispersion parameters. We highlight the ability of the Poisson-Tweedie distributions to deal with count data through a consideration of dispersion, zero-inflated and heavy tail indexes, and illustrate its application with four data analyses.

26/05/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Angélica M. Tortola Ribeiro (Universidade Tecnológica Federal do Paraná)

Título: A Kronecker-based covariance model for multivariate geostatistical data

Resumo: In this work, we present a proposal for a covariance function specification for spatially continuous multivariate data. This model is based on the Kronecker product of covariance matrices for Gaussian random fields. The structure is valid for different marginal covariance functions, allowing different variables to have different spatial dependence structures, which makes it more flexible. Our model allows its parameters to vary in its usual domains, which makes the estimation less constrained when compared to other classical approaches. The reduced computational times and easy generalization to larger dimensions follows from the model definition. The simple structure of the model, combined with the interpretability of the parameters and computational time for inference make this model a promising candidate for modeling spatially continuous multivariate data.

19/05/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Tamara Broderick (Department of Electrical Engineering and Computer Science, MIT, EUA).

Título: An Automatic Finite-Sample Robustness Check: Can Dropping a Little Data Change Conclusions?

Resumo: Practitioners will often analyze a data sample with the goal of applying any conclusions to a new population. For instance, if economists conclude microcredit is effective at alleviating poverty based on observed data, policymakers might decide to distribute microcredit in other locations or future years. Typically, the original data is not a perfect random sample from the population where policy is applied — but researchers might feel comfortable generalizing anyway so long as deviations from random sampling are small, and the corresponding impact on conclusions is small as well. Conversely, researchers might worry if a very small proportion of the data sample was instrumental to the original conclusion. So we propose a method to assess the sensitivity of statistical conclusions to the removal of a very small fraction of the data set. Manually checking all small data subsets is computationally infeasible, so we propose an approximation based on the classical influence function. Our method is automatically computable for common estimators. We provide finite-sample error bounds on approximation performance and a low-cost exact lower bound on sensitivity. We find that sensitivity is driven by a signal-to-noise ratio in the inference problem, does not disappear asymptotically, and is not decided by misspecification. Empirically we find that many data analyses are robust, but the conclusions of several influential economics papers can be changed by removing (much) less than 1% of the data.

12/05/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Alexandre L. Rodrigues (Departamento de Estatística, UFES).

Título: A conditional machine learning classification approach for spatio-temporal risk assessment of crime data.

Resumo: Crime data analysis is an essential source of information to aid social and political decisions makers regarding the allocation of public security resources. Computer-aided dispatch systems and technological advances in geographic information systems have made analysing and visualising historical spatial and temporal records of crimes a vital part of police operations and strategy. We look at our motivating crime problem as a spatio-temporal point pattern. Using a conditional approach based on properties of Poisson point processes, we transform the spatio-temporal point process prediction problem into a classification problem. We create spatio-temporal handcrafted features to link future and past events and use machine learning algorithms to learn behavioural patterns from the data. The fitted model is then used to carry out the reverse transformation, i.e. to perform spatio-temporal risk predictions based on the outcomes of the classification problem. Our procedure has theoretical formalism from point process theory and gains flexibility and computational efficiency inherited from the machine learning field. We show its performance under some simulated scenarios and a real application to spatio-temporal prediction and risk assessment of homicides in Bogota, Colombia.

05/05/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Valdério A. Reisen (UFES)

Título: M-quatile estimation for GARCH models.

Resumo: M-regression and quantile methods have been suggested to estimate generalized autoregressive conditionally heteroscedastic (GARCH) models. In this paper, we propose an M-quantile approach, which combines quantile and M-regression to obtain a robust estimator of the conditional volatility when the data have abrupt observations or heavy-tailed distributions. Some technical issues are discussed and Monte Carlo experiments are conducted to show that the M-quantile approach appears to be more resistant against additive outliers than M-regression and quantile methods. The usefulness of the method is illustrated on two financial datasets.

28/04/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Bruno Sansó (University of California Santa Cruz, EUA).

Título: Non-Gaussian geostatistical models using nearest neighbors processes.

Resumo: We present a framework for non-Gaussian spatial processes that encompasses large distribution families. Spatial dependence for a set of irregularly scattered locations is described with a mixture of pairwise kernels. Focusing on the nearest neighbors of a given location, within a reference set, we obtain a valid spatial process: the nearest neighbor mixture process (NNMP). We develop conditions to construct general NNMP models with arbitrary pre-specified marginal distributions. Essentially, NNMPs are specified by a bi-variate distribution, with suitable marginals, used to specify the mixture transition kernels. Such distribution can be spatially varying, to capture non-homogeneous spatial features. The mixture structure of the model allows for efficient MCMC-based exploration of posterior distribution of the model parameters, even for relatively large number of locations. We illustrate the capabilities of NNMPs with observations corresponding to distributions with different non-Gaussian characteristics: Long tails; Compact support; Skewness; Discrete values.

14/04/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Gabriel O. Assunção (IBGE)

Título: Aspectos Metodológicos do Sistema Integrado de Pesquisas Domiciliares.

Resumo: As pesquisas domiciliares amostrais realizadas pelo Instituto Brasileiro de Geografia e Estatística (IBGE) são fundamentais para retratar o Brasil com informações necessárias ao conhecimento da sua realidade e ao exercício da cidadania, além de serem essenciais para a formulação de políticas públicas. Nesse sentido, o Sistema Integrado de Pesquisas Domiciliares (SIPD) foi implementado pelo IBGE em 2011 com o intuito de integrar todas as pesquisas domiciliares amostrais a partir da utilização de uma mesma infraestrutura amostral, de um mesmo cadastro de seleção e de uma amostra comum, a Amostra Mestra. Então, o intuito deste seminário é apresentar sobre os aspectos metodológicos relacionados ao SIPD e à Amostra Mestra.

31/03/2023 às 13:30h – Local: Canal do Youtube Seminários DEST – UFMG

Jorge Mateu (Universitat Jaume I, Espanha).

Título: Statistical models for the analysis, prediction and monitoring of space-time data. Applications to infectious diseases and crime.

Resumo: We present several statistical approaches to understand the underlying temporal and spatial dynamics of infectious diseases (with a focus on Covid-19 data) that can result in informed and timely public health policies. Most studies in the context of infectious diseases commonly report figures of the overall infection at a state- or county-level, reporting the aggregated number of cases in a particular region at one time. However, we focus on analysing high-resolution Covid-19 datasets in form of spatio-temporal point patterns, offering vital insights for the spatio-temporal interaction between individuals concerning the disease spread in a metropolis.

24/03/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Carl Schmertmann (Florida State University, EUA).

Título: Estimação bayesiana de mortalidade para pequenas áreas com sub-registro de óbitos.

Resumo: Variância amostral dificulta a estimação de taxas demográficas para pequenas áreas. Além disso, em muitos países o sistema de registro de óbitos é imperfeito, com um grau de cobertura que varia entre regiões. Elaboramos um modelo bayesiano para mortalidade que lida com esses dois problemas simultaneamente. O modelo incorpora estimativas externas do sub-registro local através de distribuições a priori para os parâmetros que definam o grau de cobertura. Aplicamos o modelo a dados de 2009-2011 para gerar estimativas de taxas de mortalidade e esperança de vida ao nascer — e da incerteza nessas estimativas — para todas as microrregiões brasileiras.

17/03/2023 às 13:30h – Local: Sala 2076 – ICEx/UFMG

Bernardo N. B. Lima (Departamento de Matemática, UFMG).

Título: Bêbados, apostas e circuitos elétricos.

Resumo: O passeio aleatório é um dos objetos mais interessantes e estudados em Probabilidade. Descreveremos a mais simples de suas versões e exploraremos, também no caso mais simples, uma teoria matemática que também é comum em outros contextos aparentemente bem distintos.

ANO DE 2022 – 2º SEMESTRE

02/12/2022 às 13:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Gracielle A. Araújo (Doutoranda, DEST/UFMG)

Título: Bayesian methods for neural networks and related models.

Resumo: Models such as feed-forward neural networks and certain other structures investigated in the computer science literature are not amenable to closed-form Bayesian analysis. The paper reviews the various approaches taken to overcome this difficulty, involving the use of Gaussian approximations, Markov chain Monte Carlo simulation routines, and a class of non-Gaussian but “deterministic” approximations called variational approximations. Reference: Titterington D. M. (2004), Bayesian methods for neural networks and related models. Statistical Science, 19, 1, 128-139.

25/11/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Caio G. B. Balieiro (Doutorando, DEST/UFMG)

Título: Modeling spatial variation in leukemia survival data.

Resumo: In this article we combine ideas from spatial statistics with lifetime data analysis techniques to investigate possible spatial variation in survival of adult acute myeloid leukemia patients in northwest England. Exploratory analysis suggests both clinically and statistically significant variation in survival rates across the region. A multivariate gamma frailty model incorporating spatial dependence is proposed and applied, with results confirming the dependence of hazard on location. Reference: Henderson R., Shimakura S. and Gorst D. (2002), Modeling spatial variation in leukemia survival data. Journal of the American Statistical Association, 97, 460, 965-972.

18/11/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Marco Antonio T. Aucahuasi (Doutorando, DEST/UFMG)

Título: A brief overview of Markov chains and coalescing particles.

Resumo: In this talk we present a brief review of the theory of Markov chains and mixing times, and some examples. We also present an interacting particle system with the following dynamics: At time 0, we begin with a particle at each integer in [0,n]. At each positive integer time, one of the particles remaining in [1,n] is chosen at random and moves one to the left, coalescing with any particle that might already be there. How long does it take until all particles coalesce (at 0)? Orientador: Roger W. C. Silva.

11/11/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Otávio A. S. Lima (Doutorando, DEST/UFMG)

Título: O modelo de percolação de palavras.

Resumo: O problema de percolação de palavras foi introduzido por Itai Benjamini e Harry Kesten em 1995, como generalização do modelo de percolação Bernoulli. Esta apresentação tem como objetivo introduzir este modelo e apresentar alguns resultados.

04/11/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Thaís P. Galletti (Departamento de Estatística, UFMG)

Título: Geração de coordenadas geográficas sintéticas para banco de dados confidenciais com aplicação a dados de COVID-19 em Montes Claros, MG.

Resumo: Com a crescente produção de dados das últimas décadas, um dos principais problemas é a violação da privacidade de indivíduos. O desafio é desenvolver mecanismos que preservem o sigilo dos dados e, ao mesmo tempo, permitam que os dados sejam divulgados e utilizados para análises estatísticas. Os métodos de imputação múltipla para simulação de dados sintéticos têm se mostrado uma alternativa interessante para resolver esse tipo de problema, podendo ser aplicado inclusive para localizações espaciais. O objetivo deste trabalho é propor uma extensão para a metodologia de geração de coordenadas geográficas sintéticas com covariáveis discretas e contínuas, além de aplicar o método para imputação de localizações sintéticas de indivíduos com suspeita de COVID-19 na cidade de Montes Claros, MG.

21/10/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Rafael Izbicki (Departamento de Estatística, UFSCar)

Título: Diagnostics and recalibration of predictive distributions.

Resumo: Uncertainty quantification is crucial for assessing the predictive ability of AI algorithms. A large body of work (including normalizing flows and Bayesian neural networks) has been devoted to describing the entire predictive distribution (PD) of a target variable Y given input features X. However, off-the-shelf PDs are usually far from being conditionally calibrated; i.e., the probability of occurrence of an event given input X can be significantly different from the predicted probability. Most current research on predictive inference (such as conformal prediction) concerns constructing calibrated prediction sets only. It is often believed that the problem of obtaining and assessing entire conditionally calibrated PDs is too challenging. In this work, we show that recalibration, as well as validation of full/entire PDs, are indeed attainable goals in practice. Our proposed method relies on the idea of regressing probability integral transform (PIT) scores against X. This regression gives full diagnostics of conditional coverage across the entire feature space and can be used to recalibrate misspecified PDs. We benchmark our corrected prediction bands against oracle bands and state-of-the-art predictive inference algorithms for synthetic data, including settings with a distributional shift. Finally, we produce calibrated PDs for two applications: (i) probabilistic forecasting based on sequences of satellite images, and (ii) estimation of galaxy distances based on imaging data (photometric redshifts).

14/10/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Fábio M. Bayer (Departamento de Estatística, UFSM)

Título: K vizinhos mais próximos circular.

Resumo: Dados circulares estão presentes em várias áreas da ciência e carecem de métodos estatísticos específicos para seu tratamento. No âmbito de modelos de regressão, a literatura apresenta modelos de regressão paramétricos para dados circulares, os quais fazem suposições de determinadas distribuições de probabilidade circulares para seus ajustes. Por outro lado, na área de aprendizado de máquina, uma abordagem supervisionada para predição de dados contínuos envolve modelos de regressão não paramétricos, os quais podem não ser adequados para situações em que a variável resposta é circular. Neste seminário, apresentarei um novo modelo de aprendizado de máquina para predição de dados circulares, o qual é denominado k vizinhos mais próximos circular. Trabalho co-autorado com Maicon Facco.

07/10/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Valdério A. Reisen (UFES, UFMG, Université Paris-Saclay, UFBA)

Título: M-regression estimation methods and robust PCA in mixed linear models. An application to quantify the statistical association between forced expiratory volume and pollutants.

Resumo: This seminar discusses the use of M-regression estimation methods and PCA tools (robust and non-robust) in Mixed models with time series covariates. An application to the relationship between exposure to air pollution and forced expiratory volume at the first second (FEV1) is considered to motivate the use of the proposed methodology in real problems.

30/09/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Jussiane N. Gonçalves (Departamento de Estatística, UFMG)

Título: A novel regression model for correlated count data.

Resumo: The premise of independence among subjects in the same cluster/group often fails in practice, and models that rely on such untenable assumption can produce misleading results. To overcome this severe deficiency, we introduce a new regression model to handle overdispersed and correlated clustered counts. To account for correlation within clusters, we propose a Poisson regression model where the observations within the same cluster are driven by the same latent random effect that follows the Birnbaum-Saunders distribution with a parameter that controls the strength of dependence among the individuals. This novel multivariate count model is called Clustered Poisson Birnbaum-Saunders (CPBS) regression. The CPBS model is analytically tractable, and its moment structure can be explicitly obtained. Estimation of parameters is performed through the maximum likelihood method, and an Expectation-Maximization (EM) algorithm is also developed. Simulation results to evaluate the finite-sample performance of our proposed estimators are presented. We also discuss diagnostic tools for checking model adequacy. An empirical application concerning the number of inpatient admissions by individuals to hospital emergency rooms, from the Medical Expenditure Panel Survey (MEPS) conducted by the United States Agency for Health Research and Quality, illustrates the usefulness of our proposed methodology..

23/09/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Fábio N. Demarqui (Departamento de Estatística, UFMG)

Título: A class of models for survival data with cure fraction and crossing survivals..

Resumo: In this talk, we introduce a new class of models to fit survival data with cure fraction and crossing survivals. The class of models proposed in this work has some attractive features: i) it is built upon a well-known unified two-stage process that possesses an appealing biological motivation in terms of incidence-latency of disease; ii) the incidence sub-model can be modeled by the Bernoulli, Poisson, negative binomial and Bell distributions; iii) the Yang and Prentice (YP) regression structure assumed to model the latency sub-model allows the model to accommodate survival data with crossing survivals, and it further includes the well-known proportional hazards (PH) and proportional odds (PO) models as particular cases; iv) the baseline survival distribution can be modeled parametrically (under the assumption of any parametric distribution), or semiparametrically (by either the piecewise exponential distribution or the Bernstein polynomials), providing greater flexibility for the modeling process; v) the likelihood function is available in closed-form expressions, leading to more straightforward inferential procedures. An extensive simulation study was carried out to investigate the asymptotic properties of the proposed class of models using the R package survcure, developed to fit the models belonging to the proposed class. We illustrate the usefulness of the proposed model through the analysis of a real dataset involving patients diagnosed with melanoma cancer previously investigated in the literature. The results obtained suggest that the proposed model arises as a flexible and attractive alternative to model survival data with cure fraction and crossing survivals.

09/09/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Flávio B. Gonçalves (Departamento de Estatística, UFMG)

Título: Beyond Gaussian processes: flexible Bayesian modeling and inference for geostatistical processes.

Resumo: In this talk, I will present a novel family of geostatistical models to account for features that cannot be properly accommodated by traditional Gaussian processes. The family is specified hierarchically, through a latent Poisson process, and combines the infinite-dimensional dynamics of Gaussian processes with that of any multivariate continuous distribution. The resulting process is called the Poisson-Gaussian Mixture Process – POGAMP. Whilst the attempt of defining geostatistical processes by assigning some arbitrary continuous distribution to be the finite-dimensional distributions usually leads to non-valid processes, the finite-dimensional distributions of the POGAMP can be arbitrarily close to any continuous distribution and still define a valid process. Formal results to establish the existence and some important properties of the POGAMP, such as absolute continuity with respect to a Gaussian process measure, are provided. Also, an MCMC algorithm is carefully devised to perform Bayesian inference when the POGAMP is discretely observed in some space domain.

02/09/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Simon Lunagomez (ITAM, México)

Título: Latent space modelling of hypergraph data.

Resumo: The increasing prevalence of relational data describing interactions among a target population has motivated a wide literature on statistical network analysis. In many applications, interactions may involve more than two members of the population and this data is more appropriately represented by a hypergraph. In this paper, we present a model for hypergraph data which extends the well-established latent space approach for graphs and, by drawing a connection to constructs from computational topology, we develop a model whose likelihood is inexpensive to compute. A delayed-acceptance MCMC scheme is proposed to obtain posterior samples and we rely on Bookstein coordinates to remove the identifiability issues associated with the latent representation. We theoretically examine the degree distribution of hypergraphs generated under our framework and, through simulation, we investigate the flexibility of our model and consider estimation of predictive distributions. Finally, we explore the application of our model to two real-world datasets. This is joint work with Kathryn Turnbull, Christopher Nemeth and Edoardo Airoldi.

ANO DE 2022 – 1º SEMESTRE

15/07/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Guilherme L. Oliveira (CEFET – MG)

Título: An overview of Bayesian models for underreported count data: theory and applications.

Resumo: Count data is collected in many fields such as criminology, demography and epidemiology to assess or monitor the associated risks. In Brazil, this type of data usually comes from official registration systems which are prone to under-registration: only a fraction of the true (but unobserved) counts are reported. In this talk, some statistical approaches for correcting underreporting in count data will be discussed. The methods are based on the definition of a Poisson regression model for the observed data along with the specification of an auxiliary structure for modeling the reporting process. The inference is made under the Bayesian framework and it depends on the sort of prior information that is available. Applications consider Brazilian data on infant mortality, syphilis and tuberculosis, in which the correction of underreporting bias is very important for accurate surveillance, intervention and control by the government.

08/07/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Uriel M. Silva (Observatório de Saúde Urbana de BH, UFMG)

Título: A unified framework for sequential parameter learning with regularization in state space models.

Resumo: A unified framework for sequential parameter learning in state space models is proposed. This framework is capable of accommodating several other algorithms found in the literature as special cases, and this generality is achieved mainly by providing an alternative formalism to the role of regularization in this setting. In order to illustrate its flexibility, three novel algorithms are developed within this framework, including an improved and fully-adapted version of the celebrated Liu and West filter. These regularization techniques are associated with efficient resampling schemes, and their use is illustrated in challenging nonlinear settings with both synthetic and real-world data.

01/07/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Esther Salazar (FDA – Food and Drug Administration, EUA)

Título: Flexible models for heterogeneous multiview data: applications to behavioral and fMRI data.

Resumo: We present a probabilistic framework for learning with heterogeneous multiview data where some views are given as ordinal, binary, or real-valued feature matrices, and some views as similarity matrices. Our framework has the following distinguishing aspects: (i) a unified latent factor model for integrating information from diverse feature (ordinal, binary, real) and similarity-based views, and predicting the missing data in each view, leveraging view correlations; (ii) seamless adaptation to binary/multiclass classification where data consists of multiple feature and/or similarity-based views; and (iii) an efficient, variational inference algorithm which is especially flexible in modeling the views with ordinal-valued data (by learning the cutpoints for the ordinal data), and extends naturally to streaming data settings. Our framework subsumes methods such as multiview learning and multiple kernel learning as special cases. We demonstrate the effectiveness of our framework on several real-world and benchmark datasets.

24/06/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Douglas R. M. Azevedo (R/Shiny developer, Appsilon)

Título: Flexible link function with asymptotes: estimating the SUS population in Brazil.

Resumo: The estimation of hidden sub-populations is a hard task that appears in many fields. For example, public health planning in Brazil depends crucially on the number of people who holds a private health insurance plan and, hence, rarely uses the public services. Different sources of information about these sub-populations may be available at different geographical levels. The available information can be transferred between these different geographic levels to improve the estimation of the hidden population size. In this study, we propose a model that uses individual-level information to learn about the dependence between the response variable and explanatory variables by proposing a family of link functions with asymptotes that are flexible enough to represent the real aspects of the data and robust to departures from the model. We use the fitted model to estimate the size of the sub-population at any desired level. We illustrate our methodology by estimating the sub-population that uses the public health system in each neighborhood of large cities in Brazil.

10/06/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Artur J. Lemonte (Departamento de Estatística, UFRN)

Título: On the local power of the LR, Wald, score e gradient tests under orthogonality.

Resumo: The local power of the LR, Wald, score e gradient tests under the presence of a parameter vector, omega say, that is orthogonal to the remaining parameters is studied. We show that some of the coefficients that define the local power of the tests remain unchanged regardless of whether omega is known or needs to be estimated, whereas the others can be written as the sum of two terms, the first of which being the corresponding term obtained as if omega were known, and the second, an additional term yielded by the fact that omega is unknown. We apply the general result in the class of nonlinear Student-t regression models.

03/06/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Frederico M. Almeida (Pós-doc, Escola de Nutrição, UFOP)

Título: Modified score function for monotone likelihood in the semiparametric mixture cure model.

Resumo: The cure fraction models are intended to analyze lifetime data from populations where some individuals are immune to the event under study, and allow a joint estimation of the distribution related to the cured and susceptible subjects, as opposed to the usual approach ignoring the cure rate. In situations involving small sample sizes with many censored times, the detection of non-finite coefficients may arise via maximum likelihood. This phenomenon is commonly known as monotone likelihood (ML), occurring in the Cox and logistic regression models when many categorical and unbalanced covariates are present. An existing solution to prevent the issue is based on the Firth correction, originally developed to reduce the estimation bias. The method ensures finite estimates by penalizing the likelihood function. In the context of mixture cure models, the ML issue is rarely discussed in the literature; therefore, this topic can be seen as the first contribution of our paper. The second major contribution, not well addressed elsewhere, is the study of the ML issue in cure mixture modeling under the flexibility of a semiparametric framework to handle the baseline hazard. We derive the modified score function based on the Firth approach and explore the finite sample size properties of the estimators via a Monte Carlo scheme. The simulation results indicate that the performance of coefficients related to the binary covariates are strongly affected by the imbalance degree. A real illustration (melanoma data) is discussed using a relatively novel data set collected in a Brazilian university hospital.

27/05/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Alessandro J. Q. Sarnaglia (Departamento de Estatística, UFES)

Título: Regressão segmentada com abordagem Bayesiana para dados de contagem: Aplicação para estimação do limiar crítico de poluição do ar em admissões hospitalares.

Resumo: A poluição do ar é um problema enfrentado em várias partes do mundo. Em especial, como demonstrado por vários estudos, o material particulado com diâmetro inferior a 10 µm (PM10) é considerado um dos poluentes mais danosos à saúde. Do ponto de vista de saúde pública, frequentemente, esse impacto é investigado por meio do estudo do efeito da concentração do PM10 no número de internações hospitalares. Nesse sentido, o objetivo central deste trabalho é realizar uma análise com foco em determinar a partir de qual nível de concentração de PM10 crianças com 10 anos ou menos ficariam mais vulneráveis gerando, como consequência, um aumento no número de admissões hospitalares por fatores respiratórios. Para alcançar este objetivo, faremos uso de modelagem de regressão segmentada sob o ponto de vista bayesiano. Como já pontuado na literatura, a verossimilhança nesse caso acaba não sendo diferenciável no ponto de quebra, o que se torna um desafio para métodos que fazem uso de derivadas. Nesse sentido, propomos a utilização da aproximação de Laplace para amostrar da distribuição a posteriori, recorrendo a uma reparametrização do modelo e a métodos bootstrap para especificação da matriz de covariâncias utilizada nessa aproximação. Através de um estudo de simulação, comparamos esse método a diferentes procedimentos já existentes na literatura, a fim de analisar a acurácia das estimativas e o tempo computacional de execução. Por meio dos resultados obtidos, concluímos que a metodologia proposta apresenta resultados superiores às metodologias existentes, já que a mesma obteve probabilidades de cobertura maiores se aproximando mais do valor de 95% de nível de confiança, além de apresentar maior precisão com as amplitudes dos intervalos sendo menores. Por fim, aplicamos essa metodologia para estudar o efeito do PM10 e de variáveis meteorológicas no número de internações diárias por causas respiratórias em um hospital do Espírito Santo, Brasil. Como resultado, identificamos que o valor do limiar crítico do poluente PM10 que acarretaria o aumento no número de internações infantis é em torno de 34 µg/m³, que está abaixo do referencial de 50 µg/m³ estipulado pela Organização Mundial da Saúde (OMS). Resultado similar foi previamente obtido por Sarnaglia et al. (2021) sob o ponto de vista frequentista.

20/05/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Marcelo R. Hilário (Departamento de Matemática, UFMG)

Título: Lei dos grandes números para passeios aleatórios em ambientes aleatórios dinâmicos

Resumo: Passeios aleatórios em ambientes aleatórios modelam o comportamento de uma partícula cujo movimento está sujeito à influência de um meio desordenado. O núcleo de transição que governa o movimento do passeio aleatório depende de uma família de variáveis aleatórias indexadas pelo espaço chamada de ambiente aleatório. Esse ambiente pode ser estático, quando as variáveis são mantidas constantes ou dinâmico quando elas também evoluem estocasticamente no tempo. Nesta palestra vamos discutir alguns resultados recentes no entendimento do comportamento assintótico do passeio no caso em que o ambiente é dinâmico. Em particular, será apresentada uma técnica que permite demonstrar a lei dos grandes números para esses processos no caso em que o ambiente é unidimensional e apresenta fortes correlações espaço-temporais.

13/05/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Silvia L. P. Ferrari (Departamento de Estatística, IME, USP)

Título: Robust estimation in beta regression via maximum Lq-likelihood

Resumo: Beta regression models are widely used for modeling continuous data limited to the unit interval, such as proportions, fractions, and rates. The inference for the parameters of beta regression models is commonly based on maximum likelihood estimation. However, it is known to be sensitive to discrepant observations. In some cases, one atypical data point can lead to severe bias and erroneous conclusions about the features of interest. In this work, we develop a robust estimation procedure for beta regression models based on the maximization of a reparameterized Lq-likelihood. The new estimator offers a trade-off between robustness and efficiency through a tuning constant. To select the optimal value of the tuning constant, we propose a data-driven method that ensures full efficiency in the absence of outliers. We also improve on an alternative robust estimator by applying our data-driven method to select its optimum tuning constant. Monte Carlo simulations suggest marked robustness of the two robust estimators with little loss of efficiency when the proposed selection scheme for the tuning constant is employed. Applications to three datasets are presented and discussed. As a by-product of the proposed methodology, residual diagnostic plots based on robust fits highlight outliers that would be masked under maximum likelihood estimation. Joint work with Terezinha K. A. Ribeiro.

06/05/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Mark D. Risser (Lawrence Berkeley National Laboratory, EUA)

Título: Bayesian inference for high-dimensional nonstationary Gaussian processes

Resumo: In spite of the diverse literature on nonstationary spatial modelling and approximate Gaussian process (GP) methods, there are no general approaches for conducting fully Bayesian inference for moderately sized nonstationary spatial data sets on a personal laptop. For statisticians and data scientists who wish to conduct posterior inference and prediction with appropriate uncertainty quantification, the lack of such approaches and software is a limitation. In this work, we develop methodology for implementing formal Bayesian inference for a general class of nonstationary GPs. Our novel approach uses pre-existing frameworks for characterizing nonstationarity in a new way while utilizing via modern GP likelihood approximations. Posterior sampling is implemented using flexible MCMC methods, with nonstationary posterior prediction conducted as a post-processing step. We demonstrate our novel methods on three data sets, ranging from several hundred to over several thousand locations. All of our methods are implemented in the freely available BayesNSGP software package for R.

29/04/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Marcelo A. Costa (Departamento de Engenharia de Produção, UFMG)

Título: Dynamic time scan forecasting for multi-step wind speed prediction.

Resumo: Multi-step forecasting of wind speed time series, especially for day-ahead and longer time horizons, is still a challenging problem in the wind energy sector. In this paper, a novel analog-based methodology to perform multi-step forecasting in univariate time series, named dynamic time scan forecasting (DTSF), is presented. DTSF is a fast time series forecasting methodology for large data sets. Thus, the proposed method is optimal for forecasting renewable energy features such as wind speed, in which standard statistical and soft computing methods present limitations. A scan procedure is applied to identify similar patterns, named best matches, throughout the time series. As opposed to euclidean distance, more flexible similarity functions, using polynomial regression models, are dynamically estimated and Goodness-of-fit statistics are used to find the best matches. The observed values following the best matches and the fitted similarity functions are used to predict k-steps ahead, as well as forecasting intervals. An ensemble version of the method, named eDTSF, combines different predictions using different set of parameters thus, further improving forecasting performance. Remarkably, eDTSF achieved competitive results for multi-step forecasting of wind speed time series, even in situations of very high variability, as compared to eleven selected concurrent forecasting methods.

08/04/2022 às 13:30 hs – Local: sala 2076 – ICEx/UFMG

Enrico A. Colosimo (Departamento de Estatística, UFMG)

Título: Modelos de predição clínica.

Resumo: Modelos de predição clínica são construídos com o objetivo de identificar pacientes ou indivíduos com maior probabilidade de desenvolver um específico evento, usualmente doença ou óbito. Estas predições são utilizadas para mudar estilo de vida, guiar nas decisões terapêuticas, estratificar por gravidade, entre outros. Este trabalho foi motivado pela necessidade de construir um escore de risco para pacientes chagásicos cardiopatas a partir de uma coorte acompanhada na região do vale do Jequitinhonha, estado de Minas Gerais. Inicialmente foram obtidas predições a partir linha de base, e a seguir, a medida que a coorte caminhou longitudinalmente, torná-las dinâmicas. Vamos apresentar nesta palestra os passos fundamentais para a construção de um escore de predição estático e dinâmico e ilustrar com os resultados obtidos para o estudo do vale do Jequitinhonha.

25/02/2022 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

César Macieira (Doutorando, Departamento de Estatística, UFMG)

Título: Clustering discrete data through the multinomial mixture model.

Resumo: Neste artigo, o modelo de mistura multinomial é estudado através de uma abordagem de máxima verossimilhança. É apresentada a convergência do estimador de máxima verossimilhança para um conjunto com características de interesse. Método este que visa selecionar o número de componentes da mistura, desenvolvido com base na forma do estimador de máxima verossimilhança. Em seguida, é realizado um estudo de simulação para verificar seu comportamento. Por fim, duas aplicações em dados reais de misturas multinomiais são apresentadas. Referência: J. Portela (2008). Communications in Statistics – Theory and Methods, 37, 20, 3250-3263.

25/02/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Jonathan S. Matias (Doutorando, Departamento de Estatística, UFMG)

Título: M-Estimation in GARCH models.

Resumo: This paper derives asymptotic normality of a class of M-estimators in the generalized autoregressive conditional heteroskedastic (GARCH) model. The class of estimators includes least absolute deviation and Huber’s estimator in addition to the well-known quasi maximum likelihood estimator. For some estimators, the asymptotic normality results are obtained only under the existence of fractional unconditional moment assumption on the error distribution and some mild smoothness and moment assumptions on the score function. Reference: K. Mukherjee (2008). Econometric Theory, 24, 6, 1530-1553.

18/02/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Ricardo Cunha Pedroso (Doutorando, Departamento de Estatística, UFMG)

Título: Dependent modeling of temporal sequences of random partitions.

Resumo: O seminário consistirá na apresentação do artigo “Dependent Modeling of Temporal Sequences of Random Partitions”, Page et al. (2021), onde os autores propõem uma modelagem para sequências de partições aleatórias dependentes, no caso em que o principal interesse é a identificação de clusters. São apresentadas as propriedades condicionais e marginais do modelo conjunto das partições e estratégias computacionais Bayesianas de estimação. Um estudo com dados simulados para o caso de dependência temporal demonstra que o modelo produz estimativas para as partições que evoluem de forma suave e, por fim, o modelo é aplicado a dados de meio ambiente que exibem dependência espaço-temporal. Referência: Page G.L., Quintana F.A., Dahl D.B. (2021). Journal of Computational and Graphical Statistics, doi 10.1080/10618600.2021.1987255.

11/02/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Gabriel Oliveira Assunção (Doutorando, Departamento de Estatística, UFMG)

Título: Data augmentation approaches for NLP.

Resumo: Recentemente o interesse em Data Augmentation na área Processamento de Linguagem Natural (NLP) aumentou devido a trabalhos em domínios de pouco recurso, novos tipos de tarefas e a popularidade em redes neurais de larga escala que necessitam de uma quantidade grande de dados para ser treinada. Mesmo ocorrendo este interesse na área, ela ainda é pouco explorada. Nesta apresentação serão apresentados alguns métodos existentes na literatura para Data Augmentation em NLP, com suas aplicações e desafios. Referência: Feng S.Y., Gangal V., Wei J., Chandar S., Vosoughi S., Mitamura T., Hovy E. (2021). A survey of data augmentation approaches for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 968-988, Association for Computational Linguistics.

04/02/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Gisele de Oliveira Maia (Doutoranda, Departamento de Estatística, UFMG)

Título: Observation-driven models for Poisson counts.

Resumo: O seminário é focado em apresentar o artigo Observation-driven models for Poisson counts, Davis et al. (2003), onde é abordado o Modelo Linear Generalizado Autorregressivo Médias-Móveis para séries temporais de contagens, cuja distribuição condicional da série temporal dada suas observações passadas e covariáveis segue a distribuição Poisson. Serão apresentadas a estrutura, propriedades, método de estimação dos parâmetros e propriedades assintóticas deste modelo. Simulações são apresentadas com o intuito de fornecer informações adicionais sobre o comportamento dos estimadores. Finalmente, é descrita uma aplicação a um modelo de regressão para contagens diárias de casos de asma em um hospital de Sydney, Austrália. Referência: Davis R.A., Dunsmuir W.T.M., Streett S.B. (2003) Observation‐driven models for Poisson counts. Biometrika, 90, 4, 777-790.

28/01/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Camila B. Zeller (Departamento de Estatística, UFJF)

Título: Estimation in the multivariate linear regression models with skew scale mixtures of normal distributions..

Resumo: In this paper, we present recent results in the context of multivariate linear regression models considering that random errors follow multivariate skew scale mixtures of normal distributions. This class of distributions includes the scale mixtures of multivariate normal distributions, as special cases, and provides flexibility in capturing a wide variety of asymmetric behaviors. We implemented the algorithm ECM (Expectation/Conditional Maximization) and we obtained closed-form expressions for all the estimators of the parameters of the proposed model. The proposed algorithm and methods are implemented in the new R package skewMLRM. Finally, a real data set is analyzed in order to show the usefulness of the package.

21/01/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Silvana Schneider (IME, UFRGS)

Título: An approach for long-term survival data with dependent censoring.

Resumo: In this paper, we propose a likelihood-based approach for long-term multivariate survival data, which is suitable to accommodate the dependent censoring. The association between lifetimes and dependent censoring is accommodated through the conditional approach of the frailty models. The marginal distributions can be adjusted assuming Weibull or piecewise exponential (PE) distributions. A Monte Carlo Expectation-Maximization algorithm is developed to estimate the proposed estimators. The simulation study results show a small relative bias and coverage probability near the nominal value. Finally, in order to evaluate the life dynamic of free-ranging dogs, taking into account all characteristics of the data, including long-term survival, we analyze the survival times of stray dogs in India).

14/01/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Rosangela H. Loschi (Departamento de Estatística, UFMG)

Título: Handling categorical features with many levels using a product partition model.

Resumo: A common difficulty in data analysis is how to handle categorical predictors with a large number of levels or categories. Few proposals have been developed to tackle this important and frequent problem. We introduce a generative model that simultaneously carries out the model fitting and the aggregation of the categorical levels into larger groups. We represent the categorical predictor by a graph where the nodes are the categories and establish a probability distribution over meaningful partitions of this graph. Conditionally on the observed data, we obtain a posterior distribution for the levels aggregation, allowing the inference about the most probable clustering for the categories. Simultaneously, we extract inference about all the other regression model parameters. We compare our and state-of-art methods showing that it has equally good predictive performance and more interpretable results. Our approach balances out accuracy versus interpretability, a current important concern in statistics and machine learning. Joint work with: Tulio Criscuolo (Google-USA), Renato Assunção (ESRI, USA), Wagner Meira (DCC, UFMG) and Danna Cruz (Universidad del Rosario, Co).

07/01/2022 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

P. Richard Hahn (Arizona State University, EUA)

Título: Feature selection for causal effect estimation.

Resumo: This paper defines the notion of a minimal control function, on the basis of which a novel regression penalty is devised that is unbiased for average treatment effects. The development of the new approach combines insights from three distinct methodological traditions for studying causal effect estimation: potential outcomes, causal diagrams, and structural models with additive errors. It is demonstrated that naive feature selection and/or regularization approaches to treatment effect estimation can exhibit severe bias for average and conditional average treatment effects.

.ANO DE 2021 – 2º SEMESTRE

17/12/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Roger W. C. Silva (Departamento de Estatística – UFMG)

Título: Constrained-degree percolation in random environment.

Resumo: We consider the Constrained-degree percolation model in random environment on the square lattice. In this model, each vertex v has an independent random constraint κ_v which takes the value j ∈ {0, 1, 2, 3} with probability ρ_j . Each edge e attempts to open at a random uniform time U_e in [0, 1], independently of all other edges. It succeeds if at time U_e both its end-vertices have degrees strictly smaller than their respectively attached constraints. We show that this model undergoes a non-trivial phase transition when ρ_3 is sufficiently large. The proof consists of a decoupling inequality, the continuity of the probability for local events, and a coarse-graining argument. Joint work with Diogo Santos and Rémy Sanchis.

10/12/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Airlane P. Alencar (Departamento de Estatística, IME, USP).

Título: Modelos GARMA para séries temporais – GARMA modificado e outras distribuições.

Resumo: Em muitos problemas reais, queremos analisar se há mudanças de tendência, sazonalidade e efeitos de covariáveis em séries temporais. Podemos considerar modelos de regressão linear e modelos lineares generalizados, levando em conta a autocorrelação, ajustando os modelos de regressão com erros SARMA e modelos GARMA (Benjamin et al. 2003). Devido à multicolineariedade, propomos um modelo GARMA modificado (Albarracin et al. 2019). Considerando outras distribuições, os modelos GARMA usuais, como a Conway-Maxwell Poisson (Melo e Alencar, 2020), que admitem super, sub e equidispersão. Trabalho em conjunto com: Orlando Y.E. Albarracin (IME-USP), Moizes Melo (UFRN) e Linda Lee Ho (EP-USP).

03/12/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

RHelton Saulo B. Santos (Departamento de Estatística, UnB).

Título: Modelos autorregressivos de duração condicional.

Resumo: Modelos autorregressivos de duração condicional (ACD) são utilizados principalmente para lidar com dados de duração de transações financeiras. Tais dados possuem informações úteis sobre as atividades do mercado. Nesta apresentação, o modelo original ACD e algumas variantes são apresentadas..

26/11/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Rodrigo Lambert (FAMAT, UFU).

Título: A função de sobreposição no contexto da recorrência de Poincaré.

Resumo: O estudo de tempos de primeiro retorno e função de sobreposição em sequências simbólicas tem forte apelo em aplicações. Desde genética com o estudo de sequências de DNA até a teoria da informação com o estudo de algoritmos de compressão, tal assunto se mostra uma ferramenta potencialmente útil para atacar problemas que pertencem a diferentes áreas do conhecimento. Nessa apresentação, começarei motivando e dando as definições de tempo de primeiro retorno e função de sobreposição, e comentarei alguns resultados conhecidos da literatura. Finalmente, apresentarei um resultado recentemente obtido com E. A. Rada-Mora (UFABC).

19/11/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Vera Lúcia Damasceno Tomazella (Departamento de Estatística, UFSCar).

Título: Nonproportional hazards model with a frailty term for modeling subgroups with evidence of long-term survivors: Application to a lung cancer dataset..

Resumo: With advancements in medical treatments for cancer, an increase in the life expectancy of patients undergoing new treatments is expected. Consequently, the field of statistics has evolved to present increasingly flexible models to explain such results better. In this paper, we present a lung cancer dataset with some covariates that exhibit nonproportional hazards (NPHs). Besides, the presence of long‐term survivors is observed in subgroups. The proposed modeling is based on the generalized time‐dependent logistic model with each subgroup’s effect time and a random term effect (frailty). In practice, essential covariates are not observed for several reasons. In this context, frailty models are useful in modeling to quantify the amount of unobservable heterogeneity. The frailty distribution adopted was the weighted Lindley distribution, which has several interesting properties, such as the Laplace transform function on closed form, flexibility in the probability density function, among others. The proposed model allows for NPHs and long‐term survivors in subgroups. We exemplify this model’s use by applying data of patients diagnosed with lung cancer in the state of São Paulo, Brazil.

12/11/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

João Batista de Morais Pereira (DME, UFRJ).

Título: Spatial confounding in hurdle multilevel beta models: the case of the Brazilian Mathematical Olympics for Public Schools.

Resumo: Among the many disparities for which Brazil is known is the difference in performance across students who attend the three administrative levels of Brazilian public schools: federal, state and municipal. Our main goal is to investigate whether student performance in the Brazilian Mathematical Olympics for Public Schools is associated with school administrative level and student gender. For this, we propose a hurdle hierarchical beta model for the scores of students who took the examination in the second phase of these Olympics, in 2013. The mean of the beta model incorporates fixed and random effects at the student and school levels. We explore different distributions for the random school effect. As the posterior distributions of some fixed effects change in the presence, and distribution, of the random school effects, we also explore models that constrain random school effects to the orthogonal complement of the fixed effects. We conclude that male students perform slightly better than female students and that, on average, federal schools perform substantially better than state or municipal schools. However, some of the best municipal and state schools perform as well as some federal schools. We hypothesize that this is due to individual teachers who successfully motivate and prepare their students to perform well in the mathematical Olympics. Joint work with Widemberg Nobre, Igor Silva and Alexandra Schmidt..

05/11/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Fernanda De Bastiani (Departamento de Estatística, UFPE).

Título: Regressão flexível com GAMLSS.

Resumo: Os GAMLSS (Generalized Additive Models for Location, Scale, and Shape) podem ser considerados como ferramenta de regressão apropriada para conjuntos de dados onde a distribuição da variável resposta pode ser uma distribuição paramétrica muito flexível (além de pertencente à família exponencial) e onde todos os parâmetros da distribuição (não apenas a média) podem ser modelados usando ou funções suaves das variáveis explicativas. GAMLSS fornece uma estrutura para abordar problemas como a escolha de uma distribuição apropriada para a variável resposta e explicando como essa distribuição, e seus parâmetros, variam em diferentes valores das variáveis explicativas. considera diferentes termos aditivos para modelar os parâmetros da distribuição, como linear, suavização não paramétrica e termos de efeitos aleatórios. E contém diferentes técnicas de seleção de modelagem e diagnósticos para verificar a adequação do modelo também serão abordados.

29/10/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Caio Lucidius Naberezny Azevedo (Departamento de Estatística, Unicamp).

Título: Bayesian longitudinal item response modeling with multivariate asymmetric serial dependencies.

Resumo: It is usually impossible to impose experimental conditions in large-scale longitudinal (observational) studies in education. This increases the risk of bias due to for instance unobserved heterogeneity, missing background variables, and dropouts. A flexible statistical model is required for the nature of the observational assessment data and to account for the unexplained heterogeneity. A general class of longitudinal item response theory (IRT) models is proposed, where growth in performance can be monitored using a skewed multivariate normal distribution for the latent variables. Change in performance and unexplained heterogeneity is addressed through structured covariance patterns and skewed multivariate latent variable distributions. The Cholesky decomposition of the covariance matrix is considered to model the dependence structure. A novel multivariate skewnormal distribution is defined by the antedependence model with centered skew-normal distributed errors. A hybrid MCMC approach is developed for parameter estimation, model-fit assessment, and model comparison. Results of simulation studies show good parameter recovery. A longitudinal assessment study by the Brazilian federal government is considered to show the performance of the general LIRT model. Joint work with: José Roberto S. Santos (Department of Statistics and Applied Mathematics, Federal University of Ceará – Brazil) and Jean-Paul Fox (Department of Research Methodology, Measurement and Data Analysis, University of Twente – Netherlands).

22/10/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Wagner Hugo Bonat (Departamento de Estatística, UFPR))

Título: Multivariate covariance generalized linear models with applications in R..

Resumo: In this talk I will present a recent proposed framework for non-normal multivariate data analysis called multivariate covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a generalized Kronecker product. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The covariance structure is modelled by means of a covariance link function combined with a matrix linear predictor involving known matrices. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. McGLMs provide a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal data. Furthermore, I present the computational implementation in R through the package mcglm. Illustrations include mixed models, longitudinal data analysis, spatial models for areal data, models to deal with mixed outcomes and multivariate models for count data using the Poisson-Tweedie distribution.

ANO DE 2021 – 1º SEMESTRE

10/09/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Alisson Carlos da Costa Silva (Doutorando em Estatística, DEST/UFMG)

Título: Cure frailty models for survival data: Application to recurrences for breast cancer and to hospital readmissions for colorectal cancer.

Resumo: Owing to the natural evolution of a disease, several events often arise after a first treatment for the same subject. For example, patients with a primary invasive breast cancer and treated with breast conserving surgery may experience breast cancer recurrences, metastases or death. A certain proportion of subjects in the population who are not expected to experience the events of interest are considered to be ‘cured’ or non-susceptible. To model correlated failure time data incorporating a surviving fraction, we compare several forms of cure rate frailty models. In the first model already proposed non-susceptible patients are those who are not expected to experience the event of interest over a sufficiently long period of time. The other proposed models account for the possibility of cure after each event. We illustrate the cure frailty models with two data sets. First to analyse time-dependent prognostic factors associated with breast cancer recurrences, metastases, new primary malignancy and death. Second to analyse successive rehospitalizations of patients diagnosed with colorectal cancer. Estimates were obtained by maximization of likelihood using SAS proc NLMIXED for a piecewise constant hazards model. As opposed to the simple frailty model, the proposed methods demonstrate great potential in modelling multivariate survival data with long-term survivors (‘cured’ individuals). Referência: Rondeau V, Schaffner E, Corbiere F, Gonzalez JR & Mathoulin-Pélissier S (2013), Statistical Methods in Medical Research, 22, 3, 243-260.

03/09/2021, excepcionalmente às 15:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Ming-Hui Chen (University of Connecticut, EUA)

Título: A power prior approach for leveraging external longitudinal and competing risks survival data within the joint modeling framework.

Resumo: In this paper, we propose a new partial borrowing-by-parts power prior for carrying out the analysis of co-longitudinal and survival data within the joint modeling framework. The borrowing-by-parts power prior facilitates borrowing the information from a subset of the data, from a subset of the model parameters, or from the different parts of the joint model. The deviance information criterion is used to quantify the gain in the fit of the current longitudinal and survival data when leveraging external co-data. A Markov chain Monte Carlo sampling algorithm is developed for carrying out Bayesian computations. The proposed methodology is motivated by two large concurrent clinical trials: Selenium and Vitamin E Cancer Prevention Trial (SELECT) and Prostate, Lung, Colon, Ovarian (PLCO) prevention trial. In both trials, the longitudinal biomarkers and competing risks survival data were collected. A detailed analysis of the PLCO and SELECT data is conducted to demonstrate the usefulness of the proposed methodology. This is a joint work with Md. Tuhin Sheikh, Jonathan A. Gelfond, and Joseph G. Ibrahim.

27/08/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Jun Yan (University of Connecticut, EUA)

Título: Brownian motion governed by telegraph process in modeling high-frequency financial series..

Resumo: The classic Markov regime-switching model is a discrete-time model, which cannot naturally handle irregularly spaced time series. We propose a continuous-time regime-switching model with two states. In each state, the process is a Brownian motion with state-specific drift and volatility. The unobserved states are characterized by a telegraph process with exponential holding times, which is a continuous-time Markov process. Inferences for the model parameters with discretely spaced time series are developed on the basis of the hidden Markov model. Closes-form expressions for the likelihood are facilitated with the dynamic programming technique along with occupation time results for telegraph processes. For high-frequency data, a fast approximation reduces the computing time drastically without much accuracy loss. The performance of the method is validated in a simulation study. In application to a collection of stock prices, the model is found to be competitive in comparison to the popular GARCH model.

20/08/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Thais C. O. Fonseca (DME – IM, UFRJ)

Título: Can you render your Lattes? A Bayesian Network modelling of digital preservation risks.

Resumo: Digital records comprise primary sources which may be physical, born-digital or digitised. They are under threat from rapidly evolving technology, outdated policies, and a skills gap across the archives sector. Thus, the preservation of digital material is a challenge for which many archives feel underprepared and ill-equipped. This talk presents the results of the Safeguarding the Nation’s Memory Project which aimed to help archivists manage digital preservation risks through the creation of a new quantitative risk management framework. This project has produced the web-based app DiAGRAM (the Digital Archiving Graphical Risk Assessment Model) which quantifies the effect on preservation risk of various actions and interventions. This work brings Bayesian Network methods into the digital heritage sphere for the first time through close collaboration with specialists in this field. Soft elicitation was used to identify the most likely elements contributing to digital preservation and their interrelations. Where good quality data was not available, expert elicitation based on the IDEA protocol was applied.

13/08/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Fernando F. Nascimento (Depto. de Estatística, UFPI)

Título: Modelo de regressão para cauda e não-cauda de modelos de excessos, aplicado em dados de temperaturas máximas e mínimas.

Resumo: A relação de ocorrências ligadas às alterações climáticas significativas têm crescido nos últimos anos. Essas alterações podem ser influenciadas por um conjunto de covariáveis, como temperatura, localização e tempo em que ocorrem. Analisar a relação existente entre elementos e fatores que influenciam no comportamento de tais eventos é de extrema relevância para a tomada de decisões com a finalidade de minimizar e até mesmo evitar possíveis danos e perdas. Este trabalho é uma extensão do modelo proposto por Behrens et al. (2004) que considera uma distribuição GPD para a cauda e uma distribuição Gama para não cauda, do modelo de Nascimento (2012) que combina a Distribuição de Pareto Generalizada (GPD) para dados acima de um limiar e mistura de Gamas para valores abaixo do limiar, e o modelo de Nascimento et al. (2011) que utiliza estrutura de regressão para análise de valores extremos em todos os parâmetros da cauda. A partir dos dados de temperaturas máximas em cidades dos Estados Unidos e temperaturas mínimas em cidades do Estado do Rio de Janeiro este trabalho foi conduzido com o objetivo de incorporar uma estrutura de regressão para os parâmetros de toda a distribuição, incluindo também os parâmetros da distribuição abaixo da cauda. O modelo proposto consiste em uma distribuição Gama para a estimação dos valores abaixo do limiar e distribuição GPD para valores acima do limiar. A estimação dos parâmetros ocorreu por meio de técnicas MCMC – Markov Chain Monte Carlo. Este modelo apresenta a vantagem de capturar comportamentos característicos de todas as localizações e épocas do ano e fornecer melhor poder preditivo das estimações de medidas importantes em valores extremos como a estimação de quantis extremos.

06/08/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Michelle F. Miranda (University of Victoria, Canadá)

Título: A computationally scalable Bayesian method for simultaneous detection of activation signatures and background connectivity for task fMRI data.

Resumo: Task-based functional magnetic resonance imaging (fMRI) studies are a powerful tool to understand human sensory, cognitive, and emotional processes. To optimally perform a task, the brain enters a task state, and it needs to maintain it throughout the task. It is hypothesized that this is done by brain modulation of task-dependent connection patterns. We will use the term “background connectivity” for the task-dependent modulations that are due to variations in ongoing brain activity instead of stimulus-driven activity. We propose a unified modelling approach to estimate activation signatures and background connectivity in the working-memory task of the Human Connectome Project. Our model involves a new hybrid tensor spatial-temporal basis strategy that enables scalable computing, yet it captures nearby and distant intervoxel correlation and long-memory temporal correlation. The spatial basis is a composite hybrid transform with two levels: the first accounts for within-ROI correlation, and the second between-ROI distant correlation. Our basis space model increases sensitivity for identifying activation signatures, partly driven by the induced background connectivity that itself can be summarized to reveal biological insights.

30/07/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Luis Mauricio Castro Cepero (PUC, Chile)

Título: Modelling point referenced spatial count data: a Poisson process approach.

Resumo: Random fields are useful mathematical tools for representing natural phenomena with complex dependence structures in space and/or time. In particular, the Gaussian random field is commonly used due to its attractive properties and mathematical tractability. However, this assumption seems to be restrictive when dealing with counting data. To deal with this situation, we propose a random field with a Poisson marginal distribution by considering a sequence of independent copies of a random field with an exponential marginal distribution as ‘inter-arrival times’ in the counting renewal processes framework. Our proposal can be viewed as a spatial generalization of the Poisson process. Unlike the classical hierarchical Poisson Log-Gaussian model, our proposal generates a (non)-stationary random field that is mean square continuous and with Poisson marginal distributions. For the proposed Poisson spatial random field, analytic expressions for the covariance function and the bivariate distribution are provided. In an extensive simulation study, we investigate the weighted pairwise likelihood as a method for estimating the Poisson random field parameters. Finally, the effectiveness of our methodology is illustrated by an analysis of reindeer pellet-group survey data, where a zero-inflated version of the proposed model is compared with zero-inflated Poisson Log-Gaussian and Poisson Gaussian copula models.

23/07/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Leonardo Soares Bastos (Fiocruz, Rio de Janeiro)

Título: Nowcasting COVID-19 deaths and hospitalized cases in Brazil

Resumo: The coronavirus disease (COVID-19) pandemic continues to cause a massive burden in the world, especially in countries such as Brazil, with poor implementation of strategies to mitigate the transmission of SARS-CoV-2. The number of cases, severe cases, and deaths by COVID-19 are important indicators of how the COVID-19 epidemic is affecting a particular region and can be used by decision-makers to act in order to reduce morbidity and mortality. However, a common problem with surveillance data is reporting delays, whereby cases and deaths are recorded in the surveillance system days or even weeks after they occurred. Statistical models can estimate the actual number of cases, severe cases, and deaths by COVID-19 accounting for the delays (nowcasting). We proposed a Bayesian hierarchical model to nowcast deaths and hospitalised cases for Brazil and also for the 27 federal units. Finally, we provide some general discussion about the COVID-19 situation in Brazil.

16/07/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Marina Silva Paez (DME – UFRJ, Rio de Janeiro)

Título: Anisotropia através de deformação espacial em diferentes modelos geoestatísticos espaço-temporais.

Resumo: Neste seminário irei apresentar diferentes classes de modelos geostatísticos que lidam com anisotropia por meio de processos de deformação. Em suma, a ideia do procedimento de deformação espacial consiste em fazer uma transformação de R² em R² que mapeia as coordenadas geográficas da região de interesse S (possivelmente anisotrópica) para um novo espaço latente D (isotrópico por construção). A 1ª proposta é a de um modelo geoestatı́stico para fenômenos espaço-temporais univariados que não são estacionários e exibem observações atípicas. Propomos a modelagem através de um processo t-Student para descrever dados com caudas pesadas, com componentes espaciais e temporais separáveis. A variação no tempo é incorporada através de modelos dinâmicos e a componente puramente espacial assume dependência através da especificação de uma função de correlação espacial. Lidamos com a anisotropia através de deformação espacial de Sampson e Guttorp (1992), e, uma vez que adotamos o paradigma Bayesiano, nos baseamos na abordagem de Schmidt e O’Hagan (2003). A 2ª proposta trata de modelos espaço-temporais multivariados. Nos baseamos na modelagem proposta por Paez et al. (2008) que apresenta uma classe de modelos dinâmicos hierárquicos para observações matriz-variadas (no caso a matriz considera as dimensões espaço e tempo). Modelos dinâmicos são mais uma vez propostos para tratar de variações temporais. Com o objetivo de relaxar a hipótese de isotropia assumida no referido trabalho, a presente pesquisa propõe uma extensão para o trabalho de Paez et al. (2008) que permite acomodar superfícies anisotrópicas. A inferência, como já mencionado, é feita sob o ponto de vista Bayesiano, e propomos o uso do MCMC para amostrar da distribuição a posteriori dos parâmetros dos modelos. As modelagens são inicialmente testadas para dados simulados e posteriormente aplicadas a conjuntos de dados ambientais. Colaboradores: Fidel E. C. Morales, Dimitris Politis, Jacek Leskow e Rodrigo Bulhões.

09/07/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Daniel Takata Gomes (ENCE – IBGE, Rio de Janeiro)

Título: Usain Bolt x Michael Phelps: cálculo de índice de desempenho em esportes baseado em teoria de valores extremos.

Resumo: A Federação Internacional de Natação (FINA) utiliza um sistema de pontos que permite comparações de resultados de diferentes provas. Tal sistema é importante por várias razões, pois é utilizado como critério para atribuição de prêmios em competições e para formação de seleções nacionais. Os pontos são atribuídos tendo como referência somente os recordes mundiais das provas oficiais. Neste trabalho é sugerido um novo índice, baseado na distribuição de probabilidade das marcas dos nadadores mais rápidos da história de cada prova. Pela Teoria de Valores Extremos, tal distribuição, sob certas condições, converge para uma distribuição de Pareto generalizada. As comparações são feitas baseadas nas probabilidades de excedência relativas às marcas dos nadadores. Também é feita uma comparação de desempenhos de esportistas de diferentes modalidades, no caso atletismo e natação, com o objetivo de avaliar quem obteve o resultado mais extremo entre dois dos maiores nomes da história do esporte: o jamaicano Usain Bolt e o americano Michael Phelps.

02/07/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Florencia Leonardi (IME – USP, São Paulo)

Título: Detecção de estrutura de interação para campos Markovianos discretos sobre grafos..

Resumo: Os campos aleatórios de Markov discretos sobre grafos, também conhecidos como modelos gráficos na literatura estatística, têm se popularizado nos últimos anos devido à sua flexibilidade para capturar relações de dependência condicional entre variáveis. Eles já foram aplicados a muitos problemas diferentes em campos diferentes, como Biologia, Ciências Sociais ou Neurociências. Os modelos gráficos são, em certo sentido, versões “finitas” de campos aleatórios gerais ou distribuições de Gibbs, modelos clássicos em processos estocásticos e teoria da mecânica estatística. Nesta palestra abordarei o problema de estimação da estrutura de interação das variáveis (dependências condicionais) por meio de um critério de pseudo-verossimilhança penalizada. Primeiro, introduzimos um critério para estimar a vizinhança de interação de um único nó, que posteriormente será combinado com as outras vizinhanças para obter um estimador do grafo subjacente. Mostrarei resultados de consistência do estimador, sem assumir a condição de positividade das probabilidades condicionais como é usualmente assumido na literatura. Estes resultados abrem possibilidades de estender estes modelos a situações de esparsidade, onde muitos parâmetros são nulos. Também apresentarei algumas extensões em andamento destes resultados para processos satisfazendo condições de tipo mixing..

25/06/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Alexandra Mello Schmidt (McGill University, Canadá)

Título: A zero-state coupled Markov switching Poisson model for spatio-temporal infectious disease counts.

Resumo: Spatio-temporal counts of infectious disease cases often contain an excess of zeros. Existing zero inflated Poisson models applied to such data do not adequately capture the switching of the disease between periods of presence and absence overtime. As an alternative, we develop a new zero-state coupled Markov switching Poisson Model, under which the disease switches between periods of presence and absence in each area through a series of partially hidden nonhomogeneous Markov chains coupled between neighboring locations. When the disease is present, an autoregressive Poisson model generates the cases with a possible 0 representing the disease being undetected. Bayesian inference and prediction is illustrated using spatio-temporal counts of dengue fever cases in Rio de Janeiro, Brazil. This is joint work with Dirk Douwes-Schultz.

18/06/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Eduardo Fonseca Mendes (EMAp – FGV, Rio de Janeiro)

Título: Sparsity dependent generalized information criteria for regularized m-estimadors..

Resumo: Resumo: Regularized M-estimators are widely used due to their ability to recover a low-dimensional model in high-dimensional scenarios. Some recent efforts on this subject focused on creating a unified framework for establishing oracle bounds, and deriving conditions for support recovery. Under this same framework, we propose a new Generalized Information Criteria that takes into consideration the sparsity pattern one wishes to recover. We obtain sufficient conditions for model selection consistency of the GIC and path consistency of regularized m-estimators. In other words, we show that under conditions on the penalty function, one may use the GIC for selecting the regularization parameter in a way that the sequence of model subspaces contains the true model with probability converging to one. This allows practical use of the GIC for model selection in high-dimensional scenarios. We illustrate those conditions on examples including LASSO regression and group sparse generalized linear regression.

11/06/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Mariana Cúri (ICMC – USP, São Carlos)

Título: Role of deep learning in multidimensional item theory models with correlated latent variables.

Resumo: Artificial neural networks with a specific autoencoding structure are capable of estimating parameters for the Multidimensional Logistic 2-Parameter (ML2P) model in Item Response Theory, but with limitations, such as uncorrelated latent traits. In this work, we extend variational autoencoders (VAE) to estimate item parameters and correlated latent abilities, and directly compare the ML2P-VAE method to more traditional parameter estimation methods. In addition, we show that the ML2P-VAE method is capable of estimating parameters for models with high numbers of latent variables with low computational cost, where traditional methods are infeasible for high dimensionality.

04/06/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Wagner Barreto de Souza (KAUST, Arábia Saudita)

Título: Flexible bivariate INGARCH process with a broad range of contemporaneous correlation.

Resumo: We propose a novel flexible bivariate conditional Poisson (BCP) INteger-valued Generalized AutoRegressive Conditional Heteroscedastic (INGARCH) model for correlated count time series data. Our proposed BCP-INGARCH model is mathematically tractable and has as the main advantage over existing bivariate INGARCH models its ability to capture a broad range (both negative and positive) of contemporaneous cross-correlation which is a non-trivial advancement. Properties of stationarity and ergodicity for the BCP-INGARCH process are developed. Estimation of the parameters is performed through conditional maximum likelihood (CML) and finite sample behavior of the estimators are investigated through simulation studies. Asymptotic properties of the CML estimators are derived. Additional simulation studies compare and contrast methods of obtaining standard errors of the parameter estimates, where a bootstrap option is demonstrated to be advantageous. Hypothesis testing methods for the presence of contemporaneous correlation between the time series are presented and evaluated. We apply our methodology to monthly counts of hepatitis cases at two nearby Brazilian cities, which are highly cross-correlated. The data analysis demonstrates the importance of considering a bivariate model allowing for a wide range of contemporaneous correlation in real-life applications. Joint work with Luiza S.C. Piancastelli (UCD-Ireland) and Hernando Ombao (KAUST-Saudi Arabia). ArXiv link: https://arxiv.org/pdf/2011.08799.pdf.

28/05/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Eduardo Gutiérrez-Peña (Universidad Nacional Autónoma de México)

Título: General dependence structures for some models based on exponential families with quadratic variance functions.

Resumo: We describe a procedure to introduce general dependence structures on a set of random variables. These include order-q moving average-type structures, as well as seasonal, periodic and spatial dependencies. The invariant marginal distribution can be in any family that is conjugate to an exponential family with quadratic variance functions. Dependence is induced via latent variables whose conditional distribution mirrors the sampling distribution in a Bayesian conjugate analysis of such exponential families. We obtain strict stationarity as a special case. Joint work with Luis E. Nieto-Barajas, ITAM.

ANO DE 2020 – 2º SEMESTRE

26/03/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Danilo Gilberto de Oliveira Valadares (Doutorando – DEST/UFMG)

Título: Optimal maintenance time for repairable systems.

Resumo: Um sistema reparável é aquele que, quando uma falha ocorre, não é descartado e sim restaurado a uma determinada condição de operação após um processo de ajuste/reparo. Neste trabalho, discutiram-se reparos mínimos após a falha e reparos preventivos em tempos pré-determinados, objetivando encontrar o intervalo de realização do ajuste preventivo que minimize o custo esperado de manutenção. Quando um reparo mínimo é efetuado, o equipamento volta a funcionar tão bom quanto velho e, após um ajuste preventivo, o equipamento volta a funcionar tão bom quanto novo. O processo de falha foi modelado por um Processo Poisson Não-Homogêneo com função intensidade de falhas regida pela Lei das Potências, cujos parâmetros foram estimados utilizando a abordagem clássica da estatística. Os resultados foram exemplificados utilizando um banco de dados real com histórico de falhas em transformadores de energia..

26/03/2021 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Renata Mendonça Rodrigues Vasconcelos (Doutoranda – DEST/UFMG)

Título: Risk-adjusted monitoring of time to event in the presence of long-term survivors.

Resumo: Gráficos de Controle são ferramentas muito úteis em Controle Estatístico de Processos (CEP) pois auxiliam na detecção de alterações na qualidade da produção e permitem a investigação das possíveis causas presentes no processo. O CUSUM (cumulative sum) ajustado ao risco surge neste contexto de forma a incorporar o risco específico para cada indivíduo através de estruturas de regressão. Nessa perspectiva, considerou-se neste trabalho uma situação em que pacientes submetidos a procedimentos médicos podem apresentar diferentes riscos de morte dependendo das diferentes características de cada paciente. Foi proposto então o uso de um gráfico de controle CUSUM ajustado ao risco (RAST CUSUM) para o monitoramento do tempo de vida de pacientes, incorporando no seu processo modelos paramétricos usuais em sobrevivência. No entanto, esses modelos não contemplam a possibilidade de cura de um paciente. O gráfico ajustado ao risco RACUF CUSUM foi proposto baseado em um modelo de fração de cura como uma extensão do RAST CUSUM para o monitoramento de dados de sobrevivência com fração de cura. Uma ilustração da carta de controle proposta foi a partir de dados simulados e com um conjunto de dados reais de pacientes com insuficiência cardíaca atendidos no Instituto do Coração (InCor), da Universidade de São Paulo, Brasil..

19/03/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Alvaro Alexander Burbano Moreno (Doutorando – DEST/UFMG)

Título: Hierarchical Bayesian models for predicting spatially correlated curves.

Resumo: A análise de dados funcionais (FDA) surgiu como uma nova área de investigação estatística com diversas aplicações. Na FDA, as unidades são funções ou curvas, em que os dados discretos observados são convertidos em funções usando vários procedimentos de suavização. Esses dados são então analisados usando métodos estatísticos tradicionais para extrair informações das funções. Em certas aplicações da FDA, a suposição de independência condicional é razoável; no entanto, esta suposição pode não ser válida em configurações espaciais. Neste artigo os autores apresentam novos modelos Bayesianos baseados em wavelets para dados funcionais espacialmente correlacionados. Estes modelos permitem regularizar as curvas observadas no espaço e prever curvas em locais não observados. Comparações de desempenho são feitas com várias distribuições a priori para os coeficientes de wavelet e usando um critério preditivo a posteriori. A proposta é ilustrada através de dados medindo a porosidade para diversas profundidades de perfurações no solo.

19/03/2021 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Cássius Henrique Xavier Oliveira (Doutorando – DEST/UFMG)

Título: A Bayesian joint model of recurrent events and a terminal event.

Resumo: Recurrent events could be stopped by a terminal event, which commonly occurs in biomedical and clinical studies. In this situation, dependent censoring is encountered because of potential dependence between these two event processes, leading to invalid inference if analyzing recurrent events alone. The joint frailty model is one of the widely used approaches to jointly model these two processes by sharing the same frailty term. One important assumption is that recurrent and terminal event processes are conditionally independent given the subject‐level frailty; however, this could be violated when the dependency may also depend on time‐varying covariates across recurrences. Furthermore, marginal correlation between two event processes based on traditional frailty modeling has no closed form solution for estimation with vague interpretation. In order to fill these gaps, we propose a novel joint frailty‐copula approach to model recurrent events and a terminal event with relaxed assumptions. Metropolis–Hastings within the Gibbs Sampler algorithm is used for parameter estimation. Extensive simulation studies are conducted to evaluate the efficiency, robustness, and predictive performance of our proposal. The simulation results show that compared with the joint frailty model, the bias and mean squared error of the proposal is smaller when the conditional independence assumption is violated. Finally, we apply our method into a real example extracted from the MarketScan database to study the association between recurrent strokes and mortality.

12/03/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Leonardo Angelo Soares da Silva (Doutorando – DEST/UFMG)

Título: Uma nova cota para o número cromático acíclico de arestas.

Resumo: Resumo: Nesta apresentação, será exposta uma nova cota que foi obtida para o número cromático de aresta acíclica, a'(G), de um grafo G com grau máximo Δ mostrando que tal índice é de a'(G) ≤ 3,569(Δ − 1). Para isso, partiremos do princípio de uma análise probabilística de um algoritmo semelhante, realizado anteriormente por Giotis et al. que obteve a cota a'(G) ≤ 3,74(Δ − 1). Desse modo, os autores revisaram e modificaram ligeiramente o método descrito por Giotis obtendo, com isso, uma melhora no índice cromático a'(G).

05/03/2021 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Larissa Natany Almeida Martins (Doutoranda – DEST/UFMG)

Título: A Bayesian network approach for population synthesis.

Resumo: Agent-based micro-simulation models require a complete list of agents with detailed demographic/socioeconomic information for the purpose of behavior modeling and simulation. This paper introduces a new alternative for population synthesis based on Bayesian networks. A Bayesian network is a graphical representation of a joint probability distribution, encoding probabilistic relationships among a set of variables in an efficient way. Similar to the previously developed probabilistic approach, in this paper, we consider the population synthesis problem to be the inference of a joint probability distribution. In this sense, the Bayesian network model becomes an efficient tool that allows us to compactly represent/reproduce the structure of the population system and preserve privacy and confidentiality in the meanwhile. We demonstrate and assess the performance of this approach in generating synthetic population for Singapore, by using the Household Interview Travel Survey (HITS) data as the known test population. Our results show that the introduced Bayesian network approach is powerful in characterizing the underlying joint distribution, and meanwhile the overfitting of data can be avoided as much as possible.

05/03/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Marcio Augusto Ferreira Rodrigues (Doutorando – DEST/UFMG)

Título: Semiparametric regression analysis of interval-censored competing risks data.

Resumo: Interval-censored competing risks data arise when each study subject may experience an event or failure from one of several causes and the failure time is not observed directly but rather is known to lie in an interval between two examinations. We formulate the effects of possibly time-varying (external) covariates on the cumulative incidence or sub-distribution function of competing risks (i.e., the marginal probability of failure from a specific cause) through a broad class of semiparametric regression models that captures both proportional and non-proportional hazards structures for the sub-distribution. We allow each subject to have an arbitrary number of examinations and accommodate missing information on the cause of failure. We consider nonparametric maximum likelihood estimation and devise a fast and stable EM-type algorithm for its computation. We then establish the consistency, asymptotic normality, and semiparametric efficiency of the resulting estimators for the regression parameters by appealing to modern empirical process theory. In addition, we show through extensive simulation studies that the proposed methods perform well in realistic situations. Finally, we provide an application to a study on HIV-1 infection with different viral subtypes.

26/02/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Ana Júlia Alves Câmara (Doutoranda – DEST/UFMG)

Título: On generalized additive models with dependent time series covariate.

Resumo: The generalized additive model (GAM) is a standard statistical methodology and is frequently used in various fields of applied data analysis where the response variable is non-normal, e.g., integer valued, and the explanatory variables are continuous, typically normally distributed. Standard assumptions of this model, among others, are that the explanatory variables are independent and identically distributed vectors which are not multicollinear. To handle the multicollinearity and serial dependence together a new hybrid model, called GAM-PCA-VAR model, was proposed in [17] which is the combination of GAM with the principal component analysis (PCA) and the vector autoregressive (VAR) model. In this paper, some properties of the GAM-PCA-VAR model are discussed theoretically and verified by simulation. A real data
set is also analysed with the aim to describe the association between respiratory disease and air pollution concentrations.

19/02/2021 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Marta Cristina Colozza Bianchi (Doutoranda – DEST/UFMG)

Título: Modelos de mistura com dependência Markoviana para identificar observações atípicas em série temporal com espaçamento irregular.

Resumo: Neste seminário serão apresentados dois modelos Bayesianos de mistura com dependência Markoviana. A modelagem é motivada por duas aplicações para análise de milhares de medições de expressão gênica, em tumores de alguns tipos de câncer, cujas localizações são mapeadas em cromossomos definindo séries com espaçamento irregular. Este tipo de modelo foi proposto em Mayrink e Gonçalves (2017) com aplicação a dados de microarray, e estendido em Mayrink e Gonçalves (2020) com aplicação a dados RNA-Seq. Em ambos os estudos, o objetivo é identificar observações atípicas. No contexto de microarrays, deseja-se detectar regiões genômicas associadas a valores de alta expressão (superexpressão), que definem clusters de observações consecutivas. Já na análise de RNA-Seq, o objetivo é encontrar dois tipos de regiões cromossômicas: superexpressão e subexpressão. As características de alta ou baixa expressão gênica são importantes para estudar a progressão de um câncer. Através delas identificam-se regiões contendo genes com atividade diferenciada na doença. Em Mayrink e Gonçalves (2017) o modelo desenvolvido considera uma mistura de distribuições com médias ordenadas de forma que o último componente seja responsável por acomodar genes superexpressos. No trabalho de 2020, o primeiro e último componentes da mistura incorporam os genes subexpressos e superexpressos, respectivamente. O modelo é flexível o suficiente para lidar de forma eficiente com o espaçamento irregular dos dados ao usar as informações de distância entre expressões vizinhas para inferir sobre a existência de uma dependência Markoviana. Esta dependência tem papel chave para a detecção das regiões de interesse. A inferência Bayesiana é realizada por meio de amostragem indireta via algoritmos MCMC.

19/02/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Magda Carvalho Pires – DEST/UFMG (Joint work with Milena S. Marcolino, Lucas E. F. Ramos, Rafael T. Silva, Luana M. Oliveira et.al)

Título: ABC₂-SPH risk score for in-hospital mortality in COVID-19 patients: development, external validation and comparison with other available scores.

Resumo: Coronavirus disease 2019 (COVID-19), caused by the SARS-CoV-2 virus, is still the main global health, social and economic challenge. In this context, fast and efficient assessment of prognosis of the disease is needed to optimize the allocation of health care and human resources, to empower early identification and intervention of patients at higher risk of poor outcome. Thus, rapid scoring systems, which combine different variables to estimate the risk of poor outcome, may be extremely helpful for fast and effective assessment of those patients in the emergency department. Following international guidelines, generalized additive models and LASSO logistic regression were performed to develop a prediction model for in-hospital mortality, based on the 3978 patients that were admitted during March-July, 2020. The model was validated in the 1054 patients admitted during August-September 30, as well as in an external cohort of 474 Spanish patients. Our ABC2-SPH score showed good discrimination, calibration and overall performance in both Brazilian cohorts, but, in the Spanish cohort, mortality was somewhat underestimated in patients with very high (>25%) risk. The ABC2-SPH score is implemented in a freely available online risk calculator (https://abc2sph.com/).

Preprint: www.medrxiv.org/content/10.1101/2021.02.01.21250306v1

05/02/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Joseph Lucas (Senior Research Scientist na Caravan Health, EUA)

Título: A practical guide to prediction using temporal event data.

Resumo: We look at some practical aspects to prediction using (potentially high dimensional) temporal event data. The talk will touch on (i) feature extraction, (ii) overfitting, (iii) using a model agnostic approach, (iv) variable importance, and (v) managing computing resources. We will demonstrate the techniques by building models to predict device failures from connected monitors (low dimensional) and to predict end of life events from medical records and claims data (high dimensional).

29/01/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Fabrizio Ruggeri (CNR IMATI,Italy)

Título: Likelihood-Free Parameter Estimation for Dynamic Queueing Networks The case of the immigration queue at an international airport.

Resumo:

Many complex real-world systems such as airport terminals, manufacturing processes and hospitals are modelled with networks of queues. To estimate parameters, restrictive assumptions are usually placed on these models. For instance arrival and service distributions are assumed to be time-invariant. Violating this assumption are so-called dynamic queueing networks (DQNs) which are more realistic but do not allow for likelihood-based parameter estimation. We consider the problem of using data to estimate the parameters of a DQN. The is the first example of Approximate Bayesian Computation (ABC) being used for parameter inference of DQNs. We combine computationally efficient simulation of DQNs with ABC and an estimator for maximum mean discrepancy. DQNs are simulated in a computationally efficient manner with Queue Departure Computation (a simulation techniques we are proposing), without the need for time-invariance assumptions, and parameters are inferred from data without strict data-collection requirements. Forecasts are made which account for parameter uncertainty. We embed this queueing simulation within an ABC sampler to estimate parameters for DQNs in a straightforward manner. We motivate and demonstrate this work with the example of passengers arriving at the passport control in an international airport.

Joint work with Anthony Ebert, Kerrie Mengersen, Paul Wu, Antonietta Mira and Ritabrata Dutta. Available: https://arxiv.org/abs/1804.02526

22/01/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Francisco Cribari-Neto (Departamento de Estatística-UFPE)

Título: Improved testing inferences for beta regressions with parametric mean link function.

Resumo:

Beta regressions are widely used for modeling random variables that assume values in the standard unit interval, (0,1), such as rates, proportions, and income concentration indices. Parameter estimation is typically performed via maximum likelihood and hypothesis testing inferences on the model parameters are commonly performed using the likelihood ratio test. Such a test, however, may deliver inaccurate inferences when the sample size is small. It is thus important to develop alternative testing procedures that are more accurate when the sample contains only few observations. In this paper, we introduce the beta regression model with parametric mean link function and derive two modified likelihood ratio test statistics for that class of models. We provide simulation evidence that shows that the new tests usually outperform the standard likelihood ratio test in samples of small to moderate sizes. We also present and discuss two empirical applications.

15/01/2021 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Manuel Galea (Pontificia Universidad Catolica de Chile)

Título: Robust inference in the Capital Assets Pricing Model using the multivariate t−distribution.

Resumo:

In this work we consider the Capital Asset Pricing Model under the multivariate t−distribution with finite second moment. This distribution, which contain the normal distribution, offer a more flexible framework for modeling asset returns. The main objective is to develop statistical inference tools, such as parameter estimation and linear hypothesis tests in asset pricing models, with an emphasis on the Capital Asset Pricing Model (CAPM). An extension of the CAPM, the Multifactor Asset Pricing Model (MAPM), is also discussed. A simple algorithm to estimate the model parameters, including the kurtosis parameter, is implemented. Analytical expressions for the Score function and Fisher information matrix are provided. For linear hypothesis tests, the four most widely used tests (likelihood-ratio, Wald, score, and gradient statistics) are considered. In order to test the mean-variance efficiency, explicit expressions for these four statistical tests are also presented. The results are illustrated using two real data sets: the Chilean Stock Market data set and another from the New York Stock Exchange. The asset pricing model under the multivariate t-distribution presents a good fit, clearly better than the asset pricing model under the assumption of normality, in both data sets.

08/01/2021 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Ivair Silva (UFOP)

Título: Fixed-Length Confidence Intervals Following a Sequential Test with Binomial Data.

Resumo:

Sample size and time to detect a signal are key performance measures in statistical sequential hypothesis testing. While the former should be optimized in Phase III clinical trials, minimizing the latter is of major importance in post-marked drug and vaccine safety surveillance of adverse events. However, in practice, even when strong evidences indicate that the surveillance could be stopped for drawing a test decision, it may be necessary to continue collecting data in order to improve the precision of the point estimator. For binomial data, this paper presents a linear programming framework to find the optimal alpha spending that provides fixed-width and fixed-accuracy confidence intervals for the relative risk parameter. The solution permits minimization of expected time to signal, or expected sample size as needed. In addition, the method is extended for group sequential testing with variable Bernoulli probabilities. To illustrate, we use simulated data mimicking actual clinical trials on experimental COVID-19 treatments.Fixed-Length Confidence Intervals Following a Sequential Test with Binomial Data.

ANO DE 2020 – 1º SEMESTRE

11/12/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Marcos Prates (DEST-UFMG)

Título: Spatial Confounding Beyond Generalized Linear Mixed Models: Extension to Shared Components and Spatial Frailty Models.

Resumo:

Spatial confounding is defined as the confounding between the fixed and spatial random effects in generalized linear mixed models (GLMMs). It gained attention in the past years, as it may generate unexpected results in modeling. We introduce solutions to alleviate the spatial confounding beyond GLMMs for two families of statistical models. In the shared component models, multiple count responses are recorded at each spatial location, which may exhibit similar spatial patterns. Therefore, the spatial effect terms may be shared between the outcomes in addition to specifics spatial patterns. Our proposal relies on the use of modified spatial structures for each shared component and specific effects. Spatial frailty models can incorporate spatially structured effects and it is common to observe more than one sample unit per area which means that the support of fixed and spatial effects differs. Thus, we introduce a projection-based approach for reducing the dimension of the data. An R package named “RASCO: An R package to Alleviate Spatial Confounding” is provided. Cases of lung and bronchus cancer in the state of California are investigated under both methodologies and the results prove the efficiency of the proposed methodology..

04/12/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Guido Moreira (Pós-Doc, DEST-UFMG)

Título: Analysis of presence-only data via exact Bayes, with model and effects identification.

Resumo: This paper provides an exact modeling approach for the analysis of presence-only ecological data. The approach is also based on frequently used Inhomogeneous Poisson Process but unlike other approaches does not rely on model approximations for performing inference. Exactness is achieved via a data augmentation scheme. One of the augmented processes can be interpreted as the unobserved occurrences of the relevant species and its posterior distribution can be used to make predictions of the species over the region of study beyond the observer bias. The data augmentation also provides a natural Gibbs sampler to make Bayesian inference through MCMC. The proposal shows better AUC prediction metric than the traditional Poisson Process whose intensity function is log-linear with respect to the covariates, which is currently the standard method. Additionally, an identifiability problem that arises in the traditional model does not affect our proposal and this is verified on analyses with real ecological data.

06/11/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Murray Pollock (Newcastle University, UK)

Título: The Restore Process – Practical CFTP by enriching Markov processes.

Resumo: We develop a new class of Markov processes comprising local dynamics governed by a fixed Markov process, which are enriched with regenerations from a fixed distribution at a state-dependent rate. We give conditions under which such processes possess a given target distribution as their invariant measures, thus making them amenable for use within Monte Carlo methodologies. Enrichment imparts a number of desirable theoretical and methodological properties, which includes straightforward conditions for the process to be uniformly ergodic and possess a coupling from the past construction that enables exact sampling from the invariant distribution. Joint work with David Steinsaltz / Gareth Roberts / Andi Wang..

30/10/2020 às 10:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Leonardo Brandão (UFMG-Seminários 1B)

Título: The poly-logWeibull model applied to space-time interpolation of temperature.

Resumo: In this paper, a multivariate log-Weibull model for spatially dependent data is defined by marginalizing a conditional Pareto distribution with respect to a shared spatial random effect of alpha-stable distributions. Some properties of this newmodel are derived, and procedures for the estimation and inference are discussed. An application is developed to study observed temperature data sets collected from weather stations in the Brazilian Amazon.

Paper by A. L. Mota, M. S. De Lima , F. N. Demarqui e L. H. Duczmal

23/10/2020 às 14:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Ramsés H. Mena (IIMAS-UNAM, Mexico)

Título: Beta-binomial stick-breaking non-parametric prior.

Resumo: A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain’s dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model’s label-switching ability. Also, by tuning this parameter, the resulting class contains the Dirichlet process and the Geometric process priors as particular cases, which is of interest for MCMC implementations.

Some properties of the model are discussed and a density estimation algorithm is proposed and tested with simulated datasets.

Reference: Gil-Leyva, M.F., Mena, R.H. and Nicoleris, T. (2020). Beta-Binomial stick-breaking non-parametric prior Electronic Journal of Statistics. 14, 1479-1507. https://doi.org/10.1214/20-EJS1694.

16/10/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Raffaele Argiento (Department of Statistics, Università Cattolica del Sacro Cuore)

Título: Is Infinity That Far? A Bayesian Nonparametric Perspective of Finite Mixture Models.

Resumo: Mixture models are one of the most widely used statistical tools when dealing with data from heterogeneous populations. This talk considers the long-standing debate over finite mixture and infinite mixtures and brings the two modelling strategies together, by showing that a finite mixture is simply a realization of a point process. Following a Bayesian nonparametric perspective, we introduce a new class of prior: the Normalized Independent Point Processes. We investigate the probabilistic properties of this new class. Moreover, we design a conditional algorithm for finite mixture models with a random number of components overcoming the challenges associated with the Reversible Jump scheme and the recently proposed marginal algorithms. We illustrate our model on real data and discuss a relevant application in population genetics.

09/10/2020 às 14:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Peter Müller (University of Texas)

Título: Bayesian Categorical Matrix Factorization via Double Feature Allocation.

Resumo: We propose a categorical matrix factorization method to infer latent diseases from electronic health records data. A latent disease is defined as an unknown cause that induces a set of common symptoms for a group of patients. The proposed approach is based on a novel double feature allocation model which simultaneously allocates features to the rows and the columns of a categorical matrix. Using a Bayesian approach, available prior information on known diseases greatly improves identifiability of latent diseases. This includes known diagnoses for patients and known association of diseases with symptoms. For application to large data sets, as they naturally arise in electronic health records, we develop a divide-and-conquer Monte Carlo algorithm, which allows inference for the proposed double feature allocation model, and a wide range of related Bayesian nonparametric mixture models and random subsets. We validate the proposed approach by simulation studies including mis-specified models and comparison with sparse latent factor models. In an application to Chinese electronic health records (EHR) data, we find results that agree with related clinical and medical knowledge.

References:

1) Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records, Yang Ni, Peter Mueller, Yuan Ji

https://arxiv.org/abs/1809.08988

Journal of the American Statistical Association, in press.

2) Consensus Monte Carlo for Random Subsets using Shared Anchors, Yang Ni, Yuan Ji, and Peter Mueller

https://arxiv.org/abs/1906.12309

Journal of Computational and Graphical Statistics}, in press.

02/10/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Alexandre Galvão Patriota (USP)

Título: Modelos de regressão elípticos com parametrização geral.

Resumo: Neste seminário irei apresentar alguns dos resultados assintóticos desenvolvidos considerando modelos de regressão elípticos com parametrização geral. Estes modelos incluem modelos mistos, modelos não lineares heteroscedásticos, modelos com erros nas variáveis, entre outros.

25/09/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Kelly Cristina Mota Gonçalves (DME-UFRJ)

Título: Bayesian dynamic quantile linear models and some extensions.

Resumo: The main aim of this talk is to present a new class of models, named dynamic quantile linear models. It combines dynamic linear models with distribution free quantile regression producing a robust statistical method. This class of models provides richer information on the effects of the predictors than does the traditional mean regression and it is very insensitive to heteroscedasticity and outliers, accommodating the non-normal errors often encountered in practicalapplications. Bayesian inference for quantile regression proceeds by forming the likelihood function based on the asymmetric Laplace distribution and a location-scale mixture representation of it allows finding analytical expressions for the conditional posterior densities of the model. Thus, Bayesian inference for dynamic quantile linear models can be performed using an efficient Markov chain Monte Carlo algorithm or a fast sequential procedure suited for high-dimensional predictive modeling applications with massive data. Finally, a hierarchical extension, useful to account for structural features in the dataset, will be also presented.

18/09/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Denise Duarte (DEST-UFMG)

Título: Modelos de redes de afinidade

Resumo: Uma das abordagens mais populares atualmente na literatura sobre dados relacionais é a Análise de Redes Complexas. Consequentemente, análises estatísticas sobre redes sociais buscaram acompanhar este crescimento para atender à esta demanda. Para modelar estatisticamente os fenômenos estudados em redes socais, modelos probabilísticos em grafos aleatórios tem sido bastante utilizados. Entretanto, as redes sociais possuem características que são diferentes dos modelos de grafos aleatórios que possuem arestas independentes. A proposta deste trabalho é apresentar e estudar um modelo de grafo aleatório onde as ligações (arestas) são baseadas nas características dos vértices, permitindo uma modelagem mais realista de uma rede. Propomos uma vasta família de modelos, que chamamos de Modelos de Redes de Afinidade, onde as conexões são valoradas segundo uma função que mensura a afinidade entre os atores da rede. Além disso, as conexões são realizadas a partir de um determinado ponto de corte para o valor desta função afinidade, de acordo com o nível de afinidade desejado pelo pesquisador. Para exemplificar o estudo do comportamento do nosso modelo, elaboramos um estudo simulado baseado em simulações de Monte Carlo para uma das funções de afinidade descritas neste trabalho. Realizamos uma calibração nos parâmetros geradores do modelo, analisando suas medidas topológicas, comparando com as medidas topológicas encontradas em grafos com a mesma distribuição de afinidade, mas com arestas sorteadas independentemente. O estudo mostra que o Modelo de Redes de Afinidade consegue capturar características importantes de redes sociais. Trabalho em conjunto com Wesley H.S. Pereira e Rodrigo B. Ribeiro.

11/09/2020 às 13:30 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Flávio Bambirra Gonçalves (DEST-UFMG)

Título: Exact and computationally efficient Bayesian inference for generalized Markov modulated Poisson processes

Resumo: Statistical modeling of point patterns is an important and common problem in several areas. The Poisson process is the most common process used for this purpose, in particular, its generalization that considers the intensity function to be stochastic. This is called a Cox process and different choices to model the dynamics of the intensity gives raise to a wide range of models. We present a new class of unidimensional Cox process models in which the intensity function assumes parametric functional forms that switch among them according to a continuous-time Markov chain. A novel methodology is proposed to perform exact Bayesian inference based on MCMC algorithms. The term exact refers to the fact that no discrete time approximation is used and Monte Carlo error is the only source of inaccuracy. The reliability of the algorithms depends on a variety of specifications which are carefully addressed, resulting in a computationally efficient (in terms of computing time) algorithm and enabling its use with large datasets. Simulated and real examples are presented to illustrate the efficiency and applicability of the proposed methodology. A specific model to fit epidemic curves is proposed and used to analyze data from Dengue Fever in Brazil and COVID-19 in some countries.This is joint work with Livia Dutra and Roger Silva.

04/09/2020 às 13:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Hedibert Freitas Lopes (Insper-SP)

Título: The Illusion of the Illusion of Sparsity

Resumo: The emergence of Big Data raises the question of how to model statistical series when there is a large number of possible regressors. This article addresses the issue by comparing the possibility of using dense or sparse models in a Bayesian approach, allowing for variable selection and shrinkage. We discuss the results reached by Giannone, Lenza, and Primiceri (2018) through a “Spike-and-Slab” prior, which suggest an “illusion of sparsity” in economic datasets, as no clear patterns of sparsity could be found. We make a further revision of the posterior distributions of the model, and propose three experiments to evaluate the robustness of the adopted prior distribution. We find that the model indirectly induces variable selection and shrinkage, what suggests that the “illusion of sparsity” is, itself, an illusion. Note: Joint work with Bruno Vinicius Nunes Fava and was part of his 2019 undergraduate final projection Economics at Insper. Bruno starts his PhD in Economics at Northwestern University in August 2020.

28/08/2020 às 11:00 hs – Local: Canal do Youtube: Seminários DEST – UFMG

Oliver Stone and Theo Economou (Institute for Data Science and Artificial IntelligenceUniversity of Exeter)

Título: Bayesian Hierarchical Frameworks for Correcting Under-reporting and Delayed Reporting of Count Data

Resumo: The Covid-19 pandemic has brought renewed attention on the limitations of systems which report cases and deaths, specifically under-reporting and delayed reporting. In this two-part seminar, we will discuss Bayesian hierarchical approaches to correcting these issues, to enable enhanced monitoring and decision-making. Finally, we will demonstrate how the framework for correcting delayed reporting can be used for now-casting and forecasting of English hospital deaths from Covid-19.

21/08/2020 às 13:30hs – Local: Canal do Youtube: Seminários DEST – UFMG

Rafael Bassi Stern (UFSCar)

Título: CD-Split: Efficient Conformal Regions in High Dimensions

14/08/2020 às 13:30h – Local: Canal do Youtube: Seminários DEST – UFMG

Luiz Max Carvalho (EMAP-FGV)

Título: Adaptive Markov chain Monte Carlo on the space of time-calibrated trees

20/03/2020 às 13:30h – Local: sala 2076 – ICEx

Luiz Max Carvalho (EMAP-FGV)

Título: Efficient transition kernels for Bayesian phylogenetics

13/03/2020 às 13:30h – Local: sala 2076 – ICEx

Vinicius Mayrink (DEST-UFMG)

Título: Structural equation modeling with time dependence: an application comparing Brazilian energy distributors

ANO DE 2019 – 2º SEMESTRE

06/12/2019 às 14:30h – Local: sala 2076 – ICEx

Walmir dos Reis Miranda Filho (DEST-UFMG)

Título: Frailty and Copula Models: Similarities and Differences

04/12/2019 às 14:30h – Local: sala 2076 – ICEx

Daiane Zuanetti (UFSCar)

Título: Subset nonparametric Bayesian clustering – an application in genetic data

04/12/2019 às 13:30h – Local: sala 2076 – ICEx

Rafael Izbicki (UFSCar)

Título: Quantification under prior probability shift: the ratio estimator and its extensions

29/11/2019 às 13:30h – Local: sala 2076 – ICEx

Edson Ferreira (DEST)

Título: Context Tree Estimation for Not Necessarily Finite Memory Processes, Via BIC and MDL

22/11/2019 às 14:30h – Local: sala 2076 – ICEx

Renan Xavier Cortes (Anglo American)

Título: Building open-source tools in Python for Spatio-Temporal Data and Modelling

22/11/2019 às 13:30h – Local: sala 2076 – ICEx

Patricia Viana (DEST-UFMG)

Título: Bayesian Cluster Analysis: Point Estimation and Credible Balls

08/11/2019 às 13:30h – Local: sala 2076 – ICEx

Jussiane Gonçalves (DEST-UFMG)

Título: Zero-inflated mixed Poisson regression models

01/11/2019 às 14:30h – Local: sala 2076 – ICEx

Diogo Carlos dos Santos (UFMG)

Título: O processo de percolação de grau restrito

01/11/2019 às 13:30h – Local: sala 2076 – ICEx

Guilherme Ludwig (UNICAMP)

Título: Interacting cluster point process model for epidermal nerve fiberss

25/10/2019 às 13:30h – Local: sala 2076 – ICEx

Glaura C. Franco (DEST-UFMG)

Título: Non-Gaussian Time Series Models

18/10/2019 às 13:30h – Local: sala 2076 – ICEx

Ronald Dickman (Física-UFMG)

Título: Steady-state thermodynamics and phase coexistence far from equilibrium

11/10/2019 às 13:30h – Local: sala 2076 – ICEx

Dani Gamerman (DEST e UFRJ))

Título: Modelagem hierárquica em problemas de alta dimensão.

04/10/2019 às 13:30h – Local: sala 2076 – ICEx

Fabricio Murai (DCC-UFMG)

Título: Reasoning from Partially Observed Networks: Sampling, Estimation and Models.

27/09/2019 às 13:30h – Local: sala 2076 – ICEx

Ilka Afonso Reis (DEST/UFMG)

Título: Um breve passeio pela Psicometria: minha experiência com validação de instrumentos.

20/09/2019 às 13:00h – Local: sala 2076 – ICEx

Vera Tomazella (UFSCar)

Título: Defective Models Induced By Gamma Frailty Term for Survival Data With Cured Fraction

13/09/2019 às 13:30h – Local: sala 2076 – ICEx

Roberto Nalon (DCC- Big Data)

Título: Detecting Spatial Clusters of Disease Infection Risk Using Sparsely Sampled Social Media Mobility Patterns

11/09/2019 às 11:00h – Local: LCC – ICEx

Ian M Danilevicz (DEST)

Título: An overview of robust spectral estimators

06/09/2019 às 13:30h – Local: Auditório III do – ICEx

SEMINÁRIO CIÊNCIA DE DADOS: McKinsey & Company

Na sexta-feira dia 06 de setembro, o Departamento de Estatística estará recebendo, no Auditório III do ICEx, o Data Scientist, Marcus Watari, e o Data Engineer, Daniel Golhiardi, ambos consultores da McKinsey & Company. Eles apresentarão um Seminário para alunos da Pós-graduação em Estatística, Química, Física, Ciência da Computação, Matemática e Engenharia Elétrica. Mostrarão casos de como a ciência de dados tem sido aplicada em contextos reais em diferentes indústrias e quais são os desafios e possibilidades de atuação na carreira de um Engenheiro e Cientista de Dados.

23/08/2019 às 13:30h – Local: sala 2076 – ICEx

Marcelo Azevedo Costa (Eng. Produção – UFMG)

Título: Failure detection in robotic arms using statistical modeling, machine learning and hybrid gradient boosting

09/08/2019 às 13:30h – Local: sala 2076 – ICEx

Michel Spira (Departamento de Matemática – UFMG)

Título: Matemática e o Homem Vitruviano

01/08/2019 às 14:00h – Local: sala 3060 – ICEx

Alexandre Gaudillière (CNRS-Marseille); Joint work: A. Bianchi (Universita di Padova); P. Milanesi (Universite d’Aix-Marseille); M. E. Vares(UFRJ)

Título: Exponential transition law for the kinetic Ising model.

ANO DE 2019 – 1º SEMESTRE

05/07/2019 às 14:30h – Local: sala 2076 – ICEx

Fernanda Gabriely Batista Mendes (DEST)

SEMINÁRIO 2 – Título: Construção de cadeia de Markov estacionária.

05/07/2019 às 13:30h – Local: sala 2076 – ICEx

Adrian Luna (DEST)

SEMINÁRIO 1 – Título: Redes Aleatórias: alguns desafios.

28/06/2019 às 15:00h – Local: sala 2076 – ICEx

Hernando Ombao – Biostatistics Research Group – STAT Program – King Abdullah University of Science and Technology (KAUST, Saudi Arabia)

Título: Exploring the Dependence Structure in Multivariate Time Series.

28/06/2019 às 14:00h – Local: sala 2076 – ICEx

Hernando Ombao – Biostatistics Research Group – STAT Program – King Abdullah University of Science and Technology (KAUST, Saudi Arabia)

Título: Spectral and Coherence Analysis: Basic Ideas and Applications.

07/06/2019 às 13:30h – Local: sala 2076 – ICEx

Michelle Miranda (University of Victoria no Canada)

Título: Modeling Modern Data Objects: Statistical Methods for Ultra-high Dimensionality and Intricate Correlation Structures.

31/05/2019 às 13:30h – Local: Auditório B 106 – CAD3

Magda Carvalho Pires (DEST-UFMG)

Título: Current status data com censura informativa e erros de classificação

Este seminário é integra a programação do V Encontro Comemorativo do Dia do Estatístico. Por favor inscreva-se: (http://www.est.ufmg.br/diadoestatistico/inscricoes.html)

24/05/2019 às 13:30h – CHICO Soares (Prof. Emérito da UFMG)

Título: Minhas Estatísticas

17/05/2019 às 13:30h – Marcos O. Prates (EST-UFMG)

Título: Assessing spatial confounding in Bayesian shared component disease mapping models via SPOCK: With applications to SEER cancer data

03/05/2019 às 13:30h – Afrânio M C Vieira (UFSCAR)

Título: Modelos de Resposta ao Item modificados para fontes de heterogeneidade conhecidas e desconhecidas.

26/04/2019 às 13:30h – Fábio Nogueira Demarqui (DEST-UFMG)

Título: An unified semiparametric approach to model survival data with crossing survival curves

12/04/2019 às 13:30h – Guilherme Augusto Veloso (PG-EST)

Título: Análise Bayesiana Sequencial de Dados Multivariados de Contagem

29/03/2019 às 14:00h – Frederico R. B. Cruz (DEST-UFMG)

Título: Estimação e Otimização em Filas e Aplicações

29/03/2019 às 13:00h – Euloge Clovis Kenne Pagui (Università di Padova, Itália)

Título: Bias reducing adjusted score functions for monotone likelihood in Cox Regression

21/03/2019 às 14:00h – Nitis Mukhopadhyay (Department of Statistics – University of Connecticut)

Título: On Asymptotic Normality of Standardized Stopping Times with Illustrations

ANO DE 2018 – 2º SEMESTRE

07/12/2018 às 13:30h – Douglas Mateus da Silva

Título: Estimador subsemble espacial para dados massivos em geoestatística.

30/11/2018 às 13:30h – Juliana Vilela Bastos (Coordenadora do Programa Traumatismos Dentários da Faculdade de Odontologia da UFMG)

Título: Metodologia e Estatística na Pesquisa em Traumatismos Dentários

30/11/2018 às 14:30h – Profa. Jussiane Gonçalves (UFMG)

Título: Modelagem de sobredispersão tempo-dependente em dados de contagem longitudinal

23/11/2018 às 10:00h – Prof. Murray Pollock (Un. of Warwick)

Título: Modelo de regressão de Cox com verossimilhança monótona

23/11/2018 às 13:30h – Frederico Machado Almeida

Título: Confusion: Developing an information-theoretic secure approach for multiple parties to pool and unify statistical data, distributions and inferences.

23/11/2018 às 14:30h – Luis Alejandro Másmela Caita

Título: Imputação Múltipla para dados ausentes de maneira não-aleatória

09/11/2018 às 13:30h – Arthur Tarso Rego

Título: Abordagem via Modelos de Espaço de Estados para Séries Temporais Financeiras

26/10/2018 às 14:30h – Guilherme Aguilar

Título: Bayesian linear regression models with flexible error distributions

26/10/2018 às 13:30h – Danna L. Cruz

Título: Spatial disease mapping using Directed Acyclic Graph Auto-Regressive (DAGAR) models

19/10/2018 às 14:30h – Profa. Thais C. O. Fonseca (DME-UFRJ)

Título: Reference Bayesian analysis for hierarchical models

19/10/2018 às 13:30h – Prof. Karthik Bharath (University of Nottingham, UK)

Título: Geometric statistical methods for imaging data

28/09/2018 às 13:30h – Larissa Sayuri Futino C. dos Santos (UFMG)

Título: Ampliando Horizontes: Vendo o mundo com outros olhos

14/09/2018 às 13:30h – Prof. Tohid Ardeshiri (Linköping University, Suécia)

Título: Analytical Approximations for Bayesian Inference

31/08/2018 às 14:30h – Josemar Rodrigues (UFSCar)

Título: Bayesian superposition of pure-birth destructive cure processes for tumor latency

3108/2018 às 13:30h – Reinaldo B. Arellano-Valle (Pontícia Universidad Católica de Chile)

Título: Scale and Shape Mixtures of Multivariate Skew-Normal Distributions

24/08/2018 às 13:30h – Roger W. C. da Silva (DEST)

Título: Dimensional Crossover in Anisotropic Percolation on Z^{d+s}

17/08/2018 (sexta-feira) às 13:30h – Ali Abolhassani (Department of Mathematical Sciences, Isfahan University of Technology, Isfahan, Iran)

Título: Bell Spatial Scan Statistics

08/08/2018 (quarta-feira) às 10:30h – Silvia L. P. Ferrari (USP)

Título: Box-Cox t random intercept model for estimating usual nutrient intake distributions

ANO DE 2018 – 1º SEMESTRE

22/06/2018 às 14:30h – Túlio Lima (Departamento de Estatística – UFMG)

Título: Comparison between risk measures and ruin probability for the calculation of solvency capital for a long-term guarantee.

18/05/2018 às 13:30h – Rodrigo Bernardo da Silva (Departamento de Estatística, UFPb)

Título: Flexible and Robust Mixed Poisson INGARCH Models.

11/05/2018 às 13:30h – Vinicius D. Mayrink (Departamento de Estatística, UFMG)

Título: Estendendo o JAGS: Distribuição exponencial por partes e geoestatística.

04/05/2018 às 13:30h – Caio L. N. Azevedo – Departamento de Estatística, IMECC, Unicamp

Título: Time series and multilevel modeling for longitudinal item response theory data

27/04/2018 às 13:30h – Valdério A. Reisen – UFES

Título: An overview of robust spectral estimators and its applications.

20/04/2018 às 13:30h – Pedro O. S. Vaz de Melo

Título: Futebol e Política não se discutem, se analisam!

13/04/2018 às 13:30h – Carolina Silva Pena – Pró-Reitoria de Graduação – UFMG

Título: A new item response theory model to adjust data allowing examinee choice

ANO DE 2017 – 2º SEMESTRE

01/12/2017 às 13:30h – Milton Pifano (DEST)

Título: Data clustering using generalized spatio-temporal dynamic factor analysis with interactions.

24/11/2017 às 13:30h – Guilherme L. de Oliveira (DEST)

Título: Modelos Partição Produto Espaciais.

24/11/2017 às 14:30h – Gabriela Oliveira (DEST)

Título: Aspectos Probabilísticos da Distribuição Laplace.

17/11/2017 às 13:30h – Alexandre Gaudillière (Aix Marseille Université, CNRS)

Título: Intertwining Wavelets.

17/11/2017 às 14:30h – Douglas Mesquita (DEST)

Título: Confundimento espacial em modelos de fragilidade.

10/11/2017 às 13:30h – Erick Amorim (DEST-UFMG)

Título: Agrupamentos através do processo Dirichlet e o modelo fatorial com interações

10/11/2017 às 14:30h – Rafael Alves (DEST-UFMG)

Título: Markov Graphs.

27/10/2017 às 13:30h – Juliana Freitas de Mello e Silva (DEST-UFMG)

Título: Modelagem conjunta de dados longitudinais e de sobrevivência.

20/10/2017 às 13:30h – Flávio Bambirra Gonçalves (DEST-UFMG)

Título: A Monte Carlo toolbox to solve intractable statistical problems: from retrospective sampling to Bernoulli Factories

29/09/2017 às 13:30h – Gilvan Ramalho Guedes (Depto. De Demografia-UFMG)

Título: Mudanças climáticas e economia: impactos sobre vulnerabilidade regional, oferta de trabalho e demanda por seguro

22/09/2017 às 13:30h – Fernando Quintana (PUC-Chile)

Título: Covariate-Dependent Mixture Models Induced by Determinantal Point Processes and Some Applications

15/09/2017 às 13:30h – Grupo Stats4Good (DEST-UFMG)

Título: Estatística para o Bem

01/09/2017 às 13:30h – Thais Paiva (DEST-UFMG)

Título: Imputation of multivariate continuous data with nonignorable missingness

25/08/2017 às 13:30h – Bernardo Nunes Borges de Lima (MAT-UFMG)

Título: A mágica sequência de de Bruijn

18/08/2017 às 11:10h – Sokol Ndreca (DEST)

Título: Asymptotics for the queueing system with exponentially delayed arrivals

16/08/2017 às 11:30h (excepcionalmente) – Iddo Ben-Ari (University of Connecticut – USA)

Título: Cut-off for a random walk with catastrophes

ANO DE 2017 – 1º SEMESTRE

11/08/2017 às 14:30h – Ying Sun (King Abdullah University of Science and Technology (KAUST),Saudi Arabia)

Título: Visualization and Assessment of Spatio-temporal Covariance Properties

11/08/2017 às 13:30h – Marc G. Genton (King Abdullah University of Science and Technology (KAUST), Saudi Arabia)

Título: Directional Outlyingness for Multivariate Functional Data

07/07/2017 às 13:30h – Bárbara da Costa Campos Dias

Título: Exact Bayesian inference in spatio-temporal Cox processes driven by multivariate Gaussian processes

30/06/2017 às 13:30h – Uriel Moreira Silva

Título: Particle-based Inferente in Hidden Markov Models

23/06/2017 às 13:30h – Prof. Alexandre B. Simas (MAT-UFPb)

Título: Principal Components Analysis for Semimartingales and Stochastic PDE

09/06/2017 às 13:30h – Prof. Adrian P. H. Luna (DEST/UFMG)

Título: Misturas de Distribuições de Gibbs

02/06/2017 às 13:30h – Prof. Bernardo Lanza Queiroz (CEDEPLAR/UFMG)

Título: National and subnational experience with estimating the extent and trend in completeness of registration of deaths in Brazil and other developing countries

19/05/2017 às 13:15h (**Excepcionalmente**) – Prof. Fredy Castellares (DEST/UFMG)

Título: Processo Múltiplo de Poisson e a Distribuição de Bell

28/04/2017 às 13:15h (**Excepcionalmente**) – Prof. Bernardo Nunes Borges de Lima (MAT/UFMG)

Título: A mágica sequência de Bruijn

07/04/2017 às 13:30h – Prof. Marcos Oliveira Prates (DEST/UFMG)

Título: Um passeio por aplicações e problemas em diferentes áreas da Estatística nas quais tenho dedicado o meu tempo.

31/03/2017 às 13:30h – Profa. Denise Duarte (DEST/UFMG)

Título: Inferência para Cadeias de Markov de Alcance Variável Contaminadas Estocasticamente

24/03/2017 às 13:30h – Prof. Renato Martins Assunção (DCC/UFMG)

Título: De Fisher até o “Big Data”: continuidades e descontinuidades

Seminários do DEST

Vídeos mais Recentes do Canal

Seminários Online sobre a COVID-19

Lista de Seminários do DEST

Vinícius Diniz Mayrink

Desde 1976

Contato

Redes Sociais

Seminários do DEST

Vídeos mais Recentes do Canal

Seminários Online sobre a COVID-19​

Lista de Seminários do DEST

Vinícius Diniz Mayrink

Desde 1976

Contato

Redes Sociais

Seminários Online sobre a COVID-19