Model-Based Clustering and Classification for Data Science: With Applications in R

Bouveyron C

Описание

Model-based Clustering and Classification
for Data Science
Cluster analysis consists of methods for finding groups in data automatically.
Most methods have been heuristic and leave open such central questions as: How
many clusters are there? Which clustering method should I use? How should
I handle outliers? Classification involves assigning new observations to groups given
previously classified observations, and also has open questions about parameter
tuning, robustness and uncertainty assessment. This book frames cluster analysis
and classification in terms of statistical models, thus yielding principled
estimation, testing and prediction methods, and soundly-based answers to the central
questions. It develops the basic ideas of model-based clustering and classification
in an accessible but rigorous wray, using extensive real-world data examples and
providing R code for many methods, and describes modem developments for high-
dimensional data and for networks. It explains recent methodological advances,
such as Bayesian regularization methods, non-Gaussian model-based clustering,
cluster merging, variable selection, semi-supervised classification, robust
classification, clustering of functional data, text and images, and co-clustering. Written for
advanced undergraduates and beginning graduate students in data science, as well as
researchers and practitioners, it assumes basic knowledge of multivariate calculus,
linear algebra, probability and statistics.
charles bouveyron is Professor of Statistics at Universite Cote d'Azur
and the Chair of Excellence in Data Science at Inria Sophia-Antipolis. He has
published extensively on model-based clustering, particularly for networks and high-
dimensional data.
gilles celeux is Director of Research Emeritus at Inria Saclay lle-de-France.
He is one of the founding researchers in model-based clustering, having published
extensively in the area for 35 years.
т. brendan murphy is Professor of Statistics at University College Dublin. His
research interests include model-based clustering, classification, network modeling
and latent variable modeling.
Adrian e. raftery is Professor of Statistics and Sociology at University
of Washington, Seattle. He was one of the founding researchers in model-based
clustering, having published in the area since 1984.

Детали

Год издания
2019
Format
djvu