Advances in Data Science

Book description

Data science unifies statistics, data analysis and machine learning to achieve a better understanding of the masses of data which are produced today, and to improve prediction. Special kinds of data (symbolic, network, complex, compositional) are increasingly frequent in data science. These data require specific methodologies, but there is a lack of reference work in this field.

Advances in Data Science fills this gap. It presents a collection of up-to-date contributions by eminent scholars following two international workshops held in Beijing and Paris. The 10 chapters are organized into four parts: Symbolic Data, Complex Data, Network Data and Clustering. They include fundamental contributions, as well as applications to several domains, including business and the social sciences.

Table of contents

  1. Cover
  2. Preface
  3. Part 1: Symbolic Data
    1. 1 Explanatory Tools for Machine Learning in the Symbolic Data Analysis Framework
      1. 1.1. Introduction
      2. 1.2. Introduction to Symbolic Data Analysis
      3. 1.3. Symbolic data tables from Dynamic Clustering Method and EM
      4. 1.4. Criteria for ranking individuals, classes and their bar chart descriptive symbolic variables
      5. 1.5. Two directions of research
      6. 1.6. Conclusion
      7. 1.7. References
    2. 2 Likelihood in the Symbolic Context
      1. 2.1. Introduction
      2. 2.2. Probabilistic setting
      3. 2.3. Parametric models for p = 1
      4. 2.4. Nonparametric estimation for p = 1
      5. 2.5. Density models for p ≥ 2
      6. 2.6. Conclusion
      7. 2.7. References
    3. 3 Dimension Reduction and Visualization of Symbolic Interval-Valued Data Using Sliced Inverse Regression
      1. 3.1. Introduction
      2. 3.2. PCA for interval-valued data and the sliced inverse regression
      3. 3.3. SIR for interval-valued data
      4. 3.4. Projections and visualization in DR subspace
      5. 3.5. Some computational issues
      6. 3.6. Simulation studies
      7. 3.7. A real data example: face recognition data
      8. 3.8. Conclusion and discussion
      9. 3.9. References
    4. 4 On the “Complexity” of Social Reality. Some Reflections About the Use of Symbolic Data Analysis in Social Sciences
      1. 4.1. Introduction
      2. 4.2. Social sciences facing “complexity”
      3. 4.3. Symbolic data analysis in the social sciences: an example
      4. 4.4. Conclusion
      5. 4.5. References
  4. Part 2: Complex Data
    1. 5 A Spatial Dependence Measure and Prediction of Georeferenced Data Streams Summarized by Histograms
      1. 5.1. Introduction
      2. 5.2. Processing setup
      3. 5.3. Main definitions
      4. 5.4. Online summarization of a data stream through CluStream for Histogram data
      5. 5.5. Spatial dependence monitoring: a variogram for histogram data
      6. 5.6. Ordinary kriging for histogram data
      7. 5.7. Experimental results on real data
      8. 5.8. Conclusion
      9. 5.9. References
    2. 6 Incremental Calculation Framework for Complex Data
      1. 6.1. Introduction
      2. 6.2. Basic data
      3. 6.3. Incremental calculation of complex data
      4. 6.4. Simulation studies
      5. 6.5. Conclusion
      6. 6.6. Acknowledgment
      7. 6.7. References
  5. Part 3: Network Data
    1. 7 Recommender Systems and Attributed Networks
      1. 7.1. Introduction
      2. 7.2. Recommender systems
      3. 7.3. Social networks
      4. 7.4. Using social networks for recommendation
      5. 7.5. Experiments
      6. 7.6. Perspectives
      7. 7.7. References
    2. 8 Attributed Networks Partitioning Based on Modularity Optimization
      1. 8.1. Introduction
      2. 8.2. Related work
      3. 8.3. Inertia based modularity
      4. 8.4. I-Louvain
      5. 8.5. Incremental computation of the modularity gain
      6. 8.6. Evaluation of I-Louvain method
      7. 8.7. Conclusion
      8. 8.8. References
  6. Part 4: Clustering
    1. 9 A Novel Clustering Method with Automatic Weighting of Tables and Variables
      1. 9.1. Introduction
      2. 9.2. Related Work
      3. 9.3. Definitions, notations and objective
      4. 9.4. Hard clustering with automated weighting of tables and variables
      5. 9.5. Applications: UCI data sets
      6. 9.6. Conclusion
      7. 9.7. References
    2. 10 Clustering and Generalized ANOVA for Symbolic Data Constructed from Open Data
      1. 10.1. Introduction
      2. 10.2. Data description based on discrete (membership) distributions
      3. 10.3. Clustering
      4. 10.4. Generalized ANOVA
      5. 10.5. Conclusion
      6. 10.6. References
  7. List of Authors
  8. Index
  9. End User License Agreement

Product information

  • Title: Advances in Data Science
  • Author(s): Edwin Diday, Rong Guan, Gilbert Saporta, Huiwen Wang
  • Release date: February 2020
  • Publisher(s): Wiley-ISTE
  • ISBN: 9781786305763