Network Science with Python

Book description

Discover the use of graph networks to develop a new approach to data science using theoretical and practical methods with this expert guide using Python, printed in color

Key Features

  • Create networks using data points and information
  • Learn to visualize and analyze networks to better understand communities
  • Explore the use of network data in both - supervised and unsupervised machine learning projects
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Network analysis is often taught with tiny or toy data sets, leaving you with a limited scope of learning and practical usage. Network Science with Python helps you extract relevant data, draw conclusions and build networks using industry-standard – practical data sets. You’ll begin by learning the basics of natural language processing, network science, and social network analysis, then move on to programmatically building and analyzing networks. You’ll get a hands-on understanding of the data source, data extraction, interaction with it, and drawing insights from it. This is a hands-on book with theory grounding, specific technical, and mathematical details for future reference. As you progress, you’ll learn to construct and clean networks, conduct network analysis, egocentric network analysis, community detection, and use network data with machine learning. You’ll also explore network analysis concepts, from basics to an advanced level.

By the end of the book, you’ll be able to identify network data and use it to extract unconventional insights to comprehend the complex world around you.

What you will learn

  • Explore NLP, network science, and social network analysis
  • Apply the tech stack used for NLP, network science, and analysis
  • Extract insights from NLP and network data
  • Generate personalized NLP and network projects
  • Authenticate and scrape tweets, connections, the web, and data streams
  • Discover the use of network data in machine learning projects

Who this book is for

Network Science with Python demonstrates how programming and social science can be combined to find new insights. Data scientists, NLP engineers, software engineers, social scientists, and data science students will find this book useful. An intermediate level of Python programming is a prerequisite. Readers from both – social science and programming backgrounds will find a new perspective and add a feather to their hat.

Table of contents

  1. Network Science with Python
  2. Acknowledgements
  3. Contributors
  4. About the author
  5. About the reviewers
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. Download the example code files
    4. Conventions used
    5. Get in touch
    6. Share Your Thoughts
    7. Download a free PDF copy of this book
  7. Part 1: Getting Started with Natural Language Processing and Networks
  8. Chapter 1: Introducing Natural Language Processing
    1. Technical requirements
    2. What is NLP?
    3. Why NLP in a network analysis book?
    4. A very brief history of NLP
    5. How has NLP helped me?
      1. Simple text analysis
      2. Community sentiment analysis
      3. Answer previously unanswerable questions
      4. Safety and security
    6. Common uses for NLP
      1. True/False – Presence/Absence
      2. Regular expressions (regex)
      3. Word counts
      4. Sentiment analysis
      5. Information extraction
      6. Community detection
      7. Clustering
    7. Advanced uses of NLP
      1. Chatbots and conversational agents
      2. Language modeling
      3. Text summarization
      4. Topic discovery and modeling
      5. Text-to-speech and speech-to-text conversion
      6. MT
      7. Personal assistants
    8. How can a beginner get started with NLP?
      1. Start with a simple idea
      2. Accounts that post most frequently
      3. Accounts mentioned most frequently
      4. Top 10 data science hashtags
      5. Additional questions or action items from simple analysis
    9. Summary
  9. Chapter 2: Network Analysis
    1. The confusion behind networks
    2. What is this network stuff?
      1. Graph theory
      2. Social network analysis
      3. Network science
    3. Resources for learning about network analysis
      1. Notebook interfaces
      2. IDEs
      3. Network datasets
      4. Kaggle datasets
      5. NetworkX and scikit-network graph generators
      6. Creating your own datasets
      7. NetworkX and articles
    4. Common network use cases
      1. Mapping production dataflow
      2. Mapping community interactions
      3. Mapping literary social networks
      4. Mapping historical social networks
      5. Mapping language
      6. Mapping dark networks
      7. Market research
      8. Finding specific content
      9. Creating ML training data
    5. Advanced network use cases
      1. Graph ML
      2. Recommendation systems
    6. Getting started with networks
      1. Example – K-pop implementation
    7. Summary
    8. Further reading
  10. Chapter 3: Useful Python Libraries
    1. Technical requirements
    2. Using notebooks
    3. Data analysis and processing
      1. pandas
      2. NumPy
    4. Data visualization
      1. Matplotlib
      2. Seaborn
      3. Plotly
    5. NLP
      1. Natural Language Toolkit
      2. Setup
      3. Starter functionality
      4. Documentation
      5. spaCy
    6. Network analysis and visualization
    7. NetworkX
      1. scikit-network
    8. ML
      1. scikit-learn
      2. Karate Club
      3. spaCy (revisited)
    9. Summary
  11. Part 2: Graph Construction and Cleanup
  12. Chapter 4: NLP and Network Synergy
    1. Technical requirements
    2. Why are we learning about NLP in a network book?
    3. Asking questions to tell a story
    4. Introducing web scraping
      1. Introducing BeautifulSoup
      2. Loading and scraping data with BeautifulSoup
    5. Choosing between libraries, APIs, and source data
    6. Using NLTK for PoS tagging
    7. Using spaCy for PoS tagging and NER
      1. SpaCy PoS tagging
      2. SpaCy NER
    8. Converting entity lists into network data
    9. Converting network data into networks
    10. Doing a network visualization spot check
    11. Additional NLP and network considerations
      1. Data cleanup
      2. Comparing PoS tagging and NER
      3. Scraping considerations
    12. Summary
  13. Chapter 5: Even Easier Scraping!
    1. Technical requirements
    2. Why cover Requests and BeautifulSoup?
      1. Introducing Newspaper3k
      2. What is Newspaper3k?
      3. What are Newspaper3k’s uses?
    3. Getting started with Newspaper3k
      1. Scraping all news URLs from a website
      2. Scraping a news story from a website
      3. Scraping nicely and blending in
      4. Converting text into network data
      5. End-to-end Network3k scraping and network visualization
    4. Introducing the Twitter Python Library
      1. What is the Twitter Python Library?
      2. What are the Twitter Library’s uses?
      3. What data can be harvested from Twitter?
      4. Getting Twitter API access
      5. Authenticating with Twitter
      6. Scraping user tweets
      7. Scraping user following
      8. Scraping user followers
      9. Scraping using search terms
      10. Converting Twitter tweets into network data
      11. End-to-end Twitter scraping
    5. Summary
  14. Chapter 6: Graph Construction and Cleaning
    1. Technical requirements
    2. Creating a graph from an edge list
      1. Types of graphs
      2. Summarizing graphs
    3. Listing nodes
    4. Removing nodes
    5. Quick visual inspection
    6. Adding nodes
      1. Adding edges
    7. Renaming nodes
    8. Removing edges
    9. Persisting the network
    10. Simulating an attack
    11. Summary
  15. Part 3: Network Science and Social Network Analysis
  16. Chapter 7: Whole Network Analysis
    1. Technical requirements
    2. Creating baseline WNA questions
      1. Revised SNA questions
      2. Social network analysis revisited
    3. WNA in action
      1. Loading data and creating networks
      2. Network size and complexity
      3. Network visualization and thoughts
      4. Important nodes
      5. Degrees
      6. Degree centrality
      7. Betweenness centrality
      8. Closeness centrality
      9. PageRank
      10. Edge centralities
    4. Comparing centralities
    5. Visualizing subgraphs
    6. Investigating islands and continents – connected components
      1. Communities
      2. Bridges
    7. Understanding layers with k_core and k_corona
      1. k_core
      2. k_corona
    8. Challenge yourself!
    9. Summary
  17. Chapter 8: Egocentric Network Analysis
    1. Technical requirements
    2. Egocentric network analysis
      1. Uses for egocentric network analysis
      2. Explaining the analysis methodology
    3. Investigating ego nodes and connections
      1. Ego 1 – Valjean
      2. Ego 2 – Marius
      3. Ego 3 – Gavroche
      4. Ego 4 – Joly
      5. Insights between egos
    4. Identifying other research opportunities
    5. Summary
  18. Chapter 9: Community Detection
    1. Technical requirements
    2. Introducing community detection
    3. Getting started with community detection
    4. Exploring connected components
    5. Using the Louvain method
      1. How does it work?
      2. The Louvain method in action!
    6. Using label propagation
      1. How does it work?
      2. Label propagation in action!
    7. Using the Girvan-Newman algorithm
      1. How does it work?
      2. Girvan-Newman algorithm in action!
    8. Other approaches to community detection
    9. Summary
  19. Chapter 10: Supervised Machine Learning on Network Data
    1. Technical requirements
    2. Introducing ML
    3. Beginning with ML
    4. Data preparation and feature engineering
      1. Degrees
      2. Clustering
      3. Triangles
      4. Betweenness centrality
      5. Closeness centrality
      6. PageRank
      7. Adjacency matrix
      8. Merging DataFrames
      9. Adding labels
    5. Selecting a model
    6. Preparing the data
    7. Training and validating the model
    8. Model insights
    9. Other use cases
    10. Summary
  20. Chapter 11: Unsupervised Machine Learning on Network Data
    1. Technical requirements
    2. What is unsupervised ML?
    3. Introducing Karate Club
    4. Network science options
    5. Uses of unsupervised ML on network data
      1. Community detection
      2. Graph embeddings
    6. Constructing a graph
    7. Community detection in action
      1. SCD
    8. Graph embeddings in action
      1. FEATHER
      2. NodeSketch
      3. RandNE
      4. Other models
    9. Using embeddings in supervised ML
      1. Pros and cons
      2. Loss of explainability and insights
      3. An easier workflow for classification and clustering
    10. Summary
  21. Index
    1. Why subscribe?
  22. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Network Science with Python
  • Author(s): David Knickerbocker
  • Release date: February 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781801073691