Graph Data Science with Neo4j

Book description

Supercharge your data with the limitless potential of Neo4j 5, the premier graph database for cutting-edge machine learning

Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Extract meaningful information from graph data with Neo4j's latest version 5
  • Use Graph Algorithms into a regular Machine Learning pipeline in Python
  • Learn the core principles of the Graph Data Science Library to make predictions and create data science pipelines.

Book Description

Neo4j, along with its Graph Data Science (GDS) library, is a complete solution to store, query, and analyze graph data. As graph databases are getting more popular among developers, data scientists are likely to face such databases in their career, making it an indispensable skill to work with graph algorithms for extracting context information and improving the overall model prediction performance.

Data scientists working with Python will be able to put their knowledge to work with this practical guide to Neo4j and the GDS library that offers step-by-step explanations of essential concepts and practical instructions for implementing data science techniques on graph data using the latest Neo4j version 5 and its associated libraries. You'll start by querying Neo4j with Cypher and learn how to characterize graph datasets. As you get the hang of running graph algorithms on graph data stored into Neo4j, you'll understand the new and advanced capabilities of the GDS library that enable you to make predictions and write data science pipelines. Using the newly released GDSL Python driver, you'll be able to integrate graph algorithms into your ML pipeline.

By the end of this book, you'll be able to take advantage of the relationships in your dataset to improve your current model and make other types of elaborate predictions.

What you will learn

  • Use the Cypher query language to query graph databases such as Neo4j
  • Build graph datasets from your own data and public knowledge graphs
  • Make graph-specific predictions such as link prediction
  • Explore the latest version of Neo4j to build a graph data science pipeline
  • Run a scikit-learn prediction algorithm with graph data
  • Train a predictive embedding algorithm in GDS and manage the model store

Who this book is for

If you're a data scientist or data professional with a foundation in the basics of Neo4j and are now ready to understand how to build advanced analytics solutions, you'll find this graph data science book useful. Familiarity with the major components of a data science project in Python and Neo4j is necessary to follow the concepts covered in this book.

Table of contents

  1. Graph Data Science with Neo4j
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share Your Thoughts
    8. Download a free PDF copy of this book
  6. Part 1 – Creating Graph Data in Neo4j
  7. Chapter 1: Introducing and Installing Neo4j
    1. Technical requirements
    2. What is a graph database?
      1. Databases
      2. Graph database
    3. Finding or creating a graph database
      1. A note about the graph dataset’s format
      2. Modeling your data as a graph
    4. Neo4j in the graph databases landscape
      1. Neo4j ecosystem
    5. Setting up Neo4j
      1. Downloading and starting Neo4j Desktop
      2. Creating our first Neo4j database
      3. Creating a database in the cloud – Neo4j Aura
    6. Inserting data into Neo4j with Cypher, the Neo4j query language
    7. Extracting data from Neo4j with Cypher pattern matching
    8. Summary
    9. Further reading
    10. Exercises
  8. Chapter 2: Importing Data into Neo4j to Build a Knowledge Graph
    1. Technical requirements
    2. Importing CSV data into Neo4j with Cypher
      1. Discovering the Netflix dataset
      2. Defining the graph schema
      3. Importing data
    3. Introducing the APOC library to deal with JSON data
      1. Browsing the dataset
      2. Getting to know and installing the APOC plugin
      3. Loading data
      4. Dealing with temporal data
    4. Discovering the Wikidata public knowledge graph
      1. Data format
      2. Query language – SPARQL
    5. Enriching our graph with Wikidata information
      1. Loading data into Neo4j for one person
      2. Importing data for all people
    6. Dealing with spatial data in Neo4j
    7. Importing data in the cloud
    8. Summary
    9. Further reading
    10. Exercises
  9. Part 2 – Exploring and Characterizing Graph Data with Neo4j
  10. Chapter 3: Characterizing a Graph Dataset
    1. Technical requirements
    2. Characterizing a graph from its node and edge properties
      1. Link direction
      2. Link weight
      3. Node type
    3. Computing the graph degree distribution
      1. Definition of a node’s degree
      2. Computing the node degree with Cypher
      3. Visualizing the degree distribution with NeoDash
    4. Installing and using the Neo4j Python driver
      1. Counting node labels and relationship types in Python
      2. Building the degree distribution of a graph
      3. Improved degree distribution
    5. Learning about other characterizing metrics
      1. Triangle count
      2. Clustering coefficient
    6. Summary
    7. Further reading
    8. Exercises
  11. Chapter 4: Using Graph Algorithms to Characterize a Graph Dataset
    1. Technical requirements
    2. Digging into the Neo4j GDS library
      1. GDS content
      2. Installing the GDS library with Neo4j Desktop
      3. GDS project workflow
    3. Projecting a graph for use by GDS
      1. Native projections
      2. Cypher projections
    4. Computing a node’s degree with GDS
      1. stream mode
      2. The YIELD keyword
      3. write mode
      4. mutate mode
      5. Algorithm configuration
      6. Other centrality metrics
    5. Understanding a graph’s structure by looking for communities
      1. Number of components
      2. Modularity and the Louvain algorithm
    6. Summary
    7. Further reading
  12. Chapter 5: Visualizing Graph Data
    1. Technical requirements
    2. The complexity of graph data visualization
      1. Physical networks
      2. General case
    3. Visualizing a small graph with networkx and matplotlib
      1. Visualizing a graph with known coordinates
      2. Visualizing a graph with unknown coordinates
      3. Configuring object display
    4. Discovering the Neo4j Bloom graph application
      1. What is Bloom?
      2. Bloom installation
      3. Selecting data with Neo4j Bloom
      4. Configuring the scene in Bloom
    5. Visualizing large graphs with Gephi
      1. Installing Gephi and its required plugin
      2. Using APOC Extended to synchronize Neo4j and Gephi
      3. Configuring the view in Gephi
    6. Summary
    7. Further reading
    8. Exercises
  13. Part 3 – Making Predictions on a Graph
  14. Chapter 6: Building a Machine Learning Model with Graph Features
    1. Technical requirements
    2. Introducing the GDS Python client
      1. GDS Python principles
      2. Input and output types
      3. Creating a projected graph from Python
    3. Running GDS algorithms from Python and extracting data in a dataframe
      1. write mode
      2. stream mode
      3. Dropping the projected graph
    4. Using features from graph algorithms in a scikit-learn pipeline
      1. Machine learning tasks with graphs
      2. Our task
      3. Computing features
      4. Extracting and visualizing data
      5. Building the model
    5. Summary
    6. Further reading
    7. Exercise
  15. Chapter 7: Automatically Extracting Features with Graph Embeddings for Machine Learning
    1. Technical requirements
    2. Introducing graph embedding algorithms
      1. Defining embeddings
      2. Graph embedding classification
    3. Using a transductive graph embedding algorithm
      1. Understanding the Node2Vec algorithm
      2. Using Node2Vec with GDS
    4. Training an inductive embedding algorithm
      1. Understanding GraphSAGE
      2. Introducing the GDS model catalog
      3. Training GraphSAGE with GDS
    5. Computing new node representations
    6. Summary
    7. Further reading
    8. Exercises
  16. Chapter 8: Building a GDS Pipeline for Node Classification Model Training
    1. Technical requirements
    2. The GDS pipelines
      1. What is a pipeline?
    3. Building and training a pipeline
      1. Creating the pipeline and choosing the features
      2. Setting the pipeline configuration
      3. Training the pipeline
    4. Making predictions
      1. Computing the confusion matrix
    5. Using embedding features
      1. Choosing the graph embedding algorithm to use
      2. Training using Node2Vec
      3. Training using GraphSAGE
    6. Summary
    7. Further reading
    8. Exercise
  17. Chapter 9: Predicting Future Edges
    1. Technical requirements
    2. Introducing the LP problem
      1. LP examples
      2. LP with the Netflix dataset
      3. Framing an LP problem
    3. LP features
      1. Topological features
      2. Features based on node properties
    4. Building an LP pipeline with the GDS
      1. Creating and configuring the pipeline
      2. Pipeline training and testing
    5. Summary
    6. Further reading
  18. Chapter 10: Writing Your Custom Graph Algorithms with the Pregel API in Java
    1. Technical requirements
    2. Introducing the Pregel API
      1. GDS’s features
      2. The Pregel API
    3. Implementing the PageRank algorithm
      1. The PageRank algorithm
      2. Simple Python implementation
      3. Pregel Java implementation
      4. Implementing the tolerance-stopping criteria
    4. Testing our code
      1. Test for the PageRank class
      2. Test for the PageRankTol class
    5. Using our algorithm from Cypher
      1. Adding annotations
      2. Building the JAR file
      3. Updating the Neo4j configuration
      4. Testing our procedure
    6. Summary
    7. Further reading
    8. Exercises
  19. Index
    1. Why subscribe?
  20. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Graph Data Science with Neo4j
  • Author(s): Estelle Scifo
  • Release date: January 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781804612743