Create an API key to follow along the guide.
To obtain an API key:
- Go to Settings → API.
- Select “Create new API key”.
- Copy the entire key.
-
This example uses the Dune Python client, a framework for interacting with Dune’s official API. Dune Client can be found here and installed by doing
pip install dune-client
- Credit consumption for each Dune API action is estimated and printed out with a ⛽ symbol. To learn more about credit, please visit this page.
- The network was constructed using one month of Farcaster data, ending on October 23, 2023.
Prerequisite
- This guide assumes you have basic understanding of what a graph, or a network is, including terminology like nodes, edges, weights, and directions.
- If you need help here, please try resources like Medium and YouTube, which are just a Google search away.
- If you need a specific recommendation, check this out, in both written format and video format
Setups and Dataset
Construct Dataset
To build a network, we need nodes and their relationships. Refer to this Dune query for data required.- Nodes are identified by a unique “fid” (Farcaster ID).
- Relationships (edges) between nodes are based on four actions: follow, like, repost, and comment.
- Interaction weights are: follow = 5, like = 1, repost = 3, comment = 2. They consider directionality.
- These interactions as parameterized variables, adjust as needed.
Packages and Imports
Install required packages with:Environment Setup for Dune API Python Client
For the Dune API, obtain an API key:- Go to Settings → API.
- Select “Create new API key”.
- Copy the entire key.
Import Data into Notebook and Construct the DiGraph
There are two methods to export data from Dune and into notebook:- FiGet Latest Resultrst
- Query a Query
- Second
Retrieve the latest result, which is faster as it bypasses execution.
NetworkX
’s from_pandas_edgelist
function.
📊 Part 1: Summary Statistics
Let’s start by assessing our graph’s summary statistics such as density and average degree. Through metrics like density and average degree, we see Farcaster users are sparsely connected. A plotted degree distribution and number of strongly connected components reveal few highly connected users and significant segmentation into isolated clusters. Here’s a brief code snippet. For the comprehensive version, visit the Github notebook.- Density
- Avg Degree
- Degree Distribution
- Strongly Connected Components
📏 Part 2: Centrality Measures
Centrality measures identify important nodes in a network, but “importance” can be defined in many ways. In this guide, we will explore:- Degree Centrality: Gauges a node’s exposure by counting its connected edges.
- Betweenness Centrality: Measures a node’s control over communication between others.
- Closeness Centrality: Assesses how fast information spreads from one node to all others.
- Eigenvector Centrality: Evaluates influence based on the importance of connected nodes.
- PageRank: Used by Google, it considers the structure of incoming links.
🎨 Bonus part: Visualizing Networks
Visualization enhances any analysis, but visualizing extensive networks like ours can be a maze. Let’s navigate two visualization paths you can adapt further:- In Gephi: Gephi is a good tool to presenting large networks comprehensively. Export the graph from Python with nx.write_gml(G, “file_name.gml”) and dive into Gephi. Tweak node sizes, colors, and labels to distill insights.
- In Python notebook: for a more focused lens, visualize the top 200 nodes by degree with NetworkX. Node color denotes out-degree: a blue node engages less, while a red one is more active. Node size illustrates in-degree, reflecting the interactions received.
- Static
- Interactive
🔍 Part 3: Graph Sampling
Graph sampling is essential for handling large networks. Random Node Sampling (RNS) and Random Walk Sampling (RWS) are two simple methods we will cover here. Again, below is a brief code snippet. For the comprehensive version, visit the Github notebook.- Random Node Sampling
- Random Walk Sampling
🏘️ Part 4: Community Detection
Community detection, akin to clustering in traditional data analysis, reveals hidden structures within our sampled graph. While you might lean towards classic algorithms like k-means, specialized methods like the Louvain, which focuses on modularity, offer tailored insights. But remember, there’s no universal solution—your choice should resonate with your graph’s nature and your objectives. Here we illustrate how to use Louvain and Girvan-Newman methods to perform community detection on our sampled network (derived by doing random node sampling).- Louvain
- Girvan-Newman
- Example community detection