Farcaster Social Network Analysis

Create an API key to follow along the guide. To obtain an API key:

Go to Settings → API.
Select “Create new API key”.
Copy the entire key.

In the age of digital interconnectivity, platforms like Facebook and Instagram leverage the intricate webs of user relationships to drive their strategies, from identifying top influencers to deploying targeted ads to deciding what content to recommend on user’s feed. Yet, obtaining and analyzing real-world social network data is no small feat. For those with a budding interest in graph theory or network analysis, this guide will pave your path. Delve deep as we walk you through analyzing data from Farcaster, a fully decentralized social network, using Python and the Dune API. It’s a unique opportunity that many platforms guard closely. 👣 In this guide, we’ll cover the basics of network analysis in four parts: (1) 📊 summary statistics, (2) 📏 centrality measures, (3) 🔍 graph sampling, and (4) 🏘️ community detection. 📌 Dive into our Github notebook to actively follow the code as you navigate this guide.

This example uses the Dune Python client, a framework for interacting with Dune’s official API. Dune Client can be found here and installed by doing pip install dune-client
Credit consumption for each Dune API action is estimated and printed out with a ⛽ symbol. To learn more about credit, please visit this page.
The network was constructed using one month of Farcaster data, ending on October 23, 2023.

Prerequisite

This guide assumes you have basic understanding of what a graph, or a network is, including terminology like nodes, edges, weights, and directions.
If you need help here, please try resources like Medium and YouTube, which are just a Google search away.
If you need a specific recommendation, check this out, in both written format and video format

Setups and Dataset

Construct Dataset

To build a network, we need nodes and their relationships. Refer to this Dune query for data required.

Nodes are identified by a unique “fid” (Farcaster ID).
Relationships (edges) between nodes are based on four actions: follow, like, repost, and comment.
Interaction weights are: follow = 5, like = 1, repost = 3, comment = 2. They consider directionality.
These interactions as parameterized variables, adjust as needed.

Packages and Imports

Install required packages with:

pip install -r requirements.txt

This is the content in requirements.txt:

pandas
networkx
python-louvain
matplotlib
plotly
python-dotenv
dune-client
nltk

Then, load necessary libraries:

import pandas as pd
import networkx as nx
import community as community_louvain
from networkx import community
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import json
from dune_client.types import QueryParameter
from dune_client.client import DuneClient
from dune_client.query import QueryBase
from multiprocessing import Pool
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from collections import Counter
import random
import itertools
import warnings

nltk.download('stopwords')
nltk.download('punkt')

Environment Setup for Dune API Python Client

For the Dune API, obtain an API key:

Go to Settings → API.
Select “Create new API key”.
Copy the entire key.

Export your API key:

DUNE_API_KEY=<insert your key>
DUNE_API_REQUEST_TIMEOUT=120

Adjust the timeout as necessary.

Import Data into Notebook and Construct the DiGraph

There are two methods to export data from Dune and into notebook:

FiGet Latest Resultrst
Query a Query
Second

Retrieve the latest result, which is faster as it bypasses execution.

query_result = pd.DataFrame(dune.get_latest_result(3078764, max_age_hours=72).result.rows)
# can call get_latest_result_dataframe once newest version of Dune Client is released

To build a DiGraph (graph with directed edges), simply use NetworkX’s from_pandas_edgelist function.

G = nx.from_pandas_edgelist(query_result,
                            source='from_user',
                            target='to_user',
                            edge_attr='total_points',
                            create_using=nx.DiGraph())

With our setup complete, let’s dive into the summary statistics of the social network we’ve constructed.

📊 Part 1: Summary Statistics

Let’s start by assessing our graph’s summary statistics such as density and average degree. Through metrics like density and average degree, we see Farcaster users are sparsely connected. A plotted degree distribution and number of strongly connected components reveal few highly connected users and significant segmentation into isolated clusters. Here’s a brief code snippet. For the comprehensive version, visit the Github notebook.

Density
Avg Degree
Degree Distribution
Strongly Connected Components

density = nx.density(G)
print(f"Density: \{density:.4f}")

📏 Part 2: Centrality Measures

Centrality measures identify important nodes in a network, but “importance” can be defined in many ways. In this guide, we will explore:

Degree Centrality: Gauges a node’s exposure by counting its connected edges.

degree_centrality = nx.degree_centrality(G)

Betweenness Centrality: Measures a node’s control over communication between others.

betweenness_centrality = nx.betweenness_centrality(G, normalized=True) # can add k=1000 when optimization is needed

Closeness Centrality: Assesses how fast information spreads from one node to all others.

closeness_centrality = nx.closeness_centrality(G)

Eigenvector Centrality: Evaluates influence based on the importance of connected nodes.

eigenvector_centrality = nx.eigenvector_centrality(G, max_iter=200)

PageRank: Used by Google, it considers the structure of incoming links.

pagerank = nx.pagerank(G, alpha=0.85)

Again, above is a brief code snippet. For the comprehensive version, visit the Github notebook. 👑 And the influencer crown? Unsurprisingly, it goes to Farcaster’s founder, Dan (dwr.eth). Of course, who else could it be? 😉

🎨 Bonus part: Visualizing Networks

Visualization enhances any analysis, but visualizing extensive networks like ours can be a maze. Let’s navigate two visualization paths you can adapt further:

In Gephi: Gephi is a good tool to presenting large networks comprehensively. Export the graph from Python with nx.write_gml(G, “file_name.gml”) and dive into Gephi. Tweak node sizes, colors, and labels to distill insights.
In Python notebook: for a more focused lens, visualize the top 200 nodes by degree with NetworkX. Node color denotes out-degree: a blue node engages less, while a red one is more active. Node size illustrates in-degree, reflecting the interactions received.

Static
Interactive

# Layout
pos = nx.spring_layout(G)

# Node Colors (heatmap-style based on out-degree)
out_degrees = [G.out_degree(n) for n in G.nodes()]
max_out_degree = max(out_degrees)
colors = plt.cm.coolwarm([d/max_out_degree for d in out_degrees])

# Adjusting alpha for visibility
alpha_values = [0.7 + 0.3 * (d/max_out_degree) for d in out_degrees]

# Node Sizes (based on in-degree)
in_degrees = [G.in_degree(n) for n in G.nodes()]
size_factor = 10  # adjust this value to fit your needs
sizes = [d * size_factor for d in in_degrees]

# Set the figure size
plt.figure(figsize=(15, 10))

# Draw Nodes and Edges
nx.draw_networkx_nodes(G, pos, node_color=colors, node_size=sizes, alpha=alpha_values)
nx.draw_networkx_edges(G, pos, edge_color='gray', alpha=0.5)  # Set edge color to gray or any distinct color you prefer
nx.draw_networkx_labels(G, pos)  # Optional: if you want labels on nodes

plt.title("Top 200 influential users in Farcaster network")
plt.show()

In our interactive plot, briang (large, blue) contrasts with n (smaller, red). It suggests briang attracts more interactions, whereas n is the initiator in recent activity.

🔍 Part 3: Graph Sampling

Graph sampling is essential for handling large networks. Random Node Sampling (RNS) and Random Walk Sampling (RWS) are two simple methods we will cover here. Again, below is a brief code snippet. For the comprehensive version, visit the Github notebook.

Random Node Sampling
Random Walk Sampling

def random_node_sampling(graph, num_nodes, seed=None):
    if seed is not None:
        random.seed(seed)  # Set the random seed if provided
        
    sampled_nodes = random.sample(graph.nodes(), num_nodes)
    return graph.subgraph(sampled_nodes)

🏘️ Part 4: Community Detection

Community detection, akin to clustering in traditional data analysis, reveals hidden structures within our sampled graph. While you might lean towards classic algorithms like k-means, specialized methods like the Louvain, which focuses on modularity, offer tailored insights. But remember, there’s no universal solution—your choice should resonate with your graph’s nature and your objectives. Here we illustrate how to use Louvain and Girvan-Newman methods to perform community detection on our sampled network (derived by doing random node sampling).

Louvain
Girvan-Newman
Example community detection

def louvain_community(G, n=3, draw=True):
    partition = community_louvain.best_partition(G)
    pos = nx.spring_layout(G)
    draw_communities(G, partition, pos, draw)
    num_communities, average_size, sorted_communities = get_community_stats(partition)
    print(f"Number of detected communities: {num_communities}")
    print(f"Average community size: {average_size:.2f} nodes\n")
    print_top_communities(n, sorted_communities)
    return sorted_communities[:n] if n else sorted_communities

With these communities in hand, it’s time to dig deeper. Here we illustrate how to take the top 3 communities detected in above steps, use Dune API and parameters to retrieve social posts of these communities, and perform a word frequency analysis. This will shed light on the unique conversational nuances of each group.

# Create a figure and a grid of subplots
fig, axs = plt.subplots(3, 1, figsize=(20, 15))

for idx, result_list in enumerate(result_louvain_random_node): # replace the results here
    # Joining items and formatting string
    quoted_items = "', '".join(result_list)
    formatted_string = f"'{quoted_items}'"

    # Define the query
    query = QueryBase(
        name="Farcaster texts lookup by handles",
        query_id=3128246, # https://dune.com/queries/3128246
        params=[
            QueryParameter.text_type(name="farcaster_handles", value=formatted_string),
            QueryParameter.number_type(name="past_n_months", value=1),
        ],
    )
    query_result = dune.run_query_dataframe(query=query, performance="large")  # Specify large cluster for faster runtime

    est_credits = query_result.size/1_000 + 20 # 20 credits for large cluster
    print(f"⛽ Estimated credit consumption from this run is {est_credits:,.1f}")


    # Concatenating all text data from the 'text' column into a single string
    text_data = query_result['text'].str.cat(sep=' ')

    # Tokenization
    tokens = word_tokenize(text_data.lower())  # Converts to lowercase

    # Removing stopwords
    tokens = [word for word in tokens if word.isalpha() and word not in stopwords.words('english')]

    # Word frequency
    word_freq = Counter(tokens)

    # Get the most common words and their counts
    words, counts = zip(*word_freq.most_common(10))

    # Plotting on the corresponding subplot
    axs[idx].bar(words, counts)
    axs[idx].set_xlabel('Words')
    axs[idx].set_ylabel('Frequency')
    axs[idx].set_title(f'Word Frequency Analysis - List {idx + 1}')

    # Set the font size of the x-axis labels
    axs[idx].tick_params(axis='x', labelsize=15)  # Change 12 to the desired font size


# Adjust the spacing between the plots
plt.tight_layout()
plt.show()

But of course, with text analysis, you can venture into topic modeling or sentiment analysis. There’s a universe of analytical adventures waiting!

Conclusion

We’ve crafted a primer on conducting social network analysis using Farcaster with the Dune API and Python. Farcaster’s decentralized data offers a unique perspective, especially when mainstream platforms like Facebook restrict access, and even Twitter is tightening its grip. Dive into summary statistics, centrality measures, graph sampling, and community detection using real-world data. Explore how to interact with Dune’s data via its API and seamlessly export for your analysis endeavors. If your curiosity beckons and you’re keen to experiment, let this guide be your helper companion. Let’s get connected! Any questions or thoughts about this guide or ideas for future guide, let me know via LinkedIn, Twitter or Warpcast!

Jump right in

​Prerequisite

​Setups and Dataset

​Construct Dataset

​Packages and Imports

​Environment Setup for Dune API Python Client

​Import Data into Notebook and Construct the DiGraph

​📊 Part 1: Summary Statistics

​📏 Part 2: Centrality Measures

​🎨 Bonus part: Visualizing Networks

​🔍 Part 3: Graph Sampling

​🏘️ Part 4: Community Detection

​Conclusion

Prerequisite

Setups and Dataset

Construct Dataset

Packages and Imports

Environment Setup for Dune API Python Client

Import Data into Notebook and Construct the DiGraph

📊 Part 1: Summary Statistics

📏 Part 2: Centrality Measures

🎨 Bonus part: Visualizing Networks

🔍 Part 3: Graph Sampling

🏘️ Part 4: Community Detection

Conclusion