Google+ Dataset

Dataset Summary

This published dataset consisting of 4 Google+ snapshots is a subset of the dataset studied in our IMC'12 paper. Each snapshot includes both directed social structure and node attributes, which can be represented by the following Social-Attribute Network. Snapshots 3 and 4 were crawled after Google+ was opened to the public. 

Table I. Dataset summary
  #Social nodes #Social links #Attri nodes #Attri links Crawled time TimeID
snapshot 1 4,693,129 47,130,325 991,545 3,644,103 Jul., 2011 0
snapshot 2 17,091,929 271,915,755 3,108,141 14,693,125 Aug., 2011 1
snapshot 3 26,244,659 410,445,770 4,147,389 19,344,382 Sep., 2011 2
snapshot 4 28,942,911 462,994,069 4,443,631 20,592,962 Oct., 2011 3


Dataset Format

Directed social structure

UserIDFrom UserIDTo TimeID

Each line corresponds to a directed link. UserIDs are anonimyzed to be integers starting from 0. TimeID is 0, 1, 2 or 3, indicating the snapshot in which this directed link first appears.

Node attributes

UserID AttriID TimeID

Each line corresponds to an undirected attribute link. AttriID are anonimyzed to be negative integers starting from -1. Again, TimeID is 0, 1, 2 or 3, indicating the snapshot in which this link firstappears.

Attribute types

AttriID AttriType

Each line corresponds to an attribute. AttriType could be employer, school, major or places_lived. 

Reconstructing the tth Snapshot

To obtain the tth snapshot, you should keep all edges whose TimeIDs are less than t, where t=1,2,3,4. 



Downloading the Dataset

Click here to download. 
Note: the raw dataset was originally crawled by Emil Stefanov, Richard Shin, and Elaine Shi, and then processed by Neil Gong.