Dataset Summary
This published dataset consisting of 4 Google+ snapshots is a subset of the dataset studied in our IMC'12 paper. Each snapshot includes both directed social structure and node attributes, which can be represented by the following Social-Attribute Network. Snapshots 3 and 4 were crawled after Google+ was opened to the public.

#Social nodes | #Social links | #Attri nodes | #Attri links | Crawled time | TimeID | |
---|---|---|---|---|---|---|
snapshot 1 | 4,693,129 | 47,130,325 | 991,545 | 3,644,103 | Jul., 2011 | 0 |
snapshot 2 | 17,091,929 | 271,915,755 | 3,108,141 | 14,693,125 | Aug., 2011 | 1 |
snapshot 3 | 26,244,659 | 410,445,770 | 4,147,389 | 19,344,382 | Sep., 2011 | 2 |
snapshot 4 | 28,942,911 | 462,994,069 | 4,443,631 | 20,592,962 | Oct., 2011 | 3 |
Dataset Format
Directed social structure
UserIDFrom UserIDTo TimeID
Each line corresponds to a directed link. UserIDs are anonimyzed to be integers starting from 0. TimeID is 0, 1, 2 or 3, indicating the snapshot in which this directed link first appears.
Node attributes
UserID AttriID TimeID
Each line corresponds to an undirected attribute link. AttriID are anonimyzed to be negative integers starting from -1. Again, TimeID is 0, 1, 2 or 3, indicating the snapshot in which this link firstappears.
Attribute types
AttriID AttriType
Each line corresponds to an attribute. AttriType could be employer, school, major or places_lived.
Reconstructing the tth Snapshot
To obtain the tth snapshot, you should keep all edges whose TimeIDs are less than t, where t=1,2,3,4.
Papers
- Neil Zhenqiang Gong and Wenchang Xu. "Reciprocal versus Parasocial Relationships in Online Social Networks". Springer Social Network Analysis and Mining (SNAM), 4(1), 2014.
- Neil Zhenqiang Gong, Wenchang Xu, Ling Huang, Prateek Mittal, Emil Stefanov, Vyas Sekar, and Dawn Song. "Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+". ACM/USENIX Internet Measurement Conference (IMC), 2012. Acceptance rate: 45/183=24.6%.
- Neil Zhenqiang Gong, Ameet Talwalkar, Lester Mackey, Ling Huang, Richard Shin, Emil Stefanov, Elaine Shi, and Dawn Song. "Jointly Predicting Links and Inferring Attributes using a Social-Attribute Network (SAN)". In ACM Workshop on Social Network Mining and Analysis (SNA-KDD), co-located with KDD, 2012.
Downloading the Dataset
Click here to download.
Note: the raw dataset was originally crawled by Emil Stefanov, Richard Shin, and Elaine Shi, and then processed by Neil Gong.