metapath2vec: Scalable Representation Learning for Heterogeneous Networks


Paper Information

pdf | slides | poster | video

Download all data & code in one zip file---metapath2vec.zip 6.7 GB, including the following B, C, D, E, and F parts (DropBox | Baidu Cloud)).

Bibtex:
@inproceedings{dong2017metapath2vec,
title={metapath2vec: Scalable Representation Learning for Heterogeneous Networks},
author={Dong, Yuxiao and Chawla, Nitesh V and Swami, Ananthram},
booktitle={KDD '17},
pages={135--144},
year={2017},
organization={ACM}
}

Citation:
Yuxiao Dong, Nitesh V. Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable Representation Learning for Heterogeneous Networks. In KDD'17. 135–144.


A. Raw Network Data

1. AMiner Computer Science (CS) Data: The CS dataset consists of 1,693,531 computer scientists and 3,194,405 papers from 3,883 computer science venues---both conferences and journals---held until 2016. We construct a heterogeneous collaboration network, in which there are three types of nodes: authors, papers, and venues.

Citation: Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In KDD'08. 990–998. Œ

2. Database and Information System (DBIS) Data: The DBIS dataset was constructed and used by Sun et al. It covers 464 venues, their top-5000 authors, and corresponding 72,902 publications. We also construct the heterogeneous collaboration networks from DBIS wherein a link may connect two authors, one author and one paper, as well as one paper and one venue.

Citation: Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB'11. 992–1003.


B. Cleaned Network Data for Generating Paths

1. AMiner CS Collaboration Network (net_aminer.zip 146 MB)
2. DBIS Collaboration Network (net_dbis.zip 3MB)

C. Generated Paths by Meta-Path Based Random Walkers

0. python code for generating aca meta-paths (py4genMetaPaths.py 4KB)
1. AMiner CAC Meta-Paths (in_aminer.zip 4.5 GB)
2. DBIS CAC Meta-Paths (in_dbis.zip 315 MB)

D. Code---metapath2vec & metapath2vec++ (code_metapath2vec.zip 40 KB)


E. Latent Vector Representations Learned by metapath2vec & metapath2vec++

1. AMiner CS Node Representations (out_aminer.zip 1.7 GB)
metapath2vec++: m2vpp.aminer2017.w1000.l100.txt.size128.window7.negative5.txt
metapath2vec: m2v.aminer2017.w1000.l100.txt.size128.window7.negative5.txt

2. DBIS Node Representations (out_dbis.zip 61 MB)
metapath2vec++: m2vpp.dbis.w1000.l100.txt.size128.window7.negative5.txt
metapath2vec: m2v.dbis.w1000.l100.txt.size128.window7.negative5.txt


F. Ground Truth Labeled by Google Scholar Metrics 2016 for Multi-Label Node Classification and Clustering

GS-Labeled results for AMiner Data (label.zip 2 MB)
C1: 8-area 133 venues label file
C2: 8-area 246,678 authors label file

1. Computing Systems: 1.1 IEEE Trans. Parallel Distrib. Syst.; 1.2 NSDI; 1.3 Future Generation Comp. Syst.; 1.4 ISCA; 1.5 ASPLOS; 1.6 SC; 1.7 CLOUD; 1.8 HPCA; 1.9 FAST; 1.10 MICRO; 1.11 IPDPS; 1.12 SIGMETRICS Performance Evaluation Review; 1.13 EuroSys; 1.14 SoCC; 1.15 IEEE Trans. Services Computing; 1.16 ICDCS; 1.17 USENIX Annual Technical Conference; 1.18 J. Parallel Distrib. Comput.; 1.19 CCGRID

2. Theoretical Computer Science: 2.1 STOC; 2.2 FOCS; 2.3 SODA; 2.4 SIAM J. Comput.; 2.5 J. Comput. Syst. Sci.; 2.6 Theor. Comput. Sci.; 2.7 ICALP; 2.8 Algorithmica; 2.9 Logical Methods in Computer Science; 2.10 J. Autom. Reasoning; 2.11 SPAA; 2.12 Random Struct. Algorithms; 2.13 ACM Trans. Algorithms; 2.14 Theory of Computing; 2.15 STACS

3. Computer Networks & Wireless Communication: 3.1 IEEE Communications Magazine; 3.2 IEEE Communications Surveys and Tutorials; 3.3 IEEE Trans. Wireless Communications; 3.4 INFOCOM; 3.5 IEEE Journal on Selected Areas in Communications; 3.6 IEEE Trans. Vehicular Technology; 3.7 SIGCOMM; 3.8 IEEE Trans. Mob. Comput.; 3.9 IEEE Trans. Communications; 3.10 IEEE/ACM Trans. Netw.; 3.11 IEEE Wireless Commun.; 3.12 J. Network and Computer Applications; 3.13 Computer Networks; 3.14 IEEE Communications Letters; 3.15 Computer Communications; 3.16 ICC; 3.17 Internet Measurement Conference; 3.18 GLOBECOM; 3.19 MobiCom

4. Computer Graphics: 4.1 ACM Trans. Graph.; 4.2 IEEE Trans. Vis. Comput. Graph.; 4.3 Comput. Graph. Forum; 4.4 The Visual Computer; 4.5 VAST; 4.6 PacificVis; 4.7 IEEE Computer Graphics and Applications; 4.8 SIGGRAPH; 4.9 SI3D; 4.10 Computer Aided Geometric Design; 4.11 Web3D; 4.12 Graphical Models; 4.13 Eurographics; 4.14 Graphics Interface; 4.15 LDAV; 4.16 GRAPP/IVAPP; 4.17 Journal of Visualization and Computer Animation; 4.18 VRST

5. Human Computer Interaction: 5.1 CHI; 5.2 CSCW; 5.3 UIST; 5.4 UbiComp; 5.5 IEEE Trans. Affective Computing; 5.6 HRI; 5.7 Int. J. Hum.-Comput. Stud.; 5.8 MobileHCI; 5.9 ACM Trans. Comput.-Hum. Interact.; 5.10 Interacting with Computers; 5.11 ICMI; 5.12 ISMAR; 5.13 Int. J. Hum. Comput. Interaction; 5.14 IUI; 5.15 INTERACT; 5.16 Tangible and Embedded Interaction; 5.17 IEEE Trans. Haptics

6. Computational Linguistics: 6.1 ACL; 6.2 EMNLP; 6.3 HLT-NAACL; 6.4 LREC; 6.5 Computational Linguistics; 6.6 EACL; 6.7 COLING; 6.8 Language Resources and Evaluation; 6.9 IJCNLP; 6.10 CoNLL; 6.11 TACL; 6.13 WMT; 6.14 SLT; 6.15 CICLing; 6.16 ICSC; 6.17 RANLP; 6.18 TAC; 6.19 Natural Language Engineering

7. Computer Vision & Pattern Recognition: 7.1 CVPR; 7.2 IEEE Trans. Pattern Anal. Mach. Intell.; 7.3 ICCV; 7.4 IEEE Trans. Image Processing; 7.5 ECCV; 7.6 Pattern Recognition; 7.7 International Journal of Computer Vision; 7.8 Pattern Recognition Letters; 7.9 Computer Vision and Image Understanding; 7.10 Image Vision Comput.; 7.11 ICIP; 7.12 CVPRWorkshops; 7.13 ICCVWorkshops

8. Databases & Information Systems: 8.1 WWW; 8.2 VLDB; 8.3 IEEE Trans. Knowl. Data Eng.; 8.4 SIGMOD Conference; 8.5 ICWSM; 8.6 WSDM; 8.7 ICDE; 8.8 SIGIR; 8.9 CIKM; 8.10 Knowl. Inf. Syst.; 8.11 ACM TIST; 8.12 RecSys; 8.13 VLDBJ.; 8.14 PVLDB