Yuxiao Dong
Assistant Professor
Department of Computer Science
Tsinghua University, Beijing
Contact Info | Go Top


Publications | Students | Tutorials | Talks | Services


🍄 🍄 🍄 🍄 🍄 🍄 🍄 🍄

Yuxiao is an Assistant Professor of Computer Science at Tsinghua University, where he is a member of the Knowledge Engineering Group (@ThuKEG). He received his Ph.D. in Computer Science from University of Notre Dame. Before joining Tsinghua, he was a researcher at Meta AI and Microsoft Research Redmond.

His research focuses on data mining, graph representation learning, social & information networks, and foundation models. Together with collaborators, his recent research includes the Heterogeneous Graph Transformer (HGT), network embedding (NetMF, NetSMF, ProNE, SketchNE), graph pre-training (GraphMAE, GraphMAE2, kgTransformer, GPT-GNN, GCC), and language pre-training (GLM-130B, WebGLM, ChatGLM, CodeGeeX) algorithms, some of which were deployed for billion-scale applications in AMiner, Facebook, and Microsoft and nominated for best papers in WWW'22, WWW'19, and WSDM’15. He was selected as one of the IJCAI’22 Early Career Spotlights and received the 2017 ACM SIGKDD Doctoral Dissertation Award Honorable Mention and 2022 ACM SIGKDD Rising Star Award.

More information about his experience and research can be found on LinkedIn and Google Scholar.

He is looking for self-motivated students to work on graph representation learning, graph neural networks, social networks, and LLM / foundation models.

  1. 2023.03: @ThuKEG & partners open-sourced ChatGLM-6B, a model pre-trained over 1T tokens & w/ SFT & RLHF. github huggingface (trending #1 btn Mar. 18--30)
  2. 2023.02: @ThuKEG & partners start to alpha test (invited only) ChatGLM (chatglm.cn). blog (Chinese)
  3. 2022.11: Invited Talks/Tutorials on Graph Representation Learning and Pre-Training at Renmin U., Beijing Jiaotong U., Tsinghua ML Course, etc. Thanks all collaborators! slides
  4. 2022.08: @ThuKEG releases GLM-130B---an open bilingual pre-trained model with 130 billion parameters. Kudos to the team! ICLR'23 paper blog github & model download

Publications

  • all : conference | journal | pre-print | products & best & top*
  • graph representation learning : graph neural nets | heterogeneous & knowledge graphs | pre-training & self-supervised | network embedding | theory | scalability | data & benchmarks
  • social & information networks : user modeling & profiling | link prediction | recommendation | science of science
  • foundation models : pre-trained LLMs | graph pre-training
    1. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation.
      Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, Yuxiao Dong.
      arXiv:2304.05977, 2023. pdf code
    2. WebGLM: Towards An Efficient Web-enhanced Question Answering System with Human Preference.
      Xiao Liu, Hanyu Lai, Yu Hao, Yifan Xu, Aohan Zeng, Zhengxiao Du, Peng Zhang, Yuxiao Dong, Jie Tang.
      KDD'23 (Proc. of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2023.
    3. CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X.
      Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang.
      KDD'23 (Proc. of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2023.
      pdf code&model blog VS Code JetBrains
    4. BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs.
      Zhen Yang, Tinglin Huang, Ming Ding, Yuxiao Dong, Zhitao Ying, Yukuo Cen, Yangliao Geng, Jie Tang.
      KDD'23 (Proc. of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2023.
    5. WinGNN: Dynamic Graph Neural Networks with Random Gradient Aggregation Window.
      Yifan Zhu, Cong Fangpeng, Dan Zhang, Wenwen Gong, Qika Lin, wenzheng feng, Yuxiao Dong, Jie Tang.
      KDD'23 (Proc. of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2023.
    6. GLM-130B: An Open Bilingual Pre-trained Model.
      Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Zhiyuan Liu, Peng Zhang, Yuxiao Dong, Jie Tang.
      ICLR'23 ( In Proceedings of the 11th International Conference on Learning Representations), 2023.
      pdf code&model ChatGLM-6B ChatGLM
    7. SketchNE: Embedding Billion-Scale Networks Accurately in One Hour.
      Yuyang Xie, Yuxiao Dong, Jiezhong Qiu, Wenjian Yu, Xu Feng, Jie Tang.
      TKDE'23 (IEEE Transaction on Knowledge and Data Engineering), 2023.
      pdf code
    8. GraphMAE2: A Decoding-enhanced Masked Self-supervised Graph Learner.
      Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, Jie Tang.
      WWW'23 (Proceedings of The Web Conference 2023), 2023.
      pdf code&data
    9. GraphMAE: Self-Supervised Masked Graph Autoencoders.
      Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, Jie Tang.
      KDD'22 (Proc. of the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2022. Full Research Paper.
      pdf code&data
    10. Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries.
      Xiao Liu, Shiyu Zhao, Kai Su, Yukuo Cen, Jiezhong Qiu, Mengdi Zhang, Wei Wu, Yuxiao Dong, Jie Tang.
      KDD'22 (Proc. of the 28th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2022. Full Research Paper.
      pdf code&data
    11. GRAND+: Scalable Graph Random Neural Networks.
      Wenzheng Feng, Yuxiao Dong, Huang Tinglin, Ziqi Yin, Xu Cheng, Evgeny Kharlamov, Jie Tang.
      WWW'22 (Proceedings of The Web Conference 2022), 2022.
      pdf code
    12. SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs.
      Xiao Liu, Haoyun Hong, Xinghao Wang, Zeyi Chen, Evgeny Kharlamov, Yuxiao Dong, Jie Tang.
      WWW'22 (Proceedings of The Web Conference 2022), 2022.
      pdf code best paper candidate
    13. ClusterSCL: Cluster-Aware Supervised Contrastive Learning on Graphs.
      Yanling Wang, Jing Zhang, Haoyang Li, Yuxiao Dong, Hongzhi Yin, Cuiping Li, Hong Chen.
      WWW'22 (Proceedings of The Web Conference 2022), 2022.
      pdf code
    14. Adaptive Diffusion in Graph Neural Networks.
      Jialin Zhao, Yuxiao Dong, Ming Ding, Evgeny Kharlamov, Jie Tang.
      NeurIPS'21 (Proc. of the 35th Annual Conference on Neural Information Processing Systems), 2021.
      pdf code
    15. OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs.
      Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, Jure Leskovec.
      NeurIPS'21 D&B (Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks), 2021.
      pdf OGB-LSC@KDDCUP 2021 OGB
    16. Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning.
      Qinkai Zheng, Xu Zou, Yuxiao Dong, Yukuo Cen, Da Yin, Jiarong Xu, Yang Yang, Jie Tang.
      NeurIPS'21 D&B (Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks), 2021.
      pdf GRB leaderboard
    17. A Large-Scale Database for Graph Representation Learning.
      Scott Freitas, Yuxiao Dong, Joshua Neil, Duen Horng Chau.
      NeurIPS'21 D&B (Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks), 2021.
      pdf MalNet data
    18. TDGIA: Effective Injection Attacks on Graph Neural Networks
      Xu Zou, Qinkai Zheng, Yuxiao Dong, Xinyu Guan, Evgeny Kharlamov, Jialiang Lu, Jie Tang.
      KDD'21 (Proc. of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2021. Full Research Paper.
      pdf code
    19. Are We Really Making Much Progress? Revisiting, Benchmarking and Refining the Heterogeneous Graph Neural Networks
      Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, Jie Tang.
      KDD'21 (Proc. of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2021. Full Research Paper.
      pdf code&data HGB leaderboard
    20. GPT-GNN: Generative Pre-Training of Graph Neural Networks
      Ziniu Hu, Yuxiao Dong, Kuansan Wang, Kai-Wei Chang, Yizhou Sun
      KDD'20 (Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2020. Full Research Paper.
      pdf code bibtex in fb products! 6th most cited paper in KDD'20
    21. GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
      Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, Jie Tang
      KDD'20 (Proc. of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2020. Full Research Paper.
      pdf code bibtex most cited paper in KDD'20
    22. Open Graph Benchmark: Datasets for Machine Learning on Graphs
      Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec
      NeurIPS'20 (Proc. of the 34th Annual Conference on Neural Information Processing Systems), 2020. Spotlight Paper (385/9454).
      pdf OGB leaderboard
    23. Graph Random Neural Networks for Semi-Supervised Learning on Graphs
      Wenzheng Feng=, Jie Zhang=, Yuxiao Dong, Yu Han, Huanbo Luan, Qian Xu, Qiang Yang, Evgeny Kharlamov, Jie Tang
      NeurIPS'20 (Proc. of the 34th Annual Conference on Neural Information Processing Systems), 2020. Oral Paper (105/9454).
      pdf slides poster code bibtex
    24. Heterogeneous Graph Transformer
      Ziniu Hu, Yuxiao Dong, Kuansan Wang, Yizhou Sun.
      WWW'20 (Proc. of the 2020 Web Conference), short paper, oral.
      pdf data&code (pyG) code (DGL) bibtex in pyG in dgl most cited in WWW'20 in msft & fb products!
    25. Heterogeneous Network Representation Learning
      Yuxiao Dong, Ziniu Hu, Kuansan Wang, Yizhou Sun, Jie Tang.
      IJCAI'20 (Proc. of the 29th International Joint Conference on Artificial Intelligence), 2020.
      pdf slides poster OGB ogbn-mag leaderboard OGB-LSC mag240m leaderboard HGB leaderboard bibtex
    26. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization
      Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang and Jie Tang
      WWW'19 (Proc. of the 2019 Web Conference), 2019. Full paper (Oral).
      pdf code slides poster bibtex best paper candidate in msft products!
    27. ProNE: Fast and Scalable Network Representation Learning
      Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, Ming Ding.
      IJCAI'19 (Proc. of the 28th International Joint Conference on Artificial Intelligence), 2019. Full paper (Oral).
      pdf code bibtex
    28. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs
      Fanjin Zhang, Xiao Liu, Jie Tang, Yuxiao Dong, Peiran Yao, Jie Zhang, Xiaotao Gu, Yan Wang, Bin Shao, Rui Li, Kuansan Wang
      KDD'19 (Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2019. Full Applied Data Science Paper (Oral), 6.4%.
      pdf code data bibtex
    29. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec.
      Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang.
      WSDM'18 (Proc. of the 11th ACM International Conference on Web Search and Data Mining), 2018. Full paper (Oral), 16%.
      pdf code slides bibtex Microsoft Research Blog 2nd most cited paper in WSDM'18
    30. metapath2vec: Scalable Representation Learning for Heterogeneous Networks.
      Yuxiao Dong, Nitesh V. Chawla, Ananthram Swami.
      KDD'17 (Proc. of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), 2017. Full Research Paper (Oral), 8.5%.
      pdf data&code slides poster video bibtex in dgl most cited paper in KDD'17 in msft products!

    Students & Interns

    I'm very lucky to have the opportunities to work with these brilliant students and interns (ordered by year & last name).

    1. 2020, Scott Freitas, 3rd year Ph.D. student at Georgia Tech, summer intern@MSR (co-mentor)
    2. 2020, Ziniu Hu, 2nd year Ph.D. student at UCLA, summer intern@MSR
    3. 2020, Namyong Park, 3rd year Ph.D. student at CMU, summer intern@MSR
    4. 2020, Yu Zhang, 3rd year Ph.D. student at UIUC, summer intern@MSR (co-mentor)
    5. 2019, Ziniu Hu, 1st year Ph.D. student at UCLA, summer intern@MSR
    6. 2019, Jialin Zhao, 1st year Master student at UW&Tsinghua GIX, visiting student@MSR
    7. 2018, Yian Yin, 2nd year Ph.D. student at Northwestern, visiting student@MSR
    8. 2017, Jiezhong Qiu, 1st year Ph.D. student at Tsinghua, summer intern@MSR (with Dr. Hao Ma)

    Tutorials & Guest Lectures

    1. 2020, Guest Lecture, Knowledge Graph, Computer Science Department, Stanford University
    2. 2020, Guest Lecture, Advanced Machine Learning, Computer Science Department, Tsinghua University
    3. 2020, Tutorial (Invited), Graph Representation Learning, ECML/PKDD'20 slides
    4. 2019, Tutorial, Representation Learning on Networks (full day), WWW'19 slides
    5. 2019, Tutorial, Learning from Networks (full day), KDD'19 slides
    6. 2018, Tutorial, Computational Models for Social and Information Network Analysis, KDD'18 slides

    Invited Talks

    1. 2022: Invited Talk at IJCAI Early Career Spotlight
    2. 2021: Invited Talk at Tsinghua University, Computer Science Department
    3. 2020: Invited Talk at Beijing AI Conference Knowledge & Intelligence Forum slides-b
    4. 2020: Invited Talk at CCF Young Elite Forum slides-b
    5. 2019: Invited Talk at Tsinghua University, Computer Science Department
    6. 2019: Invited Talk at INFORMS'19 Informs Annual Meeting slides-a
    7. 2019: Microsoft Security and Compliance AI Summit
    8. 2019: AI and Tensor Conference at Los Alomos National Lab slides-a
    9. 2019: Invited Talk at NetSci'19 Satellite on Quantifying Success
    10. 2019: Invited Talk at NetSci'19 Satellite on Network Representation Learningslides-a
    11. 2018: Invited Talk at NetSci'18 Higher-Order Models in Network Science Satellite (HONS'18)
    12. 2018: Invited Talk at NICO, Northwestern University, IL
    13. 2017: Invited Talk at Labs in Tsinghua University
    14. 2016: Keynote at ACM JCDL'16 Workshop on Mining Scientific Publications (WOSP'16)
    15. 2016: Invited Talks at Labs in Stanford University, Tsinghua University, & Chinese Academy of Sciences
    16. 2015: Invited Talks at Labs in Oxford University, & Hesburgh Library at University of Notre Dame

    Professional Services

    Conference Organizers:

    1. Track Co-Chair of WWW'23 Social Network Analysis and Graph Algorithms Track
    2. Program Co-Chair of ECML/PKDD'21 Applied Data Science Track
    3. Program Co-Chair of ACM/IEEE ASONAM'21 Industry Track
    4. Program Co-Chair of ECML/PKDD'20 Applied Data Science Track
    5. Program Co-Chair of National Conference on Social Media Processing (SMP'20)
    6. Workshop Co-Chair of SIAM SDM'20
    7. Deep Learning Day Co-Chair of ACM KDD'20
    8. Deep Learning Day Co-Chair of ACM KDD'19
    9. Deep Learning Day Co-Chair of ACM KDD'18

    Journal Editors:

    1. Associate Editor of IEEE Transactions on Big Data (TBD), 2020--
    2. Associate Editor of Springer Social Network Analysis and Mining (SNAM), 2021--
    3. Associate Editor of AI OPEN, 2020--
    4. Guest Editor of Special Issue "AI for COVID-19" at IEEE TBD, 2020

    Conference PC members:

    1. 2023: KDD (Senior PC), WWW (Track Co-Chair), AAAI (Senior PC), ECML-PKDD (Area Chair)
    2. 2022: KDD (Senior PC), WWW (Senior PC), AAAI (Senior PC), ECML/PKDD (Area Chair)
    3. 2021: KDD, NeurIPS, AAAI (Senior PC), ECML/PKDD (PC Co-Chair ADS), ASONAM (Industry Track Co-Chair)
    4. 2020: KDD, NeurIPS, WSDM, WWW, SDM, ECML/PKDD (PC Co-Chair ADS), SMP (PC Co-Chair)
    5. 2019: KDD, WSDM, WWW, SDM, ICDM, CIKM, AAAI
    6. 2018: KDD, WSDM, WWW, SDM, ICDM, ECML/PKDD, DSAA
    7. 2017: KDD, WSDM, WWW, SDM, ASONAM, CIKM
    8. 2016: ASONAM, CIKM
    9. 2015: ASONAM

    Competition Organizers:

    1. NeurIPS Competition OGB-LSC 2022: A Large-Scale Challenge for ML on Graphs at NeurIPS'22
    2. KDD CUP OGB-LGC: A Large-Scale Challenge for Machine Learning on Graphs at KDD'21

    Journal Reviewers:

    1. Nature Machine Intelligence
    2. Nature Human Behavior
    3. Nature Scientific Reports
    4. JMLR, Journal of Machine Learning Research
    5. CSUR, ACM Computing Surveys
    6. TKDD, ACM Transactions on the Knowledge Discovery from Data
    7. TWEB, ACM Transactions on the Web
    8. TKDE, IEEE Transactions on Knowledge and Data Engineering
    9. TMC, IEEE Transactions on Mobile Computing
    10. TBD, IEEE Transactions on Big Data

    Contact Info

    1-309, FIT Building
    Tsinghua University
    Beijing 100084, China

    yuxiaod@@tsinghua.edu.cn


    *All stats are observed from Google Scholar on Jun. 2022