Personal Social Network

- Toward Building, Search, and Mining Social Networks

(Slides at ASWC’06)

Online Prototype System (soon be publicly available)

Jie Tang, Mingcai Hong, Jing Zhang, Bangyong Liang, Limin Yao, and Juanzi Li

Tsinghua University

Department of Computer Science and Technology

10-201, East Main Building, Tsinghua University, Beijing, China, 100084

 

Abstract

In this project, we are aimed at how to automatically building a personal social network using the existing Web, how to search for information (including persons, publications, and relationships) in the constructed social network, and how to mine the social network. We propose an approach for building personal social network search based on information extraction. We category the search tasks in the personal social network as person search, publication search, and category based search. For mining the network, we currently focus on expert finding, research interest finding, and people association finding.

We have developed a system called Personal Social Network (PSN). The system has been in operation on an intranet for several months. Information of more than 450,000 persons has been gathered. Here we describe the architecture and main features of the system.

Outline

·    Feature List

·    Introduction

·    Data Sets [download]

·    Related Work

·    References

 

1 Feature List

- Personal Social Network Building

* Information Collection. Collecting useful information (e.g. documents, emails)

* Personal Information Extraction. Including: Person Profiling (Portrait, Affiliation, Homepage, Position) and Person Contact Extraction (Address, Email, Telephone, Fax)

* Integration. Integrating information from different sources (e.g. DBLP, Citeseer)

 

- Personal Social Network Search

* Person Search

* Publication Search

* Category based Search

 

- Personal Social Network Mining

* Expert Finding

    Survey

* Research Interest Finding

* People Association Finding

 

2   Introduction

With the growth of the Internet and the World Wide Web, social networks have grown quickly in number and scope. Numerous studies have shown that one of the most effective channels for disseminating or obtaining information is its informal network of collaboration, colleagues, and friends [Kautz, 1997; Golbeck, 2005]. The use of social networks is widespread in employers’ recruiting and in workers’ job-seeking, pursuit of hobbies, forming romantic relationships, local problem-solving (for example, fixing a piece of equipment) and building collaboration within or between organizations.

Personal social network is an important research area in social networks as well as Semantic Web. Many research efforts have been made so far. A person can have different types of information: personal profile (including portrait, homepage, position, affiliation, publications, and documents), contact information (including address, email, telephone, and fax number), and social network information (including personal or professional relationships between persons, e.g. friend relationship). However, the information is usually hidden in heterogeneous and distributed web pages. For example, the personal profile information may be found in one’s homepages and the contact information may be found in other pages.

In this project, we have developed a system, which is called ‘Personal Social Network’ (PSN shortly). In the rest of this section, we will introduce the main features of the system in detail.

 

2.1   Personal Social Network Building

In this project, we try to address personal network search in a novel approach. We define an ontology describing the personal information (called Person Ontology) by extending the FOAF ontology. In the ontology, we define concepts such as “Person”, “Publication”, etc. For each concept, we define datatype properties, for example, for concept “Person”, an example property can be “portrait”. We define relationships between concepts such as “hasPublication”, “hasAuthor”, “hasFriend”, etc. The key point then is how to extract the personal information from the web and how to use the extracted information to construct the personal network according to the ontology. Our proposal is to take a strategy of ‘collecting-extracting-fusing’. We first collect documents ‘related’ to a person. By ‘related’, we mean documents that really describe the person rather than the keyword-based ‘relevant’ documents in the conventional search. Next, we employ information extraction technologies to extract the personal information from the collected documents [Tang, 2005]. After that, we integrate the extracted information from the documents of different sources. We use the extracted personal information and the publication information to create instances of the concepts “Person” and “Publication”. The created instances are stored in a local ontology base. In this way, we are able to construct a personal network automatically.

Figure 1 shows the processing flow of building the network. We utilize Google API to get a list of relevant documents. Then a classification model is employed to identify whether or not a document in the list is really ‘related’ to the person. Typical related documents can be persons’ homepages, publication pages from DBLP, and persons’ introduction documents. Next, we extract personal information from the identified documents. We employ two classification models to detect the start position and the end position for each type of the personal information. And then we view the text between the start position and the end position as the target. Both token and text-line can be considered as the basic unit in the detections of our approach. As models, we use SVMs (Support Vector Machines) [Vapnik, 1998]. Features are defined in the SVM models respectively for each type of the information. In identification of a personal portrait, we use the surrounding text, image content information (e.g. whether or not one image contains a person face by face recognition), and image URL address as features. We also extract person’s publications from DBLP. After that, we integrate together the personal profile, contact information, and publication information. We view coauthors in publications as the friends and finally construct the personal network.

 

Figure 1. Processing flow of Personal Network building

2.2   Personal Social Network Search

In PSN, we provide three types of searches: person search, publication search, and category based search.

For person search, the user inputs a person name, and the system returns the information of the person. There are two sub-types of searches: offline search and online search. Given a person name, we first search in the constructed personal network. If the person can be found, the information of the person stored in the local ontology base will be displayed. (We call this type of search as offline search.) Otherwise, we perform the online search, that is extracting the personal information in real time (In the extraction process, we utilize the method same as that in the process of personal social network building). The online process may last from 1 to 10 seconds depending on the network speed. Here we focus on how to accurately extract the information from web pages. The system also supports searching with constraints, for example, the user can input a query like “Jie Tang, uni: Tsinghua”, with which the system searches for persons with the name “Jie Tang” and with its property “affiliation” containing “Tsinghua”.

For publication search, the user inputs keywords related to the publication, and the system returns publications with the most ‘relevant’ publications on the top. (We use the conventional IR model to do the publication search.) The system also tries to find the publications from the web. We have developed a module that can convert the publications of any kinds of formats (e.g. PDF, WORD, HTML, etc.) into a unified format. In this way, we are able to generate snapshots of a publication so that the user can view the publication online.

For category based search, we created a category containing 40 research fields. We classify a person to five research fields according to his research interests (personal information and publications). Then the user can select the research field to view the research people classified to it.

 

2.3   Personal Social Network Mining

We currently focus on expert finding, people association finding, and research interest finding.

In expert finding, the user inputs keywords or phrase (called topic), and the system tries to find who knows about the topic. More accurately, the system tries to find who are experts on the topic. We propose a graph-propagation based approach by making use of publications and documents that a person involved to calculate his ‘expertise degree’ on a topic. One of the basic ideas for the feature is that if a person has authored many documents on a topic, then it is very likely that he is an expert on the topic, or if the person’s name co-occurs in many times with the topic, then it is likely that he is an expert on the topic. Another idea here is that if a person knows many experts on a topic, then it is very likely that he/she is also an expert on the topic, or if the person’s name co-occurs in many times with another expert, then it is likely that he/she is also an expert on the topic.

In association search, we compute the closeness between two persons and use the network to find possible associations between two persons. The associations between two persons may be direct (e.g. the two persons coauthor a document) or indirect (e.g. one person coauthors a document with a friend of the other person). We formalize association search as a problem of near-shortest path finding. We propose an efficient algorithm to find near-shortest associations from a source to a target person. The algorithm consists of two stages. In the first stage, an enhanced heap-based Dijkstra algorithm is employed to find the shortest associations from all the other persons to the target person (also the shortest association Lmin from the source to the target person). In the second stage, a depth-first search processing is utilized to quickly find all associations that are no longer than (1+β)Lmin, where β is a user-defined threshold.

In research interest finding, we employ traditional information retrieval model to compute the probabilistic between a person and a pre-defined research topic. Then we select the high scored topics as the research interest of the person.

 

Figure 2 shows the snapshots of the PSN system. In figure 2 (a), the user types a person name, and he gets a detailed description of the person. Figure 2 (b) shows the list of gathered persons in our current system.

 

 

(a)

(b)

Figure 2. Personal Network Search system

Figure 3 shows the snapshots of expert finding and publication online-view.

 

Figure 3. Snapshot of expert search and publication search

The system will be publicly available soon and the extracted data will be freely available for research use.

 

3 Data Sets

Data sets used for expert finding and association search can be found here.

 

4 Related Work

4.1   Social Network

Many web-based social networks have been built, for example, tickle.com, friendster.com, myspace.com, etc. As well much research work has been done on social networks, in particular on semantic web-based social networks.

For instance, Golbeck et al. introduce the relationship of trust into social networks [Golbeck, 2005]. They argue that when trust is assigned on a quantitative scale, we can make computations with trust values in the network. And given a specific user in the social network, we can see the average value about the person’s trustworthiness.

The Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people and the relationships between them and the things they create and do [Brickley, 2004]. It has the potential to be an open interchange format used by many different social networking applications. See also FOAFRealm [Kruk, 2005], FOAFnaut [Foafnaut, 2006], Flink [Mika, 2005], and MindSwap [Kalyanpur, 2006]. See [Domingos, 2005] and [Matsuo, 2003] for social network mining.

4.2   People Search

People search can be classified into several sub tasks, including: expert finding and expert profiling.

Expert Finding

Expert finding addresses the task of finding the right person with the appropriate skills and knowledge: “Who are the experts on topic X?”.

For example, Campbell et al. [Campbell, 2003] analyzed the link structure defined by authors and receivers of emails using a modified version of the Hyperlink-Induced Topic Search (HITS) algorithm to identify authorities. They showed that improvements over a content-based approach were possible given two organizations, but the number of candidates used was limited, only fifteen and nine. See also [Schwartz, 1993; Foner, 1997; Kautz, 1997; Mattox, 1999; Yimam-Seid, 2003].

From 2005, the Text REtrieval Conference (TREC) has provided a common platform with the Enterprise Search Track for researchers to empirically assess methods and techniques devised for expert finding. Much work has been done, for example, [Fu, 2005; Cao, 2005]. See [Craswell, 2005] for an overview.

For detail, we can see from here.

 

Expert Profiling

The task is to turn the expert finding task around: to profile a person is to discover, collect, and produce information of skills and knowledge of that person: “What does a person Y know?”

For example, [Culotta, 2004] presents an end-to-end system that extracts a user’s social network and its members’ contact information given the user’s email inbox. The system identifies unique people in emails, finds their Web presence, and automatically fills the fields of a contact address book using Conditional Random Fields [Lafferty, 2001].

Tang et al. propose an approach for automatically extracting the personal information from the web [Tang, 2006]. See also [Mika, 2005; Filatova; 2005; Balog, 2006].

4.3   Association Search

In association search, we are interested in the relationships between entities. For searching for people, we are aimed at finding the relationships between persons.

People Association Finding

Many uses of recommender system require finding associations from people to people, not just providing “oracular” advice. Recently, a few methods have been proposed to solve the problem. For example, Kautz et al. have developed a recommender system call ReferralWeb [Kautz, 1997]. The ReferralWeb system lets ones search and explore social networks - the networks of friends, colleagues, and co-workers - that exist on the WWW. When a user wants to search for information on a topic, the system first identifies experts on the topic and then tries to find associations (called referred chain) from the searcher to an expert. They tested the system on a 1,000 node network and a 10,000 researcher network. They focus on expert recommendation rather than efficiently finding and ranking the associations.

Balog and Rijke propose to formulate three sub tasks in the area of people association finding (also called relationship finding): connection finding, collaboration finding, and reputation analysis [Balog, 2006]. They focus on finding direct associations between persons from the average documents on the Web using information retrieval models. They propose to model association finding as a mixture of a language modeling approach to snippet retrieval and a noisy channel approach to determine the likelihood that a retrieved snippet “generates” a relationship between two persons.

Adamic and Adar have investigated the problem of association search in email networks [Adamic, 2005]. They tested how the three different properties (i.e. degree, position in the organizational hierarchy, and physical location) of the individuals in the social network can be used to improve the performance of association search. They simulate an experiments based on a small network composed of 430 individuals with an average number of 10 acquaintances.

To the best of our knowledge, no previous work has been sufficiently done on association search in large-scale social networks in the research community.

Semantic Association Search

Semantic association represents the complex relationships (not necessarily between persons) in Semantic Web. Several research efforts have been made so far. The previous work focuses on how to represent the association in a query and how to enable processing the query of association search. Limited research work has been done on efficiently searching and ranking associations between entities in the semantic data (ontology).

For instance, Anyanwu et al. propose extending the existing RDF or OWL model so as to support representing the notion of a relationship in the RDF query and processing the association query on a knowledge base [Anyanwu, 2003]. They propose to process queries about semantic associations through the use of an operator ρ (ρ-Queries). The approach has been implemented by extending the existing RDF query system, e.g. RDFSuite [Alexaki, 2001] or JENA [McBride, 2002]. Semantic association search is often performed first based on the notation of RDF property sequences. Then instances of the RDF property sequences are viewed as associations in the knowledge base. [Anyanwu, 2005] proposes a method based on information gain to measure how much information is conveyed by a result of association so as to rank the found associations.

See also [Liang, 2005; Sheth, 2005; Aleman-Meza, 2006]

4.4   Shortest Paths

Considerable works have been conducted for finding shortest path(s) in graphs. It is a fundamental problem in combinational optimization. Algorithms for shortest path [Dijkstra, 1959; Floyd, 1962], efficient shortest paths in sparse networks [Johnson, 1977], Top-k shortest paths [Eppstein, 1998; Hershberger, 2003], and near-shortest paths [Carlyle, 2005] have been proposed. See [Lawler, 1976; Brander, 1995; Hadjiconstantinou, 1999] for overviews. See also [Byers, 1984].

The algorithms for shortest path have been applied to, for instance, find the best routines of vehicles or messages, find optimal flows in networks (treated for example in [Ford, 1962]), traffic-light network[Yang, 2005], and find from the HMM graph the N most likely state sequences given the observed acoustic data [Nilsson, 2001].

 

5.   References

(Note: The references here are not necessary comprehensive. If anything missing, please let me know: )

5.1   Social Network

[Brickley, 2004] D. Brickley and L. Miller. FOAF Vocabulary Specification, Namespace Document, September 2, 2004. http://xmlns.com/foaf/0.1/.

[Domingos, 2005] P. Domingos. Mining social networks for viral marketing (short paper). IEEE Intelligent Systems, 20(1): 80-82, 2005.

[Golbeck, 2005] J. Golbeck. Web-based social networks: a survey and future directions. Technique Report. 2005

[Golbeck, 2005] J. Golbeck. Computing and Applying Trust in Web-based Social Networks. Ph.D. Dissertation, University of Maryland. College Park, 2005.

[Kalyanpur, 2006] A. Kalyanpur, B. Parsia, E. Sirin, and B. Cuenca-Grau. Repairing unsatisfiable concepts in OWL ontologies. In Proceedings of ESWC’2006.

[Kruk, 2005] S. R. Kruk and S. Decker. Semantic social collaborative filtering with FOAFRealm. In Proceedings of the Semantic Desktop Workshop, ISWC’2005. pp. 170-184

[Matsuo, 2003] Y. Matsuo, H. Tomobe, K. Hasida, and M. Ishizuka. Mining social network of conference participants from the Web. In Proceedings of Web Intelligence’2003. pp.190-193

[Mika, 2005] P. Mika. Flink: Semantic Web technology for the extraction and analysis of social networks. Web Semantics: Science, Services and Agents on the World Wide Web, (3): 211-223, October, 2005.

[Tang, 2005] Jie Tang, Juanzi Li, Hong-Jun Lu, Bangyong Liang, and Kehong Wang. iASA: Learning to Annotate the Semantic Web. Journal on Data Semantic, IV. Springer Press. Nov, 2005:110-145

[Tang, 2006] J. Tang, M. Hong, J. Zhang, B. Liang, and J. Li. A new approach to personal network search based on information extraction. Demo Paper. In Proceedings of ASWC’2006.

[Foafnaut, 2006] Foafnaut, http://www.foafnaut.org/. Sep. 2006.

 

5.2   People Search

[Balog, 2006] K. Balog and M. de Rijke. Searching for People in the Personal Work Space. In Proc. of International Workshop on Intelligent Information Access (IIIA-2006), 2006.

[Campbell, 2003] C. S. Campbell, P. P. Maglio, A. Cozzi, and B. Dom. Expertise Identification Using Email Communications. In Proc. of CIKM’03. 2003. pp.528-531

[Cao, 2005] Y. Cao, J.Liu, S. Bao, and H. Li. Research on Expert Search at Enterprise Track of TREC 2005. TREC 2005.

[Craswell, 2005] N. Craswell, A. de Vries, and I. Soboroff. Overview of the Trec-2005 Enterprise Track. TREC 2005 Conference Notebook. 2005, pp.199-205

[Culotta, 2004] A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and contact information from email and the web. In Proceedings of Conference on Email and Spam (CEAS’04). 2004.

[Filatova, 2005] E. Filatova and J. Prager. Tell me what you do and I’ll tell you what you are: Learning occupation-related activities for biographies. In Proceedings of HLT-NAACL, 2005, pp. 49-56.

[Foner, 1997] L. N. Foner. Yenta: A Multi-agent, Referral-based Matchmaking System. In First International Conference on Autonomous Agents. 1997.

[Fu, 2005] Y. Fu, W. Yu, Y. Li, Y. Liu, M. Zhang, and S. Ma. THUIR at TREC 2005: Enterprise Track. TREC 2005.

[Kautz, 1997] H. Kautz, B. Selman, and M. Shah. ReferralWeb: Combining Social Networks and Collaborative Filtering. Communication of The ACM, vol. 30 no. 3, March 1997.

[Mattox, 1999] D. Mattox, M. T. Maybury, D. Morey. Enterprise Expert and knowledge Discovery. In Proc. of  HCI'1999. 1999, pp.303-307

[Schwartz, 1993] M. F. Schwartz and D. C. M. Wood. Discovering Shared Interests Using Graph Analysis. Communications of the ACM, Vol. 36(8):78-89, 1993.

[Yimam-Seid, 2003] D. Yimam-Seid and A. Kobsa. Expert Finding Systems for Organizations: Problem and Domain Analysis and the Demoir Approach. Journal of Organizational Computing and Electronic Commerce, Vol. 13(1):1-24, 2003.

 

5.3   Association Search

[Adamic, 2005] L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27(3):187-203, July 2005.

[Aleman-Meza, 2006] B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. P. Sheth, I. B. Arpinar, A. Joshi, and T. Finin. Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection. In Proceedings of WWW’2006. pp.407-416.

[Alexaki, 2001] S. Alexaki, G. Karvounarakis, V. Christophides, D. Plexousakis, and K. Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases. In 2nd International Workshop on the Semantic Web, Hong Kong, 2001. pp. 1-13

[Anyanwu, 2003] K. Anyanwu, and A. P. Sheth. ρ-Queries: enabling querying for semantic associations on the semantic web. In Proceedings of WWW’2003. pp.690-699

[Anyanwu, 2005] K. Anyanwu, A. Maduko, and A. P. Sheth. SemRank: ranking complex relationship search results on the semantic web. In Proceedings of WWW’2005. pp.117-127

[Balog, 2006] K. Balog and M. de Rijke. Searching for People in the Personal Work Space. In Proc. of International Workshop on Intelligent Information Access (IIIA-2006), 2006.

[Kautz, 1997] H. Kautz, B. Selman, and M. Shah. ReferralWeb: Combining Social Networks and Collaborative Filtering. Communication of The ACM, vol. 30 no. 3, March 1997.

[Liang, 2005] B. Liang, J. Tang, and J. Li. Association search in Semantic Web: search + inference. Poster Paper. In Proceedings of WWW’2005.pp992-993

[McBride, 2002] B. McBride. Jena: A Semantic Web toolkit. IEEE Internet Computing, 2002: 55-59

[Sheth, 2005] A. Sheth, B. Aleman-Meza, I. B. Arpinar, C. Halaschek, C. Ramakrishnan, C. Bertram, Y. Warke, D. Avant, F. S. Arpinar, K. Anyanwu, and K. Kochut. Semantic Association Identification and Knowledge Discovery for National Security Applications. Special Issue of Journal of Database Management on Database Technology for Enhancing National Security, Eds: L. Zhou and W. Kim, (2005) (Accepted).

 

5.4   Shortest Paths

[Brander, 1995] A. Brander and M. Sinclair. A comparative study of K-shortest path algorithms. In Proc. of 11th UK Performance Engineering Workshop, pages 370-379, 1995.

[Byers, 1984] T.H. Byers and M.S. Waterman. Determining all optimal and near-optimal solutions when solving shortest path problems by dynamic programming. Operations Research, 32:1381-1384, 1984.

[Carlyle, 2005] W. Matthew Carlyle and R. Kevin Wood. Near-shortest and K-shortest simple paths. Networks, 46(2):98-109, September 2005.

[Dijkstra, 1959] E. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik, 1:269-271, 1959.

[Eppstein, 1998] David Eppstein. Finding the k shortest paths. SIAM J. Comput., 1998: 652-673

[Floyd, 1962] Robert W. Floyd. Algorithm 97: Shortest path. Communications of the ACM, 5(6):345-348, June 1962.

[Ford, 1962] L. R. Jr Ford and D. R. Fulkerson. Flows in Networks. Princeton U Press, Princeton, N. J., 1962.

[Hadjiconstantinou, 1999] Eleni Hadjiconstantinou and Nicos Christofides. An efficient implementation of an algorithm for finding K shortest simple paths. Networks, 1999: 88-101

[Hershberger, 2003] John Hershberger, Matthew Maxel, Subhash Suri. Finding the k shortest simple paths: a new algorithm and its implementation. In Proceedings of ALENEX’2003. pp.26-36

[Johnson, 1977] Donald B. Johnson. Efficient algorithms for shortest paths in sparse networks. J. ACM, 1977: 1-13

[Lawler, 1976] E. Lawler, Combinational Optimization, Networks and Matroids. New York: Holt, Rinehert and Winston, 1976.

[Nilsson, 2001] Dennis Nilsson, Jacob Goldberger. Sequentially finding the n-best list in Hidden Markov Models. In Proceedings of IJCAI’2001. pp.1280-1285

[Yang, 2005] Hsu-Hao Yang and Yen-Liang Chen. Finding K shortest looping paths in a traffic-light network. Computers & OR, 2005: 571-581

 

 

 

Visites since Oct. 20, 2006. Free Web Counter

Last updated date: Oct. 20, 2006, by Jie Tang.