Personal
Social Network
-
Toward Building, Search, and Mining Social Networks
(Slides
at ASWC’06)
Online Prototype System (soon be publicly available)
Jie
Tang, Mingcai Hong, Jing Zhang, Bangyong Liang, Limin Yao, and Juanzi Li
Department of Computer Science and Technology
10-201,
Abstract
In this project, we are aimed at how to automatically building a
personal social network using the existing Web, how to search for information
(including persons, publications, and relationships) in the constructed social
network, and how to mine the social network. We propose an approach for building personal social
network search based on information extraction. We category the search tasks in
the personal social network as person search, publication search, and category
based search. For mining the network, we currently focus on expert finding,
research interest finding, and people association finding.
We have
developed a system called Personal Social Network (PSN). The system has been in
operation on an intranet for several months. Information of more than 450,000
persons has been gathered. Here we
describe the architecture and main features of the system.
Outline
- Personal Social Network Building
* Information
Collection. Collecting useful information (e.g. documents, emails)
* Personal
Information Extraction. Including: Person Profiling (Portrait, Affiliation, Homepage,
Position) and Person Contact Extraction (Address, Email, Telephone, Fax)
* Integration. Integrating
information from different sources (e.g. DBLP, Citeseer)
- Personal Social Network Search
* Person Search
* Publication
Search
* Category based
Search
- Personal Social Network Mining
* Expert Finding
* Research
Interest Finding
* People
Association Finding
With the growth of the Internet and the World Wide Web, social networks
have grown quickly in number and scope. Numerous studies have shown that one of
the most effective channels for disseminating or obtaining information is its
informal network of collaboration, colleagues, and friends [Kautz, 1997;
Golbeck, 2005]. The use of social networks is widespread in employers’
recruiting and in workers’ job-seeking, pursuit of hobbies, forming
romantic relationships, local problem-solving (for example, fixing a piece of
equipment) and building collaboration within or between organizations.
Personal social network is an important research area
in social networks as well as Semantic Web. Many research efforts have been
made so far. A person can have different types of information: personal profile
(including portrait, homepage, position, affiliation, publications, and
documents), contact information (including address, email, telephone, and fax
number), and social network information (including personal or professional
relationships between persons, e.g. friend relationship). However, the
information is usually hidden in heterogeneous and distributed web pages.
For example, the personal profile information may be found in one’s
homepages and the contact information may be found in other pages.
In this project, we have developed a system, which is
called ‘Personal Social Network’ (PSN shortly). In the rest of this
section, we will introduce the main features of the system in detail.
2.1 Personal Social Network Building
In this project, we try to address personal network search in a novel
approach. We define an ontology describing the personal information (called Person
Ontology) by extending the FOAF ontology. In the ontology, we define concepts
such as “Person”, “Publication”, etc. For each concept,
we define datatype properties, for example, for concept “Person”,
an example property can be “portrait”. We define relationships
between concepts such as “hasPublication”, “hasAuthor”,
“hasFriend”, etc. The key point then is how to extract the personal
information from the web and how to use the extracted information to construct
the personal network according to the ontology. Our proposal is to take a
strategy of ‘collecting-extracting-fusing’. We first collect
documents ‘related’ to a person. By ‘related’, we mean
documents that really describe the person rather than the keyword-based
‘relevant’ documents in the conventional search. Next, we employ
information extraction technologies to extract the personal information from
the collected documents [Tang,
2005]. After that, we integrate the extracted information from the documents of
different sources. We use the extracted personal information and the
publication information to create instances of the concepts “Person” and
“Publication”. The created instances are stored in a local ontology
base. In this way, we are able to construct a personal network automatically.
Figure 1 shows the processing flow of building the
network. We utilize Google API to get a list of relevant documents. Then a
classification model is employed to identify whether or not a document in the
list is really ‘related’ to the person. Typical related documents
can be persons’ homepages, publication pages from DBLP, and
persons’ introduction documents. Next, we extract personal information
from the identified documents. We employ two classification models to detect
the start position and the end position for each type of the personal
information. And then we view the text between the start position and the end
position as the target. Both token and text-line can be considered as the basic
unit in the detections of our approach. As models, we use SVMs (Support Vector
Machines) [Vapnik, 1998]. Features are defined in the SVM models respectively
for each type of the information. In identification of a personal portrait, we
use the surrounding text, image content information (e.g. whether or not one image contains a
person face by face recognition), and image URL address as features. We also
extract person’s publications from DBLP. After that, we integrate
together the personal profile, contact information, and publication
information. We view coauthors in publications as the friends and finally
construct the personal network.
Figure 1. Processing flow of Personal Network building
2.2 Personal Social Network Search
In PSN, we provide three types of searches:
person search, publication search, and category based search.
For person search, the user inputs a person name, and
the system returns the information of the person. There are two sub-types of
searches: offline search and online search. Given a person name, we first
search in the constructed personal network. If the person can be found, the
information of the person stored in the local ontology base will be
displayed. (We call this type
of search as offline search.) Otherwise, we perform the online search, that is
extracting the personal information in real time (In the extraction process, we
utilize the method same as that in the process of personal social network
building). The online process may last from 1 to 10 seconds depending on the
network speed. Here we focus on how to accurately extract the information from
web pages. The system also supports searching with constraints, for example,
the user can input a query like “Jie Tang, uni: Tsinghua”, with
which the system searches for persons with the name “Jie Tang” and with its
property “affiliation” containing “Tsinghua”.
For publication search, the user inputs keywords
related to the publication, and the system returns publications with the most
‘relevant’ publications on the top. (We use the conventional IR
model to do the publication search.) The system also tries to find the
publications from the web. We have developed a module that can convert the
publications of any kinds of formats (e.g. PDF, WORD, HTML, etc.) into a
unified format. In this way, we are able to generate snapshots of a publication
so that the user can view the publication online.
For category based search, we created a category
containing 40 research fields. We classify a person to five research fields
according to his research interests (personal information and publications).
Then the user can select the research field to view the research people
classified to it.
2.3 Personal Social Network Mining
We currently focus on expert finding, people association finding, and
research interest finding.
In expert finding, the user inputs keywords or phrase
(called topic), and the system tries to find who knows about the topic. More
accurately, the system tries to find who are experts on the topic. We propose a
graph-propagation based approach by making use of publications and documents
that a person involved to calculate his ‘expertise degree’ on a
topic. One of the basic ideas
for the feature is that if a person has authored many documents on a topic,
then it is very likely that he is an expert on the topic, or if the
person’s name co-occurs in many times with the topic, then it is likely
that he is an expert on the topic. Another idea here is that if a person knows
many experts on a topic, then it is very likely that he/she is also an expert
on the topic, or if the person’s name co-occurs in many times with
another expert, then it is likely that he/she is also an expert on the topic.
In association search, we compute the closeness
between two persons and use the network to find possible associations between
two persons. The associations between two persons may be direct (e.g. the two
persons coauthor a document) or indirect (e.g. one person coauthors a document
with a friend of the other person). We formalize association search as a
problem of near-shortest path finding. We propose an efficient algorithm to
find near-shortest associations from a source to a target person. The algorithm
consists of two stages. In the first stage, an enhanced heap-based Dijkstra
algorithm is employed to find the shortest associations from all the other
persons to the target person (also the shortest association Lmin from the source to the
target person). In the second stage, a depth-first search processing is
utilized to quickly find all associations that are no longer than (1+β)Lmin, where β
is a user-defined threshold.
In research interest finding, we employ traditional
information retrieval model to compute the probabilistic between a person and a
pre-defined research topic. Then we select the high scored topics as the
research interest of the person.
Figure 2 shows the snapshots of the PSN system. In
figure 2 (a), the user types a person name, and he gets a detailed description
of the person. Figure 2 (b) shows the list of gathered persons in our current
system.
|
|
|
|
(a) |
(b) |
Figure 2. Personal Network Search system
Figure 3 shows the snapshots of expert finding and
publication online-view.
|
|
|
Figure 3. Snapshot of expert search and publication search
The system will be publicly
available soon and the extracted data will be freely available for research
use.
Data sets used for expert finding and
association search can be found here.
4.1 Social Network
Many web-based social networks have been
built, for example, tickle.com, friendster.com, myspace.com, etc. As well much
research work has been done on social networks, in particular on semantic
web-based social networks.
For instance, Golbeck et al. introduce the relationship
of trust into social networks [Golbeck, 2005]. They argue that when trust is assigned
on a quantitative scale, we can make computations with trust values in the
network. And given a specific user in the social network, we can see the
average value about the person’s trustworthiness.
The Friend of a Friend
(FOAF) project is creating a Web of machine-readable pages describing people
and the relationships between them and the things they create and do [Brickley,
2004]. It has the potential to be an open interchange format used by many
different social networking applications. See also FOAFRealm [Kruk, 2005],
FOAFnaut [Foafnaut, 2006], Flink [Mika, 2005], and MindSwap [Kalyanpur, 2006].
See [Domingos, 2005] and [Matsuo, 2003] for social network mining.
4.2 People Search
People search can be classified into several
sub tasks, including: expert finding and expert profiling.
Expert finding addresses the task of finding the right person with the
appropriate skills and knowledge: “Who are the experts on topic
X?”.
For example, Campbell et
al. [
From 2005, the Text REtrieval Conference (TREC) has
provided a common platform with the Enterprise Search Track for researchers to
empirically assess methods and techniques devised for expert finding. Much work
has been done, for example, [Fu, 2005; Cao, 2005]. See [Craswell, 2005] for an
overview.
For detail, we can see from
here.
The task is to turn the expert finding task
around: to profile a person is to discover, collect, and produce information of
skills and knowledge of that person: “What does a person Y know?”
For example, [Culotta, 2004] presents an end-to-end
system that extracts a user’s social network and its members’
contact information given the user’s email inbox. The system identifies
unique people in emails, finds their Web presence, and automatically fills the
fields of a contact address book using Conditional Random Fields [Lafferty,
2001].
Tang et al. propose an
approach for automatically extracting the personal information from the web [Tang,
2006]. See also [Mika, 2005;
Filatova; 2005; Balog, 2006].
4.3 Association Search
In association search, we are interested in
the relationships between entities. For searching for people, we are aimed at
finding the relationships between persons.
Many uses of recommender system require
finding associations from people to people, not just providing
“oracular” advice. Recently, a few methods have been proposed to
solve the problem. For example, Kautz
et al. have developed a recommender system call ReferralWeb [Kautz, 1997]. The
ReferralWeb system lets ones search and explore social networks - the networks
of friends, colleagues, and co-workers - that exist on the WWW. When a user
wants to search for information on a topic, the system first identifies experts
on the topic and then tries to find associations (called referred chain) from
the searcher to an expert. They tested the system on a 1,000 node network and a
10,000 researcher network. They focus on expert recommendation rather than efficiently finding and ranking the
associations.
Balog and Rijke propose to
formulate three sub tasks in the area of people association finding (also
called relationship finding): connection finding, collaboration finding, and
reputation analysis [Balog, 2006]. They focus on finding direct associations
between persons from the average documents on the Web using information
retrieval models. They propose to model association finding as a mixture of a
language modeling approach to snippet retrieval and a noisy channel approach to
determine the likelihood that a retrieved snippet “generates” a
relationship between two persons.
Adamic and Adar have
investigated the problem of association search in email networks [Adamic,
2005]. They tested how the three different properties (i.e. degree, position in
the organizational hierarchy, and physical location) of the individuals in the
social network can be used to improve the performance of association search.
They simulate an experiments based on a small network composed of 430
individuals with an average number of 10 acquaintances.
To the best of our knowledge, no previous work has
been sufficiently done on association search in large-scale social networks in the research community.
Semantic association represents the complex relationships (not
necessarily between persons) in Semantic Web. Several research efforts have
been made so far. The previous work focuses on how to represent the association
in a query and how to enable processing the query of association search.
Limited research work has been done on efficiently searching and ranking
associations between entities in the semantic data (ontology).
For instance, Anyanwu et al. propose extending the
existing RDF or OWL model so as to support representing the notion of a
relationship in the RDF query and processing the association query on a
knowledge base [Anyanwu, 2003]. They propose to process queries about semantic
associations through the use of an operator ρ
(ρ-Queries). The approach has
been implemented by extending the existing RDF query system, e.g. RDFSuite
[Alexaki, 2001] or
See also [Liang, 2005; Sheth, 2005; Aleman-Meza, 2006]
4.4
Shortest Paths
Considerable works have been conducted for finding shortest path(s) in
graphs. It is a fundamental problem in combinational optimization. Algorithms
for shortest path [Dijkstra, 1959; Floyd, 1962], efficient shortest paths in
sparse networks [Johnson, 1977], Top-k shortest paths [Eppstein, 1998;
Hershberger, 2003], and near-shortest paths [Carlyle, 2005] have been proposed.
See [Lawler, 1976; Brander, 1995; Hadjiconstantinou, 1999] for overviews. See
also [Byers, 1984].
The algorithms for shortest path have been applied to,
for instance, find the best routines of vehicles or messages, find optimal
flows in networks (treated for example in [Ford, 1962]), traffic-light
network[Yang, 2005], and find from the HMM graph the N most likely state sequences given the observed acoustic data
[Nilsson, 2001].
(Note: The
references here are not necessary comprehensive. If anything missing, please
let me know: )
5.1 Social Network
[Brickley, 2004]
D. Brickley and L. Miller. FOAF Vocabulary Specification, Namespace Document,
September 2, 2004. http://xmlns.com/foaf/0.1/.
[Domingos, 2005]
P. Domingos. Mining social networks for viral marketing (short paper). IEEE
Intelligent Systems, 20(1): 80-82, 2005.
[Golbeck, 2005]
J. Golbeck. Web-based social networks: a survey and future directions.
Technique Report. 2005
[Golbeck, 2005]
J. Golbeck. Computing and Applying Trust in Web-based Social Networks. Ph.D. Dissertation,
[Kalyanpur, 2006]
A. Kalyanpur, B. Parsia, E. Sirin, and B. Cuenca-Grau. Repairing unsatisfiable
concepts in OWL ontologies. In Proceedings of ESWC’2006.
[Kruk, 2005] S.
R. Kruk and S. Decker. Semantic social collaborative filtering with FOAFRealm.
In Proceedings of the Semantic Desktop Workshop, ISWC’2005. pp. 170-184
[Matsuo, 2003] Y.
Matsuo, H. Tomobe, K. Hasida, and M. Ishizuka. Mining social network of
conference participants from the Web. In Proceedings of Web
Intelligence’2003. pp.190-193
[Mika, 2005] P.
Mika. Flink: Semantic Web technology for the extraction and analysis of social
networks. Web Semantics: Science,
Services and Agents on the World Wide Web, (3): 211-223, October, 2005.
[Tang, 2005] Jie
Tang, Juanzi Li, Hong-Jun Lu, Bangyong Liang, and Kehong Wang. iASA: Learning
to Annotate the Semantic Web. Journal on Data Semantic, IV. Springer Press.
Nov, 2005:110-145
[Tang, 2006] J.
Tang, M. Hong, J. Zhang, B. Liang, and J. Li. A new approach to personal
network search based on information extraction. Demo Paper. In Proceedings of
ASWC’2006.
[Foafnaut, 2006]
Foafnaut, http://www.foafnaut.org/. Sep. 2006.
5.2 People Search
[Balog, 2006] K. Balog and M. de Rijke. Searching for People in the
Personal Work Space. In Proc. of International Workshop on Intelligent
Information Access (IIIA-2006), 2006.
[Campbell, 2003] C. S. Campbell, P. P. Maglio, A.
Cozzi, and B. Dom. Expertise Identification Using Email Communications. In Proc. of
CIKM’03. 2003. pp.528-531
[Cao, 2005] Y.
Cao, J.Liu, S. Bao, and H. Li. Research on Expert Search at Enterprise Track of
TREC 2005. TREC 2005.
[Craswell, 2005]
N. Craswell, A. de Vries, and I. Soboroff. Overview of the Trec-2005
[Culotta, 2004]
A. Culotta, R. Bekkerman, and A. McCallum. Extracting social networks and
contact information from email and the web. In Proceedings of Conference on
Email and Spam (CEAS’04). 2004.
[Filatova, 2005]
E. Filatova and J. Prager. Tell me what you do and I’ll tell you what you
are: Learning occupation-related activities for biographies. In Proceedings of
HLT-NAACL, 2005, pp. 49-56.
[Foner, 1997] L.
N. Foner. Yenta: A Multi-agent, Referral-based Matchmaking System. In First
International Conference on Autonomous Agents. 1997.
[Fu, 2005] Y. Fu,
W. Yu, Y. Li, Y. Liu, M. Zhang, and S. Ma. THUIR at TREC 2005:
[Kautz, 1997] H.
Kautz, B. Selman, and M. Shah. ReferralWeb: Combining Social Networks and
Collaborative Filtering. Communication of The ACM, vol. 30 no. 3, March 1997.
[Mattox, 1999] D. Mattox, M. T. Maybury, D. Morey. Enterprise Expert and
knowledge Discovery. In Proc. of
HCI'1999. 1999, pp.303-307
[Schwartz, 1993] M. F.
Schwartz and D. C. M. Wood. Discovering Shared Interests Using Graph Analysis.
Communications of the ACM, Vol. 36(8):78-89, 1993.
[Yimam-Seid, 2003] D. Yimam-Seid and A. Kobsa. Expert Finding Systems for
Organizations: Problem and Domain Analysis and the Demoir Approach. Journal of
Organizational Computing and Electronic Commerce, Vol. 13(1):1-24, 2003.
5.3 Association Search
[Adamic, 2005] L. A. Adamic and E. Adar. How to search
a social network. Social Networks,
27(3):187-203, July 2005.
[Aleman-Meza, 2006] B. Aleman-Meza, M. Nagarajan, C.
Ramakrishnan, L. Ding, P. Kolari, A. P. Sheth, I. B. Arpinar, A. Joshi, and T.
Finin. Semantic Analytics on Social Networks: Experiences in Addressing the
Problem of Conflict of Interest Detection. In Proceedings of WWW’2006.
pp.407-416.
[Alexaki, 2001] S. Alexaki, G. Karvounarakis, V.
Christophides, D. Plexousakis, and K. Tolle. The ICS-FORTH RDFSuite: Managing
Voluminous RDF Description Bases. In 2nd International Workshop on the Semantic
Web, Hong Kong, 2001. pp. 1-13
[Anyanwu, 2003] K. Anyanwu, and A. P. Sheth.
ρ-Queries: enabling querying for semantic associations on the semantic
web. In Proceedings of WWW’2003. pp.690-699
[Anyanwu, 2005] K. Anyanwu, A. Maduko, and A. P.
Sheth. SemRank: ranking complex relationship search results on the semantic
web. In Proceedings of WWW’2005. pp.117-127
[Balog, 2006] K. Balog and
M. de Rijke. Searching for People in the Personal Work Space. In Proc. of
International Workshop on Intelligent Information Access (IIIA-2006), 2006.
[Kautz, 1997] H. Kautz, B.
Selman, and M. Shah. ReferralWeb:
Combining Social Networks and Collaborative Filtering. Communication of The
ACM, vol. 30 no. 3, March 1997.
[Liang, 2005] B. Liang, J. Tang, and J. Li.
Association search in Semantic Web: search + inference. Poster Paper. In
Proceedings of WWW’2005.pp992-993
[McBride, 2002] B. McBride.
[Sheth, 2005] A. Sheth, B. Aleman-Meza, I. B. Arpinar,
C. Halaschek, C. Ramakrishnan, C. Bertram, Y. Warke, D. Avant, F. S. Arpinar,
K. Anyanwu, and K. Kochut. Semantic Association Identification and Knowledge
Discovery for National Security Applications. Special Issue of Journal of
Database Management on Database Technology for Enhancing National Security,
Eds: L. Zhou and W. Kim, (2005) (Accepted).
5.4
Shortest Paths
[Brander, 1995] A. Brander and M. Sinclair. A
comparative study of K-shortest path algorithms. In Proc. of 11th UK
Performance Engineering Workshop, pages 370-379, 1995.
[Byers, 1984] T.H. Byers and M.S. Waterman.
Determining all optimal and near-optimal solutions when solving shortest path
problems by dynamic programming. Operations
Research, 32:1381-1384, 1984.
[Carlyle, 2005] W. Matthew Carlyle and R. Kevin Wood.
Near-shortest and K-shortest simple paths. Networks, 46(2):98-109, September
2005.
[Dijkstra, 1959] E. Dijkstra. A note on two problems
in connexion with graphs. Numerische
Mathematik, 1:269-271, 1959.
[Eppstein, 1998] David Eppstein. Finding the k shortest paths.
[Floyd, 1962] Robert W. Floyd. Algorithm 97: Shortest
path. Communications of the ACM,
5(6):345-348, June 1962.
[Ford, 1962] L. R. Jr Ford and D. R. Fulkerson. Flows
in Networks. Princeton U Press,
[Hadjiconstantinou, 1999] Eleni Hadjiconstantinou and
Nicos Christofides. An efficient implementation of an algorithm for finding K
shortest simple paths. Networks, 1999: 88-101
[Hershberger, 2003] John Hershberger, Matthew Maxel,
Subhash Suri. Finding the k shortest
simple paths: a new algorithm and its implementation. In Proceedings of
ALENEX’2003. pp.26-36
[Johnson, 1977] Donald B. Johnson. Efficient
algorithms for shortest paths in sparse networks. J. ACM, 1977: 1-13
[Lawler, 1976] E. Lawler, Combinational Optimization,
Networks and Matroids.
[Nilsson, 2001] Dennis Nilsson, Jacob Goldberger.
Sequentially finding the n-best list in Hidden Markov Models. In Proceedings of
IJCAI’2001. pp.1280-1285
[Yang, 2005] Hsu-Hao Yang and Yen-Liang Chen. Finding K shortest looping paths in a
traffic-light network. Computers & OR,
2005: 571-581
Last updated date: Oct. 20, 2006,
by Jie Tang.