NETWORKING
AND SECURITY:
1.
ON WIRELESS SCHEDULING ALGORITHMS FOR MINIMIZING
THE QUEUE-OVERFLOW PROBABILITY
In this paper, we
are interested in wireless scheduling algorithms for the downlink of a single
cell that can minimize the queue-overflow probability. Specifically, in a
large-deviation setting, we are interested in algorithms that maximize the asymptotic
decay-rate of the queue-overflow probability, as the queue-overflow threshold
approaches infinity. We first derive an upper bound on the decay-rate of the
queue-overflow probability over all scheduling policies. We then focus on a
class of scheduling algorithms collectively referred to as the “α-algorithms”.
For a given α ≥ 1, the -algorithm
picks the user for service at each time that has the largest product of the
transmission rate multiplied by the backlog raised to the power α. We show that
when the overflow metric is appropriately modified, the minimum-cost-to-overflow
under the -algorithm can be achieved by a simple linear path, and it can be
written as the solution of a vector-optimization problem. Using this structural
property, we then show that when α approaches infinity, the α-algorithms asymptotically
achieve the largest decay-rate of the queue overflow probability. Finally, this
result enables us to design scheduling algorithms that are both
close-to-optimal in terms of the asymptotic decay-rate of the overflow
probability, and empirically shown to maintain small queue-overflow
probabilities over queue-length ranges of practical interest.
2.
EFFICIENT AND DYNAMIC ROUTING TOPOLOGY INFERENCE
FROM END-TO-END MEASUREMENTS
Inferring the routing topology and link performance from
a node to a set of other nodes is an important component in network monitoring
and application design. In this paper we propose a general framework for
designing topology inference algorithms based on additive metrics. The
framework can flexibly fuse information from multiple measurements to achieve better
estimation accuracy. We develop computationally efficient (polynomial-time)
topology inference algorithms based on the framework. We prove that the
probability of correct topology inference of our algorithms converges to one
exponentially fast in the number of probing packets. In particular, for
applications where nodes may join or leave frequently such as overlay network
construction, application-layer multicast, peer-to-peer file sharing/streaming,
we propose a novel sequential topology inference algorithm which significantly
reduces the probing overhead and can efficiently handle node dynamics. We
demonstrate the effectiveness of the proposed inference algorithms via Internet
experiments.
KNOWLEDGE
AND DATA MINING:
3.
BINRANK: SCALING DYNAMIC AUTHORITY-BASED SEARCH
USING MATERIALIZED SUBGRAPHS
Dynamic authority-based keyword search algorithms,
such as ObjectRank and personalized PageRank, leverage semantic link
information to provide high quality, high recall search in databases, and the
Web. Conceptually, these algorithms require a querytime PageRank-style
iterative computation over the full graph. This computation is too expensive
for large graphs, and not feasible at query time. Alternatively, building an
index of precomputed results for some or all keywords involves very expensive
preprocessing. We introduce BinRank, a system that approximates ObjectRank
results by utilizing a hybrid approach inspired by materialized views in
traditional query processing. We materialize a number of relatively small
subsets of the data graph in such a way that any keyword query can be answered
by running ObjectRank on only one of the subgraphs. BinRank generates the
subgraphs by partitioning all the terms in the corpus based on their
co-occurrence, executing ObjectRank for each partition using the terms to
generate a set of random walk starting points, and keeping only those objects
that receive non-negligible scores. The intuition is that a subgraph that
contains all objects and links relevant to a set of related terms should have
all the information needed to rank objects with respect to one of these terms.
We demonstrate that BinRank can achieve subsecond query execution time on the
English Wikipedia data set, while producing high-quality search results that
closely approximate the results of ObjectRank on the original graph. The
Wikipedia link graph contains about 108 edges, which is at least two orders of
magnitude larger than what prior state of the art dynamic authority-based
search systems have been able to demonstrate. Our experimental evaluation
investigates the trade-off between query execution time, quality of the
results, and storage requirements of BinRank.
4.
CLOSENESS: A NEW PRIVACY MEASURE FOR DATA
PUBLISHING
The k-anonymity privacy
requirement for publishing microdata requires that each equivalence class (i.e.,
a set of records that are indistinguishable from each other with respect to
certain “identifying” attributes) contains at least k records. Recently,
several authors have recognized that k-anonymity cannot prevent attribute
disclosure. The notion of `-diversity has been proposed to address this;
`-diversity requires that each equivalence class has at least `
well-represented (in Section 2) values for each sensitive attribute. In this
article, we show that `-diversity has a number of limitations. In particular,
it is neither necessary nor sufficient to prevent attribute disclosure.
Motivated by these limitations, we propose a new notion of privacy called
“closeness”. We first present the base model t-closeness, which requires that
the distribution of a sensitive attribute in any equivalence class is close to
the distribution of the attribute in the overall table (i.e., the distance
between the two distributions should be no more than a threshold t). We then
propose a more flexible privacy model called (n, t)-closeness that offers
higher utility. We describe our desiderata for designing a distance measure
between two probability distributions and present two distance measures. We
discuss the rationale for using closeness as a privacy measure and illustrate its
advantages through examples and experiments.
5.
DATA LEAKAGE DETECTION
We study the following
problem: A data distributor has given sensitive data to a set of supposedly
trusted agents (third parties). Some of the data is leaked and found in an
unauthorized place (e.g., on the web or somebody’s laptop). The distributor
must assess the likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other means. We propose
data allocation strategies (across the agents) that improve the probability of
identifying leakages. These methods do not rely on alterations of the released
data (e.g., watermarks). In some cases we can also inject “realistic but fake”
data records to further improve our chances of detecting leakage and
identifying the guilty party.
6.
PAM: AN EFFICIENT AND PRIVACY-AWARE MONITORING
FRAMEWORK FOR CONTINUOUSLY MOVING OBJECTS
Efficiency and privacy
are two fundamental issues in moving object monitoring. This paper proposes a
privacy-aware monitoring (PAM) framework that addresses both issues. The
framework distinguishes itself from the existing work by being the first to
holistically address the issues of location updating in terms of monitoring
accuracy, efficiency, and privacy, particularly, when and how mobile clients
should send location updates to the server. Based on the notions of safe region
and most probable result, PAM performs location updates only when they would
likely alter the query results. Furthermore, by designing various client update
strategies, the framework is flexible and able to optimize accuracy, privacy,
or efficiency. We develop efficient query evaluation/reevaluation and safe
region computation algorithms in the framework. The experimental results show
that PAM substantially outperforms traditional schemes in terms of monitoring
accuracy, CPU cost, and scalability while achieving close-to-optimal
communication cost.
7.
P2P REPUTATION MANAGEMENT USING DISTRIBUTED
IDENTITIES AND DECENTRALIZED RECOMMENDATION CHAINS
Peer-to-peer (P2P)
networks are vulnerable to peers who cheat, propagate malicious code, leech on
the network, or simply do not cooperate. The traditional security techniques
developed for the centralized distributed systems like client-server networks
are insufficient for P2P networks by the virtue of their centralized nature.
The absence of a central authority in a P2P network poses unique challenges for
reputation management in the network. These challenges include identity
management of the peers, secure reputation data management, Sybil attacks, and
above all, availability of reputation data. In this paper, we present a
cryptographic protocol for ensuring secure and timely availability of the
reputation data of a peer to other peers at extremely low costs. The past
behavior of the peer is encapsulated in its digital reputation, and is
subsequently used to predict its future actions. As a result, a peer’s
reputation motivates it to cooperate and desist from malicious activities. The
cryptographic protocol is coupled with self-certification and cryptographic
mechanisms for identity management and countering Sybil attack. We illustrate
the security and the efficiency of the system analytically and by means of
simulations in a completely decentralized Gnutella-like P2P network.
8.
MANAGING MULTIDIMENSIONAL HISTORICAL AGGREGATE DATA
IN UNSTRUCTURED P2P NETWORKS
A P2P-based framework
supporting the extraction of aggregates from historical multidimensional data
is proposed, which provides efficient and robust query evaluation. When a data
population is published, data are summarized in a synopsis, consisting of an
index built on top of a set of subsynopses (storing compressed representations
of distinct data portions). The index and the subsynopses are distributed
across the network, and suitable replication mechanisms taking into account the
query workload and network conditions are employed that provide the appropriate
coverage for both the index and the subsynopses.
9.
Deriving Concept-Based User Profiles from Search
Engine Logs
User profiling is a fundamental component of any
personalization applications. Most existing user profiling strategies are based
on objects that users are interested in (i.e., positive preferences), but not
the objects that users dislike (i.e., negative preferences). In this paper, we
focus on search engine personalization and develop several concept-based user
profiling methods that are based on both positive and negative preferences. We
evaluate the proposed methods against our previously proposed personalized
query clustering method. Experimental results show that profiles which capture
and utilize both of the user’s positive and negative preferences perform the
best. An important result from the experiments is that profiles with negative
preferences can increase the separation between similar and dissimilar queries.
The separation provides a clear threshold for an agglomerative clustering algorithm
to terminate and improve the overall quality of the resulting query clusters.
10. Parallelizing
Itinerary-Based KNN Query Processing in Wireless Sensor Networks
Wireless sensor networks have been proposed for
facilitating various monitoring applications (e.g., environmental monitoring
and military surveillance) over a wide geographical region. In these applications,
spatial queries that
collect data from wireless sensor networks play an important role. One such
query is the K-Nearest Neighbor (KNN) query that facilitates collection of sensor data samples based on a given
query location and the number of samples specified (i.e., K). Recently, itinerary-based KNN query
processing techniques, which propagate queries and collect data along a
predetermined itinerary, have been developed. Prior studies demonstrate that
itinerary-based KNN query processing algorithms are able to achieve better
energy efficiency than other existing algorithms developed upon tree-based
network infrastructures. However, how to derive itineraries for KNN query based
on different performance requirements remains a challenging problem. In this
paper, we propose a Parallel Concentric-circle Itinerary based KNN (PCIKNN)
query processing technique that derives different itineraries by optimizing
either query latency or energy consumption. The performance of
PCIKNN is analyzed mathematically and evaluated through extensive experiments.
Experimental results show that PCIKNN outperforms the state-of-the-art
techniques.
11. Towards
an Automatic Detection of Sensitive Information in a Database
In order to validate
user requirements, tests are often conducted on real data. However,
developments and tests are more and more outsourced, leading companies to
provide external staff with real confidential data. A solution to this problem
is known as Data Scrambling. Many algorithms aim at smartly replacing true data
by false but realistic ones. However, nothing has been developed to automate
the crucial task of the detection of the data to be scrambled. In this paper we
propose an innovative approach - and its implementation as an expert system -
to achieve the automatic detection of the candidate attributes for scrambling.
Our approach is mainly based on semantic rules that determine which concepts
have to be scrambled, and on a linguistic component that retrieves the
attributes that semantically correspond to these concepts. Since attributes
cannot be considered independently from each other we also address the
challenging problem of the propagation of the scrambling among the whole
database. An important contribution of our approach is to provide a semantic modeling
of sensitive data. This knowledge is made available through production rules,
operationalizing the sensitive data detection.
12. ViDE: A
Vision-Based Approach for Deep Web Data Extraction
Deep Web contents are accessed by queries
submitted to Web databases and the returned data records are enwrapped in dynamically
generated Web pages (they will be called deep Web
pages in this paper). Extracting structured data from
deep Web pages is a challenging problem due to the underlying intricate
structures of such pages. Until now, a large number of techniques have been proposed
to address this problem, but all of them have inherent limitations because they
are Web-page-programming-language dependent. As the popular two-dimensional
media, the contents on Web pages are always displayed regularly for users to
browse. This motivates us to seek a different way for deep Web data extraction
to overcome the limitations of previous works by utilizing some interesting
common visual features on the deep Web pages. In this paper, a novel
vision-based approach that is Web-page programming- language-independent is
proposed. This approach primarily utilizes the visual features on the deep Web
pages to implement deep Web data extraction, including data record extraction
and data item extraction. We also propose a new evaluation measure revision to capture the amount of human effort needed to
produce perfect extraction. Our experiments on a large set of Web databases
show that the proposed vision-based approach is highly effective for deep Web
data extraction.
13. An
UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining
Traditional pattern growth-based approaches for
sequential pattern mining derive length-(k þ 1) patterns
based on the projected databases of length-k patterns recursively. At each level of recursion, they
unidirectionally grow the length of detected patterns by one along the suffix
of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed
Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining.
UDDAG allows bidirectional pattern growth along both ends of detected patterns.
Thus, a length-k pattern can
be detected in blog2k þ 1c levels of recursion
at best, which results in fewer levels of recursion and faster pattern growth.
When minSup is large such that the average pattern length is close to 1, UDDAG
and PrefixSpan have similar performance because the problem degrades into
frequent item counting problem. However, UDDAG scales up much better. It often
outperforms PrefixSpan by almost one order of magnitude in scalability tests.
UDDAG is also considerably faster than Spade and LapinSpam. Except for extreme
cases, UDDAG uses comparable memory to that of PrefixSpan and less memory than
Spade and LapinSpam. Additionally, the special feature of UDDAG enables its extension
toward applications involving searching in large spaces.
14. Domain-Driven
Classification Based on Multiple Criteria and Multiple Constraint-Level Programming
for Intelligent Credit Scoring
Extracting knowledge from the transaction
records and the personal data of credit card holders has great profit potential
for the banking industry. The challenge is to detect/predict bankrupts and to
keep and recruit the profitable customers. However, grouping and targeting
credit card customers by traditional data-driven mining often does not directly
meet the needs of the banking industry, because data-driven mining
automatically generates classification outputs that are imprecise, meaningless,
and beyond users’ control. In this paper, we provide a novel domain-driven
classification method that takes advantage of multiple criteria and multiple
constraint level programming for intelligent credit scoring. The method involves
credit scoring to produce a set of customers’ scores that allows the
classification results actionable and controllable by human interaction during
the scoring process. Domain knowledge and experts’ experience parameters are
built into the criteria and constraint functions of mathematical programming
and the human and machine Conversation is employed to generate an efficient and
precise solution. Experiments based on various data sets validated the effectiveness
and efficiency of the proposed methods.
15. Feature
Selection Using f-Information Measures in Fuzzy Approximation
Spaces
The selection of nonredundant and relevant
features of real-valued data sets is a highly challenging problem. A novel feature
selection method is presented here based on fuzzy-rough sets by maximizing the
relevance and minimizing the redundancy of the selected features. By
introducing the fuzzy equivalence partition matrix, a novel representation of
Shannon’s entropy for fuzzy approximation spaces is proposed to measure the
relevance and redundancy of features suitable for real-valued data sets. The
fuzzy equivalence partition matrix also offers an efficient way to calculate
many more information measures, termed as f-information measures. Several f-information measures are shown to be effective for selecting
nonredundant and relevant features of real-valued data sets. This paper
compares the performance of different f-information measures for feature selection in fuzzy approximation spaces.
Some quantitative indexes are introduced based on fuzzy-rough sets for
evaluating the performance of proposed method. The effectiveness of the
proposed method, along with a comparison with other methods, is demonstrated on
a set of real-life data sets.
MOBILE
COMPUTING:
16.
SECURE DATA COLLECTION IN WIRELESS SENSOR NETWORKS
USING RANDOMIZED DISPERSIVE ROUTES
Compromised-node and denial-of-service
are two key attacks in wireless sensor networks (WSNs). In this paper, we study
routing mechanisms that circumvent (bypass) black holes formed by these attacks.
We argue that existing multi-path routing approaches are vulnerable to such
attacks, mainly due to their deterministic nature. So once an adversary
acquires the routing algorithm, it can compute the same routes known to the
source, and hence endanger all information sent over these routes. In this paper,
we develop mechanisms that generate randomized multipath routes. Under our
design, the routes taken by the “shares” of different packets change over time.
So even if the routing algorithm becomes known to the adversary, the adversary
still cannot pinpoint the routes traversed by each packet. Besides randomness,
the routes generated by our mechanisms are also highly dispersive and
energy-efficient, making them quite capable of bypassing black holes at low
energy cost. Extensive simulations are conducted to verify the validity of our
mechanisms.
17.
A Novel Dual-Index Design to Efficiently Support
Snapshot Location-Based Query Processing in Mobile Environments
Location-based services are increasingly popular
recently. Many applications aim to support a large number of users in metro
area (i.e., dense networks). To cope with this challenge, we present a
framework that supports location-based services on MOVing objects in road
Networks (MOVNet, for short) [26]. MOVNet’s dual-index design utilizes an
on-disk R-tree to store the network connectivities and an in-memory grid
structure to maintain moving object position updates. In this paper, we extend
the functionality of MOVNet to support snapshot range queries as well as
snapshot k nearest
neighbor queries. Given an arbitrary edge in the space, we analyze the minimum
and maximum number of grid cells that are possibly affected. We show that the
maximum bound can be used in snapshot range query processing to prune the
search space. We demonstrate via theoretical analysis and experimental results
that MOVNet yields excellent performance with various networks while scaling to
a very large number of moving objects.
PARELLEL
AND DISTRIBUTED SYSTEM:
18.
PRIVACY-CONSCIOUS LOCATION-BASED QUERIES IN MOBILE
ENVIRONMENTS
In location-based
services, users with location-aware mobile devices are able to make queries
about their surroundings anywhere and at any time. While this ubiquitous
computing paradigm brings great convenience for information access, it also
raises concerns over potential intrusion into user location privacy. To protect
location privacy, one typical approach is to cloak user locations into spatial
regions based on user-specified privacy requirements, and to transform
location-based queries into region-based queries. In this paper, we identify
and address three new issues concerning this location cloaking approach. First,
we study the representation of cloaking regions and show that a circular region
generally leads to a small result size for region-based queries. Second, we
develop a mobility-aware location cloaking technique to resist trace analysis
attacks. Two cloaking algorithms, namely MaxAccu_Cloak and MinComm_Cloak, are
designed based on different performance objectives. Finally, we develop an
efficient polynomial algorithm for evaluating circular-region-based kNN
queries. Two query processing modes, namely bulk and progressive, are presented
to return query results either all at once or in an incremental manner.
Experimental results show that our proposed mobility-aware cloaking algorithms
significantly improve the quality of location cloaking in terms of an entropy
measure without compromising much on query latency or communication cost.
Moreover, the progressive query processing mode achieves a shorter response
time than the bulk mode by parallelizing the query evaluation and result
transmission.
SECURE
COMPUTING:
19.
Using Web-Referral Architectures to Mitigate
Denial-of-Service Threats
The web is a complicated graph, with millions of
websites interlinked together. In this paper, we propose to use this web sitegraph structure to mitigate flooding attacks on a
website, using new web referral architecture for privileged service
(“WRAPS”). WRAPS allows a legitimate client to
obtain a privilege URL through
a simple click on a referral hyperlink, from a website trusted by the target
website. Using that URL, the client can get privileged access to the target
website in a manner that is far less vulnerable to a distributed
denial-of-service (DDoS) flooding attack than normal access would be. WRAPS
does not require changes to web client software and is extremely lightweight
for referrer websites, which makes its deployment easy. The massive scale of
the web sitegraph could deter attempts to isolate a website through blocking
all referrers. We present the design of WRAPS, and the implementation of a prototype
system used to evaluate our proposal. Our empirical study demonstrates that
WRAPS enables legitimate clients to connect to a website smoothly in spite of a
very intensive flooding attack, at the cost of small overheads on the website’s
ISP’s edge routers. We discuss the security properties of WRAPS and a simple
approach to encourage many small websites to help protect an important site
during DoS attacks.
SOFTWARE
ENGINEERING:
20.
Vulnerability Discovery with Attack Injection
The increasing reliance put on networked
computer systems demands higher levels of dependability. This is even more relevant
as new threats and forms of attack are constantly being revealed, compromising
the security of systems. This paper addresses this problem by presenting an
attack injection methodology for the automatic discovery of vulnerabilities in
software components. The proposed methodology, implemented in AJECT, follows an
approach similar to hackers and security analysts to discover vulnerabilities
in network-connected servers. AJECT uses a specification of the server’s
communication protocol and predefined test case generation algorithms to
automatically create a large number of attacks. Then, while it injects these
attacks through the network, it monitors the execution of the server in the
target system and the responses returned to the clients. The observation of an
unexpected behavior suggests the presence of a vulnerability that was triggered
by some particular attack (or group of attacks). This attack can then be used
to reproduce the anomaly and to assist the removal of the error. To assess the
usefulness of this approach, several attack injection campaigns were performed
with 16 publicly available POP and IMAP servers. The results show that AJECT
could effectively be used to locate vulnerabilities, even on well-known servers
tested throughout the years.
21.
On Event-Based Middleware for Location-Aware
Mobile Applications
As mobile
applications become more widespread, programming paradigms and middleware
architectures designed to support their development are becoming increasingly
important. The event-based programming paradigm is a strong candidate for the
development of mobile applications due to its inherent support for the loose
coupling between components required by mobile applications. However, existing
middleware that supports the event-based programming paradigm is not well suited
to supporting location-aware mobile applications in which highly mobile
components come together dynamically to collaborate at some location. This
paper presents a number of techniques including location-independent
announcement and subscription coupled with location-dependent filtering and
event delivery that can be used by event-based middleware to support such
collaboration. We describe how these techniques have been implemented in STEAM,
an event-based middleware with a fully decentralized architecture, which is
particularly well suited to deployment in ad hoc network environments. The cost
of such location-based event dissemination and the benefits of distributed
event filtering are evaluated.
CLOUD
COMPUTING:
22.
An Efficient Hybrid Peer-to-Peer System for
Distributed Data Sharing
Peer-to-peer overlay networks are widely used in
distributed systems. Based on whether a regular topology is maintained among
peers, peer-to-peer networks can be divided into two categories: structured
peer-to-peer networks in which peers are connected by a regular topology, and
unstructured peer-to-peer networks in which the topology is arbitrary.
Structured peer-to-peer networks usually can provide efficient and accurate
services but need to spend a lot of effort in maintaining the regular topology.
On the other hand, unstructured peer-to-peer networks are extremely resilient
to the frequent peer joining and leaving but this is usually achieved at the
expense of efficiency. The objective of this work is to design a hybrid peer-to-peer
system for distributed data sharing which combines the advantages of both types
of peer-to-peer networks and minimizes their disadvantages. The proposed hybrid
peer to- peer system is composed of two parts: the first part is a structured
core network which forms the backbone of the hybrid system; the second part is
made of multiple unstructured peer-to-peer networks each of which is attached
to a node in the core network. The core structured network can narrow down the
data lookup within a certain unstructured network accurately, while the
unstructured networks provide a low-cost mechanism for peers to join or leave
the system freely. A data lookup operation first checks the local unstructured network,
and then, the structured network. This two-tier hierarchy can decouple the
flexibility of the system from the efficiency of the system. Our simulation
results demonstrate that the hybrid peer-to-peer system can utilize both the
efficiency of structured peer-to peer network and the flexibility of the unstructured
peer-to-peer network and achieve a good balance between the two types of networks.
23.
A Model for Reducing Power Consumption in
Peer-to-Peer Systems
Information systems
based on the cloud computing model and peer-to-peer (P2P) model are now getting
popular. In the cloud computing model, a cloud of servers support thin clients with
various types of service like Web pages and databases. On the other hand, every
computer is peer and there is no centralized coordinator in the P2P model. It
is getting more significant to discuss how to reduce the total electric power
consumption of computers in information systems to realize eco-society. In this
paper, we consider a Web type of application on P2P overlay networks. First, we
discuss a model for showing how much each server peer consumes electric power
to perform Web requests from client peers. Then, we discuss algorithms for a
client peer to select a server peer in a collection of server peers so that the
total power consumption can be reduced while some constraint like deadline one
is satisfied. Lastly, we evaluate the algorithms in terms of the total power
consumption and throughput compared with traditional round robin algorithms.
24.
A
cloud oriented approach for people-provided services
Web 2.0
started a paradigm shift concerning content generation. Users spend hours
browsing and producing contents to share, most of the times, freely. During
these activities, their laptops and home PCs are considerably underused. By
combining the amount of idle processing capacity with increasingly fast
internet accesses, every computer can be considered as a computational “nearby”
resource. The question is: what to do with them? This paper explores how these
resources can be used to give something back to the user in a secure and
private way.
INTERNET
AND WEB APPLICATION:
25.
A
Solution Model and Tool for Supporting the Negotiation of Security Decisions in
E-Business Collaborations
Sharing, comparing
and negotiating security related actions and requirements across businesses has
always been a complicated matter. Issues arise due to semantic gaps, disparity
in security documentation and formats, and incomplete security-related
information during negotiations, to say the least. As collaborations amongst
e-businesses in particular increase, there is a growing, implicit need to
address these issues and ease companies’ deliberations on security. Our research
has investigated this topic in substantial detail, and in this paper we present
a novel solution model and tool for supporting businesses through these tasks.
Initial evaluation results and feedback from interviewed security professionals
affirm the use and suitability of our proposals in supporting the security
actions negotiation process.
This paper presents an automatic
conformance testing tool with timing constraints from a formal specification
(TEFSM: Timed Extended Finite State Machine) of web services composition
(WSOTF: Web Service composition, Online Testing Framework), that is implemented
by an online testing algorithm. This algorithm combines simultaneously idea of
test execution and debug to generate and simultaneously execute the test cases.
In this tool, the complete test scenario (timed test case suite and data) is
built during test execution. This tool focus on unit testing, it means that
only the service composition under test is tested and all its partners will be
simulated by the WSOTF. This tool also considers the timing constraints and
synchronous time delay. We can also use this tool for debug that is not easy
while we develop a composite of Web services.
27. TRIC: An
Infrastructure for Trust and Reputation Across Virtual Communities
Virtual communities are gradually becoming an
inherent part of modern life. Trust and reputation systems, which
are considered key enablers of virtual communities, These systems support the accumulation of
member reputation information and
leverage this information to increase the likelihood of successful member
interactions and to better protect the community from fraudulent members. Reputation
information is an important part of a user's
identity, which makes this data both sensitive and
desired for communities
to share. Reputation sharing is also motivated by
the individual user that can leverage her state in new communities
using the reputation she gained in others. In
previous work we have presented the Cross-Community Reputation (CCR) model for
sharing reputation knowledge among communities. The CCR model identifies the fundamental
terms required for a meaningful sharing of reputation information among communities
and proposes means to make the information sharing
feasible. This paper presents the TRIC infrastructure for enabling CCR
systems. The major concerns of TRIC are to protect
the user's rights for privacy and
control over her data and to draw some architecture
guidelines that can be implemented in more than one way. We explain the major
components required for this type of infrastructure and discuss
known design alternatives and their tradeoffs in the
case of TRIC.
28. Crawling Online
Social Graphs
Extensive research has been conducted on
top of online social
networks (OSNs), while little attention has been paid to the data collection
process. Due to the large scale of OSNs and their privacy control policies, a
partial data set is often used for analysis. The data set analyzed is decided
by many factors including the choice of seeds, node selection algorithms, and
the sample size. These factors may introduce biases and further contaminate or
even skew the results. To evaluate the impact of different factors, this paper
examines the OSN graph crawling
problem, where the nodes are OSN users and the edges are the links (or
relationship) among these users. More specifically, by looking at various
factors in the crawling process, the following
problems are addressed in this paper: 1) Efficiency: How fast different
crawlers discover nodes/links; 2) Sensitivity: How different OSNs and the
number of protected users affect crawlers; 3) Bias: How major graph properties are skewed. To the best of our knowledge,
our simulations on four real world online social graphs provide the first
in-depth empirical answers to these questions.
29. Combining the Missing Link: an Incremental Topic Model
of Document Content and Hyperlink
The content and
structure of linked
information such as sets of web pages or research
paper archives are dynamic and keep on changing. Even though different methods
are proposed to exploit both the link structure and the content
information, no existing approach can effectively deal with this evolution. We
propose a novel joint model, called Link-IPLSI, to combine texts
and links in a topic modeling framework incrementally. The
model takes advantage of
a novel link updating technique that can cope with
dynamic changes of online document
streams in a faster and scalable way. Furthermore, an
adaptive asymmetric learning method is adopted to freely control the assignment of weights to
terms and citations. Experimental results on two different sources of online information demonstrate the
time saving strength of our method and indicate that
our model leads to systematic improvements in the quality of classification
and link prediction.
30. Query-Aware Complex
Object Buffer Management in XML Information Retrieval
In this paper, we
analyse the data access characteristics of a typical XML
information retrieval
system and propose a new query aware
buffer replacement algorithm based on prediction of
Minimum Reuse Distance (MRD for short). The algorithm predicts an object's next reference distance according to the retrieval system's running status and replaces the objects that have maximum reuse distances. The factors
considered in the replacement algorithm include the
access frequency, creation cost, and size of objects,
as well as the queries being executed. By taking into account the queries
currently running or queuing in the system, MRD
algorithm can predict more accurately the reuse distances of index data objects.
31. Recording
How-Provenance on Probabilistic Databases
Tracking data provenance
(or lineage) has become increasingly important in many large-scale
applications, and a few methods have been proposed to record
data provenance recently. However, most of previous
works mainly focus on deterministic databases except Trio style lineage that aims at probabilistic databases, which
is much more challenging because of the exponential growth of possible world
instances and dependence among intermediate tuples. This paper proposes an
approach, named PHP-tree, to model how-provenance upon probabilistic databases. we also show how to
evaluate probability based on a PHP-tree. Compared
with Trio style lineage, our approach is independent of intermediate results
and can calculate the probability both cases of restricted and complete
propagation of data provenance. Detailed
experimental results show the effectiveness, efficiency and scalability of our
proposed model.
32. Efficient Peer-to-Peer
Similarity Query Processing for High-dimensional Data
Objects, such as a digital image, a text
document or a DNA sequence are usually represented in a high
dimensional feature space. A fundamental issue in (peer-to-peer)
P2P systems is to support an efficient
similarity search for high-dimensional data in metric spaces. Prior works suffer from some
fundamental limitations, such as being not adaptive to
a highly dynamic network, poor search efficiency under skewed data scenarios, large maintenance overhead and etc. In
this study, we propose an efficient scheme, Dragon, to support P2P similarity
search in metric spaces. Dragon achieves the efficiency through the following
designs: 1) Dragon is based on our previous designed P2P network, Phoenix,
which has the optimal routing efficiency in dynamic scenarios. 2) We design a
locality-preserving naming algorithm and a routing tree for
each peer in Phoenix to
support range queries. A radius-estimated method is
proposed to transform a kNN query
to a range query. 3) A
load-balancing algorithm is given to support strong query processing under skewed data distributions. Extensive experiments verify the
superiority of Dragon over existing works.
33. Two Novel Adaptive
Symbolic Representations for
Similarity Search in Time Series Databases
Since the last decade, we have seen an
increasing level of interest in time
series data mining due to its variety of real-world applications. Numerous representation models of time
series have been proposed for data mining, including
piecewise polynomial models, spectral models, and the recently proposed symbolic models, such as Symbolic
Aggregate approXimation (SAX) and its multiresolution extension, indexable Symbolic Aggregate approXimation (iSAX). In spite of many advantages of dimensionality/numerosity
reduction, and lower bounding distance measures, the quality of SAX
approximation is highly dependent on the Gaussian distributed property of time series, especially in
reduced-dimensionality literature. In this paper, we
introduce a novel adaptive
symbolic approach based on the combination of SAX
and k¬-means algorithm which we call adaptive SAX
(aSAX). The proposed representation greatly
outperforms the classic SAX not only on the highly Gaussian distribution
datasets, but also on the lack of Gaussian distribution datasets with a variety
of dimensionality reduction. In addition to being
competitive with, or superior to, the classic SAX, we extend aSAX to the
multiresolution symbolic representation
called indexable adaptive SAX (iaSAX). Our empirical
experiments with real-world time series datasets
confirm the theoretical analyses as well as the efficiency of the two proposed algorithms in
terms of the tightness of lower bound, pruning power and number of random disk
accesses.
34. An Asymmetric
Similarity Measure for Tag Clustering on Flickr
Web 2.0 tools and environments have made tagging, the act of assigning keywords to on-line objects, a popular way to annotate shared
resources. The success of now-prominent tagging
systems makes tagging "the natural way for people to classify objects as well as an attractive way to discover new material". One of
the most challenging problems is to harvest the semantics from these systems,
which can support a number of applications, including tag
clustering and tag
recommendation. We conduct detailed studies on
different types of tag relations and tag similarity measures, and propose a scalable measure
that we name Reliability Factor Similarity Measure (RFSM). We compare it with two other measures having similar scalability by integrating them
into hierarchical clustering methods and performing tag clustering on a subset of Flickr data. The
results suggest that RFSM outperforms those two measures
when it is applies for tag
clustering purpose. We also present an alternative way of utilizing discovered tag relations to set up tag
refining rules in order to deal with some noise in the initial tag sets, which can in turn improve the precision of tag relations.
35. Effective Clustering of Dense and Concentrated Online
Communities
Most clustering
algorithms tend to separate large scale online communities into several meaningful sub-communities by extracting cut points and
cut edges. However, these algorithms are not effective
on dense and concentrated graphs which do not have any meaningful cut
points. Common problems with the previous algorithms are as follows. First, the
size of the first cluster
is too large as it may contain many incompatible users. Second, the quality and the purity of the clusters are very low. Third, only the dominant first cluster is found to be meaningful. To address these
problems, we first propose a graph transformation to separate large scale online communities into two
different types of meaningful subgraphs. The first
subgraph is the intimacy graph and the second is the
reputation graph. Then, we present the effective
algorithms for discovering good sub-communities and for excluding incompatible users in these subgraphs.
The experimental results show that our algorithms allow for extracting more
suitable and meaningful sub-communities
than the previous work in dense online
networks.
36. Graph-based Recommendation on Social Networks
Recommender systems have emerged as an
essential response to the rapidly growing digital information phenomenon in
which users are finding it more and more difficult to locate the right
information at the right time. Systems under Web2.0 allow users not only to
give resources- ratings but also to assign tags to them. Tags play a
significant role in Web 2.0. They can be used for navigation, browsing, recommendation and so on. In
this paper, we propose a novel recommendation
algorithm, which is based on
social networks. The social network is established
among users and items, taking into account both the information of ratings and
tags. We consider users' co-tagging behaviors and add the similarity
relationship to the graph to enhance the
performance. Our algorithm is based on the Random Walk with Restarts but provides a more
natural and efficient way to represent social networks. Having considered the influence of tags, the
transition matrix is denser and the recommendation
is more accurate. By evaluating our new algorithm and comparing it to the
baseline algorithm which is used in many real world recommender systems on a real life dataset, we make the conclusion that our method
performs better than the baseline method.
37. Improving a News Recommendation System in Adapting to Interests of
a User with Storage of a Constant Size
It is desired to
have a system that can
recommend news articles according to interests of a user, which would change
with time. In this paper, we attempt to improve a
news recommendation system with supervised classification by integrating a clustering method into the system
in order to adapt flexibly to the variety of interests of a user. To
follow the changes of interests
with time, we need to make not only the classifier
but also the clustering module of the system be easily updatable. We construct clusters in the feature space from combining one-dimensional
clusters to make the size of
storage to hold for updating clusters be constant.
We let the data distribution in each one-dimensional
space be influenced by the clustering results from another one-dimensional
space, thereby taking into account of data
distribution in the original multiple dimensional
space. Main contribution of this paper is to propose a method that can
achieve both of the two goals: to
improve the performance of
a news recommendation system and to make the amount of data to hold for updating clusters be constant. Some
experimental results are shown and the effective and weak points of our proposal are discussed.
38. Dynamically
Analyzing Time Constraints in Workflow Systems with Fixed-Date Constraint
In workflow management systems
(WFMSs), time management plays an essential role in controlling the lifecycle of business processes.
Especially, run-time analysis
of time constraints is
necessary to help process manager proactively detect possible deadline
violations and appropriately handle these violations. Traditional time constraint analyses either present deterministic results which are
too restrictive in highly uncertain workflow processes, or only consider static analysis at workflow build-time. For such an issue, this paper proposes a dynamic
approach for analyzing time
constraints during process execution. To be
specific, based on a Petri-net-extended stochastic model, this approach first analyzes activity instances' continuous probabilities of
satisfying time constraints
when a process instance is initiated. Afterwards, during the execution of this
process instance, the approach dynamically updates
these probabilities whenever an activity instance is completed. Moreover, an
example process instance in real-world WFMSs shows
the practicality of our approach.
39. Retrieving
Cultural Heritage Information with Google Earth
In recent years, a great number of digital
archives have been built in the field of cultural heritage preservation. Besides providing a systematic way
to store and handle digitized versions of valuable artifacts and archival items
these archives allow scientists, preservation officers and other professionals
to access documents and information that may
otherwise be unavailable. Opening the cultural
archives for interoperation as well as public access is often a desirable objective.
For this purpose, a user interface everybody knows and understands would be a
good choice. In this paper, we propose to use Google
Earth as a search and retrieval client for public
access to digital archives. A mapping from a metadata model comprising
geospatial, temporal, and thematical tags onto the Google
KML is presented. This way Google Earth can easily be extended with
cultural heritage information as stored in one or more digital archives. Any
user all around the world can use Google Earth to locate, display and access the desired information by simply downloading and installing the
appropriate KML file.
40. Hash Tree: A New Hybrid Index for Flash Disks
Flash disks have become a popular
alternative for magnetic disks,
due to their fast I/O speed and other features such as small-size,
shock-resistance, energy-efficient and non-volatile. However, flash disks also have
characteristics of out-of-place update and asymmetric I/O latencies for read, write, and erase operations, which introduce new challenges into the indexing
mechanism for flash disks. Traditional index
structures do not take the flash I/O characteristics
into account and, therefore, will cause poor performance. To address this
problem, we present a new
hybrid index structure for flash disks
in this paper, which is called HashTree. HashTree aims at getting better update
performance while keeping relatively high search efficiency. The HashTree uses a hash-based index to split the indexed
records into several buckets, and then we develop a new tree structure named FTree
to organize the records in each bucket. Compared with the previous tree-based indexes, our policy can reduce the costs of
maintaining the hierarchical tree structure. We also
introduce a tuning mechanism into the HashTree so
that we can obtain appropriate trade-off between search performance and update
performance. We conducted an experiment on a
commercial SSD to evaluate the performance of HashTree. Compared with its
competitors such as BFTL and FD-tree, our
experimental results show that HashTree performs best in terms of both update
performance and overall performance.
41. Service Collaboration Network: A Novel Mechanism for Web
Service Management
This paper proposes a
novel approach to service
management and monitoring on the basis of service collaboration relations
that can be extracted from service composition
history. An undirected and weighted service collaboration network is
constructed therefore. On the network, two
abstractions are defined: service centrality, which
measures a service's collaboration capability comparing to other services for a long period, and service
activeness, which measures the frequency of a service's collaboration with
others for a short
period. The capability of service management among Open APIs is investigated and
demonstrated. Experimental data are collected from Programmable Web with the period from 2005-9-14 to 2009-9-21.
Experiments show that the defined metrics can properly reflect the service properties.
42. Suggesting Topic-Based
Query Terms as You Type
Query term suggestion that interactively expands the queries is an indispensable technique to help users
formulate high-quality queries and has attracted
much attention in the community of web search. Existing methods usually suggest terms based on statistics in documents as
well as query logs and
external dictionaries, and they neglect the fact that the topic
information is very crucial because it helps retrieve topically relevant
documents. To give users gratification, we propose a novel term
suggestion method: as the user types
in queries letter by letter, we suggest
the terms that are topically coherent with the query and could retrieve relevant documents instantly. For
effectively suggesting highly relevant terms, we propose a generative model by incorporating the
topical coherence of terms. The model learns the
topics from the underlying documents based on Latent
Dirichlet Allocation (LDA). For achieving the goal of instant query suggestion, we use a trie structure to index and
access terms. We devise an efficient top-k algorithm
to suggest terms as users type in queries. Experimental results show that our approach not
only improves the effectiveness of term suggestion,
but also achieves better efficiency and scalability.
43. Optimizing Query Processing for the Hidden Web
The term Deep Web (sometimes also called Hidden
Web) refers to the data
content that is created dynamically as the result of
a specific search on the Web.
In this respect, such content resides outside web
pages, and is only accessible through interaction with the
web site typically via HTML forms. It is believed
that the size of the Deep
Web is several orders of magnitude larger than that
of the so-called Surface Web,
i.e., the web that is
accessible and indexable by search engines.
44. A Metric for Measuring Members’ Contribution to
Information Propagation in Social Network
Sites
The phenomenon of propagation
is universal in our daily life. For
example, infectious diseases can be transmitted from one person to another, hot news is disseminated widely on the
Internet, and classic passages written by the popular users can be shared by
many other users in online social network sites.
With the emerging of online social network sites, such as Facebook, YouTube,
and Flickr, many literatures try to analyse patterns
of information propagation
and design effective virtual marketing strategies in
these sites. However, few metrics have been designed to
measure the characteristic of information
propagation. In this
paper, we propose a novel metric
for measuring members' contribution to information propagation in online social
network sites. As a case, we analyse large-scale
traces of members' contribution
to photo dissemination in
Flickr and find that the distribution of members' contribution follows a power
law distribution, which reveals that most of the information
is created and propagated by a few members.
45. Domain-oriented
Deep Web Data Sources’ Discovery and Identification
As Deep Web contains tremendous well-structured data sources, how to integrate data sources in Deep Web has become a hotspot
in current research. Accurately discovering and
identifying Deep Web data sources related to a
specific domain become key issues. We propose a Domain-Oriented Deep Web data
source Discovery method (DO-DWD)
and a novel Domain Identification strategy of Deep
Web data sources (DIDW). In the discovery
stage, we use machine learning algorithms and some
heuristic rules to find query interfaces of the data
sources; In the identification
stage, we identify Deep Web
data sources associated
with the domain by calculating the relevance between
a query interface and the domain
based on semantic similarity. Finally, we have extensive experiments on a real data set showing that DO-DWD and
DIDW are of high correctness and accuracy.
46. Towards Video
Management over Relational Database
Video has become popular
in our daily life for both professional and consumer applications. Both low
level video processing and high level semantic video analysis are critically computational tasks in
application domains. Most of current video computing
tools are developed for specific analytic tasks, they are lack higher level
interoperability with database and treat database merely as a relational
data storage engine rather than an analytic platform, which causes inefficient
data access and massive amount of data movement. In this paper, we study how to
support video data management
over relational database, and present our initial solutions of video data storage mechanism, video
data access method and efficient video analytics. We
also illustrate our ongoing prototype system HybVideo that developed in a novel
architecture. It integrates above solutions to tackle the major challenges of
providing a platform for both storage and analysis of video
data.
47. Phosphor: A
Cloud based DRM Scheme with SimCard
As 3G networks provide enhanced
capabilities of data transportation, a considerable
amount of mobile applications and services, which involve mass of unstructured
digital content, e.g., video, audio, are available. Meanwhile, pirate and
illegal distribution of these digital contents are severe issues. Digital
Rights Management (DRM) aims at protecting
unstructured digital contents from being abused through regulating the usage of
digital contents. However, existing mobile DRM schemes more or less suffer from poor compatibility, bad
practicality or high cost. Besides, to the best of our knowledge, fewer of the
existent DRM schemes
concern for efficient unstructured data management. In this paper, we propose Phosphor, a cloud based mobile DRM scheme with
sim card. In details, an unstructured data management system is adopted for
efficient data management services in DRM backend
(servers and systems). Meanwhile, sim card is introduced to Phosphor, which not only reduces the cost, but also
provides higher security. We have implemented our DRM
scheme, which demonstrates that Phosphor
is efficient, secure and practicable.
SERVICES COMPUTING:
48. Rule-Based Coordination
of Distributed Web Service
Transactions
Current approaches to transactional support
of distributed processes
in service-oriented environments are limited to
scenarios where the participant initiating the process maintains a controlling
position throughout the lifetime of the process.
This constraint impedes support of complex processes
where participants may only possess limited local views on the overall process.
Furthermore, there is little support of dynamic
aspects: failure or exit of participants usually
leads to cancelation of the whole process. In this
paper, we address these limitations by introducing a framework that strengthens
the role of the coordinator and allows for largely
autonomous coordination of
dynamic processes. We first discuss motivating examples and analyze existing
approaches to transactional coordination.
Subsequently, we present our framework TracG, which is based
on WS-BusinessActivity. It contains at its core a set of
rules for deciding on the ongoing confirmation or cancelation status of participants' work and protocol extensions for
monitoring the progress of a process. Various types of participant vitality for a process are distinguished,
facilitating the controlled exit of nonvital participants
as well as continuation of a process in case of tolerable failures. The implementation of the framework is presented and discussed regarding
interoperability issues.
No comments:
Post a Comment