D. Zhang, Y. Du, and L. Hu,
"On Monitoring the top-k Unsafe Places",
Proc. of 24th International Conference on Data Engineering (ICDE),
Cancun, Mexico, 2008.
(Acceptance rate for full paper with short presentation: 19.2%)
(PDF)
(PPT)
(Poster)
In a city, protecting units like police cars move around and protect places such as banks and residential buildings. Different places may have different requirements in how many protecting units should be nearby. If any place has less protecting units around than it requires, it is an unsafe place. We studied the Continuous Top-k Unsafe Places (CTUP) query, which continuously monitors the k least safe places while the protecting units keep sending their location updates to the server. The CTUP query is a novel addition to the family of continuous location-based queries, an emerging area due to the recent advances in dynamic location-aware environments.
T. Xia, D. Zhang, and Y. Tao,
"On Skylining with Flexible Dominance Relation (poster paper)",
Proc. of 24th International Conference on Data Engineering (ICDE),
Cancun, Mexico, 2008.
(Acceptance rate: 31%)
(PDF)
(Poster)
Given a set of d dimensional objects, a skyline query finds the objects ("skyline") that are not dominated by others. However, skylines do not always provide useful query results to users, and existing methods of various skyline queries have at least one of the following drawbacks: (1) the size of skyline objects can not be controlled, or can be only increased or only decreased but not both; (2) skyline objects do not have built-in ranks; (3) skylines do not reflect users' weights (preferences) at different dimensions. We proposed a unified approach, the ε-skyline, to effectively solve all three drawbacks. We explored the properties of ε-skylines and propose several different algorithms to compute ε-skylines.
T. Xia and D. Zhang,
"Refreshing the Sky: The Compressed Skycube with Efficient Support for Frequent Updates",
Proc. of 25th ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pages 491-502, Chicago, IL, 2006.
(Acceptance rate: 13.0%)
(PDF)
(PPT)
Given a set of multi-attribute objects (where different dimensions may not be comparable), the skyline is the subset of most important objects according to a given set of dimensions. For instance, the skyline of NBA players corresponding to (rebounds, points) consists of the players who do not have fewer rebounds and fewer points than any other player. To efficiently support the skyline query for an arbitrary set of dimensions, one straightforward approach is to pre-compute all skylines. But this approach has two drawbacks. First, it occupies too much space, as the number of skylines is exponential to the number of dimensions. Secondly, an update in the base table may result in expensive re-construction of the whole structure. The NSF-funded project produced the Compressed Skycube, which is a very compact representation of the union of all skylines, and which has efficient query and update algorithms.
J. M. Kang, M. F. Mokbel, S. Shekhar, T. Xia, and D. Zhang,
"Continuous Evaluation of Monochromatic and Bichromatic Reverse
Nearest Neighbors",
Proc. of 23rd International Conference on Data Engineering
(ICDE), Istanbul, Turkey, 2007.
(Acceptance rate: 18.5%)
(PDF)
(PPT)
T. Xia and D. Zhang,
"Continuous Reverse Nearest Neighbor Monitoring",
Proc. of 22nd International Conference on Data Engineering
(ICDE), Atlanta, Georgia, 2006.
(Acceptance rate: 19.5%)
(PDF)
(PPT)
In a battlefield, some soldiers have medical equipments. The reverse nearest neighbors (RNNs) of a soldier X with medical equipments are the soldiers who are spatially closer to X than to all other soldiers with medical equipments. The RNNs of X are the soldiers who may need to receive help from him. To continuously monitor the RNNs in real time, while all soldiers are moving, is challenging. There are two versions of the problem. In the monochramtic version, there is a single data set. In the bichromatic version, there are two data sets, e.g. a set of soldiers with medical equipments and a set of soldiers without equipments. We solved both versions efficiently, by providing the monitoring regions which are the spatial regions in which location updates may affect the query result. The efficiency of the proposed solutions lies in the fact the size of the monitoring regions is very small compared with the size of the whole space, and therefore most location updates happen outside the monitoring regions.
Y. Du, T. Xia, Y. Tao, D. Zhang, and F. Zhu,
"On Multidimensional k-Anonymity with Local Recoding Generalization (poster paper)",
Proc. of 23rd International Conference on Data Engineering
(ICDE), Istanbul, Turkey, 2007.
(Acceptance rate: 27.6%)
(PDF)
(PPT)
To enable data publishing while protecting privacy, to simply remove the name attribute of all records is not enough, because the combinations of some attributes may reveal the true identity, e.g. the combination of (age=26, startyear=2001) may be unique and therefore may reveal the identify of the corresponding person. These attributes are called quasi-identifiers. A commonly used method is to convert the original database table to a k-anonymous table, by replacing each value of a quasi-identifier attribute by a range. For instance, (age=26, startyear=2001) may be replaced by (age in [25,30], startyear in [2000,2004]). Mapped to the multi-dimensional space of age and startyear, every record is a rectangle instead of a point. If every rectangle corresponds to at least k objects, privacy is preserved. Under the condition that the k-anonymity is achieved, the sizes of the rectangles should be as small as possible, so as to enable answering accurate aggregation queries like "find the total number of people who joined after 2000." Local recoding generalization (LRG) allows different rectangles to overlap in space. While providing more accuracy in aggregation queries, LRG is more difficult to calculate. In fact, we have proved that the problem is NP-hard both to find the table with the maximum quality, and to discover a solution with an approximation ratio <= 1.25. We have developed three algorithms with good balance between the approximation ratio and time complexity.
D. Zhang, Y. Du, T. Xia, and Y. Tao,
"Progressive Computation of The Min-Dist Optimal-Location Query",
Proc. of 32nd International Conference on Very Large Data Bases (VLDB)}, page 643-654, Seoul, Korea, 2006.
(Acceptance rate: 13.9%)
(PDF)
(PPT)
Y. Du, D. Zhang and T. Xia,
"The Optimal Location Query",
Proc. of 9th International Symposium on Spatial and
Temporal Databases (SSTD), pages 163-180, Angra dos Reis, Brazil, 2005.
(Acceptance rate: 31.2%)
(PDF)
(PPT)
The optimal location, of a potential new store, can either be defined as a location which maximizes the number of customers which consider the new store as the closest store, or be defined as a location which has maximum combined saving for the customers in their traveling distance to the nearest store. In both cases, there seems to have an infinite number of candidate locations to check. The NSF-funded research produced efficient algorithms to find exact answers to the optimal location queries, and provided theorems proving the correctness of the algorithms.
E. Kanoulas, Y. Du, T. Xia, and D. Zhang,
"Finding Fastest Paths on A Road Network with Speed Patterns",
Proc. of 22nd International Conference on Data Engineering
(ICDE), Atlanta, Georgia, 2006.
(Acceptance rate: 19.5%)
(PDF)
(PPT)
Existing GIS systems that give driving directions, such as MapQuest.com or live.local.com, do not take as input the leaving and/or arrival times. So they will produce the same route independent to what time and which day the driving direction is needed. However, speed on the road changes over time. For instance, during rush hours, inbound highways to metropolitan cities have very slow speed. The proposed method assumes a speed pattern is given for each road segment in the road network, and then solves the fastest-path query taking as input a leaving or arrival time interval. For instance, the query can be: find all the fastest paths if the leaving time is between 7 and 9; e.g. if the leaving time is in [7:00, 7:43], take route A, and take route B otherwise. The ability to handle a leaving or arrival time interval rather than a single time instant differentiates the problem from straightforward extensions of shortest-path algorithms like Dijkstra or A*.
T. Xia, D. Zhang, E. Kanoulas, and Y. Du,
"On Computing Top-t Most Influential Spatial Sites",
Proc. of the 31th International Conference on Vary Large Data Bases (VLDB),
Trondheim, Norway, 2005.
(Acceptance rate: 16.5%)
(PDF)
(PPT)
Consider a site as a wireless server and an object as a mobile user. The influence of a site is defined as the number of objects that have the site as their nearest server. Given a query range Q, the query finds several sites whose influences are the largest among all sites in Q. The NSF-funded research produced an interesting algorithm incorporating novel pruning techniques based on a new metric called minExistDNN. The algorithm is faster than straightforward algorithms by multiple orders of magnitude.
T. Xia and D. Zhang,
"Improving the R*-tree with Outlier Handling Techniques",
Proc. of 13th ACM International Symposium on Advances in Geographic Information
Systems (GIS), pages 125-134, Bremen, Germany, 2005.
(Acceptance rate: 33%)
(PDF)
(PPT)
The R*-tree is a state-of-the-art spatial index structure. Although widely used, it has several drawbacks. First of all, since all objects are stored at the leaf level, any outlier object will be included in a leaf page with a large minimum bounding rectangle (MBR), since a page has to be at least half full. Large MBRs negatively affects the query performance. Another problem is that the metric the R*-tree uses, to remove "boundary" objects from an overflowing page (so as to re-insert), is not optimal. We have explored the idea of allowing the outlier objects to be stored in index nodes of the R-tree. We have also proposed novel gain and loss metrics to guide the selection of boundary objects from a node. Extensive experiments have verified that the proposed optimizations can improve the R*-tree performance in all widely used queries. Therefore we propose to use the R^{o}-tree instead of the R*-tree whenever applicable.