|Privacy-preserving Release of Re-identifiable Moving Object Data|
|Written by Hui Wang|
Location-aware devices, for example, GSM mobile phones, GPS-enabled PDAs, location sensors, and active RFID tags, have been used extensively in recent years. The use of these devices facilitates new and exciting location-based applications. Due to the wide use of the location-aware devices, a huge collection of spatial-temporal moving object data has been generated. Moving object data can be used for various data analysis purposes. However, publication of these mobility data threatens individuals privacy since these raw trajectory data provides location information that can identify individuals and, potentially, their private information. In this article, we review some existing solutions to privacy preserving publishing of moving object data and we discuss unaddressed issues.
Location-aware devices, for example, GSM mobile phones, GPS-enabled PDAs, location sensors, and active RFID tags, have been used extensively in recent years. The wide use of these location-aware devices have generated a huge collection of spatial-temporal moving object data. These data can be used for various data analysis purposes such as city traffic control, mobility management, urban planning, and location-based service advertisements. The research effort on these data has been witnessed by large amounts of spatial-temporal data mining techniques that have been developed in the last years[ (Ashbrook & Starner, 2003), (Lee & Han, 2007), (Lee, Han, & Li, 2008), (Lee J. , Han, Li, & Gonzalez, 2008), (Jeung, Liu, Shen, & Zhou), (Li, Han, Kim, & Gonzalez, 2008), (Gonzalez, Li, Han, & Hector, 2007), (D'Auria1, Nanni, & Pedreschi, 2006), (Zheng, Xie, & Ma)].
Releasing the collected location data may help an attacker to discover personal and sensitive information like user habits, social customs, religious and sexual preferences of individuals. Thus it raises serious concerns on privacy. Unfortunately, simply replacing users' real identifiers (name, social security number, etc.) with pseudonyms is insufficient to guarantee anonymity. The problem is that due to the existence of the quasi-identifier locations, i.e., a set of locations that can be linked to external information to re-identify individuals, the attacker may be able to trace the anonymous location data back to individuals with the help of additional data sources. Therefore, it is vital to design effective anonymization techniques that can defend against the re-identification attack on moving object databases based on the location-based adversary knowledge.
2. Re-identification Attack on Moving Object Datasets
In the context of publishing relational databases, it has been shown that simply removing explicit identifers (e.g., name) does not preserve privacy, given that the adversary has some background knowledge about the individuals whose records are included into the dataset. Sweeney (Sweeney, 2002) illustrates that 87% of the U.S. population can be uniquely identifed based on 5-digit zip code, gender, and date of birth. These attributes are called quasi-identfier (QID) and the adversary may know these values from publicly available sources such as a voter list. An individual can be identifed from published data by simply joining the QID attributes with an external data source.
The same risk of the re-identification attack exists for moving object databases; the attacker may realize the existence of a set of quasi-identifier (QID) locations that may uniquely identify moving objects. Indeed, it can be considered as a successful re-identification attack if the attacker can re-identify any individual moving object with high probability by associating with a set of locations. As an example, (Krumm, 2007) shows that based on two-week GPS tracks from 172 individuals, the home address (with median error below 60 meters) and identity (with success above 5%) of these individuals were successfully identified by joining GPS traces with a reverse geocoder and a Web-based whitepage directory.
Yarovoy et al. (Yarovoy, Bonchi, Lakshmanan, & Wang, 2009) define QIDs of each moving object O as a minimal set of timestamps of O such that the number of moving objects that have the same positions as O at those timestamps is less than l, where l is a threshold for the likelihood probability for re-identification. When l equals to 1, the QID of O can uniquely determine O.
3. Privacy Preserving Publishing of Moving Object Databases
The problem of location privacy has been well studied in the context of location-based services (Deutsch, Hull, Vyas, & Zhao, 2010), (Gedik & Liu, 2005), (Grunwalddepartment, 2003), (Kido, Yanagisawa, & Satoh), (Chow, Mokbel, & Aref, 2006). Most works define the privacy risk as linking of requests for services and locations to specific mobile users. Works in (Duckham & Kulik, 2005), (Gruteser & Hoh, 2005) de-identify a given request or a location by using perturbation and obfuscation techniques. (Gedik & Liu, 2005) unlinks individual location points belonging to a user. (Ghinita, Zhao, Papadias, & Kalnis, 2005) replaces the exact location of a user U with a so-called anonymizing spatial region (ASR) that contains at least K - 1 other users. Furthermore, it defines a reciprocity privacy model which requires that a set of K users always be grouped together for a given K. As anonymization may ruin data utility, (Ghinita, Kalnis, Khoshgozaran, Shahabi, & Tan, 2008) proposes to use private location-dependent queries based on the theory of Private Information Retrieval (PIR), instead of using an anonymizer.
A few recent research has addressed the issue of privacy-preserving publishing of a single snapshot of moving object databases (Abul, Bonchi, & Nanni, 2008), (Terrovitis & Mamoulis, 2008), (Yarovoy, Bonchi, Lakshmanan, & Wang, 2009). The underlying anonymization principle is similar: a user is k-anonymous when his data is indistinguishable from the spatial and temporal information of at least k - 1 others. In particular, Terrovitis et al. (Terrovitis & Mamoulis, 2008) suppress the existence of certain points in the trajectories, so that the probability that any trajectory can be associated (i.e.,re-identified) to a real person is no larger than Pbr, where Pbr is a given breach probability threshold.
Abul et al. (Abul, Bonchi, & Nanni, 2008) propose a novel concept of privacy model based on co-localization that exploits the inherent uncertainty of the moving objects' whereabouts. By their models, the trajectory of a moving object is no longer a polyline in a three-dimensional space; instead it is a cylindrical volume, where its radius represents the possible location imprecision. Objects that move within the same cylinder are indistinguishable from each other. They define (k, δ) - anonymity for moving objects databases requiring that every (k, δ ) - anonymity set contains at least k trajectories that are co-localized w.r.t. the cylinder radius δ. They apply trajectory clustering and spatial translation (i.e., moving in space the necessary points, in order to make a trajectory lie within the anonymity cylinder) to achieve (k, δ)-anonymity.
Yarovoy et al. (Yarovoy, Bonchi, Lakshmanan, & Wang, 2009) assume every moving object may potentially have a distinct quasi-identifier (QID) that consists of a set of timestamps. They define the k-anonymity privacy model that requires every moving object must have at least k-1 other moving objects w.r.t its QID. Due to the fact that QIDs of different moving objects may not be identical, the design of anonymization algorithm is challenging since anonymization groups of moving objects may not be disjoint. Overlapping anonymization groups will result in revisits of earlier generalizations and possible re-generalization of existing anonymization groups with other objects. Yarovoy et al. proposed two anonymization algorithms that use space filling curves for fast computations of anonymization groups.
4. Remaining Challenges
How to obtain quasi-identifiers? With nowadays positioning system, the locations of moving objects can be very accurate; they can be represented as pairs of x- and y- coordinates. However, in practice, a set of spatial areas may be sufficient to identify moving objects with high probability. Examples of such spatial areas include living area, working buildings, and public places (e.g., an oncology clinic). Monreale et al. (Monreale, Trasarti, Renso, Pedreschi, & Bogorny, 2010) assume that these spatial areas are defined by the data publisher. However, it is difficult to define these area-based QIDs, as the data publisher does not always have full knowledge of the adversary information. It is interesting to explore how to define QIDs at a good granularity. Second, most of the existing work assumes that the quasi-identifiers can be provided either directly by the users when they subscribe to the location-based service or be part of the users personalized settings. We argue that in practice, the quasi-identifiers are application dependent, and should not be known a priori. How to compute QIDs from the trajectories is another interesting research direction.
How to publish moving object databases with updates? Most of the past work focuses on static database that is considered as a single release. This fails to capture the highly dynamic nature of moving object data. Continuous publishing of dynamic moving object databases is advantageous to discover moving pattern shifts and outliers, which is important in many applications such as traffic management, human activity analysis, surveillance and security. However, continuous publishing of moving object databases poses unique challenges for privacy protection. As the multiple releases of the same dataset are correlated, anonymizing each release independently may fail to resist the sequential QID-match attack and lead to privacy leakage. Therefore, how to choose the anonymizations is a demanding task.
Due to the importance of privacy of moving objects, there exists increasing need for novel and robust anonymization techniques for publishing of moving objects that can be re-identified by linking to external information. The central question to privacy in publishing of moving object databases is how to design anonymization schemes that can defend against the privacy attacks while produce minimal amount of information loss? In this article, we reviewed some existing solutions to privacy preserving publishing of moving object data. We also discussed unaddressed issues.
Section 1: http://www.itechnews.net/wp-content/uploads/2007/09/Mio-Digiwalker-A501-GPS-PDA-Phone.jpg