Data Privacy – Need for an Economic Perspective

Privacy and… sheep

The standard scenario considered in most data privacy discussions is the one in which data describing people is collected, essentially for free, by an individual (or an organization), who then process that data to obtain knowledge valuable for them. This knowledge often becomes an asset to which a monetary value can be assigned. Exaggerating only slightly, a parallel can be drawn with shearing of the sheep: the sheep (people) give away, often unknowingly, their woolen fleece (the data describing them). The fleece is then processed by scouring, spinning and weaving (data cleaning, data engineering, model building), and all the participants of the process – except the sheep -reap the profit, either monetary or as value added to their operations. The sheep go on growing more fleece, until the next shearing.

Economics – ABC

This parallel makes us think about approaching data privacy from an economic perspective. We believe that such view may allow us to better understand privacy as a subject of an exchange, and therefore – as a good in the economic sense. It will also help us to capture the multiple dimensions of data privacy: the value of the data to its owner, as well as to the party that wants to use the data, the cost (or effort) of attacking the data in a privacy breach, the cost of protecting the data against attacks, etc.

Economics has a long track of providing quantitative models of exchanges in a social context that can be reasoned with for measurement and prediction of specific concepts. As a discipline, economics attempts to provide us with understanding of actions of agents in the process of exchange of goods. This exchange is aimed at consumption of goods in order to satisfy needs. The exchange may involve money, measurable or virtual goods, may result in delayed or immediate consumption. The detailed description of exchange is called organization of market or shortly – a market. It is important to note that a complex market is defined not only by agreed upon rules stating terms and conditions of the exchange process, but also involves consent on social constraints – goals for achievement and threats perceived as socially undesirable. For our purposes we identify the society with institutions it has evolved to make decisions (state, government, etc.), which are called the regulator.

A data market

We propose to involve the concept of market in analyzing data privacy. With this approach, agents exchange their data (data about themselves that they own, in the sense of consenting to the collection and use the data describing them for specific purposes) for a “price”. This leads to the concept of a data market, and we believe it can address some aspect of data privacy. Such markets are the force governing the collection and use of the “traces of economic activities, of searches of information, of connections to people and of their special movements” (see Introduction to this issue of the Privacy Observatory Magazine). In our view there is no evidence that spontaneous rise of such markets will address the societal constraints one want to observe in data privacy, so we propose to turn to market design for a solution1.

Given this market context, the following questions seem critical:

  1. Who are parties in the transactions, and what are the exchange mechanisms?
  2. What are product characteristics?
  3. How is the product – knowledge acquired from the aggregated data-to be valued?
  4. Could there be a fair and efficient market (regulation)?

We have looked in detail into questions 1-3 in our earlier paper [Matwin, Szapiro 2010], which attempts to relate the existing literature devoted to these issues. Here we will summarize our preliminary answers to these questions and we will consider question 4.

For the sake of simplicity we consider two types of agents: individual data owners and data aggregators. The two kinds of agents interact in Aggregate Data Privacy Markets (APDMs). Members of first group exchange (give up the control of) their personal data for a price. Specifically, individuals (agents) trade their data with people or organizations interested in aggregating the data in order to perform model building from the data aggregates. These models are used in knowledge tools (e.g. recommender systems, marketing profiles, risk profiles, etc.) that add value to the company business. We have therefore a free organization of market, i.e. terms and trades of transactions are left to parties. In most of practical situations the price to the aggregators is zero, since data owners valuate future implications of this exchange as insignificant. A consequence of the zero price is the phenomenon known in economics as free riding: data aggregators obtain a valuable good (information about individual clients needed for model building) for free.

Data need more than pure market forces

In the data privacy context, another undesirable consequence of the market is the potential social exclusion that may result from using aggregated data. In extreme situations such exclusion amounts to discrimination. Profiles obtained from the data may be used to exclude society members from services (e.g. health insurance) or opportunities (credit, jobs) – often implicitly and incorrectly. Lack of acceptance for social exclusion is an example of a social norm that is advanced not only by the regulator, but also by all individuals. A regulator usually will not accept mechanisms leading to social exclusion.

This market perspective on data privacy immediately faces a serious difficulty – deficit of satisfaction may not be perceived by consumers during the transaction, but only later. An agent may freely give their data for a marketing study, only to be bombarded by product offers. Or she may voluntarily participate in a medical study, only to be ranked as a high-risk individual to whom higher insurance premiums apply. This may create a deficit of privacy, perceived after the data “exchange”. Thus from an economic point of view the deficit of privacy creates a need to heal this pain, and actions are considered which meet that need. For each action an intrinsic value is determined and a utility results from computing resultant profit (i.e. satisfying the need to decrease the privacy deficit). In this framework decisions result from personal and societal preferences (values), which can be increased by material or other explicit incentives (laws) and social sanctions or rewards (norms). Norms and incentives can strengthen or undermine each other. Preferences of agents are defined by their utilities known only to them.

Theoretically, in ADPM we assume that individuals maximize utility involving evaluation of their own privacy, profit from incentives to give it up, consequences and costs related to actions involved in giving or refusing their data, etc. Practically, it may occur that conveying true private information is not rational for agent – it leads to lower exchange outcome (and thus lower level of satisfaction of needs). This means that goods’ allocations may appear not socially optimal, i.e. allocation of goods among agents does not maximize jointly (the sum of) utilities of all agents and that another allocation could be better. In ADPM we face exactly this situation – data aggregators are not interested in informing data owners of their potential privacy deficits and the ensuing risks. This results in socially non-optimal allocations (e.g. people without medical insurance). Furthermore, there is asymmetry of information: just like the sheep are not aware of the value of their wool, the individual agents are usually not aware of the value of their data to the aggregators, who, however, are well aware of the value of the aggregated data. The existing market is therefore deficient, and there is need to decrease both information asymmetry and free riding on APDM.

Early ideas on how to “fix’ the data market

In the context of ADPM, socially desirable behaviors can be achieved through rewarding/punishing data aggregators for revealing/hiding information regarding the future use of data. One extreme proposal suggested involvement of state-run financial institutions in enforcing the existence of agreements between data owners and data aggregators on future profits from sale of aggregated data ([Laudon 96]). In an economic approach, the motivating reward and punishment would be material. In the psychological perspective, incentives result from manipulation of collective identity or social perceptions of correct behavior. Legal view would price violation of regulation according to their economic and social valuation.

Solutions for extending classical pure market mechanisms have been proposed in the economic literature (e.g. recent work of [Bénabou and Tirole 11]). While they do not venture into the area of privacy, we believe that their proposal on the introduction of norms into markets may be interesting for building “improved” data privacy markets. They show how building norms and incentive mechanisms, taking into account different behaviors of market participants, may lead to optimized markets. The creation of these mechanisms is delegated to the regulator who defines incentives – used in rules for individual actions comparison – and therefore creates a framework for individual rationality. Optimality of this mechanism is achieved through such choice of incentives for which maximization of individual utilities maximizes aggregated outcome. Bénabou and Tirole provide a stylized analytical model involving formal and informal interactions, and allowing quantitative analysis leading to interpretable conclusions.

To summarize, we advocate an economic model of data privacy. We believe that market mechanisms can be used to understand (and improve) the exchanges involved in situations when data privacy concerns arise. We recognize the limitations of the `pure` market approach and propose to address them by enhancing the market model with the introduction of social norms and incentive mechanisms. This may lead to the fair participation of the sheep in the benefits their wool brings to the society.

References

  • [Bénabou and Tirole 11] Bénabou, R., Tirole, J., “Laws and Norms”, http://econ.as.nyu.edu/docs/IO/16878/Benabou_20101019.pdf
  • [Laudon 96] Laudon, K. (1996), ‘Markets and privacy’, CACM 39(9), 92-104
  • [Matwin, Szapiro 10] Matwin, S. Szapiro, T. “Data Privacy: From Technology to Economics” Springer Studies in Computational Intelligence vol. 263, pp. 43-74, 2010.

Big data mining, fairness and privacy

We live in times of unprecedented opportunities of sensing, storing and analyzing micro-data on human activities at extreme detail and resolution level, at society scale. Wireless networks and mobile devices record the traces of our movements. Search engines record the logs of our queries for finding information on the web. Automated payment systems record the tracks of our purchases. Social networking services record our connections to friends, colleagues, collaborators.

Ultimately, these big data of human activity are at the heart of the very idea of a knowledge society: a society where decisions – small or big, by businesses or policy makers or ordinary citizens – can be informed by reliable knowledge, distilled from the ubiquitous digital traces generated as a side effect of our living. Increasingly sophisticated data analysis and data mining techniques support knowledge discovery from human activity data, enabling the extraction of models, patterns, profiles, simulation, what-if scenarios, and rules of human and social behavior – a steady supply of knowledge which is needed to support a knowledge-based society. The Data Deluge special report in “The Economist” in February 2010 witnesses exactly how this way of thinking is now permeating the entire society, not just scientists.

The capability to collect and analyze massive amounts of data has already transformed fields such as biology and physics, and now the human activity data cause the emergence of a data-driven “computational social science”: the analysis of our digital traces can create new comprehensive pictures of individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies.

The paradigm shift towards human knowledge discovery comes, therefore, with unprecedented opportunities and risks: we the people are at the same time the donors of the data that fuel knowledge discovery and the beneficiaries – or the targets – of the resulting knowledge services. The paradoxical situation we are facing today, though, is that we are fully running the risks without fully grasping the opportunities of big data: on the one hand, we feel that our private space is vanishing in the digital, online world, and that our personal data might be used without feedback and control; on the other hand, the very same data are seized in the databases of global corporations, which use privacy as a reason (or excuse?) for not sharing it with science and society at large.

In the CNN show Fast Future Forward, the anchor-woman asked the panel of futurists: “Which major challenge are we going to have to deal with 10 years out?” The answer was: “We’ll have to completely reverse our orientation to privacy. The reality is that we don’t have privacy anymore: you use your cell phone, you drive your car, you go on-line, and it’s gone.” Although the debate on the vanishing privacy is going on for years, it is striking that now it is posed as a major problem in a popular prime-time show on the future trends of society and technology. The message is clear: privacy, once a human right, is becoming a chimera in the digital era.

In the other extreme, knowledge discovery and data science run the risk of becoming the exclusive and secret domain of private companies – Internet corporations such as Google, Facebook, Yahoo, or big telecom operators – and government agencies, e.g. national security. For these data custodians, privacy is a very good excuse to protect their interests and not share the data, while users are not really aware how the data they generated are used. Alternatively, there might emerge a cast of privileged academic or industry researchers who are granted access over private big data from which they produce results that cannot be assessed or replicated, because the data they are based on cannot be shared with the scientific community. Neither scenario will serve the long-term public interest of accumulating, verifying, and disseminating knowledge.

Should we really give up and surrender to a wild digital far west, where people are exposed to risks, without protection, transparency, and trust, while society as a whole and science get little reward with respect to the opportunities offered by the secluded big data? We believe that another knowledge technology is possible, a fair knowledge discovery framework can be devised, where opportunities are preserved and risks are kept under control. A technology to set big data free for knowledge discovery, while protecting people from privacy intrusion and unfair discrimination.

In order to fully understand the risks, we should consider that the knowledge life-cycle has two distinct, yet intertwined, phases: knowledge discovery and knowledge deployment. In the first step, knowledge is extracted from the data; in the second step, the discovered knowledge is used in support of decision making; the two steps may repeat over and over again, either in off-line or in real-time mode. For instance, knowledge discovery from patients’ health records may produce a model which predicts the insurgence of a disease given a patient’s demographics, conditions and clinical history; knowledge deployment may consist in the design of a focused prevention campaign for the predicted disease, based on profiles highlighted by the discovered model. Hence, people are both the data providers and the subjects of profiling. In our vision, the risks in each of the two steps in the knowledge life-cycle are:

  1. Privacy violation: during knowledge discovery, the risk is unintentional or deliberate intrusion into the personal data of the data subjects, namely, of the (possibly unaware) people whose data are being collected, analyzed and mined;
  2. Discrimination: during knowledge deployment, the risk is the unfair use of the discovered knowledge in making discriminatory decisions about the (possibly unaware) people who are classified, or profiled.

Continuing the example, individual patient records are needed to build a prediction model for the disease, but everyone’s right to privacy means that his/her health conditions shall not be revealed to anybody without his/her specific control and consent. Moreover, once the disease prediction model has been created, it might also be used to profile the applicant of a health insurance or a mortgage, possibly without any transparency and control. It is also clear, from the example, how the two issues of profiling and privacy are strongly intertwined: the knowledge of a health risk profile may lead both to discrimination and to privacy violation, for the very simple fact that it may tell something intimate about a person, who might be even unaware of it.

Privacy intrusion and discrimination prevent the acceptance of human knowledge discovery: if not adequately countered, they can undermine the idea of a fair and democratic knowledge society. The key observation is that they have to be countered together: focusing on one, but ignoring the other, does not suffice. Guaranteeing data privacy while discovering discriminatory profiles for social sorting is not so reassuring: it is just a polite manner to do something very nasty. So is mining knowledge for public health and social utility, if, as a side effect, the personal sensitive information that feeds the discovery process is disclosed or used for purposes other than those for which it has been collected, putting people in danger. On the contrary, protecting data privacy and fighting discrimination help each other: methods for data privacy are needed to make the very sensitive personal information available for the discovery of discrimination. If there is a chance to create a trustworthy technology for knowledge discovery and deployment, it is with a holistic approach, not attempted so far, which faces privacy and discrimination as two sides of the same coin, leveraging on inter-disciplinarity across IT and law. The result of this collaboration should enhance trust and social acceptance, not on the basis of individual ignorance of the risks of sharing one’s data, but on a reliable form of risk measurement. By building tools that provide feedback and calculated transparency about the risk of being identified and/or discriminated, the idea of consent and opt-in may become meaningful once again.

In summary, a research challenge for the information society is the definition of a theoretical, methodological and operational framework for fair knowledge discovery in support of the knowledge society, where fairness refers to privacy-preserving knowledge discovery and discrimination-aware knowledge deployment. The framework should stem from its legal and IT foundations, articulating data science, analytics, knowledge representation, ontologies, disclosure control, law and jurisprudence of data privacy and discrimination, and quantitative theories thereof. We need novel, disruptive technologies for the construction of human knowledge discovery systems that, by design, offer native techno-juridical safeguards of data protection and against discrimination. We also need a new generation of tools to support legal protection and the fight against privacy violation and discrimination, powered by data mining, data analytics, data security, and advanced data management techniques.

The general objective should be the reformulation of the foundations of data mining in such a way that privacy protection and discrimination prevention are embedded the foundations themselves, dealing with every moment in the data-knowledge life-cycle: from (off-line and on-line) data capture, to data mining and analytics, up to the deployment of the extracted models. We know that technologies are neither good nor bad in principle, but they never come out as neutral. Privacy protection and discrimination prevention have to be included in the basic theory supporting the construction of technologies. Finally, the notions of privacy, anonymity and discrimination are the object of laws and regulations and they are in continuous development. This implies that the technologies for data mining and its deployment must be flexible enough to embody rules and definitions that may change over time and adapt in different contexts.
The debate around data mining, fairness, privacy, and the knowledge society is going to become central not only in scientific research, but also in the policy agenda at national and supra-national level. We conclude our discussion by proposing a list of the top ten research questions, as a contribution to set the research roadmap around fair knowledge discovery and data mining.

  1. How to define fairness: how to actually measure if privacy is violated, identity is disclosed, discrimination took place?
  2. How to set data free: how to make human activity data available for knowledge discovery, in specific contexts, while ensuring that freed data have a reliable measure of fairness?
  3. How to set knowledge free, while ensuring that mined knowledge has been constructed without an unfair bias? How to guarantee that a model does not discriminate and does not compromise privacy?
  4. How to adapt fairness in different contexts, which raise different legitimate expectations with regard to privacy and non-discrimination? To what extent should discrimination on the basis of a higher risk be defined as discriminatory? Is this a legal issue or an ethical issue?
  5. How to make the data mining process parametric w.r.t a set of constraints specifying the privacy and anti-discrimination rules that should be embedded in freed data and mined knowledge? How to take into account in the process the analytical questions that will emerge after having mined the data?
  6. How to prove that a given software is fair, if it is claimed to be fair? How can the relevant authorities check whether e.g. a privacy policy or a non-discrimination policy is complied with?
  7. What incentive does the industry have to opt for fair data mining technologies? What incentive do individual consumers have to opt for service providers that employ such technologies?
  8. How to provide transparency about services that employ profiling and are potentially discriminatory? Which effective remedies do citizens have if they suspect unfair discrimination on the basis of data mining technologies (cf. art. 12 Directive 95/46/EC and the Anti-Discrimination Directives 2000/43/EC 2000/78/EC, 2004/113/EC, 2006/54/EC)?
  9. Which incentives could be provided by the European legislator to employ fair data mining technologies (both on the side of the industry and on the side of individual consumers)? E.g., compulsory certification, new legal obligations, technical auditability schemes, new individual rights, new data protection competences?
  10. How to specify and implement fairness on-line, so as to guarantee privacy and discrimination freedom at the very moment of data acquisition or capture?

Predicting Data that People Refuse to Disclose

Data mining technologies are very good at predicting missing data. Datasets that are partially incomplete or incorrect can be completed or corrected by predicting characteristics. Completing or correcting data implies ascribing new characteristics to people. However, sometimes data are missing exactly because people refuse to disclose particular data, especially when these data are sensitive personal data. In general, people have a right to refuse disclosure of their personal data. This perspective on privacy, focusing on individual autonomy, is usually referred to as informational self-determination. When data mining technologies can easily deduce or predict missing data within slight margins of error, this undermines the right to informational self-determination. In fact, using predictions to fill in blanks may decrease the accuracy of the data, as predicted data may be less accurate than data provided by data subjects. As a result, paradoxically, people may be inclined to provide the personal data themselves.

Introduction

bart_article1_picture1aData mining technologies are useful tools for profiling, i.e., ascribing characteristics to individuals or groups of people. Furthermore, most data mining technologies are very good at dealing with datasets that are incomplete or incorrect. Missing data generally do not constitute a problem when searching for patterns, as long as the total amount of missing data is not too large compared to the amount of data available. Hence, with the help of data mining predictions, the blanks (missing data) can be completed in datasets. Similarly, by predicting characteristics that are available, statements can be made about the accuracy of these data. Such predictions may show that the available characteristics are probably incorrect and can subsequently be corrected.

bart_article1_picture2However, sometimes data is missing because people did not want to provide such data. Data subjects, i.e., the people the data in databases relate to, may have good reasons not to provide particular data. For instance, people may consider such information not to be someone else’s business, they may consider disclosure as not improving their reputation, or they may fear disadvantageous judgments of others about themselves. Some information may not be considered appropriate for disclosure to anyone, but more often information may not be considered appropriate for disclosure to particular people or institutions. For instance, people may want to share medical information with their physician and their hospital, but not with their car insurance company, employer or supermarket. People may want to discuss their sexual preferences with friends, but not with their parents. Such a partitioning of social spheres is referred to as audience segregation. In short, people may prefer that others who collect, process, analyse data have some blanks on them in their databases.

Informational Self-Determination

In fact, to some extent, people even have a right to refuse disclosure of their personal information. Everyone has a right to privacy, according to Article 12 of the Universal Declaration of Human Rights. What this right to privacy exactly means, is not entirely clear. When it comes to informational privacy (contrary to, for instance, spatial privacy) a commonly used definition bart_article1_picture3(particularly in the United States) is that of Westin, who refers to privacy in terms of control over information. Privacy is a person’s right to determine for himself when, how, and to what extent information about him is communicated to others. This definition is sometimes referred to as informational self-determination and has a strong focus on the autonomy of the individual.

The traditional ways of collecting personal data are either directly, i.e., by asking data subjects for the data, or indirectly, for instance, by buying datasets or coupling databases. Predicting missing data is also a way of indirect data collection. This is rather new, as data mining technologies allow this way of indirect data collection on a large scale.

European Data Protection Legislation

Current European legislation protects collecting and processing personal data, but not the collecting and processing of anonymous data. For this reason, data controllers may prefer to process anonymous data, which allows profiling on an aggregate (group) level. Despite false negatives and false positives, such profiles may be sufficiently accurate for decision-bart_article1_picture4making. The characteristics may be valid for the group members even though they may not be valid for the individuals group members as such. Predicting that people driving white cars cause less traffic accidents on average or predicting that people who refrain from eating peanut butter live longer on average may be (hypothetical) data mining results based on anonymous databases.

Ascribing an anonymous profile to a data subject (if John drives a white car, then he is likely to be a careful driver, or if Sue regularly eats peanut butter, then she is likely to live long), implies ascribing personal data to individuals. This process creates new personal data. These personal data, contrary to those data that a data subject voluntarily provided to a data controller, are much more difficult to get to know for a data subject (both the existence of such data and their contents). In fact, characteristics may be attributed to people that they did not know about themselves (such as life expectancies or credit default risks). People may be grouped with other individuals unknown to them.

This process may seem harmless, but may be considered less harmless to the individuals involved when information is combined and used to predict or deduce, with slight margins of error, particular sensitive data. Furthermore, predicting or deducing missing values and subsequently ascribing them to individuals provides friction with informed consent from those individuals. In Europe, in many cases (though not always), data subjects have a right to consent to the use of their data. When people do not know the ways in which their personal data are processed, which characteristics are ascribed to them, and what are the consequences of this, it is very difficult for them to object.

Redlining

In the past, some financial institutions drew red lines on maps around entire neighbourhoods they deemed off-limits for loans. This practice, known as “redlining”, is now strictly illegal.

Discrimination on the basis of race, sex, marital status, etc. is illegal, but because of group profiling, these characteristics may be linked to trivial characteristics, such as underwriting based on geographic location or credit history.

bart_article1_picture5

Discrimination Issues

Under discrimination laws, several characteristics are considered unacceptable for decision-making. For instance, ethnic background or gender should not be used to select job applicants. However, everyone knows that a trivial attribute like a name can often predict the ethnicity or gender of a person. The same may be true for attributes like profession (there are still very few female airline pilots or males working as an obstetrician) or zip code (some neighbourhoods are predominantly ‘black’ whereas others are predominantly ‘white’).

The use of data mining may further increase the possibilities of predicting sensitive characteristics. From a legal perspective, no employer looking for a new employee is allowed to ask for these characteristics and no job applicant has to provide them, but it is obvious that anti-discrimination legislation is extremely difficult to enforce nevertheless. The point here is that hiding particular characteristics is not sufficient. In fact, research has shown that leaving out sensitive data like ethnic background and gender out of a database may still yield discriminatory data mining results.

Solutions in Code

Are we running out of solutions? Using anonymous data is not really an option, as data may sooner or later be ascribed to individuals again. In fact, when identifying characteristics, such as name, address, social security number, are missing, data mining technologies and database coupling may also be used to predict the missing characteristics. Deleting sensitive data from databases does not work either, as these sensitive characteristics may also be predicted. Prohibiting data mining at all (a radical measure) is not realistic with the enormous amounts of data we are facing in our information society, as it would imply less insight in and overview of the data available.

bart_article1_picture6There are some (rather advanced) solutions, however. These solutions require combining technological measures and legal measures. From a legal perspective it may be recommendable not to focus on (a priori) access limiting measures regarding input data, but rather focus on (a posteriori) accountability and transparency. Instead of limiting access to data, which is increasingly hard to enforce in a world of automated and interlinked databases and information networks, rather the question how data can and may be used is stressed

But this requires also technological measures. For instance, the architecture of data mining technologies can be adjusted (‘solutions in code’) to create a value-sensitive design, that incorporates legal, ethical and social aspects in the early stages of development of these technologies. This is exactly what privacy preserving data mining techniques aim at. These may aim at protecting identity disclosure or attribute disclosure, but also at prevention or protection of the inferred data mining results. Similarly, discrimination-free data mining techniques have been developed, by integrating legal and ethical in data mining algorithms.

Conclusion

Even though people may not want to disclose particular personal data to others, it may be possible to predict these data. Predictions can be based on other data available on these individuals and data available on the groups to whom they belong. Data mining technologies may be very useful to complete missing parts of the data in large datasets. However, when people explicitly refuse to disclose these data, predicting the missing characteristics may challenge their right to informational self-determination. Hence, their privacy and autonomy may be infringed.

The predictions of missing values usually contain margins of error. When the data mining results are used for decision-making, the decisions may contain errors as well. Since the predicted data may be less accurate than the data provided by individuals, people may be inclined to provide the (more correct) data themselves, to ensure (more) just decisions. This is a privacy paradox.

bart_article1_picture7

Bibliography

  1. Frawley, W.J., Piatetsky-Shapiro, G. and Matheus, C.J. (1993) Knowledge Discovery in Databases; an overview, In: Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley (eds.) Menlo Park, California: AAAI Press / The MIT Press.
  2. Adriaans, P. and Zantinge, D. (1996) Data mining, Harlow, England: Addison Wesley Longman.
  3. Fayyad, U-M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (1996) Advances in knowledge discovery and data mining. Menlo Park, California: AAAI Press / The MIT Press.
  4. Even though a digital person is only a limited representation of a real person. Solove, D. (2004) The Digital Person; Technology and Privacy in the Information Age, New York: University Press.
  5. Van den Berg, B. and Leenes, R. (2010) Audience Segregation in Social Network Sites, In: Proceedings for SocialCom2010/PASSAT2010 (Second IEEE International Conference on Social Computing/Second IEEE International Conference on Privacy, Security, Risk and Trust). Minneapolis (Minnesota, USA): IEEE: 1111-1117.
  6. Westin, A. (1967) Privacy and Freedom. London: Bodley Head.
  7. Other common definitions of the right to privacy are the right to be let alone, see Warren and Brandeis (1890) and the right to respect for one’s private and family life (Article 8 of the European Convention on Human Rights and Fundamental Freedoms).
  8. Zarsky, T. Z. (2003) “Mine Your Own Business!”: Making the Case for the Implications of the Data Mining of Personal Information in the Forum of Public Opinion. Yale Journal of Law & Technology, 5, pp. 56.
  9. Vedder, A. H. (1999) KDD: The Challenge to Individualism, In: Ethics and Information Technology, Nr. 1, p. 275-281.
  10. For more on risks of profiling, see Schermer, B.W. (2011) The Limits of privacy in automated profiling and data mining. Computer Law & Security Review, Volume 27, Issue 7, p. 45-52.
  11. Verwer and Calders. (2010) Three Naive Bayes Approaches for Discrimination-Free Classification, in: Data Mining: special issue with selected papers from ECML-PKDD 2010; Springer
  12. Pedreschi, D., Ruggieri, S., and Turini F. (2008) Discrimination-aware Data Mining. 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD 2008): 560-568. ACM, August 2008.
  13. Ohm, P. (2010) Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, UCLA Law Review, Vol. 57, p. 1701-1765.
  14. Custers B.H.M. (2010) Data Mining with Discrimination Sensitive and Privacy Sensitive Attributes. Proceedings of ISP 2010, International Conference on Information Security and Privacy, 12-14 July 2010, Orlando, Florida.
  15. Weitzner, D.J., Abelson, H. et al. (2006) Transparent Accountable Data Mining: New Strategies for Privacy Protection. MIT Technical Report, Cambridge: MIT.
  16. For more information, see also: http://wwwis.win.tue.nl/~tcalders/dadm/doku.php
  17. Lessig, L. (2006) Code Version 2.0, New York: Basic Books.
  18. Friedman, B., Kahn, P.H., Jr., and Borning, A. (2006). Value Sensitive Design and information systems. In: P. Zhang and D. Galletta (eds.) Human-Computer Interaction in Management Information Systems: Foundations, Armonk, New York; London, England: M.E. Sharpe, pp. 348–372.
  19. Lindell, Y., and Pinkas, B. (2002) Privacy preserving data mining, Journal of Cryptology, no. 15, p. 177-206.
  20. Calders, T., and Verwer, S. (2010) Three Naive Bayes Approaches for Discrimination-Free Classification. Special issue of ECML/PKDD.
  21. Custers, B.H.M. (2004) The Power of Knowledge Tilburg: Wolf Legal Publishers, p. 157.

Privacy Through Awareness – Introducing Real-Time Feedback

Introduction

Now that more than 20 years have past since Mark Weiser identified this as the century of Ubiquitous Computing (ubicomp) [4] it seems we are no closer to giving users a handle on what is happening with their data. As Weiser noted,

The problem, while often couched in terms of privacy, is really one of control. If the computational system is invisible as well as extensive, it becomes hard to know what is controlling what, what is connected to what, where information is flowing, how it is being used, what is broken (as compared to what is working correctly, but not helpfully), and what are the consequences of any given action (including simply walking into a room).

What is especially interesting in Weiser’s quote is the notion of future consequences. Ubicomp technologies offer the potential of capturing and storing large datasets of users’ behaviour, preferences, activities or movements. They allow us to look into the past, to check what we liked, where we have been or what we did. Company owners can track their assets using GPS devices, parents can track their children via mobile phones, big supermarkets can monitor our purchases using loyalty cards and banks can find our location when we are using cash machines. CCTV, Google Maps Street View and Webcam technologies allow us to access remote locations simply by observing those locations or objects and people inside. Technological advances in storage, aggregation, and extraction of information both online and offline raise several privacy concerns that have an impact on the acceptance of the new technologies [5].

While, one reason for this problem is technological invention, we cannot blame the technology for all privacy problems in Ubiquitous Computing. Nguyen and Mynatt note that a ubicomp system is not limited to devices of different sizes connected through the wireless network. It encompasses three environments, in which people live, work and interact with each other: technical, physical and social [9]. A ubicomp system is an ecology of devices (technology layer) situated in the physical space (physical layer), in which people are connected (social layer).

Ubicomp technologies are just the beginning of this new information society, in which humans and the technology co-exist. They change our culture and the ways we interact with information and other people. They open new ways for communication, in which sharing personal information becomes a part of everyday communications.

The privacy risk we see here is sharing without realizing the consequences – i.e. lack of awareness. Here we describe one approach for mitigating this risk. We report on our investigation into the efficacy of real-time feedback – privacy-protection technology that help people be aware of the future implications of their privacy-related choices.

Introducing Real-Time Feedback

Although significant attempts have been made to support privacy awareness [6, 10], usable awareness interfaces design remain a big challenge. Therefore in our work we focused on user interaction design, and users’ reaction to the technology.

We borrowed from Erickson and Kellogg’s concept of social translucence [8] in supporting awareness through a shared understanding that enforces accountability by making things visible to one another. While their approach was mainly addressed to the computer supported collaborative work domain, we see a strong benefit for incorporating social translucence into privacy-aware ubicomp systems for the following reasons:

  • First, since ubicomp systems encompass technical, physical and social environments, social translucence offers a more natural approach towards communication, in which the information flow between the three environments can be more effective.
  • Second, despite support for visibility and awareness, a third characteristic of social translucence – accountability, offers great promise for stimulating privacy-respecting behaviour and enforcing social norms in digital systems. It is a useful design feature as it might minimize the practical burden of privacy management as highlighted by Hong [10].

In our work, awareness is achieved by real-time feedback as the method of informing users about how their information is being used. We define feedback to be the notification of information disclosure, where the notification specifies what information about the person is disclosed when and to whom. This definition is drawn from the work of Bellotti and Sellen [7].

Usage scenario: Buddy Tracker

We decided to evaluate our approach in the domain of mobile location-sharing applications. For this reason we developed Buddy Tracker, social networking app allowing peers share their location in real time. We see mobile technology as a challenging design domain for awareness systems and interfaces, mainly due to its context-awareness potential and interaction design opportunities: context-awareness, richness of input and output methods and robust hardware offer a great promise for designing novel interactions for feedback on mobile devices.

Real-Time Feedback: How it works?

Every time a user of Buddy Tracker checks another user’s location the system automatically sends a notification to the data owner, which informs him about every check made on his location. Because both the requester and data owner are aware of this notification process, the system supports awareness. Buddy Tracker supports several sensory dimensions for representing feedback, which can be flashing screen, led, auditory message in natural language, vibration or graphical element on the screen. Example visual notifications are presented.

arosha figure 1. selected visual representations for real-time feedback

To provide a richer experience and minimize the negative effect of interruptions caused by inappropriate feedback representation, we implemented a feedback adaptation mechanism. Our system is capable of sensing user’s context and adapting its behaviour to the user. For example: vibration is used when the phone is detected in the pocket or a flashing led light is used when user is watching a video.

The basic scenario presenting our approach is presented in the Figure 2. A user of the client application (U1) sends a request to view the location of a fellow user (U2) to the Buddy Tracker server. The server generates a response containing U2’s location information and sends it to U1. Additionally, the server generates a feedback response, which is sent to U2, informing them that U1 viewed their location. Both the data requester (U1) and data owner (U2) are users of Buddy Tracker client application.

arosha figure 2. real-time feedback - simple usage scenario

Examining Real-Time Feedback: Does it work?

We conducted several studies (focus groups, interviews, field trials ) to:

  1. explore users’ reactions to the concept of real-time feedback as a mean for supporting awareness and enforcing social norms in privacy-sensitive systems; and
  2. examine the impact of socially translucent system on social norms enforcement. We were also interested in assessing whether associating contextual factors with users’ preferences can minimize the intrusiveness while maximizing the effectiveness of real-time notifications.

A total of 27 participants used our technology for 3 weeks. They were split into smaller groups and asked to use the Buddy Tracker application in their daily routine. People could share their location, set privacy preferences and check who viewed them. Depending on the user’s context, Real-Time feedback was delivered in the form of vibration, sound, flashing light, textual or graphical interface presented to the tracked person, immediately after they had been looked up.

We designed and built a privacy-awareness system capable of adapting to the user’s context, which improved the user experience and had a positive impact on the acceptance of this technology. Moreover, we observed that the introduction of real time feedback had a definite effect on the participants’ use of the system; it did not stop them but it did limit usage to the situations where they felt they had an obligation from the data owner to check his location. In other words, real-time feedback helped us introduce social norms into the digital system usage practice.

Our studies indicate that one’s privacy can be protected with little to no effort by making things visible one to another. We showed that visibility, which has been represented in the form of real-time notifications, resulted in better awareness of the extent to which the system works. It shows that a socially translucent architecture successfully enforces accountability and limits the number of unmotivated and unreasonable location requests, which in consequence helps preserve one’s privacy.

 

Concluding Remarks

There is no strong consensus in the HCI community as to how privacy-awareness interfaces should be built. The ideas presented in this work provide a new starting point for the privacy-aware systems designers and goes some way towards addressing the problem of awareness interfaces, which has been recognized as one of the key challenges for the future work on privacy in HCI [5].

We showed that incorporating feedback in digital systems could be used to enforce social norms. We hope this work contributes to a discussion about novel ways of achieving privacy, including those that nudge people towards privacy-respecting behaviour. More studies are needed to explore what other design features can help people make better privacy choices.

More details about this work can be found in [1, 2, 3].

 

Bibliography

  1. Mancini, C, Rogers Y., Thomas K., Joinson N. A., Price B. A., Bandara A. K., Jędrzejczyk Ł., Nuseibeh, B., “In the Best Families: Tracking and Relationships”. Proceedings of the 29th International Conference on Human Factors in Computing Systems. ACM CHI. 2011.
  2. Jędrzejczyk, Ł, Price B. A, Bandara A. K., Nuseibeh B. A., “On the Impact of Real-time Feedback on Users’ Behaviour in Mobile Location-sharing Applications.” In Proceedings of SOUPS ’10. Redmond, WA, USA. 2010.
  3. Ł Jędrzejczyk, C Mancini, D Corapi, B A Price, A K Bandara, Nuseibeh B. A., “Learning from Context: A Field Study of Privacy Awareness System for Mobile Devices”. Technical Report 2011/07. 2011.
  4. Weiser, M, “The Computer for the 21 Century” Scientific American 256 (3): 94–104. 1991.
  5. Iachello, G, Hong J., “End-User Privacy in Human-Computer Interaction”. Now Publishers Inc. 2007.
  6. Langheinrich, M,. “Personal Privacy in Ubiquitous Computing. Tools and System Support.” PhD Thesis, Zürich, Switzerland: Swiss Federal Institute of Technology (ETH Zürich). 2005.
  7. Bellotti, V, Sellen A., “Design for Privacy in Ubiquitous Computing Environments.” In Proceedings of ECSCW ’93, 77–92. Milan, Italy: Kluwer Academic Publishers. 1993.
  8. Erickson, T, Kellogg W., “Social Translucence: An Approach to Designing Systems That Support Social Processes.” ACM Transactions on Computer-Human Interaction (TOCHI) 7 (1): 59–83. 2000.
  9. Nguyen, D, Mynatt E., “Privacy Mirrors: Making Ubicomp Visible.” In Human Factors in Computing Systems: CHI 2001 (Workshop on Building the User Experience in Ubiquitous Computing). 2001.
  10. Hong, J I., “An Architecture for Privacy-Sensitive Ubiquitous Computing”. Unpublished PhD Thesis, University of California at Berkeley, Computer Science Division. 2005.

Distinguishing between public and private places

Public places

In terms of privacy, streets, parks, stores, and museums are public places. In contrast, residences are private places.

What determines whether a given location is public is whether it is open to the public.

Fees don’t make a public place private; a road is public, even if it has tolls. And a museum is public, even if the entrance charge is high.

Ownership is not relevant to determining whether a given location is public or private. A privately-owned museum is a public place. A store is public place, even if the land it sits on is privately owned. On the other side, a residence is private, even if it is owned by the government.

Public places and privacy

People have fewer privacy rights in public places. The specifics vary by country and there are endless questions that arise. But, to take an example, Google’s Street View is able to show all that it does because streets are public. And I doubt that Google would get as far with Bedroom View or Bathroom View.

So, distinguishing between public and private spaces might help us effectuate appropriate privacy policies and practices.

Detailed maps

Happily, there exist maps with detail on property uses down to the parcel level. In essence, these maps are souped-up versions of the map in your car navigation device which can help you find gas stations, tourist destinations, hospitals, etc.

Some governmental units make this map data available for free. And commercial map providers enhance and standardize these offerings. So, the data is often available.

Complication – Privacy in Public Places

Its surprisingly easy to identify someone solely from a track of their movements. The location that someone visits most frequently is usually their home. And, using the location information and publically available data (e.g. phone directories), it is often possible to identify the individual (Krum, 2007).

And, even if we removed from the track all locations within the person’s home, the track would still show the person’s travels to and from their home. So, even a track showing only movements in the public space would compromise anonymity.

Mobile Carriers’ Data

In fact, it is the relative ease of identifying someone from a track of their movements that blocks cell phone companies from using their detailed and voluminous tracking data. Carriers are required to be able to locate any subscriber who calls for emergency services. And they have to keep records of a subscriber’s location each time they make or receive a call.

But carriers don’t generally resell this data because it is ‘personally identifiable,’ even if it does not include the subscribers’ name or address (Zhang, et al. 2011).

The Aggregation Solution

The simplest and most common solution to the ability to identify who made a given individual track is to get rid of individual tracks by aggregating (or combining) many tracks; if many tracks have been combined, then it is no longer possible to discern the movements of a single individual and to identify that individual. Aggregation provides effective privacy protection.

However, aggregation involves large data loss. When you combine lots of detailed tracks, you lose a lot of specificity about where a given trip started, where the people making a given trip started their day, etc.

To some degree, the data loss is intentional; aggregation is trying to lose the data which makes it possible to identify individuals. But the concern is that the data loss is greater than needed and that aggregate data are difficult to analyze and that the data loss limits the insight that can be obtained.

Anonymizing tracking data

With two tweaks, it is possible to use the distinction between public and private places to anonymize individual location tracks. If the method succeeds, then individual tracks can be used as data, without aggregation.

First, while minor streets may be public places, they typically don’t have enough traffic to provide sufficient anonymization. So, its necessary to blur locations on minor streets, as well as locations in the home itself.

The individual’s locations within this residential neighborhood are blurred to the point in the center. The blurred appearance the pin there indicates that this pin represents many GPS readings.

 

jeremytrack13home1point

The track segment in Figure 1 shows the person’s precise location on the busy, major road. But, all of the locations within the residential neighborhood are represented by the single point in the center.

In the same vein, some public places (e.g. restaurants) may not provide sufficient anonymization. And the decision may be made to treat these locations as private.

Second, the longer one is followed the more distinctive one’s track becomes. Tracking someone for 1 day seems reduce the de-anonymization risk to an acceptable level (Zhang, et al. 2011). Happily, the regularity in people’s travel patterns limits the data lost by shortening the tracking period (Gonzalez, 2008).

Uses for the method

Based on where the individual begins or ends his day, it is possible to probabilistically assign each track a demographic profile. For example, one could assume that there is a 70% probability that the owner of a track that began and ended its day in an upper middle class area where 70% of the people are college educated has completed college.

Individual tracks might be useful for retail site selection, transportation planning, and profiling attendees at a mall or sporting event.

Identifying public spaces can also help provide services to the user. For example, a person is typically more comfortable sharing her location with her friends when she is in a public place (Toch, et al. 2010). So, using detailed maps to distinguish public and private places might be useful in location sharing apps.

Also, it may emerge that people are more willing to receive advertisements or offers when they are in public.

Limitation – data accuracy

The method outlined here is called ‘location anonymization.’

In addition to the detailed maps, location anonymization requires tracking data precise enough to make the maps useful. If the tracking data is accurate to within 100M, then the random error will swamp distinctions relying on parcels that average 25M.

So, it seems unlikely that the method would work with cell-tower data. But GPS-derived data would seem to have sufficient accuracy.

Opt-in Data Collection

Of course, one must obtain the tracks before anonymizing them. And, even if more than 50% of people are carrying GPS-enabled smart phones, consent should be obtained before using these devices for data collection.

Potentially, the movements of those using a location sharing app could provide the data input. Alternatively, the data input could come from location data gathered from people’s use of a search engine or mapping/traffic application. In those situations, opt-in consent could be obtained as part of the registration process, and subjects would be receiving a benefit in exchange for their data.

Alternatively, a random sample of individuals could be recruited and offered an incentive to allow themselves to be tracked for 24 hours, with the understanding that the resulting track would be anonymized. Presumably, this method would involve a larger up-front cost, but the opt-in consent procedure would be less susceptible to criticism, and randomly selecting the individuals would likely result in a more representative sample and better data.

Prospects

Location anonymization is patented in the US (Wood, 2012/I) and a small study seems to show that it works with GPS-derived tracks (Wood, 2012/II). But it has not been implemented on a large scale, the anonymization risks have not been quantified, and it remains to be seen how the method competes against big data approaches relying on aggregation.

References

  1. Gonzalez, M. C., Hidalgo, C. A., and Barabasi, A.-L. Understanding individual human mobility patterns. 453 Nature, (2008), 779-782.

  2. Krumm, J. Inference attacks on location tracks. In PERVASIVE’07: Proceedings of the 5th international conference on Pervasive computing. Springer-Verlag, (2007), 127-143.

  3. Toch, E., Cranshaw, J., Drielsma, P. H., Tsai, J. Y., Kelley, P. G., Springfield, J., Cranor, L., Hong, J., and Sadeh, N. Empirical Models of Privacy in Location Sharing. UbiComp, ACM. (2010)

  4. Wood, J. Method of Providing Location-Based Information from Portable Devices. United States Patent 8,185,131.  (2012)

  5. Wood, J. Preserving Location Privacy by Distinguishing between Public and Private Spaces. UpiComp poster. 2012.

Privacy and Trust in the Social Web

Introduction

The “social revolution” introduced by “classic” Online Social Network (OSN) websites (e.g., Facebook, MySpace, Twitter) and, later, by media sharing websites (e.g., Flickr, Youtube, and Instagram), has lead to the fact that the Web as we were used to know it, is nowadays rapidly evolving to incorporate more and more social aspects. In this Social Web vision, users and their resources are linked together via multiple and different kinds of relationships, crossing the boundaries of the specific services used and their related technologies.

In the last years, users interactions have been usually represented by social graphs, describing the online relationships between individuals, and interest graphs, describing the network of people who share interests, but that do not necessarily know each other personally (e.g., followers in Twitter, bought items in e-commerce websites, searches on the Web).

Nowadays, the strict separation among the above definitions is bound to be outdate. In fact, we can already see “interest graph aspects” in applications based on social graphs (e.g., the possibility for a user in Facebook to receive other users public updates even if they are not in his/her social graph and the user does not know them personally), and “social graph aspects” in interest-based applications (e.g., the possibility for users to restrict their searches and data sharing to particular ‘circles’ in Google+). Moreover, in both cases, not only users are connected between them but, according to the specific context of the social/interest graph, also resources can be involved in relationships with users (and other resources).

The current scenario

From the above discussion, it emerges how the traditional representation of a social network as a graph composed only by symmetric user-to-user relationships is no longer enough to express the complexity of social interactions. On the contrary, the concept of multiple types of social relationships is emerging as one of the key issues for the generation of the so-called augmented/multi-level social graphs (Atzori, Iera, & Morabito, 2011) (Breslin & Decker, 2007) (Kazienko, Musial, & Kajdanowicz, 2011). They can be, within the same social network, graphs connecting users and resources via different “actions” (e.g., a user x “likes” a user y’s resource; a user x “shares” a resource with a user y; a user y “follows” a user x based on his/her interests). They can also be graphs across different social networks, merging the different relationships that a user has on different social networks for different purposes (e.g., a user x can be “friend of” a user y on a specific social network and can be a simple “follower” of y on another one).

This trend is also witnessed by the developments of major Social Network players. For instance, see the Open Graph protocol (The Open Graph Protocol) developed by Facebook, or the OpenSocial public specification (Open Social), which is followed by Google and Myspace (together with a number of other social networks).

Privacy and the Social Web

It clearly emerges how, in such a social scenario, there is an high risk of being exposed to various privacy attacks. In fact, especially in the Social Web, not only users personal information are exposed to privacy risks, but this risk might propagate also to anyone else or any other resource which is part of the user augmented/multi-level social graph.

After an initial phase where privacy mechanisms were sparse or absent, with the majority of user profiles and resources accessible to other members, several research efforts have been carried out to mitigate these problems, with the development of some tools helping users to be more privacy-aware. Notable examples are: relationship-based access control mechanisms (Carminati & Ferrari, 2010), tools on support of privacy preference specification (Fang & Le Fevre, 2010) (Liu, Gummadi, Krishnamurthy, & Mislove, 2011), as well as more expressive and complex privacy settings recently adopted by commercial OSNs, like Facebook and Google+.

Despite the relevance of these proposals, current solutions to prevent information disclosure in OSNs and, more in general, in the Social Web, present unfortunately two main shortcomings:

they are ineffective for a large, decentralized system like the World Wide Web, where it is easy to aggregate information, and it is often possible to infer “private” information not explicitly accessible;

the complexity of privacy mechanisms de-facto forces users not to exploit their real potentialities, leaving a big amount of information still publicly available.

Trust as a means to protect Privacy in the Social Web

Trust can be defined as “the extent to which a given party is willing to depend on something or somebody in a given situation with a feeling of relative security, even though negative consequences are possible” (Josang, Ismail, & Boyd, 2007). In the Social Web field, in order to interact with others (even strangers), users are willing to risk negative consequences connected to the possible misuse of disclosed information. This is due to the possible benefit in terms of social interaction that users aspire. Measuring this extent represents, especially in virtual communities, an important factor to evaluate the degree of uncertainty connected to possible future interactions and, consequently, a means to provide users with access control mechanisms taking into account this risk.

Trust Modeling and Computation has been explored in the context of OSNs (Maheswaran, Hon, & Ghunaim, 2007) (Borzymek & Sydow, 2010) (DuBois, Golbeck, & Srinivasan, 2011) (Nepal, Sherchan, & Paris, 2011), but most of the proposed techniques are based on probabilistic approaches, and on the concept of trust transitivity among users (Jøsang, 2006) (Liu, Wang, & Orgun, 2011). There is today a great debate on how useful such approaches are. For instance, according to social psychology, trust decays slowly in a certain very small number of early hops from a source participant, and then it decays fast until the trust value approaches the minimum. In addition to this, there is still the lack of a conceptual model on top of which privacy tools have to be designed as well as any conscious user decision with respect to information sharing or friendship dynamics, even in recent approaches trying to address these issues (Falcone & Castelfranchi, 2010) (Adali, Wallace, Qian, & Vijayakumar, 2011) (Tang, Gao, & Liu, 2012).

Characteristics of a Trust Model for the Social Web

We believe that a suitable model to compute trust in the Social Web, for access control, privacy preservation and, more general, privacy-aware decision making purposes, should have the following main characteristics

multi-dimensional: trust computation should be based not only on different social network topological aspects, but also on a variety of other dimensions, such as for instance users’ actions and characteristics.

based on controlled transitivity: the majority of proposals appeared so far evaluate trust among not directly connected users regardless of the distance (i.e., depth of the paths). On the contrary, it is necessary to make clear how and to what extent trust is transitive along a social trust path, by also using multi-dimensional social pattern discovery to drive the definition of innovative methods for transitive trust computation.

time-dependent: in judging how much an action (or opinion) has to impact a trust relationship, we should consider the frequency with which a user has performed this kind of action (or received this opinion). For instance, the consequence in trust relationships value should be more important if the system verifies that a tagging action that causes a privacy leakage is repeated over the time, rather than just being an isolated event, as this highlights that user consciously misbehaves.

privacy-aware: trust computation very usually requires the availability of personal information and/or to log some of the user actions. Such activities should be done in a privacy-preserving way.

A Concrete Proposal

Our idea is to build a Multi-dimensional and Event-based Trust Layer on top of any social environment via an augmented social graph able to aggregate all information gathered from the Social Web concerning users and their resources (e.g., actions, opinions, user profile attributes), in order to evaluate users’ trust relationships.

To keep trace of the augmented graph evolution and to evaluate trust accordingly, we believe that a workable solution is to exploit Complex Event Processing (CEP) systems (Luckham, 2002), which are able detect interesting events or event patterns in data streams and react to them in presence of critical situations.

The idea is, therefore, to:

  • gather from the augmented social graph all the events that change the social interactions on the graph (i.e., edges creation/deletion/modification),
  • encode them into streams,
  • evaluate over them a set of meaningful event patterns,
  • specify a set of customizable trust rules, that associate with involved users a given trust value when some meaningful event patterns occur.

Let us consider, for example, Facebook as a target scenario. A domain expert can define a trust rule stating that a user x become “untrusted” with respect to y after that a certain number of “de-tagging” actions have been executed by y on images tagged by x.

vivianiarchitecture

According to our proposal, trust rules are therefore monitored in the Trust Layer by a Complex Event Processing engine (see Figure 1, component (a)), to immediately detect changes on the augmented social graph that implies a new trust value for involved users. Note that, a real-time estimation of users trust values might be fundamental in some scenarios where trust is a key parameter in the decision process. However, if we consider the huge amount of possible changes in a social environment, the CEP-based architecture might imply an high overheard due to the continuous event monitoring and evaluation of trust rules. As such, as an alternative, an event log based architecture can be exploited (see Figure 1, component (b)), over which trust rules are periodically evaluated.

Conclusions

Preventing disclosure of users personal information and consequently privacy attacks is fundamental in a highly socially interactive environment like the one constituted by the Social Web. We are convinced that, without asking to users the knowledge of complex privacy settings tools, we can use trust to automatically tune them, dynamically analyzing multi-level users interactions. To do this, we discuss an architecture to build a multi-level and event-based trust layer on top of the Social Web. To make this proposal effective, a variety of research issues should be addressed, such as for instance the efficiency and privacy guarantees connected to trust computation, or the method to identify interesting trust patterns and corresponding trust rules. Some preliminary results can be found in (Carminati, Ferrari, & Viviani, 2012).

References

Adali, S., Wallace, W. A., Qian, Y., & Vijayakumar, P. (2011). A Unified Framework for Trust in Composite Networks. Proceedings of the 13th AAMAS Workshop on Trust in Agent Societies.

Atzori, L., Iera, A., & Morabito, G. (2011). SIoT: Giving a Social Structure to the Internet of Things. IEEE Communications Letters , 15 (11), 1193–1195.

Borzymek, P., & Sydow, M. (2010). Trust and distrust prediction in social network with combined graphical and review-based attributes. KES-AMSTA’10 Proceedings of the 4th KES international conference on Agent and multi-agent systems: technologies and applications, Part I, (p. 122-131).

Breslin, J., & Decker, S. (2007). The Future of Social Networks on the Internet: The Need for Semantics. IEEE Internet Computing , 11, 86-90.

Carminati, B., & Ferrari, E. (2010). Privacy-aware Access Control in Social Networks: Issues and Solutions. In Privacy and Anonymity in Information Management Systems, Advanced Information and Knowledge Processing (p. 181-195). London: Springer.

Carminati, B., Ferrari, E., & Viviani, M. (2012). A Multi-dimensional and Event-based Model for Trust Computation in the Social Web. SocInfo 2012: The 4th International Conference on Social Informatics, 5–7 December 2012. Proceedings (To appear).

Cook, K. S., & Rice, E. (2006). Social Exchange Theory. In Handbook of Social Psychology (p. 53-76).

DuBois, T., Golbeck, J., & Srinivasan, A. (2011). Predicting Trust and Distrust in Social Networks. Privacy, Security, Risk and Trust (Passat), 2011 IEEE 3rd International Conference on Social Computing (SocialCom), (p. 418-424 ).

Falcone, R., & Castelfranchi, C. (2010). Trust and Transitivity: A Complex Deceptive Relationship. Proceedings of the 12th AAMAS Workshop on Trust in Agent Societies.

Fang, L., & Le Fevre, K. (2010). Privacy Wizards for Social Networking Sites. International Conference on World Wide Web (WWW 2010), (p. 351-360).

Jøsang, A. (2006). Exploring different types of trust propagation. iTrust’06 Proceedings of the 4th international conference on Trust Management.

Josang, A., Ismail, R., & Boyd, C. (2007). A survey of trust and reputation systems for online service provision. Decision Support Systems , 43 (2), 618-644.

Kazienko, P., Musial, K., & Kajdanowicz, T. (2011). Multidimensional Social Network in the Social Recommender System. IEEE Transactions on Systems, Man, and Cybernetics , 41 (4), 746–759.

Liu, G., Wang, Y., & Orgun, M. A. (2011). Trust transitivity in complex social networks. AAAI Conference on Artificial Intelligence.

Liu, Y., Gummadi, K. P., Krishnamurthy, B., & Mislove, A. (2011). Analyzing Facebook Privacy Settings: User Expectations vs. Reality. IMC ’11 Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement, (p. 61-70).

Luckham, D. (2002). The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley Longman Publishing Co.

Maheswaran, M., Hon, C. T., & Ghunaim, A. (2007). Towards a Gravity-Based Trust Model for Social Networking Systems. Distributed Computing Systems Workshops, 2007. ICDCSW ’07. 27th International Conference on, (p. 24).

Nepal, S., Sherchan, W., & Paris, C. (2011). STrust: A Trust Model for Social Networks. Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on, (p. 841-846).

Open Social. (s.d.). Tratto da http://docs.opensocial.org

Squicciarini, A. C., Heng, X., & Xiaolong, Z. (2011). CoPE: Enabling collaborative privacy management in online social networks. Journal of the American Society for Information Science and Technology , 62 (3), 521–534.

Tang, J., Gao, H., & Liu, H. (2012). mTrust: discerning multi-faceted trust in a connected world. WSDM ’12 Proceedings of the fifth ACM International Conference on Web Search and Data Mining.

Taylor, H., Yochem, A., Phillips, L., & Martinez, F. (2009). Event-Driven Architecture: How SOA Enables the Real-Time Enterprise. Addison-Wesley Professional.