Does credit card transaction details data sets help identify specific person? Do large-scale data sets of human behavior have the potential to re-identify specific user? A study of credit card data has created alarms in several quarters including for privacy advocates, showing that it takes only a tiny amount of personal information to un-cover people’s cover.
A group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store. Although information like account number and names that would identify particular person where removed from transaction data sets, the uniqueness of people’s behavior made it easy to single them out claimed the scientists.
Study titled “Unique in the Shopping Mall: On the Re-identifiability of Credit Card Metadata” claimed that with just four random pieces of information, it was possible to re-identify 90 % of the shoppers as unique individuals and to uncover their records, researchers calculated and knowing the price of a transaction increases the risk of re identification by 22%, on average. Also, uniqueness of behavior combined with publicly available information, like Instagram or Twitter posts, could make it possible to re-identify people’s records by name.
“The message is that we ought to rethink and reformulate the way we think about data protection,” said Yves-Alexandre de Montjoye, a graduate student in computational privacy at the M.I.T. Media Lab who was the lead author of the study. This study has bought a huge question mark over the standard methods that many companies, hospitals and government agencies currently use to anonymize their records. When it comes to sensitive personal information, “the open sharing of raw data sets is not the future,” said Yves-Alexandre de Montjoye. To reveal a person’s identity, you just need to correlate the metadata with information about the person from an outside source. One correlation attack became famous last year when the New York City Taxi and Limousine Commission released a data set of the times, routes, and cab fares for 173 million rides. Passenger names were not included. There are so many websites devoted to celebrity spotting—bloggers, with the help of these web sites it was easy to find time-stamped photos of celebrities getting in and out of taxis leading to easy finding out which celebrities paid which fares, making a reporter at Gawker re-identify Kourtney Kardashian, Ashlee Simpson and other celebrities.
So a lesson to learn for big companies and institutions, If they are to continue to make these kinds of data sets widely available, they should quantitatively attest to the risks of re-identification. A data set’s lack of names, home addresses, phone numbers or other obvious identifiers does not make it anonymous nor safe to release to the public and to third parties which could lead to break the privacy of individual.