Nobody can have escaped the news that Facebook data was used by Cambridge Analytica to allegedly attempt to influence the outcome of US and possibly Brexit elections.
At the most basic level what Cambridge Analytica did was build a segmentation model based on collecting as much available data as possible to divide people into groups with similar interests. The exact same thing every marketing analytics company has tried to do for the last five years. I’m not surprised this happened and pretty much assumed it was anyway. So why are people so shocked? Did any of these companies really know enough that we should be worried?
Marketing segmentation is not new, in fact it is really simple. Imagine you sell bread and you can send marketing messages for brown bread or white bread. You could just choose one at a time but it is likely most of the time the messages you send will only resonate with half of your customers. It would be better to figure out who likes white bread and who likes brown and send the most appropriate message to each customer. This is marketing segmentation – taking your customers and dividing them into groups based on their preferences.
This is not new. Take the Tesco Clubcard (and almost any other store loyalty card) – for years they have been collecting data about what you buy in store and using that data to market to you. Some companies even sell that data to other companies which they can combine with data from multiple sources to get an even broader picture. You’ve always had to remember to tick a ‘do not sell my info’ option and in some cases it has been hidden in the terms and conditions of getting the service.
It has always been relatively hard to divide customers into very detailed segments as understanding the behaviour of millions of customers performing millions of transactions requires working with a vast amount of data. Previously companies worked at the basic level of looking for customers who bought product X or lived in a certain area or had spent more than Y. Recent advances in Cloud computing and machine learning mean that it is now possible to process hundreds of millions or billions of transactions and find detailed patterns that would be impossible with human analysis – this is the technological leap that makes hyper targeting possible.
The Data Industry
Media reports would have you believe that the work Cambridge Analytica were doing was so far ahead of the industry that they were using technology to do nefarious things previously unthought of. This is not true, companies all over the world are using machine learning and personalisation to encourage you to buy more.
Many online retailers, travel sites and traditional retailers have built profiles of customers based on comparing their behaviour to other customers and attempting to combine data you have provided them with data bought from elsewhere and correlated with publicly available datasets.
This doesn’t even need a PhD in Data Science, in the UK the Office for National Statistics publish something called the Index of Multiple Deprivation which is available to a level called LSOA (Lower Super Output Area) that is usually a few streets. This measures how rich or poor an area is on average – sure there is always variation but at that level of granularity it is not much. From that LSOA dataset I can map to postcode (another free dataset).
From here all I need is a customers address and I can have a pretty good idea of your income and therefore how much your are likely to spend with me, without you even purchasing from me. Once you make a purchase I can see how much you spent and what products you bought, compare you to customers who bought the same type of things from me previously and score your value to me even more accurately.. You have not explicitly given me anything but your address and perhaps made a purchase. Imagine a retailer you interact with regularly, especially online, where I can see where you live, what you look at and don’t buy, what device you browse on, when you are online and where do you have it shipped.
Having things you like on Facebook is super useful too. Ever wondered why companies want you to like them on Facebook? Once you like a page on Facebook that business is able to market to you directly as well as correlate (at an aggregate level) what people who like their page also like. This helps them to target marketing on Facebook to those groups of people who are likely to also be interested in their products. This is fundamentally Facebook’s business model. To allow businesses to find and market to those people who might also be interested in their business or agenda.
The difference between this and CA – Facebook do it behind the scenes within their business, CA did exactly the same thing. Every company you share data with on Facebook requires explicit approval by liking a page or approving an app – everyone who CA got data on approved an app or had security settings that allowed their data to be shared with apps via friends, something at the time that was known by those who were reasonably tech savvy. Many of my techier friends had adjusted their security settings to prevent this.
What became very clear when Mark Zuckerberg is that politicians do not understand the internet (the average age of a Senator is 62) and what some very smart people have been able to do to piece together thousands of pieces of data on an individual. People at large do not really understand the value that sits in their data and marketeers don’t really want them to as they would stop sharing it and make their lives much, much harder.
Am I surprised this happened? Absolutely not, I assumed this type of targeting was going on in all elections by all sides. Some were more technically advanced than others but is that not the name of the game? Obama won in 2008 because his campaign understood better than his rivals that online marketing worked. To me CA just proves that some campaigners, albeit potentially nefarious foreign agents, understood the game better than others.
Does that mean Facebook were wrong? I don’t think so, it means that the political machine has not kept up with technological advances. Don’t blame the tools, blame the workmen. When the next election cycle comes this will all happen again but next time everyone will be using hyper targeting.
Google have been ‘reading’ my emails through Gmail since the early 2000s. I accepted it because it meant I got great, free email. I could have paid for an inferior service elsewhere that kept my messages private but that wasn’t the choice I made, along with millions of others.
Worried your personal information is all over the place? In most cases we put it there. We filled in our facebook profile, wanted loyalty points from our favourite retailer, used social media owned messaging apps to talk to ours friends, used Gmail. We explicitly gave access to these companies to our data. If that worries you then you can opt out. Don’t use these services but don’t assume the corporations are there to protect you. They are there to make money and they will exploit every seemingly insignificant interaction you have with them to improve their bottom line.
At the end of the day we all have a choice on what we share but people need to stop thinking that any data they use on these services is private and protected. The cost of many of these great, free services, be they social networks or loyalty cards is that we are the product. Most of the time that helps us as we get great money saving offers but it always helps the companies and their profits first.