How Cambridge Analytica’S Facebook Targeting Model Actually Worked – According To The Mortal Who Built It

Matthew Hindman

The researcher whose operate is at the centre of the Facebook-Cambridge Analytica information analysis too political advertising uproar has revealed that his method worked much similar the 1 Netflix uses to recommend moviesIn an electronic mail to me, Cambridge University scholar Aleksandr Kogan explained how his statistical model processed Facebook information for Cambridge Analytica. The accuracy he claims suggests it industrial plant nigh equally good equally established voter-targeting methods based on demographics similar race, historic stream too gender. If confirmed, Kogan’s draw organisation human relationship would hateful the digital modeling Cambridge Analytica used was hardly the virtual crystal ball a few make got claimed. Yet the numbers Kogan provides also show what is – too isn’t – actually possible past times combining personal data with machine learning for political ends.

Regarding 1 substitution populace concern, though, Kogan’s numbers advise that information on users’ personalities or “psychographics” was simply a pocket-size purpose of how the model targeted citizens. It was non a personality model strictly speaking, but rather 1 that boiled downward demographics, social influences, personality too everything else into a large correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to make got created a valuable travail tool, fifty-fifty if the production beingness sold wasn’t quite equally it was billed.
The hope of personality targeting

In the wake of the revelations that Trump travail consultants Cambridge Analytica used data from 50 1 M one thousand Facebook users to target digital political advertising during the 2016 U.S. presidential election, Facebook has lost billions inward stock marketplace value, governments on both sides of the Atlantic make got opened investigations, too a nascent social movement is calling on users to #DeleteFacebook.

But a substitution inquiry has remained unanswered: Was Cambridge Analytica genuinely able to effectively target travail messages to citizens based on their personality characteristics – or fifty-fifty their “inner demons,” equally a fellowship whistleblower alleged?

If anyone would know what Cambridge Analytica did alongside its massive trove of Facebook data, it would last Aleksandr Kogan too Joseph Chancellor. It was their startup Global Science Research that collected profile information from 270,000 Facebook users too tens of millions of their friends using a personality exam app called “thisisyourdigitallife.”

Part of my ain research focuses on agreement machine learning methods, too my forthcoming book discusses how digital firms usage recommendation models to construct audiences. I had a hunch nigh how Kogan too Chancellor’s model worked.

So I emailed Kogan to ask. Kogan is withal a researcher at Cambridge University; his collaborator Chancellor at 1 time industrial plant at Facebook. In a remarkable display of academic courtesy, Kogan answered.
His response requires some unpacking, too some background.
From the Netflix Prize to “psychometrics”

Back inward 2006, when it was withal a DVD-by-mail company, Netflix offered a reward of $1 million to anyone who developed a improve way to brand predictions nigh users’ pic rankings than the fellowship already had. H5N1 surprise travel past times challenger was an independent software developer using the pseudonym Simon Funk, whose basic approach was ultimately incorporated into all the travel past times teams’ entries. Funk adapted a technique called “singular value decomposition,” condensing users’ ratings of movies into a series of factors or components– essentially a laid of inferred categories, ranked past times importance. As Funk explained inward a spider web log post,

“So, for instance, a category mightiness stand upwards for activity movies, alongside movies alongside a lot of activity at the top, too tiresome movies at the bottom, too correspondingly users who similar activity movies at the top, too those who prefer tiresome movies at the bottom.”

Factors are artificial categories, which are non ever similar the variety of categories humans would come upwards up with. The most of import factor inward Funk’s early on Netflix model was defined past times users who loved films similar “Pearl Harbor” too “The Wedding Planner” piece also hating movies similar “Lost inward Translation” or “Eternal Sunshine of the Spotless Mind.” His model showed how machine learning tin detect correlations amidst groups of people, too groups of movies, that humans themselves would never spot.

Funk’s full general approach used the 50 or 100 most of import factors for both users too movies to brand a decent guess at how every user would charge per unit of measurement every movie. This method, frequently called dimensionality reduction or matrix factorization, was non new. Political scientific discipline researchers had shown that similar techniques using roll-call vote data could predict the votes of members of Congress alongside ninety per centum accuracy. In psychology the “Big Five” model had also been used to predict deportment past times clustering together personality questions that tended to last answered similarly.

Still, Funk’s model was a large advance: It allowed the technique to operate good alongside huge information sets, fifty-fifty those alongside lots of missing information – similar the Netflix dataset, where a typical user rated only few dozen films out of the thousands inward the company’s library. More than a decade later the Netflix Prize competition ended, SVD-based methods, or related models for implicit data, are withal the tool of alternative for many websites to predict what users volition read, watch, or buy.

These models tin predict other things, too.
Facebook knows if you lot are a Republican

In 2013, Cambridge University researchers Michal Kosinski, David Stillwell too Thore Graepel published an article on the predictive ability of Facebook data, using information gathered through an online personality test. Their initial analysis was nearly identical to that used on the Netflix Prize, using SVD to categorize both users too things they “liked” into the travel past times 100 factors.

The newspaper showed that a factor model made alongside users’ Facebook “likes” lone was 95 per centum accurate at distinguishing betwixt dark too white respondents, 93 per centum accurate at distinguishing men from women, too 88 per centum accurate at distinguishing people who identified equally gay men from men who identified equally straight. It could fifty-fifty correctly distinguish Republicans from Democrats 85 per centum of the time. It was also useful, though non equally accurate, for predicting users’ scores on the “Big Five” personality test.

There was public outcry in response; inside weeks Facebook had made users’ likes privateby default.

Kogan too Chancellor, also Cambridge University researchers at the time, were starting to usage Facebook information for election targeting equally purpose of a collaboration alongside Cambridge Analytica’s bring upwards draw solid SCL. Kogan invited Kosinski too Stillwell to bring together his project, but it didn’t operate out. Kosinski reportedly suspected Kogan too Chancellor mightiness make got reverse-engineered the Facebook “likes” model for Cambridge Analytica. Kogan denied this, maxim his projection “built all our models using our ain data, collected using our ain software.”
What did Kogan too Chancellor genuinely do?

As I followed the developments inward the story, it became clear Kogan too Chancellor had indeed collected enough of their ain information through the thisisyourdigitallife app. They for sure could make got built a predictive SVD model similar that featured inward Kosinski too Stillwell’s published research.

So I emailed Kogan to enquire if that was what he had done. Somewhat to my surprise, he wrote back.

“We didn’t just usage SVD,” he wrote, noting that SVD tin scrap when some users make got many to a greater extent than “likes” than others. Instead, Kogan explained, “The technique was something nosotros genuinely developed ourselves … It’s non something that is inward the populace domain.” Without going into details, Kogan described their method equally “a multi-step co-occurrence approach.”

However, his message went on to confirm that his approach was indeed similar to SVD or other matrix factorization methods, similar inward the Netflix Prize competition, too the Kosinki-Stillwell-Graepel Facebook model. Dimensionality reduction of Facebook information was the heart too soul of his model.
How accurate was it?

Kogan suggested the exact model used doesn’t affair much, though – what matters is the accuracy of its predictions. According to Kogan, the “correlation betwixt predicted too actual scores … was around [30 percent] for all the personality dimensions.” By comparison, a person’s previous Big Five scores are nigh 70 to eighty per centum accurate inward predicting their scores when they retake the test.

Kogan’s accuracy claims cannot last independently verified, of course. And anyone inward the midst of such a high-profile scandal mightiness make got incentive to understate his or her contribution. In his appearance on CNN, Kogan explained to a increasingly incredulous Anderson Cooper that, inward fact, the models had genuinely non worked really well.

Aleksandr Kogan answers questions on CNN.

In fact, the accuracy Kogan claims seems a fleck low, but plausible. Kosinski, Stillwell too Graepel reported comparable or slightly improve results, equally make got several other academic studies using digital footprints to predict personality (though some of those studies had to a greater extent than information than simply Facebook “likes”). It is surprising that Kogan too Chancellor would acquire to the problem of designing their ain proprietary model if off-the-shelf solutions would seem to last simply equally accurate.

Importantly, though, the model’s accuracy on personality scores allows comparisons of Kogan’s results alongside other research. Published models alongside equivalent accuracy inward predicting personality are all much to a greater extent than accurate at guessing demographics too political variables.

For instance, the similar Kosinski-Stillwell-Graepel SVD model was 85 per centum accurate inward guessing political party affiliation, fifty-fifty without using whatever profile information other than likes. Kogan’s model had similar or improve accuracy. Adding fifty-fifty a little total of information nigh friends or users’ demographics would probable boost this accuracy to a higher seat ninety percent. Guesses nigh gender, race, sexual orientation too other characteristics would likely last to a greater extent than than ninety per centum accurate too.

Critically, these guesses would last particularly proficient for the most active Facebook users – the people the model was primarily used to target. Users alongside less activity to analyze are probable non on Facebook much anyway.
When psychographics is generally demographics

Knowing how the model is built helps explicate Cambridge Analytica’s patently contradictory statements nigh the role – or lack thereof – that personality profiling too psychographics played inward its modeling. They’re all technically consistent alongside what Kogan describes.

A model similar Kogan’s would give estimates for every variable available on whatever grouping of users. That agency it would automatically estimate the Big Five personality scores for every voter. But these personality scores are the output of the model, non the input. All the model knows is that sure Facebook likes, too sure users, tend to last grouped together.

With this model, Cambridge Analytica could say that it was identifying people alongside depression openness to sense too high neuroticism. But the same model, alongside the exact same predictions for every user, could simply equally accurately claim to last identifying less educated older Republican men.

Kogan’s information also helps clarify the confusion nigh whether Cambridge Analytica actually deleted its trove of Facebook data, when models built from the information seem to withal last circulating, too fifty-fifty being developed further.

The whole indicate of a dimension reduction model is to mathematically stand upwards for the information inward simpler form. It’s equally if Cambridge Analytica took a really high-resolution photograph, resized it to last smaller, too hence deleted the original. The photograph withal exists – too equally long equally Cambridge Analytica’s models exist, the information effectively does too.
Buat lebih berguna, kongsi:

Trending Kini: