How Cambridge Analytica's Facebook Targeting Model Actually Worked - According To The Individual Who Built It

by Matthew Hindman

In an electronic mail to me, Cambridge University scholar Aleksandr Kogan explained how his statistical model processed Facebook information for Cambridge Analytica. The accuracy he claims suggests it industrial plant close equally good equally established voter-targeting methodsbased on demographics similar race, historic menstruum together with gender.

If confirmed, Kogan’s problem organisation human relationship would hateful the digital modeling Cambridge Analytica used was a few cause got claimed. Yet the numbers Kogan provides also show what is - together with isn’t - actually possible past times combining personal datawith machine learning for political ends.

Regarding 1 telephone commutation populace concern, though, Kogan’s numbers propose that information on users’ personalities or “psychographics" was only a pocket-size purpose of how the model targeted citizens. It was non a personality model strictly speaking, but rather 1 that boiled downwards demographics, social influences, personality together with everything else into a large correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to cause got created a valuable drive tool, fifty-fifty if the production beingness sold wasn’t quite equally it was billed.
The hope of personality targeting

In the wake of the revelations that Trump drive consultants Cambridge Analytica used data from 50 meg Facebook users to target digital political advertising during the 2016 States of America presidential election, Facebook has a few cause got claimed. Yet the numbers Kogan provides also show what is - together with isn’t - actually possible past times combining personal datawith machine learning for political ends.

Regarding 1 telephone commutation populace concern, though, Kogan’s numbers propose that information on users’ personalities or “psychographics" was only a pocket-size purpose of how the model targeted citizens. It was non a personality model strictly speaking, but rather 1 that boiled downwards demographics, social influences, personality together with everything else into a large correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to cause got created a valuable drive tool, fifty-fifty if the production beingness sold wasn’t quite equally it was billed.
The hope of personality targeting

In the wake of the revelations that Trump drive consultants Cambridge Analytica used independent software developer using the pseudonym Simon Funk, whose basic approach was ultimately incorporated into all the laissez passer on teams’ entries. Funk adapted a technique called “singular value decomposition," condensing users’ ratings of movies into a series of factors or components - essentially a prepare of inferred categories, ranked past times importance. As Funk explained inwards a weblog post,

“So, for instance, a category mightiness stand upwardly for activity movies, alongside movies alongside a lot of activity at the top, together with ho-hum movies at the bottom, together with correspondingly users who similar activity movies at the top, together with those who prefer ho-hum movies at the bottom."
Factors are artificial categories, which are non ever similar the form of categories humans would come upwardly up with. The most of import factor inwards Funk’s early on Netflix modelwas defined past times users who loved films similar “Pearl Harbor" together with “The Wedding Planner" spell also hating movies similar “Lost inwards Translation" or “Eternal Sunshine of the Spotless Mind." His model showed how machine learning tin notice correlations amidst groups of people, together with groups of movies, that humans themselves would never spot.

Funk’s full general approach used the 50 or 100 most of import factors for both users together with movies to brand a decent guess at how every user would charge per unit of measurement every movie. This method, frequently called dimensionality reduction or matrix factorization, was non new. Political scientific discipline researchers had shown that similar techniques using roll-call vote data could predict the votes of members of Congress alongside ninety per centum accuracy. In psychology the “Big Five" model had also been used to predict conduct past times clustering together personality questions that tended to last answered similarly.

Still, Funk’s model was a large advance: It allowed the technique to run good alongside huge information sets, fifty-fifty those alongside lots of missing information - similar the Netflix dataset, where a typical user rated only few dozen films out of the thousands inwards the company’s library. More than a decade afterward the Netflix Prize contender ended, SVD-based methods, or related models for implicit data, are nevertheless the tool of selection for many websites to predict what users volition read, watch, or buy.

These models tin predict other things, too.
Facebook knows if yous are a Republican

In 2013, Cambridge University researchers Michal Kosinski, David Stillwell together with Thore Graepel published an article on the predictive ability of Facebook data, using information gathered through an online personality test. Their initial analysis was nearly identical to that used on the Netflix Prize, using SVD to categorize both users together with things they “liked" into the laissez passer on 100 factors.

The newspaper showed that a factor model made alongside users’ Facebook “likes" lone was 95 per centum accurate at distinguishing betwixt dark together with white respondents, 93 per centum accurate at distinguishing men from women, together with 88 per centum accurate at distinguishing people who identified equally gay men from men who identified equally straight. It could fifty-fifty correctly distinguish Republicans from Democrats 85 per centum of the time. It was also useful, though non equally accurate, for predicting users’ scores on the “Big Five" personality test.

There was public outcry in response; inside weeks Facebook had made users’ likes private past times default.

Kogan together with Chancellor, also Cambridge University researchers at the time, were starting to job Facebook information for election targeting equally purpose of a collaboration alongside Cambridge Analytica’s nurture problem solid SCL. Kogan invited Kosinski together with Stillwell to bring together his project, but it didn’t run out. Kosinski reportedly suspected Kogan together with Chancellor mightiness cause got reverse-engineered the Facebook “likes" model for Cambridge Analytica. Kogan denied this, maxim his projection “built all our modelsusing our ain data, collected using our ain software."
What did Kogan together with Chancellor genuinely do?

As I followed the developments inwards the story, it became clear Kogan together with Chancellor had indeed collected enough of their ain information through the thisisyourdigitallife app. They for sure could cause got built a predictive SVD model similar that featured inwards Kosinski together with Stillwell’s published research.

So I emailed Kogan to inquire if that was what he had done. Somewhat to my surprise, he wrote back.

“We didn’t precisely job SVD," he wrote, noting that SVD tin scrap when some users cause got many to a greater extent than “likes" than others. Instead, Kogan explained, “The technique was something nosotros genuinely developed ourselves … It’s non something that is inwards the populace domain." Without going into details, Kogan described their method equally “a multi-step co-occurrence approach."

However, his message went on to confirm that his approach was indeed similar to SVD or other matrix factorization methods, similar inwards the Netflix Prize competition, together with the Kosinki-Stillwell-Graepel Facebook model. Dimensionality reduction of Facebook information was the centre of his model.

How accurate was it?

Kogan suggested the exact model used doesn’t affair much, though - what matters is the accuracy of its predictions. According to Kogan, the “correlation betwixt predicted together with actual scores … was around [30 percent] for all the personality dimensions." By comparison, a person’s previous Big Five scores are close 70 to eighty per centum accurate inwards predicting their scores when they retake the test.

Kogan’s accuracy claims cannot last independently verified, of course. And anyone inwards the midst of such a high-profile scandal mightiness cause got incentive to understate his or her contribution. In his appearance on CNN, Kogan explained to a increasingly incredulous Anderson Cooper that, inwards fact, the models had genuinely non worked really well.

Aleksandr Kogan answers questions on CNN.

In fact, the accuracy Kogan claims seems a chip low, but plausible. Kosinski, Stillwell together with Graepel reported comparable or slightly ameliorate results, equally cause got several other academic studies using digital footprints to predict personality (though some of those studies had to a greater extent than information than only Facebook “likes"). It is surprising that Kogan together with Chancellor would acquire to the problem of designing their ain proprietary model if off-the-shelf solutions would seem to last only equally accurate.

Importantly, though, the model’s accuracy on personality scores allows comparisons of Kogan’s results alongside other research. Published models alongside equivalent accuracy inwards predicting personality are all much to a greater extent than accurate at guessing demographics together with political variables.

For instance, the similar Kosinski-Stillwell-Graepel SVD model was 85 per centum accurate inwards guessing political party affiliation, fifty-fifty without using whatsoever profile information other than likes. Kogan’s model had similar or ameliorate accuracy. Adding fifty-fifty a little total of information close friends or users’ demographics would probable boost this accuracy higher upwardly ninety percent. Guesses close gender, race, sexual orientation together with other characteristics would in all probability last to a greater extent than than ninety per centum accurate too.

Critically, these guesses would last peculiarly skillful for the most active Facebook users - the people the model was primarily used to target. Users alongside less activity to analyze are probable non on Facebook much anyway.

When psychographics is generally demographics

Knowing how the model is built helps explicate Cambridge Analytica’s plainly contradictory statements close the role - or lack thereof - that personality profiling together with psychographics played inwards its modeling. They’re all technically consistent alongside what Kogan describes.

A model similar Kogan’s would give estimates for every variable available on whatsoever grouping of users. That way it would automatically estimate the Big Five personality scoresfor every voter. But these personality scores are the output of the model, non the input. All the model knows is that sure Facebook likes, together with sure users, tend to last grouped together.

With this model, Cambridge Analytica could say that it was identifying people alongside depression openness to sense together with high neuroticism. But the same model, alongside the exact same predictions for every user, could only equally accurately claim to last identifying less educated older Republican men.

Kogan’s information also helps clarify the confusion close whether Cambridge Analytica actually deleted its trove of Facebook data, when models built from the information seem to nevertheless last circulating, together with fifty-fifty being developed further.

The whole signal of a dimension reduction model is to mathematically stand upwardly for the information inwards simpler form. It’s equally if Cambridge Analytica took a really high-resolution photograph, resized it to last smaller, together with thence deleted the original. The photograph nevertheless exists - together with equally long equally Cambridge Analytica’s models exist, the information effectively does too.

Matthew Hindman, Associate Professor of Media together with Public Affairs, George Washington University

This article is republished from The Conversation nether a Creative Commons license. Read the original article.


Buat lebih berguna, kongsi:

Trending Kini: