What is Google’s Federated Learning of Cohorts (FLoC)?

In 2019 advertisers spent an estimated 19 billion dollars on audience data and related solutions. A significant chunk of that is in the form of 3rd party data and Data Management Platforms (DMP). With the removal of the third party cookie (and related workarounds), most of that money is at risk. If a proposal like FLoC is implemented, the entirety of the 3rd party data landscape will change. Data vendors will either cease to exist or have to entirely change their business model. DMPs will likely be replaced by Customer Data Platforms. Future predictions aside, lets take a look at what Google is proposing will replace 3rd party audience segments.

Current State

Before we look at FLoCs, a brief summary of how the landscape looks today (or skip to the analysis): When a user visits a website, that website owner (publisher) has placed many different pixels/bits of code that collect data about that user. With many different publishers using these pixels, data vendors record a detailed history of what websites a user visits and the actions they take on those sites. Additionally, companies like Liveramp can import data from in-person sales (Steve bought a TV at BestBuy for $2,300 in Dallas, TX) and intermingle that offline data with their online behavior database.

Data vendors then use this wealth of data to create audience segments. This 170 page PDF from Oracle is a nearly exhaustive list of info about these vendors – it contains example segments like “in market for a new car”, “previously traveled to France”, “heavy reader of finance journals”, “employed by IBM”, etc.

Below are examples of the collection methodology of a few data vendors which show how and where they collect user data:

[Data vendor has] Accurate, actionable data compiled from a diverse network of offline and offline publishers, and qualified premium data partners. [Data vendor] leverages all data types across online, location, offline, etc., across its 100+ terabyte data lake.

For more than seven years, [Data vendor] has created advanced data models for Fortune 1000 brands. With code on more than one million publisher sites, our publisher network yields more than 30 billion intent and interest signals from content consumption, copy and paste sharing, search keywords, and social behaviors.

Aggregated spending insights such as high or frequent spend are culled from U.S. Visa credit- and debit-card transactions. These insights are then combined with Oracle Data Cloud demographic, purchase, and other data to create Visa Audiences powered by Oracle. Visa aggregates and de-identifies all transactional data output for Visa Audiences to protect cardholder and merchant privacy

Federated Learning of Cohorts

Replacing the above ecosystem is a monster task. Three stakeholders and their goals need to be accounted for: Users and their privacy, Publishers and their revenue, and Advertisers and their capabilities to advertise effectively.

In this new world when you visit a website, browsing data will never leave your web browser. Data vendors will not be able to collect and analyze your behavioral data. Instead, the browser will use a new type of machine learning to compare your behavior against billions of users and generated flocks (segments) of similar users. I won’t pretend to fully understand this machine learning, but here is a metaphor:

  • A recipe author publishes a recipe for a birthday cake
  • You read the recipe, but realize it could be better
  • You bake a few cakes testing different ingredients
  • You go back to the recipe author and propose changing 2 ingredients (more sugar, less salt)
  • 1,000 other people do the same – testing other cakes, proposing changes
  • The author groups users by their proposed changes
    • You and 350 others want it sweeter (more sugar)
    • 425 others want it less sweet (less sugar)

This is federated learning. The website’s creator has no idea what different ingredients you tested out, only that you proposed 2 changes. The recipe author doesn’t know what test cakes anyone baked, but they know what changes everyone proposed. 

The author can now group users into segments based on the changes. You and 350 other people want it sweeter (more sugar). 425 other people wanted it less sweet (less sugar). We now have two flocks (segments) that are created based on a users desires, but there was no data shared about what cakes each person tested.

Bringing this back to advertising, a user’s web browsing history is never shared with anyone. But at a data vendor* an AI has an idea about what users fall into what flocks. A user’s browser can see that model, and based on their behavior, send proposed changes to the AI. (Ex. User 12383 probably belongs in flock A74 and not in B89).

*The proposal is unclear if data vendors are in the picture – google might be in this role. This is also an over-simplification of how the process works and is mildly incorrect.

Real words pulled from Google’s Federated Learning announcement in 2017

This strategy largely accomplishes googles intentions:

  1. User data is kept entirely private, and individuals cannot be identified
  2. This acts as a replacement for 3rd party data, which drives higher cpms, so publishers do not lose revenue
  3. Advertisers can still target users they think will be in-market for a product

Drawbacks and Changes

In the example above it is not clear why different users would be grouped into different flocks, besides a hand-wavey “their behavior is similar”. Additionally, we don’t know how (or if) those flocks would be labelled. 

Legal Ramifications

Grouping users into segments without knowing the rules for segmentation creates legal issues. If you’ve ever worked at an agency or data vendor you’ve inevitably handled requests from financial institutions, healthcare/insurance companies, and real estate companies about how data is sourced. These companies follow strict legal guidelines about discrimination. FLoC’s current functionality would mean none of those industries could use it as they’d likely be unintentionally discriminating and breaking the law.

Personas are back. Audience segments are dead.

FLoC only allows an advertiser to target a group of users who behave similarly. Sound familiar? This is personas coming back to life. No longer would BestBuy target user in-market for tech products but instead they’d target high income, mid 20s males, living in urban areas.

Who labels segments? Are data vendors dead?

The proposal from google doesn’t provide answers on how segments would be labelled, outside of recommending short nondescript names (A56E) for privacy reasons. Without any labeling FLoCs are really just semi-informed A/B tests. 

While they’ve yet to be mentioned in discussions, I can see a world where data vendors are able to run analysis on flocks and provide insights to advertisers, similar to their role today. It would theoretically be possible for an advertiser to compare their 1st party today against flock data to get a ranking of the most relevant flocks (their target market). 

Similarly, a data vendor may be able to analyse flocks to create personas for each flock.

“FLoC is for grouping together people whose browsing behavior seems “kinda similar”. I wouldn’t expect any one flock to be full of “users who are interested in international travel”. But maybe if you took the tens of thousands of flocks and ranked them by interest in international travel, you’d find that a hundred of them stood out as highly enriched for the people you’re trying to advertise to.” 

Michael Kleber, Google

In Summary

This article you’ve just read is actually longer than Google’s FLoC proposal. The proposal is entirely focused on technical theory and doesn’t explore any impacts to the adtech ecosystem. Within the Privacy Sandbox this is the only replacement for 3rd party audiences. Because this idea is like 2% of the way towards implementation and drastically affects almost 20 billion dollars, I don’t think it will be the winner. Put your disagreements in the comments.

Additional reading:

FLoC proposal on github

Federated learning announcement (2017)

Weird comic on federated learning from Google

Real technical stuff

Leave a Reply

Your email address will not be published. Required fields are marked *