Data / Web Scraping - Metaverse Law

18 Dec

CalPrivacy’s Data Broker Enforcement Strike Force: updates and enforcement actions

Adtech, California, Data / Web Scraping, Data Brokers

On November 26, 2025, CalPrivacy (previously the CPPA) issued a decision requiring ROR Partners LLC to pay $56,600 for failure to register as a data broker under California’s Delete Act. According to the decision, the company used “billions of data points” from over 262 million Americans to create consumer profiles and audience lists, which ROR’s clients could then use for targeted advertising. This action was brought as part of CalPrivacy’s Data Broker Enforcement Strike Force, designed to investigate privacy violations by the data broker industry. As part of this effort, CalPrivacy recently issued an Enforcement Advisory highlighting data broker registration requirements related to trade names, websites and parent/subsidiary entities of data brokers. What is a data broker? By law, a data broker is defined as “a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship,” with limited exceptions for certain entities covered under other sector-specific laws. In short, they are companies that collect and sell a consumer’s personal information without directly interacting with that consumer. Data brokers commonly collect information such as email, phone number, browsing history, or location data from places like public records, commercial data, and other sources. Data brokers often then analyze, bundle and sell these profiles about consumers to other businesses. According to CalPrivacy’s DROP website, “[t]his information can be used to influence you – to buy certain products, to feel certain emotions, or even take certain actions. It can put you at greater risk of identity theft, fraud, or AI impersonations. It can also increase the chances your data is leaked or hacked.” What is the Data Broker Enforcement Strike Force? On November 19, one week prior to the ROR decision, CalPrivacy announced its creation of the Data Broker Enforcement Strike Force within its Enforcement Division. According to the announcement, “[t]he Enforcement Division will be reviewing the [data broker] industry for compliance with the data broker registration requirement in the Delete Act, as well as for compliance with the state’s comprehensive privacy law, the California Consumer Privacy Act (CCPA).” This is not the first time the California regulator has targeted data brokers. In 2024, the Enforcement Division conducted a public investigative sweep of data broker registration with a similar goal of verifying compliance with the Delete Act and the CCPA. What is the Delete Act? The Delete Act is a law that applies to data brokers and requires them to register with CalPrivacy and pay an annual fee. Additionally, data brokers must also disclose:

The number of consumer deletion requests they have received, as well as their average response time;
Whether the data broker collects certain types of sensitive information or the personal information of minors; and,
A link on their website informing customers of their rights under the CCPA.

Entities covered under the Act must register by January 31 if they operated as a data broker in the previous year, and they face a $200 penalty per day for failure to register. As of 2024, the data broker registry is maintained by CalPrivacy. The annual fee funds the registry, along with the new mechanism for allowing deletion of personal information from data brokers, called “DROP.” What is DROP? The first-of-its-kind deletion mechanism, the Data Broker Requests and Opt-Out Platform (DROP) will allow consumers to file a single request, which directs all registered data brokers to delete the consumers’ personal information immediately, and continuously every 45 days. According to the DROP website, the data that is subject to DROP may include:

Basic identifiers, including name, phone number, or email.
Behavioral data, including social media or browsing history, likes and dislikes.
Financial-related data, including payment history or spending habits.
Health-related data, including your usage of health-related apps, wearables, trackers or websites.
Location data, including where you go and how often you visit certain places.
Relationships, including your family and friends and how often you interact with them.
Inferences, including those about your lifestyle, hobbies, incomes, or even religious or philosophical beliefs, which can include history of the videos you watch, articles you read, or topics you search for.

However, the law has certain exemptions for information that is not required to be deleted. This includes information that the government makes public (property records, court filings, etc.) or information controlled by other state or federal laws, such as certain financial or health information. The intent behind the mechanism is to give consumers more control over their personal information and helps protect their privacy. DROP is expected to be available to consumers on January 1, 2026. What’s next? With the release of DROP and the establishment of the Data Broker Enforcement Strike Force, California is positioned to take data broker enforcement seriously. The decision against ROR Partners LLC was finalized one week after the Strike Force was announced, and all signs say this is the first of many enforcement efforts under this regulatory push. If your company or organization may be acting as a data broker, it is important that you understand your obligations under laws like the California Delete Act, but also other state laws. These laws may have requirements like registering as a data broker, publishing a clear privacy notice, providing specific opt-outs, and reporting certain disclosures.

16 Jan

Image depicting the flag of Texas, which is blue, white, and red, with a lone white star.

Texas sues Allstate, continuing Lone Star’s focus on vehicle data regulation

Dale Rappaneau

Data / Web Scraping, Data Brokers, United States, Vehicle Data Data Privacy, State Privacy Laws, Vehicle Data

Update: On Jan. 29, 2025, it was reported that on Jan. 12, 2025, Texas sent Kia America, Inc., a notice of their alleged violations of the Texas Data Privacy and Security Act. Kia has 30 days to cure the alleged violations. Everything is bigger in Texas, including data privacy enforcement. On January 13, 2025, Texas continued its recent regulatory focus on vehicular and geolocation data by initiating a lawsuit against Allstate and its subsidiary, Arity, alleging the companies violated numerous consumer protection and data privacy laws by unlawfully collecting, using, and selling personal, vehicular, and location data without consumers’ knowledge or consent. What led to this lawsuit? For more than half a year, Texas has been leading the regulatory enforcement of vehicular and geolocation data practices. On June 6, 2024, Texas Attorney General Ken Paxton announced that his office had opened an investigation into various car manufacturers “after widespread reporting” that those manufacturers had been secretly collecting mass amounts of data about drivers and selling that data to third parties. The “widespread reporting” cited by Paxton as the seed for the investigation was most likely a nod toward the Mozilla Foundation’s “Privacy Not Included” report published in September of 2023. The report expressly declared that modern vehicles are a “privacy nightmare” and that all 25 car brands researched for the report were labeled as having the worst privacy ever reviewed by the Foundation. The investigation initiated by Paxton in June of 2024 eventually led to a lawsuit against General Motors, filed on August 13, 2024, alleging the company engaged in false, deceptive, and misleading business practices related to its unlawful collection and sale of driving data to insurance companies without the consumers’ knowledge or consent. Following this suit, Paxton’s office sent a notice in November of 2024 to Arity, LLC, a data analytics company founded in 2016 by Allstate, alleging that Arity was in violation of Texas’s recently enacted state privacy law, the Texas Data Privacy and Security Act (the “TDPSA”). The notice identified specific provisions of the TDPSA that Arity was allegedly violating and requested that Arity cure the violations within 30 days, in accordance with the TDPSA’s cure period. But according to Texas’s petition against Allstate and Arity filed on January 13, 2025, Arity failed to cure the alleged TDPSA violations within the 30-day cure period, thereby allowing Texas to include these alleged violations in the lawsuit. What did Allstate and Arity allegedly do? According to Texas’s petition, defendants Allstate and Arity developed and integrated software into third-party apps so that when consumers downloaded the third-party app, they also “unwittingly” downloaded the defendants’ software. The defendants presented the software as “providing a necessary function,” but Texas claims the software does little more than scrape data from the third-party app. Once downloaded, the defendants’ software, through the third-party apps, monitored the consumer’s location and movement “in real-time” and collected trillions of miles of consumer driving data, including geolocation data, accelerometer data, gyroscopic data, and more. The defendants then sold that data to third parties or used it for Allstate’s insurance underwriting. To encourage third parties to integrate the defendants’ software, the defendants paid app developers and offered an incentive program that provided “generous bonus incentives” if developers increased the size of their dataset. All the while, according to Texas, consumers did not consent to, nor were they made aware of, the full extent of defendants’ collection and sale of data. Instead, defendants entered into agreements with the third-party app developers to mandate, to some degree, that certain privacy disclosures and consent language were presented by the third-party apps to consumers, but those third-party disclosures and consent, according to Texas, never mentioned the existence of the defendants, “let alone any of Defendants’ data collection or sales.” Nor did the defendants provide consumers with any of their own notices regarding their data collection practices, and even if consumers did happen to take the extra step to investigate defendants’ policies, those policies contained “untrue and contradictory statements that do not reflect Defendants’ practices.” For example, the policies expressly stated that the defendants do not sell personal data for monetary value, which Texas alleges is untrue, and the policies do not provide consumers with the ability to request that defendants stop selling their data. Taken together, Texas claims these alleged facts establish the basis for numerous legal violations, including violations of the TDPSA, the Texas Data Broker Law, and the Texas Insurance Code. Key takeaways?

Texas is – and will likely remain – focused on regulating vehicular data practices. As the saying goes, once is a coincidence, twice is a pattern, and thrice is a regulatory enforcement focus. Within the short span of half a year, Texas opened an investigation, submitted a notice to cure under the TDPSA, and initiated two lawsuits, all targeting vehicular data practices. Given the rapidity in which Texas is bringing these actions, Texas will likely continue making this an enforcement priority for the near future.
Relying on third-parties to provide notices and collect consent on your behalf may not be enough. The facts allege that the defendants had entered into agreements that, to some degree, obligated the third-party apps to provide notices and collect consumer consent for the collection and sharing of data with the defendants. Yet, according to Texas’s petition, these third-party disclosures and consent collection mechanisms failed to sufficiently inform consumers about the defendants’ data practices.
SDKs remain an area of risk. In recent years, there has been a string of federal and state enforcement action over the use of software development kits (SDKs) to collect and share data. The Federal Trade Commission entered a settlement agreement with InMarket and others; California entered a stipulated judgment of $500,000 with Tilting Point Media; and now a core fact of Texas’s petition is that the defendants developed and integrated SDKs into third-party apps to scrape data.
Carefully consider whether data is being “sold.” Under the TDPSA, a “sale” occurs when personal data is shared, disclosed, or transferred for monetary or other valuable consideration, and Texas alleges that Allstate and Arity “sold” personal data when they sold “data-based products and services for monetary value that linked a specific [consumer] to their alleged driving behavior.” Often, the language of “selling” something conjures to mind ideas of direct financial transactions – exchanging personal data expressly for money or other benefits – but regulators, including those in Texas and California, interpret “selling” personal data more broadly. Thus, companies should carefully review whether their data disclosure and access practices may constitute “selling” personal data, and if so, whether they satisfy the relevant obligations when data is being “sold.”

19 Jan

Metaverse Law featured in OC Lawyer Magazine

Jorge

AI & Algorithms, Biometrics, Data / Web Scraping, GDPR, Social Media

The Orange County Bar Association recently released the January 2024 issue of Orange County Lawyer magazine. This month, Orange County Lawyer includes an article written by Metaverse Law’s Lily Li.

Read “AI Generated Deepfakes: Potential Liability and Remedies” below or in Orange County Lawyer magazine.

[Originally published as a Feature Article: AI-Generated Deepfakes: Potential Liability and Remedies, by Lily Li, in Orange County Lawyer Magazine, January 2024, Vol. 66 No.1, page 26.]

AI-Generated Deepfakes: Potential Liability and Remedies

by Lily Li

Almost ten years ago, in Netflix’s hit series House of Cards, the Underwoods’ presidential bid is almost derailed by a leaked picture of an affair, nude shower scene and all. While the picture was real, the Underwoods were able to undermine the credibility of the leaked image by claiming it was fake—going so far as to recreate the image using a hired model, to show how “easy” it was to fabricate photos.

This episode, aptly named “The Road to Power,” highlights one of the greatest risks of disinformation and fake or synthetic media. It is not through the public’s gullibility to doctored images; it is the watering down of trust in online media, leading individuals to rely solely on friends, family, and other sources of information that echo their own beliefs and values.

Fast forward a decade, and synthetic media—also known as “deepfakes” –-are now pervasive. In early 2022, for example, a fake video of Ukrainian President Volodymyr Zelensky circulated on social media, calling for his soldiers to lay down their arms and surrender to Russia.[1] At the corporate level, deepfakes have been used to mimic a CEO’s voice to fraudulently transfer $243,000.[2] Just as troubling, and even more creepy, a “sophisticated hacking team” impersonated the CEO of cryptocurrency company Binance by using “video footage of his past TV appearances and digitally alter[ing] it to make an ‘AI hologram’ of him and trick people into meetings.”[3] At home, scammers can use deepfaked voices to mimic loved ones, or AI-powered chatbots to engage in romance scams via text messages and phone calls. This is just a front to ask the victim to wire money, send gift cards, or reveal personal information to engage in identity theft. The problem has become so severe that both the FTC and the FCC have released consumer alerts in early 2023 regarding these AI-generated scams.[4]

The ease in which generative AI can create realistic videos, voice, and text will only aggravate these concerns. Deepfakes have long relied on machine learning to iterate and become more realistic with training, but in the past, this type of technology required significant computing resources and time. Now, almost every tech product is incorporating generative AI or machine learning in some form, making this accessible to every novice programmer or script kiddie.

Given these growing risks, this article will focus on the potential liability that creators, platforms, and publishers face in creating and spreading deepfakes, as well as the challenges of pursuing remedies under existing laws. In addition, this article will discuss pending rulemaking governing deepfakes and potential steps forward.

Privacy Liability for Deepfakes

Biometrics: If deepfakes rely on scans of faceprints, facial geometry, or voiceprints to make the false video or audio, or even to train their algorithms, then biometric privacy laws may apply. The Illinois Biometric Information Privacy Act (BIPA) is one of the strictest data privacy laws in the country. It requires express written consent and meaningful disclosures prior to any use and disclosure of Illinois resident biometric data. The collection of biometric data is interpreted broadly to include faceprints and voiceprints. It provides a private right of action, up to $5,000 in statutory damages per violation, and does not require a showing of harm.[5] Earlier this year, in Cothron v. White Castle Systems, Inc.,[6] the Illinois Supreme Court went even further, confirming that each scan in violation of BIPA counts as an ongoing violation—adding further teeth to this law.

Revenge Porn Laws: To the extent the deepfakes include pornographic images, several states, like Virginia,[7] have explicitly included deepfakes within “revenge porn” laws, while other victims have pursued claims under existing revenge porn laws by claiming that the deepfakes amount to non-consensual pornography. The legal consequences vary by jurisdiction, ranging from misdemeanors to felonies with fines and jail time. New York and California also provide a private right of action for deepfake pornography.

General Data Protection Regulation (GDPR): The EU has a broad privacy law that governs use of personal data. Unlike U.S. state privacy laws, which generally allow free use of publicly available data (except for biometric processing), the EU requires all individuals, companies, and non-profits to have a lawful basis for processing any personal data—with limited exclusions for personal data “manifestly made public by the data subject.” Thus, indiscriminate scraping of social media data for deepfakes, especially where the users have limited the audience for their data, would likely violate the GDPR and be subject to fines and regulatory scrutiny.

IP, Torts, and other Remedies

Defamation: Traditional defamation claims are also applicable to deepfakes, if the plaintiff can show that the deepfake is communicated to third parties and makes false assertions that harms the plaintiff’s reputation. For public figures, plaintiffs must also show malice.

Rights of Publicity: Many states recognize a “right of publicity” to an individual’s voice or image. The damages or royalties from a right to publicity claim are proportionate to the value associated with licensing one’s image, so these types of claims are more appropriate for celebrities that ordinarily profit from licensing their image.

Copyright and Trademark: To the extent deepfakes use existing logos, photos, music, or even unique website designs to make them seem official or legitimate, this may support multiple claims of copyright and trademark infringement. Copyright holders may also send copyright takedown notices under the DMCA for infringing conduct.

Breach of Contract: If deepfakes rely on scraped content from existing sites or platforms, this may also support a breach of contract claim against the offending party (to the extent they’ve signed up and agreed to the platform’s rules). For example, in the widely publicized case, hiQ Labs, Inc. v. LinkedIn Corp., the Ninth Circuit found that hiQ breached LinkedIn’s User Agreement both through its own scraping of LinkedIn’s site and through its use of independent contractors to log into LinkedIn and do quality control of the data.[8] The Ninth Circuit noted, however, that LinkedIn was estopped from pursuing certain claims due to how much time had elapsed since its initial awareness of data scraping. Consequently, platforms that wish to rely on breach of contract claims to combat data scrapers, and potential misuse of their platforms for generative AI and deepfakes, must act swiftly and definitively. This is likely the impetus for X Corp’s (formerly Twitter) recent slew of crackdown on data scrapers, through a series of lawsuits filed in August.[9]

State Deepfake Laws: California, Texas, and Virginia have also enacted deepfake laws specific to political deepfakes, but these laws are limited in application and remedy. Texas SB 751, for instance, prohibits deepfake videos created “with intent to injure a candidate or influence the result of an election” and which are “published and distributed within thirty days of an election.” This law makes violations a Class A misdemeanor punishable by up to a year in jail and fines up to $4,000. More recently, Washington State passed a law requiring clear and transparent notices on any synthetic video or audio concerning candidates if it is related to an election. Senate Bill 5152 gives candidates a private right of action, including attorney’s fees for the prevailing party.

Limitations of Existing Remedies; Section 230 of the Communication Decency Act

There are several hurdles that would-be plaintiffs face in pursuing deepfake claims. For many torts like defamation and right of publicity, the amount of damages may be limited compared to the cost of litigation, and important First Amendment rights protect non-commercial speech that is satirical or political commentary. In addition, deepfake content can easily cross borders, so it may be difficult to find a defendant to penalize or enjoin. Consequently, instead of pursuing traditional claims, many victims rely solely on IP takedown notices, or a social media platform’s own processes to flag and remove deepfake content.

At present, Section 230 of the Communications Decency Act also shields platforms from liability for the content users upload and distribute on their platforms, as platforms generally do not constitute the “speaker” or “publisher” of such content. The line between acting as a pure platform, and contributing or generating harmful content, is increasingly blurred, however. In the recent Supreme Court case, Twitter, Inc. v. Taamneh et al,[10] plaintiffs alleged that social media platforms profited from ISIS recruitment videos and allowed ISIS to take advantage of the social media platforms’ “recommendation” algorithms that match content. While the Supreme Court declined to address the scope of 230 protections for these types of “recommendation” algorithms—the Supreme court noted that Section 230 may not protect platforms that create text, audio, or video through generative AI. In oral arguments to Google v. Gonzales, a companion case to Taamneh, Justice Gorsuch strongly implied that generative AI would fall outside of Section 230’s protections, stating: “I mean, artificial intelligence generates poetry, it generates polemics today. That—that would be content that goes beyond picking, choosing, analyzing, or digesting content. And that is not protected. Let’s—let’s assume that’s right, okay?”[11]

Going forward, we anticipate that the Illinois Biometric Information Privacy Act, and pending bills on biometric data, will likely be a more promising and lucrative way to attack platforms that explicitly use biometric data to generate or share deepfakes. In addition, as noted above, plaintiffs may have more luck pursuing claims against platforms that help create deepfake content or media using generative AI rather than solely relying on user content.

Do We Need Additional Laws?

As we can see from the patchwork of common law and statutory rights, the potential risks for creating and publishing deepfakes is many, but the best avenue for plaintiffs to pursue a remedy is unclear. Even some regulators are scratching their heads as to whether existing rules apply to deepfakes. For example, in July 2023, Public Citizen filed a petition with the Federal Election Commission (FEC), asking the FEC to amend its regulation on “fraudulent misrepresentation” at 11 C.F.R. § 110.16[12] to clarify that “the restrictions and penalties of the law and the Code of Regulations are applicable” should “candidates or their agents fraudulently misrepresent other candidates or political parties through deliberately false [AI]-generated content in campaign ads or other communications.”[13] In response, the FEC submitted a notice, soliciting public comment on this issue before making a decision on the merits of the petition.

The FTC has taken a firmer stance, stating that it does have authority to regulate AI generally, and deepfakes more specifically. In a March 2023 blog post titled “Chatbots, deepfakes, and voice clones: AI deception for sale,” the FTC noted that the “FTC Act’s prohibition on deceptive or unfair conduct can apply if you make, sell, or use a tool that is effectively designed to deceive—even if that’s not its intended or sole purpose.”[14]

Abroad, the European Union is taking an entirely different approach, developing a comprehensive law (the EU “AI Act”) that would govern artificial intelligence as a whole. The law, as drafted, requires all high-risk AI processing to undergo risk assessments for bias, safety, accuracy, and other risks. In addition, the AI Act would require transparency obligations for deepfakes, defined as “AI systems that generate or manipulate image, audio or video content.”[15] While the AI Act is still in draft form, it is likely to have as large and wide sweeping of an impact as the General Data Privacy Regulation, once it goes into effect.

Given the existing plethora of rights and remedies under the law, and the potential impact of the EU AI Act, this author does not believe that this is the right time to pursue a federal law specific to deepfakes—even though they present serious threats. In the current divisive political climate, it is likely that any proposed law will either get blocked, watered down, or if passed—fail to strike the right balance between free speech and misleading content. Instead, courts and regulators should strictly enforce existing laws that protect individual privacy and image rights, and the right to be free from false and deceptive practices. Attorneys should advise their tech clients on the risks of generative AI technologies and the potential gaps in Section 230 coverage. Finally, as private citizens, let’s remain diligent in what we read and share—and not be afraid to call out anyone who seeks to deceive.

ENDNOTES

(1) Bobby Allyn, Deepfake video of Zelenskyy could be ’tip of the iceberg’ in info war, experts warn, NPR (Mar. 16, 2022, 8:26 PM), https://www.npr.org/2022/03/16/1087062648/deepfake-video-zelenskyy-experts-war-manipulation-ukraine-russia.

(2) Catherine Stupp, Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case, Wallstreet Journal (Aug. 30, 2019, 12:52 PM), https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402.

(3) Luke Hurst, Binance executive says scammers created deepfake ’hologram’ of him to trick crypto developers, Euronews (Aug. 24, 2022, 2:47 PM), https://www.euronews.com/next/2022/08/24/binance-executive-says-scammers-created-deepfake-hologram-of-him-to-trick-crypto-developer.

(4) Alvaro Puig, Scammers use AI to enhance their family emergency schemes, Federal Trade Commission (Mar. 20, 2023), https://consumer.ftc.gov/consumer-alerts/2023/03/scammers-use-ai-enhance-their-family-emergency-schemes; ’Grandparent’ Scams Get More Sophisticated, Federal Communications Commission, https://www.fcc.gov/grandparent-scams-get-more-sophisticated (last visited Nov. 29, 2023).

(5) See Rosenbach v. Six Flags Entertainment Corp., 2019 IL 123186 (Jan. 25, 2019).

(6) 2023 IL 128004 (Feb. 17, 2023).

(7) Va. Code Ann. § 18.2-386.2.

(8) No. 17-3301 (N.D. Cal. Nov. 4, 2022).

(9) Blair Robinson, X Corp Lawsuits Target Data Scraping, National Law Review (Aug. 17, 2023), https://www.natlawreview.com/article/x-corp-lawsuits-target-data-scraping.

(10) 598 U.S. 471 (May 18, 2023).

(11) Transcript of Oral Argument at 49, Google v. Gonzales, 598 U.S. 617 (2023) (No. 21-1333).

(12) Available at https://www.ecfr.gov/current/title-11/section-110.16.

(13) Artificial Intelligence in Campaign Ads, 88 Fed. Reg. 55606 (proposed Aug. 16, 2023), https://www.federalregister.gov/documents/2023/08/16/2023-17547/artificial-intelligence-in-campaign-ads.

(14) Michael Atleson, Chatbots, deepfakes, and voice clones: AI deception for sale, Federal Trade Commission (Mar. 20, 2023), https://www.ftc.gov/business-guidance/blog/2023/03/chatbots-deepfakes-voice-clones-ai-deception-sale.

(15) Tambiama Madiega, Artificial intelligence act, EU Legislation in Progress, European Parliament (June 2023), https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf.

Lily Li is a data privacy, AI, and cybersecurity lawyer and founder of Metaverse Law. She is a certified information privacy professional for the United States and Europe and is a GIAC Certified Forensic Analyst for advanced incident response and computer forensics.

21 Nov

hiQ v. LinkedIn: User Agreements in the Age of Data Scraping

Dale Rappaneau

Contract, Data / Web Scraping, Social Media, United States

On November 4, 2022, LinkedIn announced a “significant win” for the platform and its members against “personal data scraping.” The win resulted from a 6-year legal battle that asked, in part, whether LinkedIn must allow hiQ Labs to scrape data from the public profiles of LinkedIn members. Last Friday, the U.S. District Court for the Northern District of California answered that question by ruling that LinkedIn’s User Agreement “unambiguously prohibits hiQ’s scraping and unauthorized use of the scraped data.” And as such, hiQ breached LinkedIn’s User Agreement “through its own scraping of LinkedIn’s site and using scraped data.”[1] An Overview of Data Scraping Data scraping is a technique by which a computer program extracts data from another program or source. The technique typically uses scraper bots, which send a request to a specific website and, when the site responds, the bots parse and extract specific data from the site in accordance with their creators’ wishes. Scraper bots can be built for a multitude of purposes, including:

Content scraping – pulling content from a site to replicate it elsewhere.
Price scraping – extracting prices from a competitor.
Contact scraping – compiling email, phone number, and other contact information.

In today’s economy, data is key, and data scraping is an efficient means of acquiring huge amounts of specific data. Yet, this court ruling signals that companies may need to be more cautious about how and where they use data scraping bots. hiQ’s Data Scraping Violates LinkedIn’s User Agreement Founded in 2012 as a “people analytics” company, hiQ Labs provides information to businesses about their workforces. To do this, hiQ extensively relied on using automated software to scrape data from LinkedIn’s public profiles. hiQ then aggregated, analyzed, and summarized that data to create two products, “Keeper” and “Skill Mapper,” which allowed businesses to improve their employee engagement and reduce costs associated with external talent acquisition. However, in 2017, LinkedIn sent a cease-and-desist letter threatening legal action against hiQ, arguing that LinkedIn’s User Agreement prohibits data scraping. Specifically, the User Agreement states: You agree that you will not:

Scrape or copy profiles and information of others through any means (including crawlers, browser plugins and add-ons, and any other technology or manual work);

. . .

Use manual or automated software, devices, scripts[,] robots, other means or processes to access, ‘scrape,’ ‘crawl’ or ‘spider’ the Services or any related data or information;
Use bots or other automated methods to access the Services, add or download contracts, send, or redirect messages.

Court records indicate that hiQ knew about this prohibition since 2015 yet continued scraping data from LinkedIn’s public profiles and even “attempted to reverse engineer LinkedIn’s systems . . . to avoid detection by simulating human site-access behaviors.” Based on these facts, LinkedIn sought a partial summary judgment finding hiQ liable for breach of contract. From hiQ Labs’ perspective, while the above User Agreement language may appear clear, language elsewhere in the User Agreement seemed to provide users and members with a right to scrape data from public profiles. Specifically, the User Agreement provides the following when delineating members’ rights and obligations: 2. Obligations . . . When you share information, others can see, copy and use that information. . . . 3.1 Your License to LinkedIn . . .

c. We will get your consent if we want to give others the right to publish your posts beyond the Service. However, other Members and/or Visitors may access and share your content and information, consistent with your settings and degree of connection with them.

hiQ argued that the User Agreement’s statements that “Visitors may access and share your content and information consistent with your settings” and that “[w]hen you share information, others can see, copy and use that information” are inconsistent with the prohibition of scraping data. And that, as a user and member of LinkedIn who agreed to the User Agreement, hiQ read this inconsistency to mean that hiQ had the right to scrape data from public profiles. Unfortunately for hiQ, this argument failed. The court concluded that informing users that their data may be copied and used does not contradict LinkedIn’s prohibition against scraping, crawling, or spidering. “The two concepts are not mutually exclusive – a warning to members that a third party may collect their public-facing data is not a blessing for third parties to do so through expressly prohibited means.” Thus, hiQ breached LinkedIn’s User Agreement, which “clear[ly]” prohibits data scraping, by scraping LinkedIn’s site and using that scraped data. LinkedIn May Lose Despite This Victory It is important to note that, although LinkedIn considered this a victory, the court only granted partial summary judgment in favor of LinkedIn on its breach of contract claim. hiQ raised numerous defenses to LinkedIn’s breach of contract claim, including waiver and estoppel, arguing that LinkedIn knew about hiQ’s data scraping as early as 2014 yet failed to act until the cease-and-desist letter in 2017. hiQ’s argument goes, in short, that because LinkedIn knew about hiQ’s data scraping but delayed in taking legal steps to prevent it, LinkedIn either waived its right to enforce the breach of contract claim or should be estopped because hiQ reasonably relied on LinkedIn’s acquiescence to the data scraping. The court concluded that there is at least a genuine dispute of material fact as to whether LinkedIn knew about hiQ’s data scraping as early as 2014, which – if sufficiently proven – could provide grounds for hiQ to raise the defenses of waiver and estoppel. These arguments remain unresolved, and it is not clear at this time whether hiQ and LinkedIn will continue battling in court – especially given that hiQ has gone dormant since 2019 – but we will continue monitoring for further developments. Further Privacy Concerns Lastly, this case brings to mind broader legal issues regarding publicly available personal information. Under the California Consumer Privacy Act of 2018 (CCPA), as amended by the California Privacy Rights Act of 2020 (CPRA), businesses must satisfy numerous obligations when processing personal information. However, the definition of “personal information” does not include “information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience.” Similarly, under the EU’s General Data Protection Regulation (GDPR), the law’s prohibition against the processing of special data categories (e.g., race, ethnicity, religion, health, etc.) does not apply if the “processing relates to personal data which are manifestly made public by the data subject.” These exceptions are reminiscent of hiQ’s argument in this case: that LinkedIn’s User Agreement expressly said that “[v]isitors [of LinkedIn] may access and share your content and information consistent with your settings.” Meaning, the users themselves provided their information to LinkedIn and purposefully, via their settings choices, made their information available to the public. Putting aside that LinkedIn’s User Agreement prohibited data scraping, hiQ’s argument raises the question: was hiQ scraping publicly available personal information, as it is understood under the GDPR and CCPA / CPRA? And if so, does that mean that hiQ would not have to comply with some requirements imposed by applicable general data protection laws? The answer will likely depend on a fact-specific inquiry on the circumstances surrounding the user content, such as (i) which data protection law applies to the data subjects in question; (ii) whether privacy settings were readily apparent to users when they initially posted their profiles/content; and (iii) whether users took affirmative actions to publicly post their information. In the meantime, businesses should remain aware that scraping personal information, even publicly available information, requires proper planning and due diligence. Key Takeaways

Data scraping remains a prevalent data collection practice, but individuals and companies may be liable for breach of contract claims stemming from data scraping practices in violation of a User Agreement.
On the other hand, if a business wants to quash a company’s known data scraping practices that violate the User Agreement, waiting too long to take legal steps may result in the business forfeiting a breach of contract claim.
Either way, this ruling indicates that companies must take User Agreements seriously, both their own (if they want to prevent data scraping) and those belonging to others (if they want to scrape data).
Lastly, a question remains as to whether the data in this case was made publicly available, as the term is understood under US and EU data regulation laws.

[1] Note: The court also concluded that hiQ separately breached LinkedIn’s User Agreement by hiring independent contractors to create fake LinkedIn accounts to conduct “quality assurance” while logged into LinkedIn by “viewing and confirming hiQ customers’ employees’ identities manually.” LinkedIn’s User Agreement expressly prohibits creating false identities.