Data Management

Digital Sovereignty is About the Fight to Control Data in the AI Era

How data privacy concerns drive users to seek technological countermeasures to reshape digital dynamics.

Brian Jackson Research Director, Info-Tech Research group

October 17, 2023

Digital Sovereignty is About the Fight to Control Data in the AI Era

In the last week of September, the News Media Alliance blitzed Capitol Hill to lobby for updated copyright protections for the AI era.

The alliance represents 2,000 publishers, from Vox Media to the smaller regional papers in the U.S., and it’s taking the position that any unlicensed use of content created by its members and journalists by generative AI companies equates to infringement of copyright, according to an Axios reportOpens a new window . The media industry is coming to grips with the threat of its content being used to train AI algorithms that will disrupt their role as information providers. In short, if anyone can simply ask ChatGPT for details about current or historical events, why would they bother searching for it on a news publisher’s site? Without website traffic, the news media’s business model would fall apart.

It’s just the latest example of creators banding together to push back against AI companies’ approach to training large language models (LLMs), with training sets comprised of massive amounts of data scraped from the web and other sources. Authors George R.R. Martin and John Grisham are propellants of one lawsuit, and comedian Sarah Silverman is involved in another. Software developers have a class action lawsuit targeted at GitHub Copilot, a coding assistant trained on the open-source code stored on the site. Getty Images, visual artists, and countless others wrapped up in class action lawsuits are also joining the fray. The battleground for the future of copyright will play out in courts over the next several months, with massive repercussions for how AI companies are allowed to build their algorithms in the future.

The lingering question about whether AI companies violate copyright is casting a pall over an otherwise hot market. AI-focused firms are valued at billions of dollars and draw huge interest from businesses and consumers. Microsoft took a step to get ahead of the legal challenges from scaring off customers by making its Copilot Copyright CommitmentOpens a new window , promising that Microsoft will defend its commercial customers if a third party sues them for copyright infringement related to the use of Microsoft Copilot, an Office 365 AI feature that’s powered by Open AI’s GPT-4 algorithm.

The new battle over data rights comes five years after Europe passed the General Data Protection Regulation (GDPR), and many U.S. states are now passing their own data privacy legislation outlining similar or even more stringent protections on what companies can do with user data. GDPR has resulted in billions of dollars of finesOpens a new window since coming into force, penalizing tech giants that build their business models off amassing consumer data and harvesting insights from it to target ads or other purposes. These laws should have covered how and when personal data or copyrighted data could be used to train AI, shows how quickly technology moves and how the law often struggles to keep pace with its new capabilities.

Whether the law is settled or not, users are trying to fight against predatory data harvesting practices.

Where the law has been settled, the Data Rights Protocol (DRP) seeks to automate the resolution of data requests between consumers and companies that comply with the laws. Where the technology is ahead of the law, users wanting to protect their data from AI training will fight back against the algorithms with defensive tactics such as Glaze, a filter developed by Shawn Shan and his team at the University of Chicago that prevents AI from being able to mimic the style of an artist’s image. Defending against data-hungry business models will become part of a digital sovereignty trend, which will see consumers and businesses alike use court challenges, political lobbying, and technological countermeasures to assert their control over their data and agency over their digital identity.

See More: Data Privacy: A Business Playground or Management Minefield?

Data Rights Protocol is Working to Automate the Resolution of Data Rights Requests

The U.S. may be late to the party when it comes to codifying data privacy protections into law, but the States doing so have made a flashy entrance. California’s Consumer Protection Act (CCPA) quickly became recognized as any jurisdiction’s most stringent privacy protection. Virginia and Colorado have also gone down the path of putting individuals in control of their data. But the CCPA is the legislation that proactive tech companies are complying with for now – in some cases voluntarily, so they can assure customers they are putting privacy first.

The CCPA empowers consumers by allowing them to make requests to companies they deal with and direct how their data can be used or stored. Consumers can direct that their data not be sold to other parties, access their data held by a third party, and request its deletion.

“The center point is the individual and all their personal data emanations,” explains Daniel ‘Dazza’ Greenwood, the Data Rights Protocol lead at Consumer Reports Digital Lab. “Data Rights Protocol (DRP) is about giving people ownership over their data. It’s narrowly scoped around rights articulated in existing law and must be complied with in a mandatory way by data holders.”

Built out of the Consumer Reports Innovation Lab starting in the summer of 2021, DRP is a project that seeks to move state law from manual requests involving several parties to a technical standard that allows for automated resolution of consumer data requests. It’s translating law into code and publishing it to GitHub, detailing its open-source set of API endpoints that will streamline how organizations respond to consumer data requests.

“An important part of the protocol design is that it’s better than what’s being done. No statute says people have to adhere to the law through this protocol; it’s voluntary, and our value proposition is to make it so good that people and companies would choose to use it because it’s faster, cheaper, and better,” Greenwood says.

The current system to resolve CCPA requests involves a consumer making a request, the business that receives the request, authorized agents that can act on behalf of consumers to make requests of many different businesses, and software providers that help organizations comply with privacy laws. Authorized agents interact with the privacy software providers to detail how data rights requests are made and received. DRP will standardize that across the industry. It will make the process more repeatable and predictable for organizations, giving them cost certainty for managing the requests.

DRP released its first stable release on Sept. 12, with version 0.9. It is working with digital privacy firms OneTrust, Transcend, Incogni, and Ethyca to test and implement the protocol into production.

The concept makes sense for organizations keen to comply with the law. But what about those with data broker business models that could see compliance as a threat to profitability?

“We’ll know definitively by 2025 and 2026 exactly what the propagation and adoption rates look like, and it’s possible that the companies you refer to as the worst offenders may be the last holdouts,” Greenwood acknowledges. “I still think this would be very worth doing because if the vast majority of companies are doing things the same way, it makes it somewhat easier to look at a small number of worst offenders when it comes time for enforcement.”

Greenwood points to Spokeo, a founding member of DRP, as an example of a business model that depends on providing other people’s data to its customers. The website offers people lookup for names, emails, phone, and address information.

In the future, DRP could extend its utility beyond California. It will start with the CCPA but could provide the same capability in multiple jurisdictions, including where GDPR is in effect. The protocol can specify the consumer’s geography. That will start as a binary between “California” and “voluntary” but could be updated for future versions.

If DRP succeeds, it could play a role in a future where a consumer simply uses an app to assert their data rights and can expect they will be respected.

See More: Data Governance: It Takes A Village (And Good Infrastructure)

Glaze Provides Artists a Defense Against AI Mimicry

Where DRP is designed to convey more efficiently what’s already in law, Glaze is a technology that seeks to help artists who feel they are victims who fall into the gap between cutting-edge technology and legal precedent. The backlash to generative AI’s ability to instantly create content in the unique style of well-known creators includes the lawsuits mentioned earlier in this article. Still, those court cases will take months, if not years, to play out, and there’s no guarantee a judge will decide to protect creators. So, the University of Chicago’s computer science department is creating a tool that visual artists can use to protect their work from being used to train an AI algorithm.

Artists looking for a way to fight back against AI began reaching out to Shan because of his lab’s earlier work on the Fawkes Project in 2020. The project created a digital cloak that people could put on their selfies before they uploaded them to social media, which prevented training a facial recognition algorithm. Artists wondered if the cloak would protect their artwork from being used for training.

Greg Rutkowski is the most famous example of an artist whose work was mimicked by AI image generators like Stable Diffusion or Midjourney. Rutkowski became the most-used prompt for image generators, resulting in thousands of new images that imitated his hyper-detailed fantasy style of artwork flooding the internet. But he’s not the only one affected by the phenomenon. Comics artist Sarah Anderson was shocked to learn from a fan that her style could be easily replicated with the tools; Karla Ortiz found the same, and a long list of other artists now listed as collaborators with Glaze.

Current copyright law wasn’t made with AI’s capabilities in mind, Shan says.

“Humans can’t process 2 billion images and replicate that work so well. If we had that ability, we’d have different copyright laws,” he says. “The laws have an average human being in mind. So, we’ll see what happens in the current court cases. Does the current law protect artists, or do we need new laws?”

To protect the artists they work with, the Glaze team used the open-source Stable Diffusion model to engineer a new type of cloak that protected the style of artwork. The technique is clever, changing just enough pixels in an image to fool an AI algorithm into interpreting it differently while being practically invisible to the human eye. The AI algorithm will still understand the content of an image but not its style, so if Karla Ortiz draws a dog, the ‘Glazed’ image will look like Vincent Van Gogh drew a dog to AI, for example.

“AI sees patterns differently than human beings,” Shan explains. “They see an array of pixels, and that’s true of all AI systems. We try to maximize that gap by adding the smallest changes to our perception but the biggest in the model’s perception.”

The Glaze team proved that its work was effective at foiling AI and acceptable to artists in a study. It has since released the tools to be available as an application download and an online tool accessible via a web browser. The application has been downloaded more than 1 million times.

While Glaze was developed on Stable Diffusion’s model and other AI image generators use proprietary algorithms and are not available for examination, Shan says that Glaze translates well to protect all models. The team is also trying to anticipate countermeasures. Even if large AI companies accept artists’ wishes to prevent their work from training their algorithms, it is becoming increasingly trivial for anyone to train AI using a small training set of images. So bad actors may wish to remove Glaze from artists’ work they want to mimic. So far, Glaze is holding up to its intended effect.

“For students that went to art school and worked for decades to create their own unique style, they deserve a chance to make money from this,” Shan says. If Glaze can level the playing field even a small amount, it will be a victory for artists.

What Does it Mean for Technology Leaders?

A new chapter in digital sovereignty is being written in the next couple of years as the power dynamics between individuals and organizations – especially those with AI-driven business models – are settled through court decisions and legislation. In the meantime, there is uncertainty about how organizations can maintain control over their intellectual property or whether their reputations will be tarnished if they use AI-created content.

Current market dynamics incentivize platforms to rush to stake their claims to user data. In March, Zoom changed its terms of service in a way that appeared to permit it to harvest user data for AI training but later backtracked after users protested. In July, Google updated its privacy policy to allow the company to collect and analyze information people share for AI training, and it altered its policy to state that data scraped by its web crawler could also be used for AI training unless users opt-out.

Those with access to valuable data will want to either put a price on it, as coding help website StackOverflow did by charging for its API access, or raise prices as X (formerly known as Twitter) did with its APIs. There may be a price on using data for AI training soon.

Meanwhile, technology leaders at organizations must look at data security from a new perspective. Not only must they re-evaluate how sensitive their data is in light of AI training, but they should also get ahead of customer requests regarding personal data.

Digital sovereignty is a two-way street and must be asserted and respected.

How can you navigate this complex landscape of data privacy and AI copyright effectively? Let us know on FacebookOpens a new window , XOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

MORE ON DIGITAL SOVEREIGNTY

About Expert Contributors: The Expert Contributor program is designed to help kickstart meaningful conversations around the priorities and challenges most critical to C-level executives. The insights and perspectives will help CIOs tackle what’s most important to them. We are always looking for industry thinkers who can help set the narrative for our enterprise audience. To know more about this program, and submit your ideas, reach out to the Spiceworks News & Insights Editorial team at editorial-toolbox@ziffdavis.comOpens a new window .

Brian Jackson

Research Director, Info-Tech Research group

opens a new window opens a new window

As a Research Director in the CIO practice, Brian focuses on emerging trends, executive leadership strategy, and digital strategy. After more than a decade as a technology and science journalist, Brian has his fingers on the pulse of leading-edge trends and organizational best practices towards innovation. Prior to joining Info-Tech Research Group, Brian was the Editorial Director at IT World Canada, responsible for the B2B media publisher’s editorial strategy and execution across all of its publications. A leading digital thinker at the firm, Brian led IT World Canada to become the most award-winning publisher in the B2B category at the Canadian Online Publishing Awards. In addition to delivering insightful reporting across three industry-leading websites, Brian also developed, launched, and grew the firm’s YouTube channel and podcasting capabilities. Brian started his career with Discovery Channel Interactive, where he helped pioneer Canada’s first broadband video player for the web. He developed a unique web-based Live Events series, offering video coverage of landmark science experiences including a Space Shuttle launch, a dinosaur bones dig in Alberta’s badlands, a concrete canoe race competition hosted by Survivorman, and FIRST’s educational robot battles. Brian holds a Bachelor of Journalism from Carleton University. He is regularly featured as a technology expert by broadcast media including CTV, CBC, and Global affiliates.

Do you still have questions? Head over to the Spiceworks Community to find answers.

Digital Sovereignty is About the Fight to Control Data in the AI Era

Data Rights Protocol is Working to Automate the Resolution of Data Rights Requests

Glaze Provides Artists a Defense Against AI Mimicry

What Does it Mean for Technology Leaders?

MORE ON DIGITAL SOVEREIGNTY

Recommended Reads

How Can AI Platforms Adapt to Hybrid or Multi-Cloud Environments?

AI Regulations Around the World: A Comprehensive Guide To Governing Artificial Intelligence

AI Benchmarks: Why GenAI Scoreboards Need an Overhaul

10 Best AI-Powered Tools To Fuel Productivity

4 Tips To Sell Your Generative AI Solution To Large Enterprises

What Is Prompt Engineering? Definition, Elements, Techniques, Applications, and Benefits

Digital Sovereignty is About the Fight to Control Data in the AI Era

Data Rights Protocol is Working to Automate the Resolution of Data Rights Requests

Glaze Provides Artists a Defense Against AI Mimicry

What Does it Mean for Technology Leaders?

MORE ON DIGITAL SOVEREIGNTY

Share This Article:

Recommended Reads

How Can AI Platforms Adapt to Hybrid or Multi-Cloud Environments?

AI Regulations Around the World: A Comprehensive Guide To Governing Artificial Intelligence

AI Benchmarks: Why GenAI Scoreboards Need an Overhaul

10 Best AI-Powered Tools To Fuel Productivity

4 Tips To Sell Your Generative AI Solution To Large Enterprises

What Is Prompt Engineering? Definition, Elements, Techniques, Applications, and Benefits