Privacy and Generative AI – what you need to know

I’ve done some research on the question of privacy and Generative AI. My findings and thoughts…

In Cape Town, there’s a seafront pool where people once bathed in the nude.

Called Graaff’s Pool, it was long a place for men to take a dip – though women were later allowed to do the same.

As far as I can see from from internet searches the pool is no longer in use. But there’s an apocryphal story-cum-joke about it that still surfaces from time to time.

The story: There’s a wall between the pool and the nearby road, so you can’t see the bathers – but that didn’t stop someone complaining to the city council about the nudity on display. The joke: That she had to stand on a chair and peer out of the top corner of a window to get the smallest glimpse of a bare bottom.

Much more likely that she was invading the privacy of the bathers than that they were behaving indecently?

What does this have to do with privacy and AI?

Not all that much, except that it was one of a school of things that swam into my mind when I was researching the question for a session of my AI Explorer learning circle.

Mostly, I was wondering what we really mean when we talk about privacy. If people chose to swim in the nude in a pool where they were possibly visible to other people, had they surrendered their right to privacy? They hadn’t done that, I thought – but they had freely made a choice to enter the water without any clothes on.

Only one definition I found in my research into privacy seemed to make sense to me:  

Broadly speaking, privacy is the right to be let alone, or freedom from interference or intrusion. Information privacy is the right to have some control over how your personal information is collected and used.

My own definition, distilled from all this: privacy is the right to a choice about what other people know about you.

Kinds of privacy

It turns out there are different kinds of privacy – in the case of the Graaff’s pool bathers, bodily privacy is at stake. There’s a list of seven different kinds (see it here) – another example is location privacy, which is where the controversy about surveillance cameras comes from. Your right to walk down a street unobserved is in play.

For the purposes of this article, we’re looking mostly at privacy as it relates to personal data (the right to know how your personal information is collected and used, especially by commercial enterprises).

What’s changed with the advent of generative AI?

Most people understand that there’s a wide range of things about their lives that they share with platforms like Google and Facebook – and they understand that they are sharing that data in exchange for convenience. If Google Maps is tracking your movements via your phone, that means it knows that on this particular trip you’re probably taking your kid to soccer practice. That’s useful, especially if the darn kid lost their boots again and you are running late and need to check what traffic is like.

In general, I’ve thought of my own interactions with AI tools like ChatGPT in much the same way: yes, it knows things about me, but on the other hand, it just have me a super quick meal plan.

Are the two things the same, though?

Perhaps not, according to one of the articles I came across:

“…AI systems pose many of the same privacy risks we’ve been facing during the past decades of internet commercialization and mostly unrestrained data collection. The difference is the scale: AI systems are so data-hungry and intransparent that we have even less control over what information about us is collected, what it is used for, and how we might correct or remove such personal information. Today, it is basically impossible for people using online products or services to escape systematic digital surveillance across most facets of life – and AI may make matters even worse.” – Jennifer King, privacy and data policy fellow at the Stanford University Institute for Human-Centered Artificial Intelligence (Stanford HAI)

In summary:

We’ve been more-or-less-voluntarily giving personal information to Big Tech for years. Now, we’re not just giving information to Big Tech – we’re giving information to tools that Big Tech has made, and which are essentially black boxes. There’s a lot we don’t know about how these models work, and a lot we can’t predict. In journalism trainer Adam Tinworth’s memorable phrase: “we’re in the toddler-with-a-chainsaw stage of AI adoption”. Caution is the sensible thing here.

What follows is some information, and lists of do’s and don’t’s, and then a set of resources. Some of the material is derived from NotebookLM summaries of a set of links I gave it. I’ve rewritten and reorganised, but some of this is AI text. You can see the links and the NotebookLM work here.

What are the new scary bits, then?

Bigger = riskier: AI models are trained on massive datasets, including public platforms, user inputs, and proprietary databases, which inevitably include sensitive data. This vast volume increases the likelihood of data exposure or misuse.

The things we can’t see: Gen AI systems often lack transparency regarding what data is collected, processed, stored, and shared. They can collect users’ personal information during interactive and conversational sessions.

“No delete button”: Large Language Models (LLMs) don’t have a “delete” button or a straightforward mechanism to “unlearn” specific information. This poses difficulties for compliance with “right to be forgotten” regulations. Once data is entered into most AI systems, it is challenging to remove it.

(Note that a judge in New York recently ordered OpenAI not to delete the data of users at all. That’s because the New York Times is suing OpenAI, and the data that OpenAI holds about the NYT needs to be available for scrutiny. OpenAI is contesting that ruling.) 

What are the privacy risks when using generative AI?

  • The data that Gen AI is possibly keeping about you could lead to identity theft – either because the data is “out there” or because AI tools enable bad actors to find your data more easily. On the one hand, your picture has been on the Internet for years. On the other hand, the internet is not the same place as it once was. Jennifer King notes for example that scammers are using AI voice cloning to impersonate people and then extort them over “good old-fashioned phones”. Extortion has been with us forever; the ability to easily clone voices not so much.
  • Data can and is being procured without explicit consent or knowledge, or is being used for purposes beyond the initial disclosure.
  • Sharing private data with generative AI tools can violate data protection laws like POPIA. This is an important thing to think about if you are a freelancer, for example: your client data should not be AI fodder.

What should you never, ever share?

  • Passwords and security information: Do not share email passwords, two-factor authentication codes, or password tips (like your mother’s maiden name or childhood pet) with chatbots. 
  • Highly sensitive or embarrassing secrets: This includes confessions of illegal activities or deeply personal admissions. Some chatbots log conversations. Even if not used for model training, these things are not secure vaults or therapists!
  • Company information: Avoid using chatbots for product prototypes, confidential meeting notes, private research or executive travel plans. Proprietary details (whether they belong to you, or the company you work for) should never be entered into a chatbot conversation.
  • Explicit or harmful content: Graphic violence, threats, or hate speech should not be part of your chatbot experience, even as a joke. Some AI systems might flag and report such content to authorities. (And I mean – why would you do this anyway?)

Don’t share, but you could if you did a bunch of work 

These three things are obvious no-no’s – but they are in the realm of documents that contain information that we might not think about thoroughly as we interact with a chatbot.

Personally Identifiable Information (PII): Your own full name, address, ID or social security number, passport, or driver’s licence details. Think identity theft. 

What to do instead: The likely use case here is asking an AI tool to evaluate your CV. Take all those details out before feeding them into the machine.

Financial information: Never share credit card numbers, bank account details, or cryptocurrency private keys. 

What to do instead: The likely use case is here asking for help with your taxes, or how to clear your debt. You’re going to have to remove all your personal data first. 

Medical information: Do not enter medication prescriptions or medical charts. 

What to do instead: Take out personal details; instead, phrase questions generally (e.g., “What types of exercises build muscle for a woman in her 50s with a sore knee?”).

So what can you do?

  • You can share non-sensitive, non-personal, and non-confidential information, or information that has been generalised, anonymised or desensitised.
  • You can (perhaps) be less cautious with enterprise-grade AI tools – this is when your company has its own internal AI tools
  • If you’re really worried, try a tool like Jan AI which operates only on your computer.
  • If you’ve done your homework and an AI provider clearly outlines their data handling policies and obtains explicit, informed consent for specific uses, and you are comfortable with those terms, certain data may be shared within those defined boundaries.

Common sense rules

  • Be selective about what you share.
  • Understand data practices: Even if the AI provider says they don’t use user inputs to train their primary chatbots, some human review can still occur for abuse flagging.
  • Read the privacy and security policies: Try to understand how they guard and protect your information (see below for some guidance on that!).
  • Opt-out where possible: You can often customise settings to prevent chat entries from being used to train models, or specify how long you want your history kept.

Final word: If you wouldn’t want it repeated, reviewed, or resurfaced later, it doesn’t belong in a chatbot inquiry.

RESOURCES

I’ve done some AI-assisted research on your behalf.

ONE

First I had Perplexity find the privacy pages of the bigger AI platforms (and checked the links, of course):

Google Gemini

Microsoft Copilot 

OpenAI (ChatGPT) 

Anthropic (Claude) 

Perplexity

DeepSeek

Canva (included because so much of what it offers now has an AI component)

TWO

I inputted all those links into NotebookLM and asked it for a table that summarised information from those policies, using these three parameters:

Does the tool use data to train its models?

Does it offer opt-out features?

Can you  delete your activity history

See that table here:

THREE

I asked Google’s Gemini and ChatGPT to do deep research based on this prompt, with the list of links supplied:

Please scan these links and give me a brief assessment of which ones offer the most privacy protection to individual consumers.

The two reports (generated by AI, not fully fact-checked by me) are available as PDFs here:

Gemini:

Chat GPT:

My summary: Here’s how ChatGPT ranked them, from most privacy-friendly to least:

  • Claude
  • Copilot
  • OpenAI
  • Perplexity
  • Canva
  • DeepSeek
  • Gemini

Gemini ranked three of the providers like this:

  • Copilot (strong contender for protection of your privacy)
  • OpenAi (sort of OK)
  • DeepSeek (dodgy)

Gemini said that it was unable to rank itself, Anthropic, Perplexity and Canva because of lack of transparency. It said it had found “inaccessible or uninformative privacy policies” and added that “this immediate barrier to understanding data handling practices represents a significant red flag for any privacy conscious consumer”.

The immediate question that arises is why ChatGPT can confidently rank all seven platforms, while Gemini says that four of them are inaccessible. Which means, as usual with AI results, that both reports need to be taken with a pinch of salt. It’s interesting though that both ranked Microsoft’s Copilot high on privacy and DeepSeek as problematic. And even Gemini, a Google product, was critical of Google’s privacy policy. It doesn’t surprise me that Google took a hit here – as I said above, it’s a very large combine harvester in the field of personal data.

My takeaway? I like DeepSeek but I ain’t putting anything personal in there, at all, ever. I’ll continue to try Google Gemini (I mean, what’s changed?). And I will, with a sigh, try to make Copilot work for me, again.

Main picture: AbsolutVision, Unsplash

Other things I have written

Renee’s four golden rules of AI – So much hype, so much uncertainty, so much information about artificial intelligence. Here are some guiding principles… 

How to build yourself a virtual coach with AI (for free) – So you know how to use ChatGPT (or Claude, or Gemini). But you’re sure there’s more you could be doing. Here’s how to build yourself a virtual coach with AI (for free). 

The AI tool everybody should be using – NotebookLM – Research assistant and note-taker rolled into one – allow me to introduce you to NotebookLM.

How to write a good prompt for AI – A common complaint about AI is that it just doesn’t give good answers. That’s because you are asking questions. Instead, here’s my guide to how to write a good prompt.

How can I help you make order from chaos? 

Join the Safe Hands AI Explorer learning circle!

Sign up for my Sensible Woman’s Guide to AI and Content Creation, which comes out fortnightly.

Or sign up for my personal fortnightly email newsletter In Safe Hands (it’s a place to pause and think about things).

Book a free half hour of my time here. Find out how I can help with writing, editing, project management, general clear thinking, technical hand holding, or an introduction to AI.

Contact me via email.

2 Comments

  1. Great job. And some very useful tips and info Renee.

Comments are closed