Resources ⌇ Tell me about it

A summary of ‘Guidance on privacy and developing and training generative AI models’ (2024) by the Office of the Australian Information Commissioner

This resource was published on

Privacy is one of the risks associated with AI. To help AI developers the Office of the Australian Information Commissioner has released comprehensive guidance on managing privacy when training and fine tuning AI models.

TL;DR

The OAIC has released guidance on the interaction between the privacy law and the development of AI. It encourages ‘privacy by design’ and focuses on obligations and best practice related to the accuracy of personal information, transparency and notification of AI and how personal information is collected and for what purposes (and whether that is the primary or a secondary purpose).

What

Guidance for Australian entities on how Australia’s privacy laws apply to AI

Who

Office of the Australian Information Commissioner (OAIC)

When

Published: Monday 21 October 2024

Where

Online: on the OIAC website

Why

To set out OAIC’s expectations in relation to privacy on Australian entities developing or deploying AI

How

Outlining compliance and best practice for handling privacy in AI systems

What else

OAIC also published companion guidance for deployers of AI systems

Estimated reading time: 17 minutes

This is a summary of Guidance on privacy and developing and training generative AI models, published by the Office of the Australian Information Commissioner, Australian Government, on Monday 21 October 2024.


Summary

Australian developers of AI who are training or fine tuning AI models that include personal information need to be aware of and mindful that they have obligations under the Privacy Act 1988 (Cth) (Privacy Act), including the Australian Privacy Principles (APP). The guidance is for AI developers who are subject to the Privacy Act (i.e. APP entities and foreign organisations with an Australian link). This includes “any organisation who designs, builds, trains, adapts or combines AI models and applications. This includes adapting through fine-tuning, which refers to modifying a trained AI model (developed by them or someone else) with a smaller, targeted fine-tuning dataset to adapt it to suit more specialised use cases” [OAIC, 2024]. Even though it specifically references generative AI, “a number of the risks and issues discussed are also applicable to narrow AI systems or models that are trained using personal information.” The guidance is also useful to organisations that provide personal information to an AI developer so they can develop or fine-tune a generative AI model.

AI is data hungry – “… the data-driven nature of AI technologies, which rely on large datasets that often include personal information, can also create new specific privacy risks, amplify existing risks and lead to serious harms” [OAIC, 2024]. It is important that AI developers work to protect privacy.

The guidance is also focused on APPs 1, 3, 4, 6 and 10 as they relate to planning and designing generative AI and compiling the training data for it, or fine-tuning an AI model. Other privacy obligations may also exist in relation to other APPs.


Personal information in training data carries privacy obligations

There are many risks associated with the development, deployment and use of AI. Many of these have been flagged in various documents and processes, including the proposed mandatory AI guidelines outlined in the Proposals Paper for Introducing Mandatory Guardrails for AI in High-Risk Settings, released by the Department of Industry, Science and Resources (DISR). The Proposals paper stated that if mandatory guardrails for high-risk AI are introduced obligations under existing legislation would not be replaced by them, nor would AI developers or deployers be exempt from any existing obligations under existing legislation [DISR, 2024: 17]. Privacy law is one of the areas of existing regulation the Proposals paper identifies as applying to the use of AI [DISR, 2024: 4, 12, 15, 17, 37, 39, 43, 60].

Speaking of high-risk AI, OIAC states that relying on large quantities of personal information to train an AI system is a high privacy risk activity. Importantly, personal information in this context includes “inferred, incorrect or artificially generated information produced by AI models (such as hallucinations and deepfakes), where it is about an identified or reasonably identifiable individual” [OIAC, 2024].

There will be lots of cases where AI development will involve personal information, meaning the Privacy Act applies to collecting, using and disclosing personal information. This includes where that personal information has been used to train AI models and anu uses of AI that involve personal information. As such, AI developers should actively consider whether their training data includes personal information.

The OAIC expects AI developers to take a cautious approach to privacy and give it due consideration commensurate with the considerable risks for affected individuals [OIAC, 2024]. In some situations these obligations may be more acute. For example, greater caution should be exercised where personal information is of unclear provenance.The OAIC states that the guidance should be considered together with the Privacy Act, and its guidelines on the Australian Privacy Principles.


Privacy by design (APP 1)

In one line, this section of the guidance encourages AI developers and deployers to be proactive about privacy. AI developers subject to the Privacy Act must ensure they comply with the APPs. They should actively consider the potential privacy risks at the planning and design stage by taking a ‘privacy by design’ approach, “embedding good privacy practices into the design specifications of technologies, business practices and physical infrastructures” [OAIC, 2024].


Types of risks

The guidance lists a number of privacy risks that may come up in the context of AI, including that:

  • People can lose control over their personal information because it may be included in AI training data without their knowledge or consent, and it may be difficult for them to know it was collected and to request it be corrected or removed.
  • There is inherent biases in the training data may be replicated in AI outputs through inferences made based on gender, race or age that have a discriminatory effect.
  • The accuracy or quality of training data can result in inaccuracies in outputs that appear credible, and these may have flow on consequences such as reputational harm, misinformation or unfair decisions.
  • Transparency around the management of personal information can be difficult as entities may not understand or be able to explain how personal information is used or how decisions made by AI are made.
  • De-identification can be difficult to achieve and there is potential for people to be re-identified using AI.
  • There is lots of potential for AI to be misused for improper purposes that carry the potential to impact people’s privacy, as well as broader negative consequences, such as through disinformation (e.g. deepfakes), scams and identity theft, harmful or illegal content and harmful or malicious code used for cyber attacks or other criminal activity.
  • Personal information may become exposed through a data breach involving the training data or through attacks on the model to reveal the training data.
  • Some users may disclose personal information, including sensitive information, through their interactions with an AI system without knowing the system retains or incorproates their inputs into training data.

Identifying and managing risks

AI technologies and the supply chains around them can be complex. This can make assessing the privacy impacts of an AI model or system difficult, particularly where the model is designed for a general purpose. To understand and mitigate the risks AI developers should conduct a privacy impact assessment (PIA). Their PIA should go beyond just assessing the risks of non-compliance with privacy legislation to also consider “the broader privacy implications and risks beyond compliance, including whether a planned use of personal information will be acceptable to the community” [OAIC, 2024] and consideration of privacy risks that may result from the intended use. PIAs should be conducted on an ongoing basis, especially where risks have changed or models are fine-tuned.

Developers of general purpose AI systems or developers who structure their AI systems in a way that places privacy obligations on downstream users should provide the information or access needed so downstream users are able to assess privacy risk and comply with their privacy obligations. If there is any doubt whether the Privacy Act applies to a specific AI-related activity or where an AI developer is shifting privacy obligations downstream, the OAIC suggests AI developers err on the side of caution and assume the Privacy Act applies.


Accuracy when training AI (APP 10)

AI systems inherit inaccuracies and unfounded biases evident in their training data and can perpetuate and amplify them in their outputs. In some instances these may result from intentional data poisoning. AI tools are also known to produce hallucinations (i.e. inaccurate or false results). And they are probabilistic (i.e. they do not ‘understand’ the data they are trained on or the content they generate). Yet they generate outputs which appear credible, regardless of their accuracy.

Further, the accuracy and reliability of an AI system can suffer in some circumstances. Accuracy and reliability may deteriorate over time where training data becomes outdated or where the model encounter a scenario or task that differs from the training data (i.e. a model’s reasoning ability declines).

APP 10 requires AI developers to take reasonable steps to ensure the personal information they collect, use or disclose is accurate, up-to-date and complete. They must also ensure any personal information used or disclosed is also relevant with regards to the purpose of the use or disclosure.

Developers must take their care, especially when an AI tool will be used for high privacy risk uses such as for making decisions that will have a legal or similarly significant effect on an individual’s rights. Reasonable steps an AI developer must take depend on the circumstances, including the sensitivity of the personal information, the nature of the developer, the possible adverse consequences for an individual if the quality of personal information is not ensured and the intended purpose or intended outputs of the AI model. Steps to be taken are commensurate with an increased level of risk.

In an AI context, the OAIC offers a number of examples of reasonably steps an AI developer may need to undertake to ensure accuracy, including:

  • Ensuring the training data, including any historical information, inferences, opinions or other personal information about individuals is accurate, factual and up-to-date
  • Understanding and documenting the impact that the accuracy of the training data has on AI outputs
  • Clearly communicating any limitations in the accuracy of the AI tool, “including whether the dataset only includes information up to a certain date, and should signal where AI models may require careful consideration and additional safeguards for certain high privacy risk uses, for example use in decisions that will have a legal or similarly significant effect on an individual’s rights” [OAIC, 2024]
  • Updating AI systems if they become aware the information used for training or the outputs of the AI are incorrect or out-of-date
  • Marking content as AI-generated
  • Considering whether other steps are needed to address the risk of inaccuracy “such as fine-tuning, allowing AI systems built on the generative AI model to access and reference knowledge databases when asked to perform tasks to help improve its reliability, restricting user queries, using output filters or implementing accessible reporting mechanisms that enable end-users to provide feedback on any inaccurate information generated by an AI system” [OAIC, 2024].

Obligations when collecting personal information (APP 3)

The OAIC reminds AI developers that just because data is publicly available or otherwise accessible does not mean it can be legally used to train an AI model. Regardless of the source of the data – data scraping, data provided by a third party or a dataset the AI developer (or who they are developing the model for) already holds – where training data includes personal information AI developers must ensure they comply with their privacy obligations.

Personal information may be included in the training data itself, associated metadata or in any annotations, labels or other descriptions attributed to the data as part of its processing. In some cases information that would not be personal information alone may be in combination with other information.

If an AI developer is collecting personal information they must only collect personal information that is reasonably necessary for their functions or activities. Unnecessary personal information should be filtered out of the training data.

Under the Privacy Act personal information must be collected directly from individuals unless is it unreasonable or impracticable to do so. AI developers will need to consider whether it was unreasonable to impractical to collect the data directly rather than using scraped data. If they are using datasets collected by third parties they will need to consider what steps that third party took to inform individuals that their personal information would be used to train an AI model and whether consent for the collection of sensitive information was validly obtained. OAIC recommends AI developers seek information or assurances from third parties in relation to the collection of personal information.

When personal information is collected it must be done lawfully and by fair means. It should not be collected through intimidation or deception, by means that are unreasonably intrusive or, depending on the circumstances, covertly collected without the knowledge of the individual. This is also particularly relevant to scraped data. Also, privacy obligations may still arise even where an AI developer intends to de-identify the personal information.


Sensitive information

Extra care should be taken with sensitive information, which is “any biometric information to be used for the purposes of automated biometric verification or biometric identification, biometric templates, health information about an individual, genetic information about an individual or personal information about an individual for certain topics such as racial or ethnic origin, political opinions or sexual orientation” [OAIC, 2024]. Generally, sensitive information requires consent to be collected.

Many photographs or recordings of people contain sensitive information, including AI-generated material, and may not be able to be scraped from the internet or collected from a third party dataset without establishing consent. If sensitive information was collected inadvertently without consent it will generally need to be destroyed or deleted from training data.


Use and disclosure obligations (APP 6)

If an AI developer (or an organisation seeking an AI developer to develop an AI model for them) holds personal information collected through, for example, “operating a service, interactions with an AI system or a dataset they compiled for training an earlier AI model” [OAIC, 2024] and intends to use it for training an AI model what is important is whether that was the primary purpose of collecting the personal information.

If it was not, or the AI developer does not have consent for a secondary, AI-related purpose, then, in the absence of another exception applying, they would need to be able to establish that the secondary use would be reasonably expected by the individual, taking into account the person’s expectations at the time of collection, and that it is related (or directly related, for sensitive information) to the primary purpose or purposes. To that end, OAIC says, “Whether a secondary use is within reasonable expectations will always depend on the particular circumstances. However, given the unique characteristics of AI technology, the significant harms that may arise from its use and the level of community concern around the use of AI, in many cases it will be difficult to establish that such a secondary use was within reasonable expectations.” In many cases the AI developer should seek consent for the use and offer individuals a meaningful and informed ability to opt-out if they wish, including an appropriate amount of information and a sufficient period of time to opt-out.

Because many people may not have a full understanding of generative AI, when developers should provide people with meaningful information to help them understanding how their personal information will be handled, so they can determine whether to give consent. OAIC suggests this “could include information about the function of the generative AI model, a general description of the types of personal information that will be collected and processed and how personal information is used during the different stages of developing and deploying the model” [OAIC, 2024]. AI developers should also consider what changes may need to be made to their privacy policies and collection notices to comply with notice and transparency obligations.


Notice and transparency obligations (APP 1 and APP 5)

Regardless of where an AI developer’s training data comes from, it should have a clearly expressed and up-to-date privacy policy outlining how they manage personal information, including how data will be collected and held, the purposes for which is is being collected and used (e.g. training generative AI models) and how people can access the personal information about them held by the developer and how they can correct that information if needed.

Under APP 5 other notice should be made in addition to the privacy policy. This may be difficult where scraped data is being used as the AI developer likely does not have a direct relationship with or a way to contact individuals whose personal information is included in the training data. Where individual notification is not practicable, they should consider what other notification mechanisms they could use to provide transparency to affected individuals, such as making information publicly available in an accessible manner. This should include how the personal information was collected used and disclosed in the circumstances, such as the categories of personal information used, the kinds of websites that were scraped, and if possible the domain names and URLs of those websites.

For personal information received from third parties, the AI developer should consider what notice was given by that third party to affected individuals and whether that fulfils their privacy obligations.


Other privacy matters

OIAC cautions that the guidance is not an exhaustive list of all the privacy obligations AI developers may have when generating or fine-tuning an AI model. The guidance focuses on compliance with APPs 1, 3, 5, 6 and 10 specicically but AI developers may also need to consider:

  • when personal information originates from overseas (APP 8),
  • how to keep records of their data sources in a way that enables individuals to assert their rights of access and correction and the consequences of withdrawal of consent (APPs 12 and 13), and
  • approriate measures to secure training data as well as decision-making around when personal information should be destroyed or de-identified (APP 11).

Here’s some links I recommend that expand on the OAIC privacy guidance for AI developers:

A blog post published by OAIC that looks at the Australian privacy law and how it applies to AI development. It also advances a view that AI developers should care about privacy and consider it early in their AI development process. The blog post mentions that the related guidance published by OIAC considered similar guidance published by overseas privacy regulators and sought to align Australia’s guidance where possible.

An OAIC spokesperson provides more context about the guidelines.


References

Office of the Australian Information Commissioner (2024) ‘Guidance on privacy and developing and training generative AI models’, Australian Government, https://www.oaic.gov.au/privacy/privacy-guidance-for-organisations-and-government-agencies/guidance-on-privacy-and-developing-and-training-generative-ai-models.

Department of Industry, Science and Resources (2024) ‘Proposals Paper for introducing mandatory guardrails for AI in high-risk settings’, https://consult.industry.gov.au/ai-mandatory-guardrails.

Was this free resource helpful?

If so, I encourage you to please show your support through a small contribution – it all helps me keep creating free arts marketing content.

Disclosure

AI use

This resource was drafted using Google Docs. No part of the text of this resource was generated using AI. The original text was not modified or improved using AI. No text suggested by AI was incorporated. If spelling or grammar corrections were suggested by AI they were accepted or rejected based on my discretion (however, sometimes spelling, grammar and corrections of typos may have occurred automatically in Google Docs).

I used Gemini in Google Workspace to summarise the text of this resource, however the summary (see TL;DR) goes not duplicate any of the AI-generated text. Rather, it was used to help me gather my thoughts on the most important parts of the text to include in a summary.


Provenance

This resource was produced by Elliott Bledsoe from Agentry, an arts marketing micro-consultancy. It was first published on 29 Oct 2024. It has not been updated since it was first published. This is version 1.0. Questions, comments and corrections are welcome – get in touch any time.


Reuse

Good ideas shouldn’t be kept to yourself. I believe in the power of open access to information and creativity and a thriving commons of shared knowledge and culture. That’s why this resource is licensed for reuse under a Creative Commons licence.

A bright green version of the Creative Commons brand icon. It is two lowercase letter Cs styled similar to the global symbol for copyright but with a second C. Like the C in the copyright symbol, the two Cs are enclosed in a circle.A bright green version of the Creative Commons brand icon. It is two lowercase letter Cs styled similar to the global symbol for copyright but with a second C. Like the C in the copyright symbol, the two Cs are enclosed in a circle.

Unless otherwise stated or indicated, this resource – A summary of ‘Guidance on privacy and developing and training generative AI models’ (2024) by the Office of the Australian Information Commissioner – is licensed under the terms of a Creative Commons Attribution 4.0 International licence (CC BY 4.0). Please attribute Elliott Bledsoe as the original creator. View the full copyright licensing information for clarification.

Under the licence, you are free to copyshare and adapt this resource, or any modified version you create from it, even commercially, as long as you give credit to Elliott Bledsoe as the original creator of it. So please make use of this resource as you see fit.


Resource metadata

Resource clusters:

Resource tags:

, , ,