Artificial Intelligence (AI) is increasingly reliant on large volumes of copyrighted text, images, code and structured data to generate content, make predictions, and identify patterns. While AI development offers transformative potential across industries, it also raises legal questions:
AI models require huge volumes of text, code, images and structured data to learn language patterns, styles and facts. Through statistical analysis of datasets, AI models decipher human knowledge as well as patterns, language structures and grammar in order to generate content and predictions. Developers who can lawfully access and process high-quality datasets (e.g. professional journalism, published books and curated databases) are best placed to develop a quality AI model.
Data quality has a profound impact because the model is learning directly from the content of the data it is being fed. ‘Quality’ in training data is not merely an absence of errors. It includes, amongst other things, editorial quality, authoritative factuality, diversity and coverage and licensing certainty.
Because editors and publishers invest in verification, curation and fact checking, their material tends to be higher quality and more reliable than other information and data. While all original works attract copyright, professionally produced materials are particularly sensitive - both because of the commercial value attached to them and the higher likelihood of enforcement if they are used without authorisation in AI training datasets.
Copyright in Australia is primarily governed by the Copyright Act 1968 (Cth) (Copyright Act) which grants copyright owners’ exclusive rights to reproduce, publish, communicate and adapt their works, among other things. Any unauthorised use of a substantial part of a work will generally infringe these rights unless an exception applies, such as fair dealing.
AI and copyright therefore intersect in complex ways because the process of training AI models often involves temporarily reproducing copyrighted works. During training, AI systems ‘ingest’ large amounts of text, images or other creative content to learn patterns and generate outputs. It follows that this act of reproduction (even if transient and computational) may constitute an exercise of a copyright owner’s exclusive rights, and even possibly communication and publication rights if the content is made available.
The question is, does Australia’s copyright regime adequately facilitate the rapid emergence of AI? At present, it appears it may be ill-equipped to do so. It is highly probable that many large AI models have already been trained on copyrighted materials without permission and without providing compensation, highlighting gaps between existing copyright protections and modern data-driven technologies. And, moreover, given many AI training datasets have been developed, this raised questions of how the data has been collected, where the data has been sourced from and what has actually been collected.
This tension is the core policy problem: how can Australia enable the lawful use of high-quality data for AI system training while protecting creators’ legitimate interests? The Copyright Act was not drafted with machine learning in mind, and this gap between legal rights and technological reality is increasingly difficult to ignore.
The Copyright Act includes exceptions that permit limited use of copyrighted works without permission. However, the application of these exceptions (such as the fair dealing regime) to AI remains unclear.
So, what is the scope of Australia’s fair dealing regime? The fair dealing exception allows for the use of copyright material without permission from the copyright owner, so long as it is used for one of the specified purposes under the Copyright Act and is considered fair.
The specified purposes include research or study,[1] criticism or review,[2] parody or satire[3] and news reporting.[4] Then, whether a specified use is then considered ‘fair’ is determined with regard to a range of relevant circumstances, for example, purpose and character of use (e.g. commercial or not for profit), nature of work, commercial availability, effect on the market and quantity and substantiality of material used.
Given the substantial volumes of data required to train AI systems, it is unlikely that such use would fall within the scope of fair dealing. For instance, under the research or study exception, the reproduction of a literary, dramatic or musical work is limited to no more than 10% of the total pages of a published edition, or a single chapter. Such restrictions are unlikely to be compatible with the requirements of AI training.
For this reason, one such policy suggestion that emerged to address this regulatory gap was introducing a new category into the fair dealing regime to clearly enable lawful text and data mining (TDM) for AI purposes. This was raised by the Productivity Commission in its interim report Harnessing Data and Digital Technology whichsort feedback as to whether introducing a text and data mining exception into the fair dealing regime would be appropriate.[5] Although the final report is expected to be provided in December 2025, this debate took a decisive turn in October 2025 when the Australian Government announced that it will not introduce a TDM exception.[6]
A TDM exception would have allowed the reproduction and analysis of copyrighted works for computational purposes - like AI model training, pattern recognition, or statistical analysis - without infringing copyright, provided certain conditions are met. This would be with a goal of distinguishing technical uses from copying intended to replace or distribute an original work.
The rationale was straightforward::
At the same time, the counterarguments were equally clear :
Ultimately, the Government came down on the side of creators, ruling out a TDM exception and signalling that copyright owners should retain control over how their works are used in AI development.
So, what comes next for AI and copyright in Australia? While the Government has confirmed it is not considering a TDM exception, active consultation is underway to address this regulatory gap. The Government has convened the Copyright and AI Reference Group (CAIRG) to focus on how to encourage fair, legal avenues for using copyrighted material in AI. One such consideration is whether a collective licencing framework should be introduced under the Copyright Act. No matter the outcome, it is clear that improving certainty and improving enforcement mechanisms will be front of mind.
This article was written by Ariel Bastian Senior Associate and Anna Kosterich Solicitor Corporate Commercial.
[1] Copyright Act, s 40.
[2] Copyright Act, s 41.
[3] Copyright Act, s 41A.
[4] Copyright Act, s 42.
[5] Productivity Commission, Harnessing Data and Digital Technology, p 26.
[6] Hon Michelle Rowland MP, Albanese Government to ensure Australia is prepared for future copyright challenges emerging from AI (media release), 26 October 2025.
This article was written by Ariel Bastian, Senior Associate Corporate Commercial and Anna Kosterich, Restricted Practitioner Corporate Commercial.