Defining open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is more difficult than other open pursuits. Through efforts to define it, questions arise as to if an AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system needs to be trained on open data to be deemed open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → and whether current practices in open data adequately accommodate the needs of all data stakeholders.
TL;DR
Even as the open movement seeks to define open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → there is no agreed idea of what ‘good’ open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is. The use of open data as the training data for open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is only recommended. Mandating open data is likely to limit the development of open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → but without such a condition open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → becomes a wide spectrum where some or all elements of an AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system are open (but not all of them may be).
Through the whitepaper, Alek Tarkowski draws out the need for open data to move beyond a basic objective of ‘as open as possible’ to a more nuanced data governance approach. The classic pursuit of open data fails to accommodate the different needs of other stakeholders related to the data set. To rectify this, more than copyright licensing needs to be considered.
What
A white paper looking at the need to establish shared data governance practices in open data
Who
Author: Alek Tarkowski
Publisher: Open Future
When
Published: Friday 24 January 2025
Where
Online: on the Open Future website
Why
To encourage open advocates to strike a better balance between open data a responsible data governance
How
Suggesting areas where open data practitioners can improve their practices
What else
The paper suggests a need for two paradigm shifts to make open data more responsible and six focus areas where the open movement can take action
Estimated reading time: 10 minutes
WTF is this?!
This is a summary of the Open Future whitepaper titled Data Governance in Open Source AI: Enabling Responsible and Systemic Access by Alek Tarkowski, published by Open Future in partnership with the Open Source Initiative, on Friday 24 January 2025.
Summary
Defining open source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → or open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is a difficult task given how many elements are involved in developing an AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system. While data is a critical element, it is not the only element. How many elements must be open to be open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →? Is a purist view that sees AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → as open only when every element is open even possible? Can open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → be responsible and ethical if it is trained on open data that did not respond to the broad needs of stakeholders related to the data?
These questions sit behind Tarkowski’s white paper, which looks at the need for changed data governance practices in open data communities, particularly through the lense of open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → development.
Open and closed AI
Many of the AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → systems that exist are propriety in nature. We have seen this recently with OpenAI investigating if DeepSeekA Chinese AI company that caused a stir in January 2025 with two LLMs that challenged Sillicon Valley’s AI dominance. Learn more → used ‘its’ data to training their model. While alternatives existed that could broadly fit under an ‘open’ label, a diversity of approaches exist. ⟨ Tarkowski provides a solid overview of the spectrum of ‘openness’ in AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → development in the white paper ⟩ There is no agreed definition of what ‘good’ open source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is.
That challenge has been taken up by a number of initiatives: Linux Foundation’s Model Openness Framework (MOF), the Digital Public Goods Alliance’s standard for AI as a digital public good, Mozilla’s Convening on openness in Artificial Intelligence and the Open Source Initiative’s Open Source AI Definition (OSAID) (which was released in October last year) and others. The crux of the OSAID is that for AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → to be ‘open’ it should provide other parties with freedoms equivalent to open source software.
Should open data be used for AI to be ‘open’?
Because data is viewed as one of the key components of AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → models or AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → systems “… considerations of openness of AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → models must address the gnarly question of the openness of data: to what extent and in what way does openness of the training data (or lack thereof) determine the openness of the overall AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system?” While consensus seems to be emerging in relation to the transparency of data for openness the openness of datasets remains unresolved.
On one hand, a definition of open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → that makes the open training AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → a ‘nice to have’ seemingly allows for ‘closed’ components with an ‘open’ AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system. If the data in an ‘open’ AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → is not available under an open content licence or is otherwise publicly available important aspects of ‘openness’ are compromised. Without reusability of the underlying training data desirable ‘open’ activities such as auditing, verification and replication of the model is not possible. On the other hand, the complexity of AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → systems and the practicalities that limit openness of data may make such a definition of open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → almost unattainable. And the ever-present risk of open washing persists.
A concurrent issue is the ethics of mass-scale data extraction to train the Big AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → platforms. There are many perspectives on web-scraping the open internet to use as training data for AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →: it “is often seen as not properly managed: conducted in ways that are perceived in some cases as outright unlawful and in others as at least morally questionable, not in line with research ethics or unjust.” Regardless of your views on scraping the public web to train AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →, as Tarkowski points out, if you want to train AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → on “fully open and transparent datasets” that are either in the Public Domain or openly licences you are at a disadvantage because of the comparatively small volume of open data available. Even with the body of open data, much of it will not be the type of data your AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → system needs.
So it seems an aspiration to ‘do the right thing’ while also aspiring to a gold standard of open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → could be a double edged sword. This quote from the white paper sums up the paradox well:
The importance of data transparency and access to data, even if the latter is contested as a requirement for Open Source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → systems, signals the need for not just more data to be shared, but also for better data governance.
Such data governance also needs to navigate the risk that open data generated by and for communities could be opportunistically exploited by powerful third parties. Hence, many community driven AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → data collections end up in a dilemma: Protecting openness and respecting the data rights of marginalized communities will limit the general ability to grow a global pool of Open Source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →. Paradoxically, by trying to avoid the freeriding of some, everyone might end up with less Open Source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → which can be used by anyone, including vulnerable and marginalized communities.
Open data is not exempt from ethical concerns
At the heart of the white paper is the reality that open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → cannot compete with and address the adverse concentrations of power of Big AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → without data open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → systems can be trained on. Also, the idea that better data governance and acknowledgement that it should be the foundation of good AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → governance should be advanced more. Emphasis should not be on the quantum of data available to train AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →, but on “the quality of the data and specific governance mechanisms that ensure that data is shared in ways that are equitable, sustainable and protected from value extraction”. And, that other mechanisms beyond open licensing should be considered. Tarkowski gives the example of non-copyright-based preference signaling through opt-outs to indicate whether the data can be used for AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → training as an illustrative example.
Although the notion of Indigenous Data Sovereignty (IDSov) was not explicitly mentioned in the white paper, Tarkowski’s conceptualisation of good open data sharing efforts not simply aiming to release as much data as possible, as openly as possible, but rather to take as a starting point “proper data preparation, data governance frameworks and stewardship functions” leaves room for IDSov and First Nations Indigenous Cultural and Intellectual Property (ICIP) principles and practices. Likewise, so does elevating non-copyright mechanisms.
Moving open data to a more ethical approach
To take open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → forward, Tarkowski suggests two paradigm shifts that are needed. The first is is adoption of a ‘data commons’ approach that moves beyond basic open data methods that fall short of preventing data exploitation to “robust commons-basaed governacne models.” While it is not totally clear what Tarkowski means, the white paper does say that this would “… result in an acknowledgment of a gradient of data sharing approaches, where open data is the optimum on one side of the spectrum, with other data sharing approaches — suited for cases where open sharing is not desirable or attainable — on the other side.” The paper envisages innovative data licensing models and management approaches such as data trusts and cooperatives.
The second paradigm shift put forward in the white paper suggests that open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → look beyond “… solely meeting AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → development needs to a broader view of data sharing that serves the needs and objectives of a broader set of stakeholders.” Recognition and understanding of the various needs and goals of other stakeholders with a stake in data is necessary to successfully share new sources of data.
Focus areas for open AI
In the white paper, Tarkowski also espouses six focus areas for open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →. They are:
- Data preparation and provenance: Establishing robust standards for data collection, classification, anonymization, and metadata to ensure quality and traceability.
- Preference signaling and licensing: Developing mechanisms like opt-out frameworks and social licenses to allow rights holders and communities to control data use.
- Data stewards and custodians: Strengthening roles for data stewardship, including intermediary institutions that facilitate data sharing while ensuring ethical governance.
- Environmental sustainability: Promoting practices that reduce the environmental impact of AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → through shared datasets and efficient training methods.
- Reciprocity and compensation: Implementing mechanisms that ensure value generated from shared data is equitably distributed, particularly to marginalized communities.
- Policy interventions: Advocating for public policies that mandate data transparency, incentivize data sharing, and support the creation of open datasets.
Useful links
Here’s some links I recommend related to Alek Tarkowski open data and open AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → white paper:
Data Governance in Open Source AI: Enabling Responsible and Systemic Access [PDF]
Alek Tarkowski – February 2025
Open Future
Data Governance in Open Source AI: Enabling Responsible and Systemic Access
The publication page on the Open Future website announcing the white paper. It includes an overview of the white pape.
Friday 24 January 2025
Open Future
Was this free resource helpful?
Disclosure
AI use
This resource was drafted using Google Docs. No part of the text of this resource was generated using AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →. The original text was not modified or improved using AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →. No text suggested by AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → was incorporated. If spelling or grammar corrections were suggested by AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more → they were accepted or rejected based on my discretion (however, sometimes spelling, grammar and corrections of typos may have occurred automatically in Google Docs).
I used Gemini in Google Workspace to summarise the text of this resource, however the summary (see TL;DR) does not duplicate any of the AI-generated text. Rather, it was used to help me gather my thoughts on the most important parts of the text to include in a summary.
Provenance
This resource was produced by Elliott Bledsoe from Agentry, an arts marketing micro-consultancy. It was first published on Sunday 9 February 2025. It has not been updated since it was first published. This is version 1.0. Questions, comments and corrections are welcome – get in touch any time.
Reuse
Good ideas shouldn’t be kept to yourself. I believe in the power of open access to information and creativity and a thriving commons of shared knowledge and culture. That’s why this resource is licensed for reuse under a Creative Commons licence.
Unless otherwise stated or indicated, this resource – A summary of ‘Data Governance in Open Source AIAI is tech and marketing speak for a range of technology that imitates human intellect. Learn more →: Enabling Responsible and Systemic Access’ – is licensed under the terms of a Creative Commons Attribution 4.0 International licence (CC BY 4.0). Please attribute Elliott Bledsoe as the original creator. View the full copyright licensing information for clarification.
Under the licence, you are free to copy, share and adapt this resource, or any modified version you create from it, even commercially, as long as you give credit to Elliott Bledsoe as the original creator of it. So please make use of this resource as you see fit.