Wikipedia research on rabbit holes and Australian places, growing awareness of Indigenous data and a definition of open source AI.


Read

Funnily enough, most of what I read this week was published the week before, but here’s what I’ve been reading this week:

Open with care

TL;DR
Indigenous Data Sovereignty and open data do not have to be mutually exclusive concepts.

Using the PhD research of Native Hawaiian Leslie “Leke” Hutchins as an illustrative example, this article looks at how open data and Indigenous Data Sovereignty and the CARE Principles can be seen as in conflict with one another, but that they can co-exist. Hutchins was researching arthropod diversity on Native Hawaiian coffee plantations. He was interested to know if the return of native flora and biodiversity on coffee plantations was affected arthropod diversity. Hutchins recognised that “the arthropod samples he was collecting count as Indigenous data because they come from Native lands and have cultural significance, just like sacred objects and traditional knowledge. So he requested the farmers’ consent for any data he shared in his paper, and redacted the names and locations of arthropod species and their sequencing data to keep culturally sensitive information from outsiders and reduce unauthorized visits to the farms.” Hutchins's research aligned with the CARE principles, developed to ensure the collective benefit, authority, responsibility and ethics of Indigenous data. It is a great article that demonstrates in practical terms that openness in research and data does not mean that Indigenous Data Sovereignty and the CARE Principles cannot also be accommodated. In fact, doing so is part of a process to change the history of unethical use of Inidgenous data in research.

https://www.science.org/content/article/not-free-all-indigenous-communities-want-limits-how-their-data-are-shared


TL;DR
The US Copyright Office has decided there will be no change to the DMCA anti-circumvention rules for access for researchers to preserved video games.

The US Copyright Office has denied a push by the Software Preservation Network (SPN) and the Video Game History Foundation (VGHF) to add an exception to the anti-circumvention rules that would have allowed libraries and archives to remotely share copies of out-of-print video games held in their collections with researchers through emulation. Currently they are not able to circumvent any technical protection measure (TPM) on games even where the intended use is not a copyright infringement.

The Copyright Office determined that there was “... greater risk of market harm with removing the video game exemption’s premises limitation, given the market for legacy video games” and that “... proponents did not show that removing a single-user limitation for preserved computer games or permitting off-premises access to video games are likely to be noninfringing.” The ruling also reiterates concerns put forward by the Entertainment Software Assocaition and other video games lobby groups that the proposed change would result in games being used for recreational purposes. Of course, as the article’s author notes, the digital lending by libraries of other types of content such as books and movies are for recreational purposes.

In their statement in response to the Copyright Office’s final rule the VGHF pointed to their research that shows 87 percent of video games released in the US before 2010 are out of print. The only way to legally access such titles once they are not commercially available is through the second-hand market.Sadly this situation is reflective of a lot of copyright lobbying. As Frank Cifaldi from the VGHF said on X, “This fails the needs of citizens in favor of a weak sauce argument from the industry, and it's really disappointing.”

Publishers are absolutely terrified “preserved video games would be used for recreational purposes,” so the US copyright office has struck down a major effort for game preservation
“This fails the needs of citizens in favor of a weak sauce argument from the industry, and it’s really disappointing”

Also worth reading on this topic:

Statement on the DMCA 2024 triennial review ruling | Video Game History Foundation
The US Copyright Office announced today that they would not grant a new exemption in support of video game preservation. Our statement.

The Video Game History Foundation’s statement on the US Copyright Office’s ruling.


Going down a Wikipedia rabbit hole? Science says you’re one of these three types

TL;DR
Wikipedia rabbit holes vary depending on a user’s style of curiosity: busybody, hunter and dancer.

Internet rabbit holes are a thing and Wikipedia is responsible for its fair share of curiosity-driven click-on-click timesinks. As the article says, “Part of what made Wikipedia groundbreaking was how it satisfies people’s intrinsic learning needs by inviting navigation from page to page, luring readers into rabbit holes.” A large-scale research project involving more than 480,000 Wikipedia users in 14 languages across 50 countries investigates reading ‘curiosity styles’ – “the different “architectural styles of curiosity” people embody when they navigate” – and “the “knowledge networks” associated with the three main styles of curiosity: busybody, hunter and dancer”:

“The busybody scouts for loose threads of novelty, the hunter pursues specific answers in a projectile path, and the dancer leaps in creative breaks with tradition across typically siloed areas of knowledge.”

The research also suggests there is a spectrum of other curiosity styles beyond the main three.

In the article, the author states that “Studying Wikipedia readers reveals a rich picture of people’s freely expressed, diverse online curiosities” and that “Wikipedia (and sites like it) could better support curiosity-driven exploration” by, for example, “showing readers their own dynamic knowledge network” “rather than suggesting pages based on their popularity or similarity to other pages”. I am interested in all of these ideas.

Going down a Wikipedia rabbit hole? Science says you’re one of these three types
A study of 480,000 Wikipedia users shows how ‘busybodies’, ‘hunters’ and ‘dancers’ follow their curiosity in different ways.

How Australian Places are Represented on Wikipedia: A report of the WikiStories Project

TL;DR
wikihistories’ new report shows that Australian places on Wikipedia widely follow a settler-colonial view, with other perspectives sanitised or omitted.

The wikihistories project at UTS has published its second report, this time looking at how well Australian places are covered on Wikipedia, if at all. Through an exaimination of 35,000 articles about Australian places and interviews with volunteer editors the report tries to understand how Wikipedia represents Australian places and the editing practices that drive how those representations come about.

I encourage anyone interested in Wikimedia projects and free knowledge read the report. There are so many insights to take from it. For me, what is apparent through the report is that the situation we find ourselves in – in which “Wikipedia representation of Australian places is anthropocentric and neocolonial” – is the amalgam of a range of intersecting phenomenon. As the report notes, this includes:

  • the ‘negotiation’ between Wikipedia editors when writing about Australian places,
  • the diversity of editor practices and motivations for editing, especially when writing about Australian places (including, in some cases, avoidance of certain subject-matter to avoid ‘edit wars’),
  • the complexities of reconciling Aboriginal and Torres Strait Islander ways of knowing, being and doing with the settler-colonial status quo – individually, socially and politically,
  • the Western-centric foundational, technological and normative practices (especially in relation to understandings of space and place) on the platform.

What results is a partial or biased view that sanitises how many places are represented or systematically omits certain places at together. Where places are included on Wikipedia, the text presented does not represent all views connected to a place.

Places further from major cities see fewer articles and are less attention by editors. And “The cities, towns, and administrative divisions founded by European settlers guide the creation, editing and reading of Wikipedia articles. First Nations, ecological, or cosmopolitan senses of place need to fight or negotiate to find room within this nationalist European structure.” There has been some positive momentum to address this reality by some editors, but plenty of resistance still remains on Wikipedia.Given how valuable and insightful the report is, I will likely publish an explainer about the report and its findings soon. It is an important piece of work that sits alongside Wikimedia Australia’s commissioned research into the complex relationships Aboriginal and Torres Strait Islander people may experience when reading or contributing to Wikipedia and how to better recognise, respect and reflect Aboriginal and Torres Strait Islander peoples and their knowledge systems within the Wikimedia projects.

How Australian places are represented on Wikipedia – Reports
This year, the wikihistories team set out to understand how well Wikipedia represents Australian places and what kinds of editing practices drive those representations. Examining 35,000 articles about Australian places and interviewing volunteer editors, we found that English Wikipedia reflects an anthropocentric and neo-colonial image of Australia as a place. Download the full report, or scroll down to read an html version suitable for your smaller screen device.

Also worth reading on this topic:

We analysed 35,000 Wikipedia entries about Australian places. Some of them sanitise history
The first project to examine Australian Wikipedia entries finds topics such as Australian history and use of First Nations place names are sparking ‘edit wars’, with some serious omissions.

An article published in the lead-up to the launch of the report that explores the key findings.


The Open Source AI Definition – 1.0

The Open Source AI Definition – 1.0

TL;DR
OSI releases the Open Source AI Definition.

The Open Source Initiative has released a stable first version of the Open Source AI Definition which provides for community-led, open and public evaluations to validate whether an AI system can be defined as Open Source AI. An Open Source AI system grants the freedoms to use, study, modify and share the system for any purpose without restrictions. Ayah Bdeir, who leads AI strategy at Mozilla, said in the accompanying blog post that “The new definition requires Open Source models to provide enough information about their training data so that a ‘skilled person can recreate a substantially equivalent system using the same or similar data,’ which goes further than what many proprietary or ostensibly Open Source models do today.” As Kylie Robison says on The Verge Style: italics Meta’s Llama fails to meet the definitional requirements.

The Open Source AI Definition – 1.0
version 1.0 Preamble Why we need Open Source Artificial Intelligence (AI) Open Source has demonstrated that massive benefits accrue to everyone after removing the barriers to learning, using, sharing and…

Also worth reading on this topic:

The Open Source Initiative Announces the Release of the Industry’s First Open Source AI Definition
Open and public co-design process culminates in a stable version of Open Source AI Definition, ensures freedoms to use, study, share and modify AI systems.
Open-source AI must reveal its training data, per new OSI definition
Meta’s Llama contends with the new Open Source Initiative definition of truly “open” AI.

As the WordPress saga continues, CIOs need to figure out what it might mean for all open source

TL;DR
WordPress v WP Engine seems to be spooking the CIO horses!

I read this concerning article suggesting Chief Information Officers (CIOs) reconsider relying on open source software or try to avoid it all together in light of the very public fight between WordPress and WP Engine. It argues that CIOs need to carefully monitor the stability of open source ecosystems, assess their use of open source platforms and consider the risks involved. A number of commentators are quoted in the article cautioning all kinds of things about using open source. One even goes so far as to blame overreliance on an ‘ethos’ and a lack of well structured contracts in open source projects as a basis for risks. Even though the article also includes comments about why proprietary software also carries risk, it concerns me that this and similar articles may sway enterprises away from using open source.

Mullenweg ‘would love to go back to negotiating table’ with WP Engine
“Now they are distressed and they are losing customers,” Automattic CEO Matt Mullenweg told CIO.com after his company made new filings to defend itself against a lawsuit brought by WP Engine. “I think our negotiating position is getting stronger every day.”

A bit on the side

Other tasty tidbits this week:


Colophon

AI use

No part of the text of this blog post was generated using AI. The original text was not modified or improved using AI.

The banner graphic (i.e. the first image at the top of the blog post) was adapted from vector graphics generated in Adobe Illustrator using Firefly 4 with 'Subject' content type selected and the lowest level of detail set. { Text to Vector Graphic prompt: Seamless pattern, very large simple shapes, 80s retro style, fluid organic elements, morphing, overlapping, blurred gradients, visible layers }

Provenance

This blog post was first published on Sunday 3 November 2024. It has not been updated. This is version 1.0.