Passion for Provenance (data): On collection data, museum systems and open collaboration via wikis

PLW2024 conference header image
PLW2024 conference branding. Source: Vasariano, Wikimedia Commons. CC0

#PLW2024

The international workshop Provenance Loves Wiki: »Provenance Research + the Wikiverse« gathered art historians, museum and collection management specialists, computer scientists, data enthusiasts, and members of the Wikidata/Wikibase community, over two days at the offices of Wikimedia Germany in January in Berlin. Part barcamp, part meetup of the kuwiki (Working Group Art History + Wikipedia) community, the event kicked off with an evening roundtable of internationally recognized experts in the field of provenance studies. The roundtable set the stage for the later discussions and the core questions of the workshop – namely, what roles do Wikipedia/Wikidata/Wikibase play in provenance research, and what do we actually mean (i.e. do we mean the same thing) when we talk about the representation of provenances in Wikidata.

TIB’s Open Science Lab was represented at PLW by OSL’s Head Lambert Heller and Postdoc Researcher Lozana Rossenova. We took part not only because of OSL’s long history of engagement with cultural and historical data in wiki projects, as well as previous contributions to kuwiki events, but also because we hoped to better understand how the services we offer via NFDI4Culture, notably Wikibase4Research, could be further developed to better meet the needs of art historical communities engaged in provenance research. To that end, Lambert Heller proposed to host a barcamp session titled: “Wikiverse/Wikibase as a Collection Management Solution for G(L)AMs”. 

Wikibase as a viable museum collection management system?

The session took on a decided “business administration perspective”, in short – asking the question what do GLAMs need in order to be able to adopt Wikibase as a fully fledged alternative to their existing collection management systems? The discussions that ensued focused on the following sub-questions: 

  1. What are the missing features from the GLAM (or rather, GAM…) collection management POV?
  2. What other aspects of successful adoption are missing (not strictly technical)?
  3. What needs to be done (amount of work, who’s well prepared to contribute to it)?
PLW conference session led by Lambert Heller
Attendees of the PLW session at Wikimedia Germany’s offices on Saturday, 13 January 2024. Source: Vasariano, Wikimedia Commons. CC0

Regarding missing features, the following were of most interest:

  • Easy import/re-import via spreadsheets: The preference was that such capability is built into Wikibase directly as opposed to working with external tools, ideally allowing for a “round trip” of data sets from spreadsheet to Wikibase and vice versa.
  • Fixing holes in the data modelling processes, esp. in between common standards such as LIDO and Wikidata’s own model: An additional challenge was acknowledged here in that this is an ongoing activity, rather than a singular feature.
    The discussion focused on the dependency on tools like TMS (The Museum System) and other proprietary software, and the notion that Wikibase could be seen on the one hand simply as an open source alternative, and on the other hand as a way to also break collection silos and encourage richer data interconnections. In both cases, Wikibase is only one among several open source options, but it was noted that many of the other options do not have a lot of visibility. Wikidata, and by proxy Wikibase, have the benefit of the highly visible Wikimedia ecosystem and the perception of a low-threshold entry point because of the size of the contributing communities. The low barrier can be a great benefit in cases where collection description is done from scratch (e.g. small collections or museums starting digitisation). On the other hand legacy collection data from large institutions faces many of these “holes” that prevent straighforward data exchange between existing standards and the Wikidata model, making a full “switch” to Wikibase as a CMS rather unlikely. Alternative tools were also discussed and we referenced the tool comparison dataset developed by the LOD WG in NFDI4Culture.
  • Authorization levels for different levels of users, i.e. granular control over what is visible to whom, who edits, who decides, etc.: This remains an important aspect that for many museums can be a barrier to adopting wiki systems. The need to discuss some information within a close community before going public, or simply the need to keep internal administrative information such as loans, necessitates a way of separating what data is entered in Wikibase and what is published – either by publishing static data on another system, or by using a second Wikibase, that can remain ‘read-only’.
  • Security against cyber attacks: discussed in the context of high profile institutions such as the British Library or the Natural History Museum in Berlin getting targetted in recent months.

In terms of aditional aspects needed for wider adoption, the following key points emerged:

  • A “market study” understanding what are the current and most relevant expectations of museums (and other GLAMs) today, that are likely not met by established systems – TMS and similar – that were developed decades ago.
  • Showcases of success stories of museums and collections adopting Wikibase, including 1) detailed information of what did the institutions actually contribute during the adoption process, e.g. open/collaborative software development, data modeling, data management workflows; as well as 2) best practices for setting up namespaces, prefixes, IDs, model queries, and all the detailed aspects required from a good system configuration.
  • Avoiding fatigue of ‘form filling’ by supporting workflows that move away from manually structuring data that users prefer to construct in text form, and instead introduce AI-driven components in the workflows.
  • Enforcing standards for LOD ontologies although as noted, enforcement was something that might be more effectively achieved through machine-assisted workflows. 

When it came to the question of what needs to be done, participants admitted that institutions often opt in for proprietary solutions, because outsourcing work on maintenance, updates, security, etc. is perceived as more manageable than handling it all in-house. At the same time, proprietary software is often ill-equipped to meet the custom needs of individual institutions, and costs can be prohibitive for smaller organisations. Initiatives such as the Wikibase Stakeholder Group, of which TIB is a member, help communicate what institutions can learn from open source communities – in terms of open practices, provisioning for maintenance and long-term sustainability. At the same time, open source communities have much to learn from institutions when it comes to data privacy, ethics and sensitive handling of cultural heritage (in most cases closely connected to the theme of provenance, handling colonial legacies and historical looting). Balancing FAIR data aspirations with the CARE principles can pose challenges, but is still worth pursuing when developing workflows for the Wikidata/ Wikibase ecosystem that handle collection data with complex provenances. The whole event – expertly organised by the kuwiki group – was a great example how art historians and museum professionals can give much back to the open source and open data communities. 

An unedited set of notes from the barcamp session can be freely accessed here

Outlook

Many more relevant discussions were carried out throughout the two days of the event and more information on the programme and documentation can be found on the kukiwi Wikipedia pages. Take-aways for the OSL team were the increased expectation for AI-assisted workflow optimisation both when it comes to data extraction, data entry and harmonisation, as well as data querying. Furthermore, current examples of provenance data in Wikidata are only scratching the surface of the complexities of provenance research. At the same time, other collection management tools are equally ill-equipped to rise to the challenge. On the flip side, the need for both research and design of appropriate data modeling patterns, data entry workflows and data exchange across collections presents new opportunities for open source communities and (art) historical researchers to work together. And judging from the closing day of the workshop, plans for follow up events to #PLW are already in the making.

Acknowledgements: The authors want to thank the active participants in the Wikibase session, in particular Jane Darnell, Rudolf H. Boettcher, Lynn Rother, Laurel Zuckerman, and everyone else who contributed to the discussions and the etherpad. Special thanks also to Waltraud von Pippich for organising the event and inviting us to take part.

Dr. Lozana Rossenova ist Mitarbeiterin im Open Science Lab der TIB und arbeitet im Projekt NFDI4Culture in den Bereichen Datenanreicherung und Entwicklung von Wissensgraphen. // Dr Lozana Rossenova is currently based at the Open Science Lab at TIB, and works on the NFDI4Culture project, in the task areas for data enrichment and knowledge graph development.

Bibliothekar. 🤓
Leitung Open Science Lab der TIB.
Folgt mir unter https://openbiblio.social/@Lambo //
Librarian. 🤓
Head of Open Science Lab at TIB.
Follow me at https://openbiblio.social/@Lambo