What does the internet know about the development of software?

diesen Beitrag auf Deutsch lesen

 

A post by Helge Holzmann, Wolfram Sperber und Mila Runnwerth

Software is dynamic, and is subject to continuous development. In fact, the boundaries between different states and versions of a software program in the course of its development are often blurred. It is difficult to grasp the state of a software program; it can only, if at all, be described by its version number. Not only do software programs develop, the way in which they are presented on the internet – whether on the official website or in discussions and descriptions on external websites – is also frequently subject to development. Web archives enable users to track the development of a software program. In the context of the Specialised Information Service Mathematics, a web service was developed (the Tempas TimePortal) that links the temporal development of software websites to the actual software program. In the database for relevant software in mathematics swMATH, the integration of the TimePortal now enables users, on the basis of the website, to track the status of the software program at the time when a scientific article referring to the software was published.

swMATH

The freely accessible database swMATH is an information service for software relevant to mathematics. Information on software programs used is gained semi-automatically from publications indexed in zbMATH, and prepared.

In addition to meta information such as a description, a URL, names of the developers, and similar software programs, swMATH provides a list of publications in which reference is made to the software program. swMath distinguishes between reference publications and publications of software documentation. In the former, software was used to address a research issue. In the latter, the software program itself was the object of research. As a result, the advantages of zbMATH can be transmitted to the software, such as a classification according to the Mathematics Subject Classification 2010.

 

Detailed view of SINGULAR in swMATH

The issue

As a freely available information service for mathematics software, swMATH offers an excellent approach for finding out how software is used in mathematical research, and how it is appreciated as a tool for gaining knowledge. On the basis of the software listed in swMATH, Helge Holzmann et al. presented an analysis of how software websites are usually structured and the extent to which they provide information about the actual software program, or can even be regarded as a representation of it, at the TPDL conference in September 2016 [1]. In order to be able to track the temporal development of a software program in this way, the researchers additionally investigated the extent to which older versions of the relevant websites had been archived and could still be retrieved using Wayback Machine. It was concluded from this analysis that software websites frequently provide sufficient information and additional material (documentation etc.) to be able to gaining an understanding of the software. Linking a software version to the relevant chronological representation on the internet may therefore be useful for tracking the state of a software program as used, for example, in scientific publications.

The Tempas TimePortal in swMATH

The Tempas TimePortal was adapted especially for this specific application in order to enable the aforementioned link between a web archive and the software in a scientific publication. Tempas is a temporal search engine for archived websites that was developed at L3S within the ALEXANDRIA project. The aim of the TimePortal, based on Wayback Machine, is to present the results of such a search by being able to compare certain versions of an archived website, rather than displaying the entire archive. This function was expanded in the context of the Specialised Information Service Mathematics by enabling software websites to be retrieved via a publication from swMATH in which the software is used, rather than via the URL. Special features of a software program, such as its documentation linked on the website, are additionally highlighted for the user. In this way, the archived website can be perceived as a temporal representation of the software.

This connection to the Tempas TimePortal, enabling users to track the website of a software program at the time it was cited or referenced in a publication, has been installed in swMATH since mid-April. Here we present the feature:

 

 

The summary block of metadata now contains the comment Versions with the TimePortal logo:

If a user clicks on the logo, TimePortal icons also appear in the lists of publications; in a bright colour if there is an archive entry at the time of the publication, or in a pale shade of grey if there is not (because, for example, there were no websites before the 1990s or the website was not archived in that particular year). So far, only the year dates of publications were taken into consideration. In the future, the links should refer more precisely to the software version actually used in a publication.

If the user then clicks on the icon, the Tempas TimePortal is displayed with the website of the software from the year of the selected publication.

SINGULAR Homepage 2001
SINGULAR-Homepage today

The source is displayed at the top of the TimePortal: the selected publication from swMATH. Below this is a menu created automatically according to the features identified in the analysis of the website and its subpages. Typical features of a software website are pages for documentation, publications and downloads.

Outlook

The aim of this approach is to enable users to track and reproduce research results that are based on a software program, in the case that the actual software is not (or is no longer) accessible. In a bid to get even closer to this goal, we want to use dynamic and semantic web archives and to create them ourselves, if need be. They will then be able to be integrated in the research process as independent publications or as reliably citable sources.

Das Team

This post was created by Helge Holzmann, Wolfram Sperber and Mila Runnwerth.

Helge Holzmann
is a PhD student at L3S Research Centre conducting basic- and application-oriented research in the area of web science. He conducts research in the area of web archiving, and the permanent availability of all kinds of internet publications.

Wolfram Sperber
is a research assistant at zbMATH, the abstracting and reviewing service for mathematics, which is rich in tradition. He is responsible for further developing the literature database and the swMATH software database.

Mila Runnwerth
is a subject specialist for Computer Science and Mathematics at TIB. In the FID, she coordinates the Maths Beyond Text section, which is concerned with non-textual materials in the mathematical research process.

Thanks to Gerrit Grenzebach and Anna Kasprzik for their excellent organisation of the FID project and for their critical feedback.

Specialised Information Service Mathematics

The aim of the Fachinformationsdienst (FID) Mathematik (Specialised Information Service Mathematics), a project funded by the German Research Foundation (DFG), is to develop an infrastructure for the supraregional provision of scientific resources, information services and other services that go beyond, and substantially supplement, previous offerings for mathematics research. The FID is a joint project of Göttingen State and University Library and the Technische Informationsbibliothek (TIB) – German National Library of Science and Technology in Hanover. In addition to cooperating with the Mathematisches Forschungsinstitut Oberwolfach (Oberwolfach Research Institute for Mathematics, MFO), the FID works closely with research institutions from the fields of mathematics and information infrastructure, as this project shows.

More information

Footnote:
[1] Holzmann H., Sperber W., Runnwerth M. (2016) Archiving Software Surrogates on the Web for Future Reference. In: Fuhr N., Kovács L., Risse T., Nejdl W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science, vol 9819. Springer, Cham