Automatic XML extraction from Word and formatting of e-book formats: Insight into the Open Source Academic Publishing Suite (OS-APS) project

Markus Putnings Friedrich-Alexander-Universität Erlangen-Nürnberg

The OA Diamond Journals Study has compiled a representative overview of Diamond Open Access journal operators in its “Part 1: Findings”. For example, 53% of journals are operated by less than 1 FTE, and 60% of journals rely heavily on volunteers. Due to these resource constraints, most Diamond Open Access journals publish less than 25 articles per year, and 75% of journals are not able to provide their content in XML and HTML, primarily providing only PDFs (Bosman et al., 2021, p. 7-8).

In keeping up with larger commercial publishers and their professionalized content offerings, a high degree of automation and streamlining of processes is necessary. The Open Source Academic Publishing Suite (OS-APS, https://os-aps.de/en/) project funded by the German Federal Ministry of Education and Research aims to achieve this. Smaller and medium-sized publishers usually deliver Word manuscripts. OS-APS automatically extracts the underlying XML from these manuscripts, offers an optimization option, and, most importantly, export options in various formats (PDF, HTML, XML, EPUB). The professional corporate design, e.g., of the PDFs, is also managed automatically by reusing templates or by creating one’s own using the OS-APS Template Development Kit. 

In addition, OS-APS will also connect to scholarly-led and community-driven publishing platforms such as Open Journal Systems (OJS), Open Monograph Press (OMP), and DSpace: the software will be able to be integrated into a wide range of publication processes, whether at small, low-resource commercial Open Access Publishers, or institutional or Diamond Open Access Publishers. To understand the requirements of these heterogeneous publishers, a practical advisory board and scientific advisory board with representatives from the different publication sectors accompany the OS-APS project. In addition, demo days with corresponding feedback opportunities are held regularly. We will also present a demo at PUBMET2022.

The Open Source software could be a significant improvement for smaller, independent Open Access Publishers. It offers the possibility to increase the effectiveness and efficiency in their processes, to create new e-book formats (such as HTML, EPUB) and thus secures their existence and bibliodiversity in the long term. The project is thus in line with the recommendations of the OA Diamond Study and its call for cOAlition S Funders and Infrastructures: “Support the development of generic tools to generate structured content in XML and HTML” (Becerril et al., 2021, p. 20).

Markus Putnings
Friedrich-Alexander-Universität Erlangen-Nürnberg
University Library of Erlangen-Nürnberg
Erlangen, Bavaria, Germany
ORCID ID: 0000-0002-6014-9048

Carsten Borchert
SciFlow
Berlin, Germany
ORCID ID: 0000-0002-3981-4517

Frederik Eichler
SciFlow
Berlin, Germany
ORCID ID: 0000-0002-6579-7271

Skip to content