Draft Discussion Document Opening the record of science: making scholarly publishing work for science in the digital era

Executive Summary 2 1. Why science matters 5 2. Why scientific publication matters 5 3. Principles for scientific publishing 6 4. The current status of scientific publishing 9 4.1 The commercial publishing system and its recent evolution 9 4.2 The business model 11 4.3 Open Access Publishing 12 4.4 Learned society publishing 16 4.5 Books and monographs 17 5. Publishing the data of science 18 5.1 Binary publication: concept and data 18 5.2 Data and peer review 21 6. Science publishing in a changing world 22 6.1 The digital impact on the research cycle 23 6.2 Linked digital infrastructures for the research cycle 24 6.3 Reinventing the practice of science publishing 26 7. An emerging business model 27 7.1 Monetizing the research cycle 27 7.2 Implications for the governance of digital infrastructures 28 8. Science in a changing world 29 8.1 Challenges and opportunities 29 8.2 Open Science 30 8.3 A critique from the Global South 30 8.4 Other contrary voices 31 9. Motivations, incentives and metrics 32 10. Conclusions: publishing in the service of science 33 10.1 Possible recommendations for change 33 10.2 Enabling factors 35 10.3 A further basis for action: an economic analysis 36 11. References 37

Books and monographs 17 5. Publishing the data of science 18 5.1 Binary publication: concept and data 18 5.2 Data and peer review 21 6.
Science publishing in a changing world 22 6. 1 The digital impact on the research cycle 23 6.2 Linked digital infrastructures for the research cycle 24 6.3 Reinventing the practice of science publishing 26 7. An emerging business model 27 7. 1 Monetizing the research cycle 27 7. 2 Implications for the governance of digital infrastructures 28 8.
Science in a changing world 29 8.1 Challenges and opportunities 29 8.2 Open Science 30 8.3 A critique from the Global South 30 8.4 Other contrary voices 31 9.
Conclusions: publishing in the service of science 33 10. 1  I. There should be universal open access to the record of science, both for producers and consumers, with no unnecessary barriers to participation, in particular those based on ability to pay, institutional privilege, language or geography. II.
Scientific publications should carry open licenses that support re-use and text and data mining. III.
Rigorous and ongoing peer review must occur at some stage of the publication process. IV.
The data 2 and evidence on which a published truth claim is based should be concurrently accessible to scrutiny and supported by necessary metadata. V.
The record of science should be maintained in such a way as to ensure open access by future generations. VI.
Publication traditions of different disciplines should be respected, whilst at the same time encouraged to find means of integrating their different contributions in the shared enterprise of knowledge. Publications should be explicit about the quality standards to which they adhere. VII.
Publication systems should continually adapt to new opportunities for beneficial change rather than inflexible systems being embedded that inhibit change.
Much modern science involves a research cycle. Its nature varies across the disciplines of science, but tends to proceed through stages of project conception, formulation, funding, research, submission, peer review, publication, and reformulation for further research. In the era of print 1 Throughout this document, the word science is used to refer to the systematic organization of knowledge that can be rationally explained and reliably applied. It is inclusive of the natural (including physical, mathematical and life) sciences and social (including behavioural and economic) science domains that represent the International Science Council's primary focus, as well as the humanities, medical, health, computer and engineering sciences (from the ISC High Level Strategy [2]). It is recognized that there is no single word or phrase that adequately describes this knowledge community. It is hoped that this shorthand will be accepted in the sense intended. 2 We use "data" to refer to digital or text information, images, objects, audio or film resources, all of which can be digitally represented.
technologies, the staging points in the cycle tended to be relatively discrete with publication as a self-contained end point. Digital technologies have changed that, with all elements in the cycle being connected, or connectable, and publishable as seamless output with digital interoperability. This has major implications for economic models of publication and for the management of scholarly communication.
A central issue is the extent to which modern publication norms and practices are well adapted to the needs of science, as represented by principles I-VII above, and how they might or should develop in response to digital opportunities. From the 1960s/70s, journals that had hitherto been largely published by not-for-profit learned societies and universities were progressively and selectively acquired by commercial publishers. Increasing global investments in science and the scientific workforce and diversification of the scientific effort has led to the proliferation of new disciplines and sub-disciplines, creating a demand for more and more diverse publishing outlets. A highly asymmetric and lucrative business model of publishing has developed to exploit that demand, which yields profits in excess of 30%. Researchers are both providers and consumers of the published knowledge and publishers the intermediaries, with the ownership of so-called "high impact" journals as a unique selling point. The consequences have been the privatisation of a large part of the publicly funded record of science, high paywalls that restrict public and researcher access to that record, inhibition of text and data mining processes that are powerful means of discovering knowledge in the record, and a costly publishing system that hinders the development of science in low-income countries.
Reactions to these trends from the scientific community over the last two decades have led to the rise of "open access publishing", which is now responsible for about 47% of scientific publishing. It is based on the principle of free access to the record of science and licenses that do not inhibit copying or any form of re-use.
Over the same period, the digital revolution has produced an enormous expansion in the data available to science. Making data that supports a published truth claim open to scrutiny remains as important as ever, but more difficult and onerous, as massive data volumes cannot be contained within the bounds of a typical publication. A form of "binary publication" should become standard in such cases, in which data is made available in an accessible, trusted repository concurrently with the publication to which it refers. However, increasingly large and complex datasets, possibly coupled with a desire to withhold data for whatever reason, have created a situation in which data and metadata are not routinely available alongside a published truth claim. This is a serious omission.
Publishing the data is as important, and sometimes more important, than publishing the written text, and one to which no paywall should be applied.
There are however cases in which open data publication is not appropriate, for example where such access would prejudice privacy, safety, security or has potential for harmful dual use. In such cases it is important that: a) the data are retained somewhere, b) there are well managed pathways to access for referees and bona fide researchers; c) they conform to FAIR criteria.
Digital technologies have also opened the way to linking the elements and infrastructures of the research cycle that enable not only the creation of important research management, assessment and evaluation statistics, but also by embedding them in the Web as structured Web objects. This would permit scientific papers to be linked to data, experiments, software, and workflows. It could make the value that is currently hidden in the research cycle, and currently inaccessible, available for further analysis. It would enable AI systems that could organise scientific processes and re-run experiments, re-analyse results and explore hypotheses in systematic and unbiased ways. It could realise the promise of open science and reproducible research, and open new pathways to serendipitous discovery. It would permit future developments to be built on open source rather than proprietary systems, thereby avoiding monopolies and stimulating innovation.
Such possibilities may be inhibited however by increasing commercial control. The information acquired by major publishing companies in the course of their activities gives them prior and exclusive access to these data, with the consequence that some now seek to monetise the whole of the research cycle by marketing data analytics and research management services to universities and research funders. A fundamental question for these bodies and for the wider science community is whether they are content that a significant part of the governance of the public research process should fall into private hands, whose principal loyalty is to their shareholders and not to the advancement of knowledge. It is also important to understand how public/private partnerships can avoid inhibiting the flow of knowledge from joint activities into the public sphere.
The priorities for scientific publishing must be influenced by changes in the environment in which science is done. Three major such influences are identified: societal demands on science to address major global challenges, such as the Sustainable Development Goals (SDGs); the potential to use the tools of the digital revolution to unlock the complexity that characterizes many of these challenges; and the democratisation of information and knowledge created by the advent of the World Wide Web. These influences, together with an increasingly collaborative discourse within science and scholarship and enabling technological tools, converge in the growing open science movement, in which open access publication and open data (subject to limitations on grounds of safety, security and privacy) are core principles and which increasingly stresses greater openness to and engagement with society. It is important to recognise the imperative for an authentic global knowledge commons that does not merely reinforce the hegemony of historic centres of scientific effort. If the publishing system is to adapt to the needs of science and scholarship in responding to this agenda and to adhere to principles I-VII, it will need to address the motivations, incentives and metrics that influence the behaviour of its stakeholders.
Our overall conclusion is that the current system of publishing is not optimal and needs radical revision. Priorities for action are listed in relation to each of the principles in I-VII. There are a number of important issues that will determine whether and how these principles are realized: acceptance of relevant responsibilities by researchers, universities, scientific unions, associations and academies, libraries, funders and publishers; amendment of the incentives that condition behaviours of these stakeholders; and discussion and possible action on modes of governance surrounding modern publication regimes.
Opening the Record of Science: making scholarly publishing work for science in the digital era

Why science matters
Science is an indispensable part of the human endeavour. It is not a dispensable luxury. It helps us make sense of and navigate the increasingly complex world we live in. We need science for the advancement of our societies, to respond to their needs, to inform our education, improve our policies, spur innovation, address global sustainability, safeguard health and wellbeing, and to stimulate curiosity, imagination and wonder.
The value of science as a distinct form of knowledge is based on open scrutiny of concepts based on evidence and tested against reality, logic and the scrutiny of peers. It is embedded in the Universal Declaration of Human Rights as: "everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits" [1]. The vision of the International Science Council is of science as a global public good [2].

Why scientific publication 3 matters
Science most effectively serves the public good if the knowledge and understanding that it creates are communicated promptly and comprehensibly into the public sphere. The processes of formal scientific publication are the prime conduits of such communication.
Central to the issue of communication is the record of science: the published record of scientific knowledge and understanding from the earliest days of scientific inquiry to the present. It is contained in books, monographs, scientific journals and the "grey" literatures published in governmental and institutional reports, whether in print or digital formats, and in digital objects. It is continually refreshed, renewed and re-evaluated across the disciplines of science by new experiments, new observations and new theoretical insights. It is a complex amalgam of novel contributions that withstand critical tests of experiment, observation or logic and that are of general or local significance; contributions that fail such tests and lose currency; and others that are disregarded and rarely, if ever, remembered or quoted, though sometimes re-discovered as significant insights. Recognition and quotation by peers and publication in standard texts determine how this amalgam creates the evolving framework of understanding in a discipline or about a phenomenon.
Scientific publications should, in principle, perform a number of vital tasks 4 : a) make the conclusions, and the evidence (the data) on which a scientific truth claim is based, accessible to scrutiny by peer review and post-publication analysis so that method and logic can be validated or invalidated, its conclusion scrutinized and any observations or experiments replicated. This process has proved a powerful means of identifying error, and is the basis of so-called "self-correction" of science, the non-negotiable principle that is the bedrock of the public value of science. b) preserve the record of science so that it is accessible to succeeding generations for reassessment and re-use in further research. c) enable the global community of scientists to perennially keep abreast of the development of knowledge, and thereby build on the work of earlier generations in contributing to a selfrenewing, evolving edifice. d) form an essential part of the process whereby scientific knowledge is disseminated into wider society with the potential for innovative use in a myriad of educational, social, economic and cultural settings.
These processes are fundamental to science and to its essential function as a global public good 5 . They provide the context for this report, in which we propose fundamental principles for scientific publication, assess its current status in relation to the roles it should play, consider how the modes of publication are evolving, and scrutinise the changing contemporary environment and its implications for the processes of publication and communication. We then go on to review the extent to which needs and opportunities for science are met by the current operation of the publishing enterprise, and thence to identify any need for change and how it might be implemented.

Principles for scientific publishing
It is important to articulate the basic principles that need to be observed if the purposes set out above are to be achieved, and as a basis for judgements about the extent to which the current system of scientific publishing is fit for purpose. They should be principles that will endure independently of changes in technologies or modes of working: I. The record of published science is a vital source of ideas, of observations, evidence and data that provide feedstock and inspiration for further enquiry and is a profound part of the edifice of human knowledge. That record, including the back catalogues of publishers, should be regarded as a global public good, freely and perennially available to citizens, other societal stakeholders and to researchers. It is created as an intrinsic part of the research process and not a separate, add-on enterprise. Its costs should be borne as part of that process, by those that fund research, or, where research is unfunded, through other means 4 Essential functions of scientific and scholarly communication have been described as: registration; certification; awareness; archiving; and reward [3]. 5 The vision of the International Science Council is to advance science as a global public good. Scientific knowledge, data and expertise must be universally accessible and its benefits universally shared. The practice of science and the opportunities for scientific education and capacity development must be inclusive and equitable [2]. To economists, public goods have two essential properties, non-rivalrous consumption--the consumption of one individual does not detract from that of another--and non-excludability--it is difficult if not impossible to exclude an individual from enjoying the good [4].
that do not create access obstacles for authors, readers, or institutions. The process should be driven by the needs of science and not by the pursuit of private gain and should not lead to the privatisation of knowledge. As new technologies enhance the capacity to interrogate the whole record of science to discover patterns, relationships and solutions that may be hidden within it, access to the resources that could facilitate such discoveries should be open to all, unrestricted by ability to pay [115]. Many of the challenges that confront science are both local and global. The local cumulatively impacts upon the global. The global permeates the local through complex coupling between social and bio-geophysical processes that have re-configured the planetary ecology to produce one which is novel to the Earth. Global solutions require global involvement from the community of science and global access to its publications, both by readers and authors, irrespective of wealth, language, institutional or geographic location. Principle: There should be universal open access to the record of science, both for producers and consumers, with no unnecessary barriers to participation, in particular those based on ability to pay, institutional privilege, language or geography.

II.
When submitting to journals that do not adhere to the principles of the open access movement [25], authors are required to surrender copyright to publishers. Authors thereby lose control of their work, re-use is inhibited and the use of powerful text and data mining algorithms to uncover knowledge hidden in the record of science is prevented. Principle:

Scientific publications should carry open licenses [Box 3
] that support re-use and text and data mining.

III.
Formal peer review processes are essential parts of the curation process as a means of subjecting an author's scholarly work, research or ideas, to the scrutiny of independent experts in the same field. The purpose of peer review is to uphold high standards in the relevant field of study, and to ensure that unwarranted claims, fallacious interpretations or idiosyncratic views are not published. The peer review process has traditionally preceded publication, but novel post-publication processes of peer review are developing. Peer review is generally effective in identifying obvious errors of fact and logic, though less likely to identify errors that are more deeply embedded in complex analyses or in voluminous data, as has been revealed by recent tests of reproducibility across several disciplines [5]. These formal processes, though important, should be distinguished from the deeper reviews that take place after publication, when published work becomes available for general scrutiny by the scientific community. There is a dilemma at times of crisis when rapid responses are sought from scientists. One way of responding is through rapid online publication of preprints that have not yet been subject to peer review. There is a balance to be struck between scientific response to crisis and the possibility that un-reviewed studies might make unfounded, false and even dangerous claims. Such urgent circumstances require rapid but rigorous reviews. Principle: Rigorous and ongoing peer review must occur at some stage of the publication process, with the possibility of retraction where serious flaws are uncovered.

IV.
Data and evidence (supported by the metadata that make them useable) are the essential feedstock for most science and for maintenance of the principle of self-correction. The data of science should be accessible as a public good, not privatised or locked behind pay-walls. This should apply as far as possible whether scientists are publicly or privately funded, or unfunded. Publicly funded scientists in particular should regard themselves as custodians of data on behalf of the citizens who ultimately fund their collection, rather than their owners.
The data on which a published scientific paper is based must be available for scrutiny so that others can test the logic of the data/concept relationship, and replicate the experiment or observation, with ethical constraints relating to privacy, safety and security. To do otherwise should be regarded as scientific malpractice. Such data should be compiled and curated so as to observe FAIR data principles 6 . Principle: The data and evidence on which a published truth claim is based should be concurrently accessible to scrutiny and supported by necessary metadata.
V. The record of science is an essential part of the inheritance of humanity, and should be maintained in such a way as to ensure access by future generations. Prior to the "digital era", the record was preserved and available in indexed books and journals curated in libraries. Scientific publication now occurs in a plethora of novel forms. As most libraries no longer hold a physical stock, but tend to manage access to online resources, there is a potential danger that science will be lost. There is a case for an international virtual library dedicated to the preservation of the scientific record, and without a sunset clause. Principle: The record of science should be maintained in such a way as to ensure open access by future generations.

VI.
The disciplines of science: natural, social, engineering and medical, and the humanities, tend to have their own habits and modes of publication. They are individually valuable means of expressing their contributions to learning and knowledge. It is important however that the processes of publication should avoid creating siloes between disciplines, but should facilitate means whereby their contributions to the shared enterprise of knowledge can be integrated when addressing problems that demand interdisciplinary collaboration. It would be impractical to insist on common standards for publication across the disciplines of science, but it is important that journals are explicit about their standards and that they adhere to them 7 . Principle: Publication traditions of different disciplines should be respected, while at the same time encouraged to find means of relating their contributions in the shared enterprise of knowledge. Publications should be explicit about the quality standards to which they adhere.

VII.
The digital revolution has created new opportunities to enhance the discovery and dissemination of new knowledge and is enriching debate about its uses. The science publication system must be one that is able continually to adapt to and exploit new opportunities that are able to satisfy the principles set out here, rather than being inflexible and unresponsive to beneficial and creative opportunities (see 6,7). Principle: Publication systems should continually adapt to new opportunities for beneficial change rather than embedding inflexible systems that inhibit change.

4.
The current status of scientific publishing

The commercial publication system and its recent evolution
Even though paper publishing has largely been replaced by digital means, the processes of publication are still implicitly based on the assumptions of a world of print dissemination. From the 1960s/70s, journals that had hitherto been largely published by not-for-profit learned societies were progressively and selectively acquired by commercial publishers. As learned societies were deemed to act in the interests of scientists in their various fields, it seemed natural that scientists should entrust copyright to the journals and freely offer their services to support editorial boards and refereeing processes. As commercial publishers began to penetrate this market at scale, they simply assumed the relationships of trust that had existed between scientists and learned societies were also open to them, even though their principal responsibility is to their shareholders, not to scientists.
At the turn of the millennium, scientific publishing was largely funded by subscriptions to paperbased journals by university libraries and academics, who also had direct access to long series of back copies on library shelves. The increasing number and profitability of these journals has been driven by increasing global investments in science and the scientific workforce, diversification of the scientific effort leading to more and more sub-discipline-specific journals, and the shift of the centre of gravity of much university effort towards research. The consequent efforts by universities to maximise research output and the pressure on academics to "publish or perish" has led to the use of publication metrics by the former in measuring and managing productivity, and by the latter in demonstrating their prowess as a basis for career advancement. The so-called Journal Impact Factor (JIF) is particularly problematic. It is calculated as the ratio of citations to a journal in a given year to the citable items in the previous two years, and tends to be used as a proxy for the importance of the journal and thereby the importance of articles it contains. These publication metrics have been used to assess:  the value of an academic's scientific contribution and as a basis for career advancement;  at an aggregate level, the reputation of an academic institution (and national university systems) as reflected in university ranking tables; and  as a measure of the effectiveness of funders' investments in the individual or the institutions that they support.
These metrics have become powerful drivers of demand for more and more publication outlets from academic researchers. The consequence has been a bonanza for publishers, resulting in more than 3 million scientific papers per year and over 42,000 scientific journals [13]. The proportions of journals produced by different sub-sectors of the academic publishing system were, in 2015 [14]: Commercial publishers (including on behalf of learned societies): 64% Learned society publishers: 30% University presses: 4% Others: 2% A consequence of the increased demand for publishing outlets and the attractiveness and rewards to authors of publishing in high-impact journals has been two-fold. Firstly, it has allowed commercial publishers that own high-impact journals to charge high prices, and commonly to generate profits in excess of 20-30% [15] and to annually increase prices at rates of more than twice the rate of inflation. At the other extreme, the level of demand and the profitability of responding to it has encouraged new entrants into the commercial publishing market to create so-called "predatory journals" [116]. Such journals offer academics (who need publications for the reasons outlined above) very rapid online publication, at little cost to the publishers, as there is little -if any -peer review, and authors must format papers themselves (Box 1).
The practices of academic scientists and universities in their relationships with commercial publishers' copyrights are at least questionable. When copyright is given to a commercial publisher, the researcher has relinquished control of their intellectual output, but also, of their own volition, been an agent of privatisation of a public good, at no cost to the recipient. This process has, over recent decades, privatised a great deal of the record of science and placed it behind a paywall. Any researcher is currently not only free to do this, but is encouraged to submit their results to highimpact journals, which often have restrictive and costly paywalls.
These practices are costly to the public purse, and they hide vital data beyond paywalls to the extent that the rich data sources that have the potential to solve many complex problems are beyond effective reach of modern methods of information and data harvesting [13]. This problem was implicitly recognized by major corporate publishers during the COVID-19 pandemic, when, under pressure from the science community [14], they released temporary access to their relevant data and publication holdings so that researchers working to understand and respond to the new virus could quickly access the relevant part of the scientific record. Such a stance should be permanent and general.
Box 1: Publishing standards and predatory journals* High standards of independent peer review and editing are important norms for rigorous scientific work. There are many efforts to define and index those regional and international journals that adhere to such high standards, so that both the scientific community and the wider public are able are aware of them. They include the Directory of Open Access Journals [6], African Journals Online [7], Latindex [8], PubMed for biomedical literature [15], and Google Scholar [16] etc. Some indices of journal quality are based on impact factors, which are not necessarily measures of the quality or rigour. This issue is also of importance given the rise of so-called "predatory publishers" who operate to exploit the lucrative market created as academics, in particular, are incentivised to find publishing outlets. A diverse group of researchers have recently agreed a definition of predatory publishing [17] as: "Predatory journals and publishers are entities that prioritize self-interest at the expense of scholarship and are characterised by false and misleading information, deviation from best editorial and publications practices, a lack of transparency, and/or the use of aggressive and indiscriminate solicitation practices". There is evidence that processes of digitalisation and open access publishing are being deliberately exploited by predatory publishers to enhance their market penetration and extend their baleful influence [122].

The business model
The digital revolution has driven down costs in almost all areas where it has had impact, with the exception of scientific publishing where prices have risen, year on year, at a rate greater than that of inflation. The business model for much scientific publishing is one of unique asymmetry. It undermines normal supply/demand relationships that work towards efficient allocation of resources, by confusing who supplies and who demands.
The trump card in the hands of publishers is the journal 'impact factor' or other indices of journal status, which persuade researchers and their institutions -and sometimes national research and evaluation systems -that it is worthwhile paying a premium for publication in a journal with a high impact factor or status. The lure of the impact factor permits publishers with high-JIF journals to charge what they believe the market will bear, a form of vanity publishing. Without the JIF there would be no reason to pay a premium. The impact factor has been roundly criticized by the 2013 San Francisco Declaration on Research Assessment (DORA) [18] with the trenchant general injunction:

Do not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist's contributions, or in hiring, promotion, or funding decisions.
DORA asserts the need to improve research by using more robust means of assessment that focus on primary values of insight, impact, reliability and re-usability, rather than on questionable proxies. DORA is addressed to funders, institutions, publishers, creators of these metrics, and to researchers, and by early 2020 had been endorsed by 1,954 organizations and 15,943 individuals worldwide. It argues that the practice of using impact factors as an index of scientific excellence creates biases and inaccuracies when appraising scientific research, and that it should not to be used as a substitute [19]. In using metrics, it is the quality of scientific outputs that need to be recognised, not a flawed proxy of journal status. The latter serves to reinforce the brand name, and thus the market power of the journal, rather than the real value of published research [117]. Nonetheless, journal impact factors continue to have a stranglehold because of the desires of scientists and their institutions to target this proxy measure of excellence, irrespective of how good a proxy it is, and notwithstanding any pressure it may exert on scientists to "over cook" their results to ensure publication in high-impact journals [20]. Breaking this habit would do much to reduce the cost of publication and improve the assessment of research and researchers.
The scientific community provides its work freely, or at its own cost, to publishers, gives up copyright to publishers, staffs publishers' editorial committees, provides peer reviews freely, and then buys back its published work at inflated costs, and in most cases is legally disbarred from interrogating, through text and data mining, the very published record of science of which it is the source [115]. The public investment that has gone into production of research results, and the publicly funded work done by researchers in staffing editorial boards and reviewing, are all forgotten as a public resource is privatised at little cost to the publishers and their private investors. All of the largest commercial publishers are now based in Europe or North America. Their unique profitability, sustained by annual price increases far in excess of inflation [21] has continued even as the former costly print-intensive role of publishers in typesetting, formatting and distribution has disappeared. The historical payment model for scientific journals was based on subscriptions from readers, or university and other libraries that acquired them for the benefit of their readers. Libraries however pose a problem, as a single hard-copy journal or single digital copy may be read by many, so that publishers charge a special fee to cover such usage. As commercial publishers have increased their journal holdings, they have moved from access charges for single journals to large "bundled" deals (Box 2) to libraries, universities or national agencies acting on their behalf. For example, Germany recently paid €26 million to the publisher Wiley to publish 9,500 open access articles a year over three years, at €2,750 per article [22]. The transition from print to digital access has also meant that for many journals, access to back copies ceases when subscriptions cease, whereas subscribers to print copies retained access to back copies. In such cases, the digital price is for a loan, not ownership.

Open Access Publishing
There have been many protests about costly publishing models, limited public access to the record of science and the surrendering of copyright by authors to their work. This has converged with a developing view that the record of science should be a public good, freely and perennially available to citizens, other societal stakeholders and to researchers (principle I). It led has led to strong calls for open access to scientific publications, for example in the Berlin Declaration of 2003 [25]: Open access contributions must satisfy two conditions. The author(s) and right holder(s) of such contributions grant(s) to all users a free, irrevocable, worldwide, right of access to, and a license (Box 3) to copy, use, distribute, transmit and display the work publicly and to make and distribute derivative works, in any digital medium for any responsible purpose, subject to proper attribution of authorship (community standards, will continue to provide the mechanism for enforcement of proper attribution and responsible use of the published work, as they do now), as well as the right to make small numbers of printed copies for their personal use.
Box 2: Bundled deals from major commercial publishers Large commercial publishers sell bundled online subscriptions to their entire list of academic journals at prices significantly lower than the sum of their individual prices. The average listed price of for-profit journals has been found to be four times as high as that of non-profit journals when controlled for age, number of citations, number of articles, language and discipline [23]. Bundle prices are negotiated institution-by-institution and publishers endeavour to keep them confidential, with many contracts including "nondisclosure" clauses. The commercial publishers are able to act as a monopoly, as much work that is in high demand by readers is unique. A monopolist's ability to price according to buyers' willingness to pay is normally limited by those that are offered high prices being able to buy from others who buy at low prices, a process that stalls if no price information is available. Electronic "site licenses" that only allow access to IP addresses at the purchaser's location also prevent trading. Where information about pricing is available, prices show no systematic variation, but appear to be a result of local haggling [24]. An efficient market is one where there is equality of information. The commercial publishers' unwillingness to reveal their pricing produces a highly lucrative market for the seller, but one which is highly inefficient for the buyer.
or in the 2002 Declaration of the Budapest Open Access Initiative [26]: The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.
Many universities have adopted institutional processes that conform to these principles (e.g. [27]) and there has been a partial disruption of the earlier dominance of subscription journals, resulting in the development of "open access" journals that conform to a greater or lesser extent to the criteria of the Berlin declaration. They have progressively increased in market share and are now estimated to make up about 47% of all scientific journals [28]. Such developments have also been associated with novel ways of using the internet to create a more open publishing environment. For example, it has been conventional for authors to select journals before submitting articles, which, after peer review, editors may select for publication, thereby delaying dissemination of new ideas and adding transaction costs for researchers. These delays have led some to promote a new approach of "publish first and curate second" rather than "curate first, publish second", using online tools for rapid publication followed by post-publication feedback from peers. Several platforms in the natural sciences make use of this approach through pre-print servers, such as arXiv in physics and bioRxiv in biology, whilst post-prints are archived on institutional servers for Green Open Access. It is an approach that could optimize publishing practices for the digital age, emphasizing transparency, peer-mediated improvement, and post-publication appraisal of scientific articles [29].
According to the Directory of Open Access Journals (DOAJ [30]), 70% of open access journals do not charge Article Publishing Charges (APCs). Many commercial publishers also offer open access options, transferring the costs of production to authors, or the author's sponsors and not to readers, Box 3: Creative Commons Licences [32] CC BY Permits non-authors to distribute, remix, adapt, and build upon the original work, even commercially, as long as the original is credited. It is the preferred route for maximum dissemination and use of licensed materials. CC BY-SA (SHAREALIKE) Permits non-authors to remix, adapt, and build upon original even for commercial purposes, provided that the original is credited and the new creation is licensed under identical terms. All new works based the original will carry the same license, and allow commercial use. This license is used by Wikipedia, and recommended for material that would benefit by incorporating content from Wikipedia and similarly licensed projects. CC BY-ND (NODERIVS) Permits non-authors to reuse for any purpose, including commercially; but it cannot be shared with others in adapted form, and credit must be given to the original. through an APC [126]. However, in order to maintain profit levels, many journals set APCs at rates that can still be prohibitively expensive for many [31,123]. This is particularly true for high impact journals where APCs are pitched as high as the market will bear, thus inhibiting access by scientists from low-and middle-income countries. Changing existing journals from subscription to APCs is unlikely to resolve many of the problems of the current system, and may even entrench commercial power [118].
Commercial publishers have been reluctant to expose their cost structures in ways that make it possible to assess what a reasonable APC might be. In principle, costs will vary strongly according to the discipline and the level of service provided by the publisher [124]. The following slightly edited text has been communicated by Professor Johan Rooryck 8 . It would not be practicable to seek arbitrary limits to APCs, as real costs can vary greatly, as discussed above. In 2018 cOAlition S [34], a group of 22 international organizations, European national research agencies and foundations launched a scheme called Plan S that will require, by January 2021, all their grantees' work to be published in open access journals, or non-fully open access journals (or even pure subscription journals), provided that they deposit the version of record (VoR) or the authors accepted manuscript (AAM) in an open access repository without embargo. It has been announced that Plan S rules do not stipulate APCs but rather require publishers to be transparent, by providing a breakdown of their prices according to the costs of services such as proofreading, copy editing, and organization of peer review [35].
Plan S has raised concerns about costs, quality (where publishers may be incentivised to publish more papers), and participation (researchers without funding may have difficulties in publishing). Moreover, some fear that Plan S could perpetuate and strengthen the positions of those who already dominate the market and limit competitiveness by preventing or discouraging innovation and the emergence of new players and new models, including non-commercial open access models [36]. Despite Plan S, it is feared that the publication system may remain essentially controlled by commercial service providers for whom profitability remains the central motivation.
Notwithstanding these concerns, there are many not-for-profit, scholar-led initiatives to publish open access journals, and university libraries that organize online editorial processes for their open access journals and journal collections using an Open Journal System (OJS-PKP) platform [37]. There are a growing number of cooperative and collaborative initiatives between libraries and academic presses to manage journal collections, such as the Open Library of Humanities [38] [42] which involves universities, as long-lived and sustainable institutions, in collecting, preserving and providing access to their research outputs in institutional repositories, whilst research communities continue to undertake certification and quality control using traditional peer review. In 2016 COAR launched the Next Generation Repositories initiative, which aims to position research institutions and their repositories as the foundation for a distributed, globally networked infrastructure for scholarly communication, on top of which layers of value-added services are to be offered.
Pubfair [43] is a framework for open access publishing, which enriches a variety of research outputs (including preprints, data and software), managed by repositories or other data providers, with additional services that support quality control, dissemination and discovery. Its objective is to provide publishing services that enable sharing of a wide array of research outputs, support trusted evaluation and assessment processes, and empower research communities, funders, institutions, and scholarly societies to create novel dissemination channels. It allows researchers to move more seamlessly from data collection, storage and analysis to publication, quality assurance and dissemination.
Regional portals such as and Redalyc [9] and SciELO [10] have become particularly important for the global south, where knowledge production has had a low visibility in traditional indexing services (figure 1). Disciplinary repositories are also present in the global open access landscape, as is the case of the UN information system for agriculture, AGRIS [44], which offers open access to 3 million full-texts. At a regional level, in the social sciences, the Latin American Council of Social Sciences (CLACSO) has developed its repository which, together with Redalyc, offers 110, 000 open access documents, with more than two million downloads a month [45]. There is a concern however that Latin America's longstanding open access ecosystem, with no charge to read and no charge to publish with open access, could be undermined by proposals from the global north [46].

Learned society publishing
Until the 1960/70s, learned societies were the principal publishers of scientific articles, and although progressively displaced from their earlier dominant position, their journals still remain significant contributors to published scientific output. Their publishing models span the spectrum: subscription journals, fully open access journals, and intermediary "hybrid" journals, which are subscription based, but offer an open access option requiring payment of an article processing charge (APC). Many learned society journals are co-published, retaining the badge and ethos of the society, whilst being managed on a day-to-day basis by a commercial partner, often incorporated within their bundled deals, and benefitting from a proportion of sales, through with cost structures that are difficult to disentangle. The ethos of learned society journals has been eloquently expressed: often society publishers have a small number of very prestigious journals -so a small output of high-quality articles that have gone through exacting and high-quality editorial and production services. There is no scale to the system, the costs are high (for the right reasons) and the publishing output is low. It is a source of great pride to societies that we run the "best" and most reputable journals in our field and it is not a coincidence that we do -we are closer to our communities than other publishers (or we should be). So there is both a business and an emotional connection to society publications for our communities [48].
The challenge is to maintain this ethos and to manage the transition to open access. The Societies and Open Access Research (SOAR) project [127] has identified more than 1,000 societies publishing more than perfectly well in titles where authors are well funded and support such payments.  Open platforms -pioneered by F1000 and first adopted by funders, and now being embraced by publishers. In this approach authors publish their articles, which are then openly peer reviewed. Articles that are judged to be important and impactful can be specially curated and showcased.

Books and monographs
In many disciplines, particularly in the humanities and social sciences, book-length publications or monographs remain important. However, these long-form publications are often costly to produce and challenging to digitize. Moreover, publishing a print book carries prestige, and authors may be disinclined to choose digital-only methods for publication. However, sales of monographs are declining [50], and publishers are currently rethinking their models

Publishing the data of science
Acquiring data that opens insights into reality is a highly creative act. It is often at least as important as the publication that reports the insight. It is a first-class scientific output [60]. It should be published as an essential part of the record of science.
Most, but not all of the data of science is acquired as a consequence of scientific inquiry through experiments, observations or surveys. However, in many fields, for example in the social sciences and humanities, not all the data that are used are collected for scientific purposes but may be derived from government statistical surveys, health systems, commercial activities, social media platforms or from other public or private sources. If used for scientific purposes, methods of collection, sampling representativeness and ethical standards for access and use need to be rigorously assessed, at which point they become part of the corpus of scientific data.

Binary publication: concept and data
The association of experimental or observational data with a truth claim in a published article for which it provides the evidence is an essential element of empirically based science. Prior to recent decades, there have been relatively few areas of science where the volume of data was so large that it could not be included as part of the publication to which it belonged, either in numerical or diagrammatic form. The digital revolution has so expanded the volumes, fluxes and disciplinary diversities of the data available to and used by researchers that there are now many instances where data cannot readily be contained within the confines of even a digital-only article. This mostly concerns the natural, medical and engineering sciences, but also increasingly some sectors of the social sciences and humanities (for example, in large digital databases of languages, and in discourses from online sources such as Twitter, social messaging, Instagram). That fact does not change the fundamental reason why the article and the related data must be associated: that of scientific rigour. However, the difficulty of doing so with increasingly large and complex datasets, or Box 4: Binary publication of article and related data, an example The Nature journal Scientific Data [61] requires that authors deposit their data to a recommended data repository as part of the manuscript submission process; manuscripts will not otherwise be sent for review. The datasets must be made available to editors and referees at the time of submission, and must be shared with the scientific community as a condition of publication. The publisher does not host data, but asks authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, communityrecognized repositories where possible, or to generalist repositories if no suitable community resource is available. If data have not been deposited to a repository prior to manuscript submission, authors can upload their data to figshare [62] or the Dryad Digital Repository [63] during the submission process. Data may also be deposited in these resources temporarily if the main host repository does not support confidential peer review. The ultimate repositories must meet the journals requirements for data access, preservation and stability. The journal provides a "date-stamped archive of our recommended repository list", which is available for use under the CC-BY licence. Recommended repositories and standards that are indexed by FAIRsharing [64], can be also be viewed and filtered via the Scientific Data FAIRsharing collection.
through a desire to withhold data for whatever reason (see Box 5), have created a situation in which data and metadata are not routinely available alongside a published truth claim. Such an omission undermines the capacity to subject the logic of the claim to scrutiny, and makes it impossible to test replicability or even the honesty of the claim (Box 4). It is an omission that must and can be corrected. Publishing the data is as important, and sometimes more important, than publishing the written text, and one to which the no paywall approach should be applied.
This is a fundamental issue. It is vital that we develop a formal "binary-publication" system for cases where data and accompanying metadata cannot be accommodated within a publication. In such cases authors should be required to lodge the data with a trusted repository (Box 4), such that it is effectively published with a persistent digital object identifier (DOI), which works in an analogous fashion to a DOI for a published journal article. One part of binary publication would be the published paper, monograph or book that uses the data. Its twin would be the concurrently published data in a "trusted" data repository, in the publisher´s data repository or in a specialist data journal (ref) that manages data deposition, and where data related to a publication can be accessed via a reference in the paper. The ideal of course is that the publication and the data are digitally available and that the two should inter-operate. Such a system should increasingly be required by publishers as a condition of publication (see principle 3.IV).
There are cases in which open data publication is not appropriate, for example where such access would prejudice privacy, safety, security or has potential for harmful dual use. Nonetheless, the scientific case for scrutiny by referees and bona fide researchers remains. In such cases it is important that: a) the data are retained somewhere, b) there are pathways to access for referees and bona fide researchers; c) they conform to FAIR criteria (see below). An example of strongly safeguarded access is that of "safe havens", where data is maintained in a password protected independent network, access is controlled by a steering committee, is closely monitored and that any subsequent publication does not reveal sensitive data [67].
Where the data of study has been acquired from third parties, such as in economics and other areas of the social sciences, or in medicine, researchers may be required to sign confidentiality agreements about data use, or may only have access to aggregated data. In such cases it may not be possible to host the data seperately from the source, or to verify the integrity of the data or of an 'aggregation process' [65]. These issues complicate the process of data publication and re-use and need careful thought about the research norms that should be applied in such cases.
The scientific arguments for the publication of even the data that is not logically required in publications have been powerfully made over the last two decades in a series of influential reports [66,67,68]. There are pragmatic benefits to this approach. Firstly, subsequent users may find value that the data originator has not seen 9 . Secondly, unless the habit and the means are developed of making scientific data openly and routinely available and interoperable, the opportunity will be lost to collate and integrate data from a variety of disciplinary sources to investigate complex systems to which individual datasets contribute partial perspectives. Only having access to data that an individual or group has collected would severely restrict such possibilities. The ISC therefore endorses the ringing challenge to the scientific community that has been made by the International Union of Crystallography [69]:

We urge the worldwide community of scientists, whether publicly or privately funded, always to have the starting goal to divulge fully all data collected or generated in experiments.
There should of course be limitations on openness. It has been argued that openness should be the default position for data [67] with exceptions to be argued on criteria of safety, security [70] and privacy [71].
If datasets are to be shared, re-used, or open to scrutiny by reviewers, they need to be "intelligently open" [67]. The criteria that enable this have been formalised as FAIR -Findable-Accessible-Interoperable-Reusable [72], as follows:  Findable: Data are assigned globally unique persistent identifiers, described with rich metadata, registered/indexed in a searchable resource.  Accessible: Data are retrievable by their identifier using a standard communication protocol with metadata accessible even when the data is no longer available.  Interoperable: Data must be described and curated using a formal, accessible, shared and broadly applicable language for knowledge representation, and vocabularies that follow FAIR principles.  Reusable: Data and metadata are richly described with accurate and relevant attributes.
Released with a clear and accessible data usage licence; associated with detailed provenance; and meet domain-relevant community standards.
These are onerous requirements, but necessary if researchers and societal collaborators are to be able to share and reuse data and to combine diverse data series. They are vital if science is to be able to combine data in ways that have the potential to reveal deep structure in the many inherently complex, interdisciplinary problems that it is called on to confront.
There are some research cultures that have long been using such principles, and some are beginning to implement part of or all the principles, yet others are reluctant to follow such a path. For example, particle physicists tend to share data within consortia attached to particular experiments. Some areas of the social sciences, particularly those concerned with longitudinal data, have a long history of data repositories, with a great deal of re-use [73]. For the disciplines that have successfully implemented FAIR principles, such data have become an essential part of their research infrastructures, widely used by the community in its daily research work. They include the ESFRI [74] infrastructures in the humanities, DARIAH [75] for Arts and Humanities, CESSDA [132] in the social sciences and CLARIN [76], a language resource for Humanities and Social Sciences.
The nature and pace of change in the availability and utility of data to scientists have by-passed many earlier modes of data use. Some of the problems that need to be addressed are summarised in Box 5.

Data and peer review
Peer review has important roles in identifying unwarranted claims, fallacious interpretations or lack of originality in work submitted for publication. It is not however an infallible process.
Many, possibly most, examples of major error, retraction or fraud in published papers have been associated with non-submission of relevant data, submission of only that data consistent with an author's thesis, lack of necessary metadata, or invention of data. These are further reasons why it should be mandatory that data and the necessary metadata should be concurrently submitted with the article to which it refers, and why data publication and concept publication should be regarded as at least equivalent in status. Charles Darwin made the case: "False facts are highly injurious to the progress of science, for they often endure long; but false views, if supported by some evidence, do little harm, for everyone takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed and the road to truth is often at the same time opened" [83].
The failure of peer review to identify inadequacies in data and their treatment has and continues to have serious consequences. In 1998 a paper submitted to the Lancet [84] claimed to show that the measles, mumps, and rubella (MMR) vaccine predisposes children to autistic behaviour. Despite the small sample size (n=12), the uncontrolled design, and the speculative nature of the conclusions [85] the paper received wide publicity, and MMR vaccination rates began to drop because parents were concerned about the risk of autism after vaccination. Whilst retraction has deleted the paper from the scientific record, it has not removed it from humanity's wider store of misinformation, and its echoes continue to undermine public health.
In 2020 the Lancet retracted a paper that appeared to show that the drug hydroxychloroquine, undergoing trial as a treatment for COVID-19, could lead to increased deaths. As a consequence, the Box 5: Problematic issues for data publication The great increase in data volumes used to support published claims has posed a series of problems that remain to be resolved. Linking a published text with the underpinning data is essential, but there is as yet little general agreement about the principles to be adopted in doing so [77,78]. Journal editors and referees have a vital role in ensuring good practice for the data-publication link. The editor-in-chief of Molecular Brain recently commented [79]: amongst 180 manuscripts submitted since early 2017, 41 needed authors to provide raw data. Of these, 21 were withdrawn, indicating that requiring raw data drove away more than half of the manuscripts. 19 out of the remaining 20 were rejected because of insufficient raw data. Thus, 97% of the 41 manuscripts did not present the raw data supporting their results when requested by an editor, suggesting a possibility that the raw data did not exist from the beginning, at least in some of these cases. The large number of retractions of papers in high quality journals threatens to undermine trust in science [80]. There are methods to identify fraud [81]. Should their application by journals be mandatory, and if so, who should pay: authors, repositories or journals? Machine learning algorithms are increasingly used in the analysis of data supporting papers submitted for publication. What should be disclosed in a formal publication when using ML? How should algorithms be reported? Should it be mandatory to make codes public, or at least available to referees? Disciplines, professional societies and the scientific unions have important roles to play here since discipline practices and culture matter-both positively and negatively [82].
trials ceased [133]. The paper was based on flawed data. However, its publication stopped wellfounded trials, and was even 'weaponised' by a senior political leader against scientific advice more broadly [86]. These cases suggest two important issues for debate:  Should data technologies with the capacity to explore the validity of data series be routinely applied to data-intensive work submitted for publication?  Should rapid data review teams of relevant experts be mobilised at time of crisis and when scientific knowledge is essential, to assess the rigour of published and unpublished data relevant to the issues at stake?

Science publishing in a changing world
Scientific practice and the norms of scientific publishing inevitably adapt to the environment in which they are carried out and to the technologies that they use. The "digital revolution" has had a profound impact on science, partly because of its technologies, partly because of its impact on the social environment of science. The launch of the World Wide Web, thirty years ago, ushered in a new world of ubiquitous information, accessible to all who had devices to access it, both as readers and authors. It has stimulated creation of new knowledge communities, been a powerful means of disseminating scientific communications, whether peer reviewed or not, and democratised information, by-passing the traditional media, scientific or governmental gatekeepers that have filtered information. It has also created a digital divide [87], with approximately 3.5 billion people globally lacking the means of access to what has become a fundamental means of human communication.
A milestone was passed at the turn of the millennium, when the amount of data preserved in digital form first exceeded that in analogue form. In 2003, the first sequencing of the human genome was announced. It had taken 10 year and cost $4 billion. Today, as a digitally-controlled process, it can take less than a day and cost less than $1000. The relatively low cost and flexibility of digital processes of data acquisition, storage and transmission has since stimulated a continuously increasing digital explosion.
These changes have had powerful implications for science and scientific publishing:  The replacement of analogue by digital means of preparing articles for publication has dramatically reduced costs of preparation and dissemination, and shifted the balance of effort from publishers to authors.  The large and diverse data volumes now available to researchers create problems in observing what has been an historical scientific norm, of making evidence (the data) for a truth claim available for the scrutiny of peers.  Although the World Wide Web permits ubiquitous dissemination of scientific knowledge, access to scientific articles normally requires payment.  The global circulation of scientific knowledge has overcome barriers of geography and created the basis of a truly global scientific community, although some of the inhibitors to the effective operation of that community are embedded in current publishing practices.

The digital impact on the research cycle
Publishing can be usefully represented as an indispensable part of a research cycle, as illustrated in figure 2. Its stages are:  Formulating approaches for discovery, for problems that need solutions, or for an intuitive hypothesis that can be tested by observation or experiment.  Developing a work plan and seeking funding where necessary.  Undertaking the research programme and formulating a thesis.  Submitting the thesis for peer review as part of the publication process.  Publishing the thesis including supporting data.  Open critique by peers.  Re-formulating a further stage of research.
Research is a challenging enterprise because of the organised scepticism at key points of the research cycle that are designed to identify error: failure to obtain funding, to create a coherent thesis, to survive peer and editorial review, and to survive post-publication peer critique. A thesis embodying a truth claim is the traditional form of published output, but there are now increasing calls, reiterated in section 5, that any data produced during the research that cannot be contained in the published paper should also be concurrently published. In the era of paper-based technologies, the staging points in the cycle tended to be relatively discrete and well-defined, with, in many cases, publication being a self-contained end point. Documents relating to other parts of the cycle, including sharing and exchange of data, methods, software, pre-prints, discussion papers, funding proposals, details of collaborations etc, all tended to be lost. The advent of pervasive digital technologies has changed that. All elements of the cycle are connectable, with the possibility of digital interoperability across the cycle. The ease of digital production, its flexibility and connectivity compared with its print antecedent, creates the potential for access to much richer strands of work, thought and creativity. It is a potential that is increasingly exploited, reflecting the injunction of Principle VII in section 3 above. A crucial question therefore is whether the current modes of publication are barriers to the exploitation of these potentials.

Linked digital infrastructures for the research cycle
The journal paper, the monograph and the book have tended to be the end points of much research, and the principal means by which scientific understanding has been widely communicated. However, the process of research is more wide-ranging than the publications that are embedded within it ( fig.  2), and much additional value could be released from the many functions that the research cycle creates and which are rarely made accessible, except in archives, where the metadata required to maximise their utility is rarely available. There are many strands of current activity within the science community that could presage a  Linked digital infrastructures provide information about the research process. They not only support the process but also produce publishable outputs that are of value in managing research and its assessment by researchers, universities and funders, as shown in figure 3, including such matters as data on funding, productivity by discipline and institution. They are data about science, in contrast to the data of science discussed in section 5. If these were accessible, it would allow others to access, connect, analyse, and re-organize "research information" to better verify, understand, analyse, use and apply it. High levels of digital interoperability now make it possible to link patterns of publication by discipline, by geography, by citation and through time. These elements of the digital infrastructure are routinely collected and collated by publishers through the submissions made to them and the articles that they publish (see fig. 3). This data is of great value to researchers as a means of tracking publication in their discipline, to universities in managing their researcher effort and their researchers, and to national bodies in assessing patterns of production and productivity.
In principle this system is one in which formal publication in journal, monograph or book, no longer needs to be separately costed, but is simply one of the outputs of the research process, albeit, together with data, the most important. Its cost should be borne, as are the other costs of doing research, by the researcher's institution or funder. If well-organized, there is no reason whereby such digital support mechanisms should not be extended to non-funded research through open licensing.

Science Publishing System
Public Education Science Innovation Policy Researchers Universities Funders

Reinventing the practice of science publishing 10
The potential of the World Wide Web as a vehicle for scientific publishing is profoundly underutilised. It is largely used as a means of finding and accessing the content of scientific papers, either through individual browser searches or by interrogating the catalogues of publishers to which an individual researcher may have access rights. A more creative use would be to embed scientific records in the Web as structured Web objects 11 [120]. In such an approach, scientific papers could be linked to structured descriptions of experiments, software, data and workflows. It could make the value that is currently hidden in the research cycle accessible to further analysis. It is an approach illustrated in figure 4. It would enable AI systems that could organise scientific processes and re-run experiments, re-analyse results and explore hypotheses in systematic and unbiased ways. It could unlock part of the promise of open science and of reproducible research and would enhance the potential for serendipitous discovery [e.g. 121].
The necessary tools are accessible and their use is relatively straightforward. The objective is to use the capacity of the Web for semantic linking [130] in the work of a group or an individual, thereby to deepen the knowledge that they create, and potentially to link this to the work of others and to open wider horizons of discovery. If the work involved is routinely undertaken as part of the research cycle rather than being left to the end, it need not be an onerous task, and indeed may help to adjust the trajectory of research into the most profitable and creative avenues. There are several fundamental steps: determining the taxonomy that connects research materials as part of the individual's or group's approach to research; identifying the digital objects to be recorded; linking them by allocating URLs in a suitable access portal which can be managed by the originator or institution; accessing through a common protocol that applies to other online identifiers (e.g. DOIs, ORCiDs and InChi); using Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) to provide environments where applications can query the data, and draw inferences. The conventional "published" products of research (papers and data) would be parts of such as linked system together with the record of evolving versions and external critiques, and with the potential that subsequent "papers" are related to their antecedents and indeed are absorbed into in a cumulative corpus of research.
Such a system that would be responsive to a wide range of distinctive disciplinary or personal needs and approaches. There is no reason why current open access publishers should not provide for such a system, but it would point to a different direction of development, in both ownership and function, to that in which major commercial publishers are driving their journal-based systems. There is an underlying principle that should be observed in future developments of systems for publishing the output of science: that they should be built on open source rather than proprietary systems. It is a principle that has made the World Wide Web such a powerful, democratising agent of communication.

Monetizing the research cycle
Section 6.1 and figure 2 locate publishing activities as part of a spectrum of research rather than as a separate and discrete process. Some of the major commercial publishers (Springer-Nature, Elsevier, Wiley) recognize this reality and are now extending their business models beyond simple content provision in journals, monographs and textbooks to the other infrastructures of the research cycle [119]. Some (e.g. Elsevier, Pearson and Cengage) increasingly see themselves as data companies, and some IT companies are moving into the research data field. Such companies increasingly provide not only research support tools such as bibliographies and research activity syntheses, but also research assessment systems, productivity tools, and online learning analytics and management systems that are derived from the data acquired from their publishing activities, and which are critical to the business of research institutions and those that fund them [88]. This is not a replacement for the already lucrative journal and book publishing business, but an addition to it. The developing business model seeks to monetise the whole of the research cycle, its management and assessment.
Data analytics are seen by some companies as the lucrative new frontier. Recent negotiations between Dutch universities and Elsevier have revealed the trade-off the publisher is prepared to make in exchange for an extensive pilot programme on metadata [125]. Elsevier's strategy has been analysed [89] as being based on four key priorities: a) To protect its core journal business, and minimise the impact of the Open Access movement on that business by increasing market share at the high impact end of the publishing spectrum, thereby retaining the capacity to charge high APCs. The importance of high impact journals in the journal business model is exemplified by the demonstration that in disciplines that have been analysed, between 15% and 20% of journals are responsible for 75% of usage [90]. b) To improve the productivity of its journal business through the use of the data that it acquires relating to citation and readership and exploits through its various databases (Scopus, Science Direct, Mendeley, SSRN, and Bepress). The more such data that Elsevier collects, the better it is able to enhance its competitive position through the analysis of research and publication patterns, the quality and reach of collaborative networks, and the identification of researchers likely to become future leaders in their fields who might then be offered editorial board positions ahead of other publishers. c) To sell products to universities, funding bodies, and governments to assess the productivity of specific research areas and provide metrics to assess careers. In a 2015 investor presentation, Elsevier explicitly indicated its intent to increasingly serve university administrations, funding bodies, and governments with tools aimed at estimating and improving the productivity of research and optimizing funding decisions. d) To sell research insights to the business and investment communities. Elsevier, like other companies that have required scientists to relinquish copyright as a condition of publications, legally owns a treasure trove of insight, and increasingly of data. NASDAQ believes that about 30% of its market capitalization is derived from the appropriation of academic research capital [89]. Partnering with venture capitalists to exploit this treasure trove could prove to be the most lucrative of its options. The strategies and processes described above are well advanced, such that it is timely for all stakeholders to consider whether to accommodate to them or to consider and work for a different future. We return to this issue in section 10.2.

Implications for the governance of digital infrastructures
There are important roles for the private sector in providing services to public sector research, but the above analysis highlights problematic issues: a) The emerging commercial strategies do not address the issue of open access to the record of science and the data and evidence it contains (principle I), nor do they address principle II, of avoiding unaffordable costs, particularly in low-and middle-income countries and the resulting pattern of unequal access within the global scientific community. b) Is the scientific community content that many essential aspects of the research cycle are increasingly in the hands of commercial companies that have a primary responsibility to their shareholders rather than to science? c) Are universities and scientists content that the business management metrics that are essential to the management of their research function, or of their responsibilities for their students, may increasingly be in the hands of commercial companies? Are they concerned about the potential exploitation by commercial companies of publicly funded findings that they and their staff have created, through cross-cutting educational ventures and commercial exploitation of the record of science? Is the reality and further potential of commercial companies in determining metrics for staff performance and owning vast troves of data about their staff and students a matter of concern to them? d) Are funders and governments content about activities that represent a massive, cost free acquisition of a publicly funded resource and arguably an inhibition of its potential to contribute to the public good? e) And finally, should the governance of these systems that are so essential to the future of science be in the hands of private companies, or should it be located within the scientific community? The Scholarly Publishing and Academic Resources Coalition (SPARC) has made its view clear. "The need for academic institutions to act to retain control of infrastructure, data and data analytics is here to stay. It is critical for academic leaders to acknowledge that data and its uses play central roles in the operations and the future of their institutions, and take control of how it is managed as a strategic asset" [91].

Challenges and opportunities
The contemporary challenges faced by science and the opportunities for scientific discovery all influence the working of the research cycle and should be major determinants of priorities for the evolution and design of publication systems, whilst observing the principles in section 3. Emerging business models for publishing discussed in section 7 should be primarily driven by scientific need, not commercial opportunity. We observe the following trends of demand, opportunity and scientific dynamic as essential context for developments in publishing: a) In recent decades, science has become more deeply embedded as intellectual infrastructure for modern society. It lies at the heart of the largest shared priority for intergovernmental action, that of the Sustainable Development Goals (SDGs), where demands for its contributions are many and various, but all of which depend upon international scientific engagement as an essential, as the reality of an interconnected world and the need for collective action become more apparent. Equitable access to the record of science, both as authors and readers, is a vital imperative. b) Stephen Hawking predicted, in the year 2000, that 'the next century (the 21 st ) will be the century of complexity' [92]. The rich and varied data streams now being created, and the accumulated knowledge in the published record of science, coupled with machine learning technologies, enable unprecedented insights into complex systems. The science of complexity is the science of the real world, of the SDGs, of the real universe, of human endeavours, societies and languages, of cities, of demographics, infectious disease, biomolecular medicine and black holes. For its potential to be realized requires open and affordable access to the record of science and its data and evidence resources. Digital systems also offer opportunities to communicate and use knowledge in novel and effective ways. They have paved the way for multi-model systems that can incorporate audio-visual elements and text, for example in video journals that also incorporate peer review. c) The World Wide Web has created a new global information and communication environment, that both enhances the reach of science but also gives a platform to lobby groups that seek to undermine scientific consensus on many critical issues such as climate change, vaccination, smoking and AIDS, and complicate the scientific response to emergencies such as the COVID-19 pandemic. Public trust in science and its processes is critical.

Open Science
Openness is not new to science. It has been a principle that has been at the core of scientific enquiry since the publication of the first scientific journals in the 17 th century. But the trends summarised above have stimulated new horizons of openness that are embedded in the modern Open Science movement (93), for which open access to the record of science is a fundamental pillar. The movement has built on the opportunities created by the digital revolution and ongoing changes in the habits of scientific enquiry. These changes have been described as a progressive shift from one characterised by the hegemony of disciplinary science, with its internal hierarchies driven by the autonomy of scientists and their host institutions (mode 1), to a developing paradigm of knowledge production which is socially distributed, responsive to societal needs, trans-disciplinary and subject to multiple accountabilities (mode 2) [94,95].
There are a variety of views about the rationale and objectives of open science. Some see greater sharing of data and information as means of increasing the efficiency of scientific inquiry. Some see benefits to interdisciplinary science in having open access to the record of science and to a wide variety of data streams. Some see access to and integration of diverse, multi-dimensional data streams as means of analysing inherently complex problems.
And some see open science as a democratising process, in which openness is socially contextualised [96].
The ISC takes a broad view that encompasses all these motivations, resting on several fundamental pillars: open access to the record of science, the data and evidence of science, and the process of science (figure 2). We also insist upon a fourth pillar, of openness to society, in a two-way process of dialogue, in which science engages more deeply with business, policymakers, governments, communities and citizens, and between global north and south, as knowledge partners in ways that increase effectiveness and socio-political legitimacy. Without fundamental and equitable improvements in access to the record of science, its data and evidence, these objectives are likely to remain beyond our grasp. The extent to which modern publication mechanism and processes enable or impede the development of open science is a crucial issue.

A critique from the Global South
Not all are proponents of open science, or of the positive view of it expressed above. There is a developing critique from the Global South, particularly from Africa, that the assumptions and processes of open science and open access publishing as they have developed serve to reassert neocolonial values in ways originally framed in the work of Franz Fanon [97]. For many schooled in the confident setting of western science, "the claim that open access may be a neo-colonial process seems incomprehensible" [98], after all, is not science universal? The latter argument must be carefully nuanced. The laws of physics may be universal, but social customs and characteristics of population health are not. Equally, there are epistemological diversities that reflect differing histories and values that lead to differing priorities and approaches. And there is bias in the record of science as representing by indexes such as Web of Science and Scopus which, as noted in section 4, are dominated by outputs from the major commercial publishers, all located in Europe or North America, and largely representing science in these regions [99]. The perspective that most scientific advances have been made in the global north, and that northern priorities are global priorities, can lead to the exclusion of and contempt for knowledge and priorities in other regions [100]. Such a view implies that African science needs to develop so that it looks more and more like that of the North. It is argued that "these [Northern] partners inevitably guide the problems and the methodological and epistemological choices of African researchers towards the only model they know and value, the one born at the centre of the world-system of science -without questioning whether this model is relevant to Africa and its challenges" [98].
A global science community has become a greater reality in recent years, but it will not have come of age until it replaces a unipolar perspective with an inclusive universalism, open to a wider ecology of knowledge and capable of building an authentic global knowledge commons [101,102]. It is hoped that the development of the African Open Science Platform [114] will not only stimulate increased open content from within Africa, but also, crucially, provide the means to access that content, in addition to bringing science closer to society, promoting fair and sustainable development [100] and creating a more powerful African voice in global science.

Other contrary voices
It is also important to note arguments against open science in the north, which tend to be conservative or radical [103]. The conservative critique defends the right of the individual against the collective. This argument was trenchantly stated in an editorial published in the New England Journal of Medicine [104] which described the 'emergence of a new class of research parasites', and also commented that some of these parasites might seek to examine whether the original study was correct, a response that implicitly but directly conflicts with a fundamental principle of scientific rigour (section 2a). A similar sentiment was expressed some years ago by a Microsoft executive, who referred to open-source computer programmes as a "cancer", although the company has since joined the movement to liberate the world's data through its "Open Data Campaign".
The radical critique [105] argues that the release of vast troves of data, papers or research results, although potentially beneficial to science as an enterprise, simply exacerbates the trend towards the increasing marketization and corporatization of science that disproportionately benefit large corporations. It is argued [105] that open science opens the door to capture of publicly funded research value by commercial platforms; yet more "metrics" of productivity to "incentivize" scholars to work harder; and a focus on the system-wide progress of science, ignoring costs and benefits to individuals, whether scientists or non-scientists. This argument has anticipated some of the changes that have and are continuing to emerge (sections 6 and 7). It must be taken seriously and debated within the science community, in particular in relation to issues of governance (section 10.2).

Motivations, incentives and metrics
Issues of concern in sections 5-7 include immediate matters such as open access to the record of science and APCs, and longer-term concerns about the trajectory of evolution of business models for the research cycle, its infrastructures, outputs and governance. Understanding motivations, incentives and metrics amongst key stakeholders is vital in addressing both short-and longer-term issues and to the performance of the publishing system.
At the heart of the system are the scientists who undertake research and produce new knowledge that they seek to communicate through publication. There is little doubt that a primary motivation comes from the urge for discovery and the desire to communicate and promote use of that knowledge for the general good. But this naturally tends to be coupled with incentives for career advancement, which are set by their employers, and which are not necessarily compatible with the goal of discovery. Researchers are both suppliers of goods to the science publication industry and customers for its metrics.
The performance metrics set by universities for their staff are derived in turn from the funders of research, and tend to be metrics for the history of publication and associated citations as evidence of the productivity of the university. They are used by funders to judge the extent to which published science relates to their priorities, as a basis for judgement of potential in bids for further funding, and, in some cases, by national funders as a partial basis for appropriate levels of core funding to their universities. Universities use those metrics as incentives for staff, as a means of managing staff performance in bringing in research grants, and because of their role in burnishing the reputation of universities, particularly through the use of such metrics in university rankings [106]. Metrics cascade through the system. Although university rankings have been heavily criticised on methodological and conceptual grounds [107] their use as proxy indicators of excellence in university publicity, as a driver of reputation, and in some cases of funding, continues to grow. This domain of data about science is an important tool of strategic management of research at all levels, and growing area of academic and commercial concern as described in section 7.
Such proxy metrics are, however, highly problematic. They suffer from the consequences of 'Goodhart's law' [108,113] that 'when a measure becomes a target, it ceases to be a good measure', primarily because they can be and are 'gamed'. Exactly that has happened, suggesting the need for new approaches, but with the warning that unless carefully conceived, they too are likely to become inappropriate targets. Alternative approaches for evaluating contributions to science throughout the research cycle are essential, as many current metrics are barriers to development. Creative approaches are under way in many different settings worldwide [e.g. 109] and gaining prominence as a way to more fully describe the contribution of researchers to science. It may be timely to reevaluate the extent to which incentives are required at all [131], given the pathologies that current systems can generate in bogus research and predatory journals. We should contemplate the possibility that doing science is its own reward.
The motivations of the large commercial publishers are primarily to maintain high levels of financial return to their shareholders. The current science publishing environment is in a state of flux in which commercial publishers strive to maintain existing revenue levels whilst exploring new, profitable activities. For the larger publishers, this includes increasing efforts to control and monetise all stages of the research cycle, its assessments and indexing.

Conclusions: publishing in the service of science
Our overall conclusion is that the current system of publishing, which forms the bedrock of the public record of science, is not optimal and needs radical revision. Such a revision would be timely, as the structure and operation of the science publishing system is undergoing rapid evolution, partly in adapting to the digital revolution, partly because of dissatisfaction with pre-existing publishing norms, and partly because of the imminent potential for the digital infrastructures of the research cycle to be monetised for private profit rather than governed for the good of science and society.
We therefore set out a series of possible priorities for change in the light of the proposed seven fundamental principles (section 3) and the discussion of the changes needed to implement them. As requested in the covering letter, members are particularly requested to reflect and comment on the following recommendations, and consider how they would rank their importance a basis for discussion later this year and for planning actions we might collectively seek to take to implement agreed priorities.

Possible recommendations
I -Open Access to the Record of Science a) Access should be free to all -subscriptions should cease. b) Costs of published outputs (for both conventional publication and the deposition of data in trusted repositories), the costs of data management and archiving for re-use, should all be transparent and absorbed as a cost of doing research, as an integral part of the research cycle rather than as discrete add-ons. c) Publishers should open their current and archived holdings for text and data mining as a powerful tool of scientific discovery. d) A set of business model is required for the above priorities that will avoid excessive profits and the development of monopolies, stimulate beneficial innovation and that are adaptable to varied circumstances. e) Global dialogue and action are required to ensure that there is global equitable access to the record of science for both authors and readers. f) Current indexing systems discriminate against the global south. Moves should be made to federate such systems as a global resource for science. g) Future initiatives for publishing the output of scientific research should be based on open source rather than proprietary software systems.

II -Open licences
When submitting to articles for publication, researchers should not surrender copyright to publishers but should adopt an open license. Authors should not gift copyright of their work to publishers. An appropriate creative commons license, preferably CC-BY, should be the norm (Box 3).
III -Peer Review a) Peer review remains fundamental to quality control in scientific and scholarly publishing. Scientific publications should adhere to agreed standards of peer review. b) It is essential to maintain the possibility of rigorous post-publication peer review by ensuring that the evidence or data on which published concepts are based are contained within the publication, or, if the data are too voluminous, that they are concurrently published in a trustworthy repository. Publishers should not accept work for publication that does not meet this latter requirement in the form of IVb, below. c) It is timely to consider the routine application of data technologies able to interrogate large data series for integrity as a routine part of the peer review process. d) Times of crisis, such as during the COVID-19 pandemic, tend to give rise to an urgent need for new scientific knowledge. At such times, it is essential that systems of rapid response peer review are available to avoid misleading or even dangerous interventions that have not been independently assessed. Pre-prints may be an essential part of the process. e) The increasing use of networked online systems for publication processes provides new opportunities for pre-and post-publication reviews, and for publishing different elements of research, and should be encouraged. f) Commercial publishers are already developing systems to support pre-publication review systems and it is crucial that the scientific community is involved in considering how these systems can be most helpful, and how unintended consequences can be avoided.
IV -Publishing Data a) The deposition of data in a trustworthy and accessible repository should come to be regarded as a normal and necessary part of the scientific and scholarly publishing process, whether the data is integral to a published article or not. b) There should be a binary approach to data that is integral to a published truth claim but where the data, for whatever reason, cannot be included within the publication. It should be concurrently deposited in a trusted, open access data repository, with an interoperable link from paper to data and vice versa. c) Deposited data and meta-data should as far as possible adhere to the FAIR standards. These standards may require details of computer code and even machine characteristics, depending upon the character of the work. d) Publicly funded scientists and scholars should not regard themselves as "owners" of the data that their create, but custodians of the data on behalf of the public interest and should make their data openly available in trusted repositories. e) Openness should be the default position for data. But in some cases, issues of safety, security or privacy may override the default. In such cases, access to bona fide researchers may be permitted and regulated by controlled access to data in "safe havens".
V -Maintaining the Record of Science Scientific publication now occurs in a plethora of novel forms. As most libraries no longer hold a physical stock, but tend to manage access to online resources, there is a potential danger that science will be lost. There is a case for an international virtual library dedicated to the preservation of science, and without a sunset clause.
VI -Interoperability a) Interoperability between journal papers, monographs and books and the references that they contain is an important priority. There should be movement to a system where all publications are online, their underpinning data are online, and in which the two interoperate. b) The need for global multidisciplinary research, particularly into complexities such as those of the SDGs, would benefit greatly from interoperation between publications and data from different locations and disciplines. c) The diverse vocabularies and ontologies of science and scholarship are frequently incompatible. It is a major inhibitor in using data from different disciplines to address complex, multidisciplinary problems. The ISC has made this a major decadal priority, to be led by its Committee on Data (CODATA). It is important however to recognise the potential to lose nuance, complexity and meaning in translating data and evidence from one field to another, underlines the crucial importance of sensitive metadata.
VII -Adaptability and Evolution a) 3: I-VII are offered as enduring principles, although the ways in which they are observed may change as technologies and modes of working evolve. It is important to avoid being locked into inflexible systems that inhibit such evolution. The contemporary system is, in much of its operation, such an inhibitor. A purpose of this document is to seek ways of moving beyond it, and avoiding a different, but potentially equally inhibiting system. b) The technologies of the digital revolution enable fundamental evolution of the infrastructures of scientific and scholarly publication, with the potential to benefit the research process. For example, embedding scientific records in the Web as Web objects, not merely as publications but together with many other linked attributes of the research process could profoundly influence scientific creativity and could realise the promise of open science. However, the governance of such processes will be crucial in determining whether the principal benefits are delivered to the research process or to private investors, and to research communities worldwide or to the most privileged research institutes and countries. The issue is immediate and urgent.

Enabling factors
There are a number of important issues that will determine take up of many of the above recommendations and will be essential parts of the discussions planned for late 2020. They include:  Responsibilities: The principles that are advocated and the actions that are proposed place significant responsibilities on researchers, on their institutions, on scientific unions, associations and academies, on funders and on publishers.
 Incentives: Section 9 reviews the incentives in sciences systems that have strong influences on the inter-related actions of all the principle players. A deeper analysis is required about the extent to which this cascading system of incentives enables or inhibits the changes that are argued for in this document.  Governance: This issue is posed very directly by the changes that are already in train amongst commercial publishers, as discussed in section 7.

A further basis for action: an economic analysis
An economic study is currently being prepared that seeks to assess current economic models for scientific publishing, evaluates their advantages and disadvantages in relation to the fundamental principles advocated in this report, and proposes a range of models that would be compatible with these principles. The study will be available in early August, and will either be circulated as a separate paper or incorporated in a revised draft of this paper.