Publishing open-access bibliographical data on Ancient Greek and Latin texts: challenges, constraints, progression

We present here both some of our thoughts on methodology in relation to the specific constraints that complexify the ways of structuring and accessing bibliographical data in the Sciences of Antiquity, and the solutions adopted by the IPhiS-CIRIS project for dealing with these constraints. The project began in 2014 in a general scientific environment that was still being standardised and structured, with digital bibliographical resources in this disciplinary field becoming increasingly numerous, although of uneven quality and hard to access and/or private.


INTRODUCTION
The publication of bibliographical data on Classical and Late Antiquity2 has a considerable history already, since it dates back to the early twentieth century with the publication in 1927 of the two volumes of an important retrospective bibliography entitled Dix Années de Bibliographie Classique (1914-1924)  [Ten years of Classical bibliography, 1914-1924], by Jules Marouzeau 3 .In keeping with the evolving needs specific to each discipline with regard to Antiquity, a number of specialised bibliographies have flourished with the intention of providing a comprehensive list of works published in their respective fields.Some of these bibliographies have since made use of electronic media for their circulation, continuing to maintain the paper version as their reference, while others have been digital from the outset.Thus each has its own classification scheme and indexing system, and sometimes even its own canonical forms of authority.Since most of these resources are proprietary, it is not their natural vocation to be mutually compatible: at best, certain practices tend towards a standardisation of formats and classification schemes.The work required in terms of referencing the data to make it interoperable has not yet been completed, although numerous initiatives demonstrate a desire to use common repositories.The IPhiS-CIRIS project ('Information Philologique -Savoirs Antiques' [philological information -ancient knowledge], Centre Jean Pépin, CNRS-ENS) is participating more particularly in defining a set of reference data of names of authors and titles of ancient works to be fed into a database of editions of ancient texts.When drawing up these files we come up against a number of difficulties: concurrent forms for the same entry, inexistent forms of the state of certain works in the existing repositories, duplicates, homonymic forms of titles or authors, dubious attributions of works to a particular author, controversial attributions to various authors, etc. Deciding on the forms of these tables and the links between them created a number of technical difficulties from the outset; these are characteristic of Antiquity and have not yet been fully dealt with in the computer modelling.

I STATE OF PLAY 1.1 A blossoming of initiatives
There is today a large quantity of bibliographical resources in the field of Classical and Late Antiquity that is available -partly or fully -on the Internet.Mention may be made of L'Année Philologique, published annually since 2017 by Brepols, after having for a long time been distributed by Les Belles-Lettres: not only is this bibliography the only one to attempt to cover the group of disciplines encompassing Classical and Late Antiquity -it is also the oldest and largest bibliography listing studies on these authors and editions of texts, and a fee is charged for access.The other resources are specialised bibliographies in the form of regularly updated databases.There are many examples4 , but we may mention here: • Base d'Information Bibliographique en Patristique (free access)5 ; • Bibliographie papyrologique en ligne (free access) 6 ; • L'Année épigraphique (paywall) 7 ; • Syri.ac:An annotated bibliography of Syriac resources online (free access) 8 ; • Répertoire des sources philosophiques antiques (free access) 9 ; • Droits Antiques (free access) 10 .New formats of shared bibliographical resources are springing up all the time, using software specifically designed for organising bibliographies.It is now possible, for example, to share collections and libraries in Zotero 11 .In addition, we are also seeing the appearance of digital libraries of ancient editions and manuscripts: each library has its own criteria for managing its collections of manuscripts and ancient editions, developing tools in-house that not only allow consultation but also, more often than not, prevent or restrict the downloading of data.
Mention should also be made of the massive but disorganised (and of very uneven quality) digitalisation work carried out by GoogleBooks / GoogleScholars, archive.org,HathiTrust (with restrictions on access from outside the US) projects, and the development of academic networks (such as academia.edu)for following up topics and researchers.These are new approaches to the publication and circulation of bibliographical data that -whatever we think of themcorrespond to new uses or new expectations in academic circles.Thus there is a host of initiatives in this field, but we are still at an experimental stage in terms of digitalising and structuring bibliographical data.There are a number of problems: firstly the large number of proprietary resources, and secondly the fact that the good quality open resources are still not well known outside digital humanities circles.Demand for the production of open data in fact involves producing data that meets not only the required technical standards but also scientific demands, always seeking to deliver data that is accessible to a wide public, whether or not its members are technologically informed.Thus it is necessary to bind together the political and scientific issue of access with the technical issues.This demands that we do not compromise on excellent scientific quality, ergonomics, ease of use, and good practices ensuring the continued existence of the tools, and it is sometimes difficult to maintain our stand.

The inexistent common portal for bibliography on Antiquity
In an ideal world, we might imagine that all these databases could be compiled into one vast bibliography of Antiquity, but so far there is very little interoperability among them.The problem is firstly one of quality, because of the absence of convergence among the strictly bibliographical structuring systems, such as the Dublin Core, which is used more than extensively in describing standardised bibliographical data, and the systems that describe ancient sources which use specific and sometimes contradictory characters.True interoperability among bibliographical databases would require a standardisation not only of the choices applied in describing the bibliographical resources themselves (articles, monographs, editions) but also of their material sources (manuscripts, papyri, inscriptions) and their textual sources (standardisation of titles and authors).It is important to underline the effort made by the Project Biblissima12 to work on interoperability of a large variety of databases concerning history of books and libraries in the Middle Age and the Renaissance, including research tools for ancient authors and texts.We shall see below more specifically why true operability remains on a distant horizon with regard to the Sciences of Antiquity, and how we, for our part, have broached the issue.The problem is also one of quantity.Recent years have seen an inflation in both the bibliographical mass and the number of initiatives to make use of it, an immediate consequence of the bibliometric evaluation systems set up by the academic institutions.This burgeoning of scientific literature, even though it is a sign of the vitality of the research work being carried out, eventually wilts when faced with the defective referencing system that is in use.There is no longer any single bibliographical undertaking capable of coping with the entire quantity of scientific production13 .We should also add that recent years have seen a renewal of scientific approaches in the field of Antiquity, with the emergence of areas of study devoted to cultures other than those of the Greco-Roman world in its strictest sense, and this in turn has led to numerous studies on the transmission and exchange of textual, literary, technical and philosophico-religious productions.However, the bibliographical tools that include these aspects are not yet particularly developed, and researchers are often faced with a fragmentation of resources, with access to each specialised bibliography creating yet another obstacle that needs to be overcome.

"Back to the source"
Rather than bemoaning the fact that we are not able to embrace the galaxy of scientific publications related to Classical Antiquity, we have, with the IPhiS-CIRIS project, opted for developing a tool in phase with these new methods, in terms of both content and ergonomics14 .The first stage therefore involved refining this abundance of material in order to concentrate our efforts on a homogenous part of the publications.We chose to concentrate first on the editions of texts and then on those studies whose prime object is to identify authors and establish texts and their tradition.In doing so we introduced a new methodology: we chose to consider the text as the starting point for the architecture of our bibliographical database.However, there is nothing self-evident about identifying and referencing texts: there is no fixed form of titles15 in Antiquity; many texts, in fragmentary form, have come down to us without any title, and others have been reworked and the various versions need to be differentiated, while others are no longer attributed with any certainty to a particular author, or were for a certain amount of time before scientific progress challenged their attribution, and so on.Our aim -on a small yet ambitious scale -was therefore reformulated at the start of the IPhiS project in 2014 as being: 1) to supply as complete as possible a bibliography of the editions of Greek and Latin texts from the time printing began up to the present day, including in the different states in which they are known (in their original language or in translation; in partial or complete form; treated as part of a larger set; etc.); 2) to supply this bibliography in a structured format that would allow the exchange of data with other bibliographical databases, whether specialised in any particular discipline or not; 3) to provide, as far as possible, access to the resource by indicating the link to an open-access digitalisation of the edition or, if it exists, to the manuscripts used as the basis for the edition, to make it easier for our users to find their way around the global landscape of digital resources concerning Antiquity16 .The project has given rise to a working database, IPhiS, which is not open to the public, and to a web interface for published data once it has been validated by the CIRIS17 team.

What constitutes an ancient text
Ancient texts, which form the core of our project, are above all shifting realities: between the initial form of a text published as the work of a given author and the form in which it has come down to us there are quantities of intermediate forms which we lump together in the reference title of the text 18 .There are any number of possible causes for an early text being deformed into the version of it we know today: the text may have been reworked by its original author and we are in possession of several concurrent versions; the author of the text may be uncertain or under debate; the text may not have reached us in its original or complete form, either by choice (quoted by another author, for example) or by accident (a fragmentary, defective or badly copied manuscript), etc.However, all these forms refer to a single ideal conceptual philological object, which contains in it all the forms it has already been able to take and could still take in the future.The task of referencing these objects is ongoing, and no catalogue today can claim to be exhaustive or complete.Developments in the field of research also mean that this referencing cannot, by its very nature, be definitive, even if we seek to establish sustainable data: anyone could be caught out by an unexpected discovery that challenges an attribution or a title, or by the rediscovery of a text for which we thought we had reliable, constant information.

The problem of title
The question of the title of the ancient text is in itself a problem: in our efforts to draw up a list of titles we had to consider a number of realities that corresponded to the expression 'title'.Some texts -such as plays and novels -really do have a title.It is less evident when we are dealing with scientific, philosophical or religious treatises, since they would be qualified as '(treatise) on a particular subject', sometimes with a sub-title.This is then an explanatory title, but its form is not fixed and it may evolve, in copies that are made of it, by abbreviation, or by extension.Other texts have no other title than that given by a subsequent publisher according to their genre ('oration', 'speech', 'homily', etc.): these texts are often passed down to us in a corpus, within which they are numbered; this order becomes 'canonical' until new manuscripts are discovered, challenging the order by adding new texts or removing others, and we are then faced with duplicate or triplicate numberings.Most of the poetry that comes down to us from Antiquity is only known in anthologies and collections within which the poems are numbered, but the same poem, since it is included in several collections, is given several identifiers: we then choose to consider the compilation as the reference text, and the poem as a constituent part rather than as a text in its own right.Lastly, an incalculable number of texts have come down to us with no title at all, in isolation, sometimes only as a fragment: if their content is deemed important, they are given a title that quickly becomes the reference title.However, some of these titles are no more than designations of the accident as a result of which the text has come down to us: for example, what is designated as the 'Palatine Anthology' is a collection of poems which has come down to us via a manuscript found in the Palatine Library in Heidelberg19 .We have sometimes chosen to propose a new, more meaningful title for a text which has a relatively meaningless title 20 .It is thus this multiplicity of situations that the field "title" covers.In our database we have made room for 'aliases' that retain the history of the titles, when they vary from one publisher to another, or from one piece of manuscript evidence to another, or simply because we give Greek texts a working title in Latin even though we are fully aware that the original title is the Greek one.Since we always provide for each text a cross-reference towards other repertories of texts, we often notice that these repertories don't use the same exact title for the same text.If there is no reason to reject all of these variant titles, we tend to keep the title given in the Thesaurus Linguae Graecae Canon 21 of texts or in the Library of Latin Texts 22 for classical texts, and the title given in the Clavis Clavium 23 for Christian texts.

Linking a text to an author
Another difficulty with referencing is that of creating a link between a text and an author.Two specific situations had to be taken into account in our bibliographical database: texts with no identified author, and texts whose presumed author is or has been disputed.In both cases, we must report the lack of precision or the ambiguity in our knowledge, without hiding anything.

Texts with no author
This first case appears simple, since it is possible to attribute a text with no author to an 'anonymous' author.However, the entry 'anonymous' in the table of authors would refer to just one person whose name has been lost!Our approach has therefore been to use collective headings in the 'creator author' field, making it possible to classify our texts in categories without concealing the loss of the name of their author, and providing the possibility of finding the text by means of a themed search.The headings and number of these collective entries are left completely up to us, but there is a tendency in academic circles to adopt common practices, and so we have tried to harmonise our headings with those used in other databases, particularly the Pinakes database24 developed by the Greek section at the IRHT (CNRS -UPR 841) 25 .Thus these collective headings function as "supra-authors" and it would have been perfectly logical for us to attribute to each text, whether its author was known or not, a collective heading in addition to its possible author, in the same way as one would use markers or labels when organising index cards.We decided to only use this option in the case of unknown authors or for certain large-scale groupings in mainly early editions.Initially, we chose to use these names only in the case of texts without a known author, that is to say as an alternative generic author name.But we do not rule out modifying this parameter later and systematically assigning each text one or more of these names to allow a thematic search in the database.

Disputed attribution
The other situation is when texts have come down to us under the name of an author that subsequently proves to be incorrect, either specifically and deliberately -for example the many cases of religious authors whose work was banned but some of their texts continued to be copied under the name of another author who was authorised -, or as the result of an error in transmission caused by homonymy or confusion.Sometimes modern publishers of ancient texts took up such problems of attribution a while ago, and in some cases they have come up with a solution.Thus some texts no longer circulate under the wrong author's name, and once the controversy has been deemed closed for several decades, we make no mention of this other than in a comment on the text.But much doubt still remains about many texts, and we felt it was necessary to report on this as a stage in academic research.
We therefore opted for creating the notion of 'disputed attribution' by mentioning specifically the bibliography involved in the controversy.It is therefore possible, in our database, to list a text under several authors and to mention which authors are accepted or rejected by modern publishers, with the corresponding bibliography.Apart from the fact that this provides additional access to the bibliographical data concerned, our solution makes use of the state of research and the importance of considering the uncertainty that sometimes surrounds the texts that have come down to us from Antiquity, by presenting such texts to the user on both the page for the author (Figure 1) and the page for the corresponding text (Figure 2).

Geographical data
It is usual to designate certain Ancient authors by adding to their name that of a town or country to which they are attached by birth or the place of their main activity, particularly in cases of homonymy.Wherever possible we have preferred sequencing these two items of information, the one prosopographic and the other toponymic.However, this obviously supposes that the geographical data has been checked and crossed-referenced against the historical geography repositories.Where this was known, we linked each author to one or more places, possibly specifying the event associated with the place (birth, education, activity, death).Here again we made it possible to weight the reliability of the historical data by adding an indication of whether the data is certain or disputed.This specific data is displayed in an extension of the CIRIS application in a cartographic format 26 that we felt was more pertinent.For the time being, only the geographical data of the philosophical authors are being published; ultimately we hope to be able to provide this data for all the Ancient authors in the database 27 .
26 https://ciris.huma-num.fr/cartographie.php?langue=fr; consulted on 23 April 2021.See also Fig. 12 supra.The specific publication of this geographical data has constituted an appendix to the project which we present in the report [Giovacchini et al., 2020]. 27We have chosen to display our biographic data in a modern map, with contemporaneous place names.This choice is a pedagogical one, and is directly focused on the accessibility of our tool for non-scholar users and particularly high school students.

A bibliography in its environment
For the files of both titles and authors, we rely systematically whenever possible on repositories that have already been checked and found reliable, to which we direct users (Pinakes, VIAF, DataBNF, etc), but if appropriate we also propose our own repository, which has our own structuring.Once we had opted for acknowledging the rightful place of uncertain data on bibliographical and philological objects with rather complex contours, presenting a classification meant both respecting the standards without which it is impossible to share data as well as making decisions reaching well beyond straightforward indexing or alignment, and therefore supposed a degree of scientific risk-taking.In the history of the IPhiS-CIRIS project, accessibility has played a structuring role.Initially, the project was constructed on the basis of an ambition to offer full open access, in reaction to the idiosyncratic and extremely constricting proprietary model of the Année Philologique28 .Thus the notion of open access was directly correlated to the question of the structuring and display of the data.Opening up data is not simply the unequivocal gesture of lifting a barrier: rather it means from the outset thinking of the data as an element that may be shared or potentially improved or altered within a complex environment -what is often nowadays called a 'data ecosystem'.Which is tantamount to saying that accessibility is necessarily free of charge, but that is not its only feature.The project therefore exploits a dimension of structuring that is in fact already historically consubstantial with the Classical sources.Since the time of the very first publishers, Greco-Latin literary texts -because of their often incomplete or fragmentary state and the existence of successive versions that form overlays rather than replacements -possess systems of unique identifiers of varying complexity.
Many non-classicists from academia and beyond still express surprise that classicists have been aggressively integrating computerized tools into their field for a generation.The study of Greco-Roman antiquity is, however, a data-intensive enterprise.Classicists have for thousands of years been developing lexica, encyclopedias, commentaries, critical editions, and other elements of scholarly infrastructure that are best suited to an electronic environment.Classicists have placed great emphasis on systematic knowledge management and engineering.The adoption of electronic methods thus reflects a very old impulse within the field of classics.The paper knowledge based on Greco-Roman antiquity is immense and well organized; classicists, for example, established standard, persistent citations schemes for most major authors, thus allowing us to convert nineteenth-century reference tools into useful electronic databases.Classicists are thus well prepared to exploit emerging digital systems.For many classicists, electronic media are interesting not (only) because they are new and exciting but because they allow us to pursue more effectively intellectual avenues than had been feasible with paper29 .
Since the time of the monuments of German erudition in the eighteenth and nineteenth centuries, Classicists have always considered manipulation of the corpora as constituting a consultation, far beyond cursive continuous reading.Consultation supposes transversality and navigation; the corpora are explored and refer back to each other in an inchoative method that can never end.This mode of appropriation is necessary in as much as ancient texts are not units that are closed in on themselves but rather are open samples that are always subject to revision and rereading and above all that serve as sources for other texts.By its very nature, one Greco-Latin literary text refers to other texts in the present state of transmission: one text transmits other texts and is itself transmitted by others.(By 'source' we mean here the material source as well as the historical and literary source, both meanings mingling fairly imperceptibly in the case of texts of the Greco-Latin Antiquity period 30 .)Thus the very first step towards accessibility consists, as we have seen, of modelling this network of inter-generating texts, and making the text as a source the pivotal point or central table of the database.The second stage supposes adding to this central table a sufficient number of controlled identifiers to allow unambiguous navigation both inside and outside the database.These two gestures suppose a distancing from the usual structuring of a bibliography: departing from the traditional documentary model which constructs the reference around the modern edition and is not affected by fine granularity (fragment, extract, etc), considering the modern edition as the culminating point of a long historical process, one state of the text among others, neither the latest nor the 'best', and focusing attention on the source text identified as a sort of bibliographical invariable in its ideal form, whose actual occurrences are so many versions.This involves no more or less than applying the Lachmannian notion of archetype to the bibliography, albeit by shedding any naively realist posture 31 .It is supposed that a text was produced in a given period by a given author on a given subject; if the text in its empiric form is no longer accessible today, it has produced a certain number of historical avatars which all have in common a desire to be reproductions or indirect representations of the text, which must then be taken seriously as the focus, raison d'être and true purpose of these avatars.This is truly a matter of accessibility, since it is the only modelling that actually meets the expectations of the users of a bibliographical database; these users are primarily readers, as they read and carry out research on texts, but they are also authors who produce texts themselves.This departure from the usual bibliographical frame is also an interesting way of converging towards other schemes that are less strictly disciplinary and more generalist, with a view to strengthening the navigability of the data.

Semanticizing IPhiS
In 2019, the IPhiS database and its public web version (CIRIS) migrated from a local server on the Villejuif site to Huma-Num's Very Large Research Infrastructure (TGIR).The conditions for this migration were excellent, offering guarantees of incomparable permanence compared with the constraints of local hosting.The operation, although highly beneficial for the project, was dependent on one medium-term condition: it had to be possible for the Isidore 32 search engine to harvest the essential CIRIS data.Since Isidore functions according to the principles of the web of linked data, we had to 'translate' our traditional relational database in such a way as to make it compatible with a number of semantic constraints, which meant converting the main IPhiS data into searchable resources meeting the semantic standards accepted by Isidore 33 . 30This is a problem that justifies the title of our job at the CNRS: the analysis of source, many examples of which are to be found in our team's research logbook; see for example [Capron, 2016c] and [Grignon, 2016.]. 31 An expression of this type of realism forms a common theme running through recent experiments carried out on citing Ancient sources; all the models proposed are based on historical breakdowns and undisputed nomenclatures that are deemed historical fact, whereas they are actually no more than reconstructions.Thus the most advanced tool available to date in terms of citation, the CapiTainS environment (presented for example in [Clérice, 2017]), proposes alignments that are technically extremely advanced, constructed on the basis of the CTS (Canonical Text Services) model, which in turn uses a text identification system that does not dispute the pertinence of the titles being aligned.Although in theory this system is indeed able to allow title variants to be taken into account by distinguishing works from versions of works, in fact it only does so marginally, as it excludes for example the possibility of fuzzy matches, variable divisions, and intertextual conversion. 32https://isidore.science/; consulted on 23 April 2021. 33There is nothing original in this process of semanticisation on our part.Indeed it is a process that has become banal: items of digital data are adapted to the constraints of the data web so that they will comply with the Isidore is a harvester and a tool of enrichment, but to be able to harvest and enrich it first needs to be able to recognise, identify and link.It is therefore necessary to propose data set out in a format it is able to interpret, and for the actual items of data to be described using metadata Isidore recognises.The Isidore documentation offers two options for preparing databases: • Propose data using an XML flow of standardised metadata using the OAI-PMH protocol associated with metadata in Dublin Core format.This method is suitable for use with documentary databases, corpora, scientific archives and document/data libraries.For example, a tool such as Omeka offers OAI-PMH via a module. 34• Propose data using an XML sitemap flow pointing to webpages containing RDFa metadata.This method is suitable for research programme websites presenting corpora of documents or data, scientific blogs (but not Hypotheses.org),and webpages in general. 35 36 selected the first option, specifically adapted to our case, and decided to do so by making use of another tool proposed by the Huma-Num grid of services: a Nakala 37 depository.This solution supplied us with an OAI-PMH depository generated automatically by our posting in Nakala without having to incorporate the data directly in a database that was already substantial enough; it also enabled us to select more closely which sets of data were most pertinent for harvesting purposes.This option nevertheless raised other problems, and not all of them have as yet been completely resolved.The principle of a Nakala depository is relatively simple: it is an autonomous uploading area in which items of data are associated either manually or using an API with metadata standardised according to the grids of Dublin Core vocabulary, which is currently the most frequently used model for producing bibliographical metadata38 .The first question that had to be asked was therefore whether the fields in the various IPhiS tables matched those of the Dublin Core vocabulary, with a view to being able to establish an equivalence between the two, this being a prerequisite for the possibility of interoperability between IPhiS and Nakala.If we compare the IPhiS data model with the Dublin Core vocabulary, we can see straight away that there are considerable convergences -which is perfectly natural and to be expected of a bibliographical database.The most important of these convergences is the adaptation in IPhiS of an important distinction, namely differentiating between the creator of a resource and a contributor to a resource.It is a particular feature of ancient texts that their transmission involves a number of participants who, although they are not all authors strictly speaking, have nevertheless made a not inconsiderable contribution to the elaboration of a given text up to its final state: these participants include publishers, printers, translators and commentators.The distinction between creator and contributor was from the outset an elegant solution in the data model for differentiating sufficiently clearly the various stages in the transmission of the text; we preferred this to the more traditional distinction between ancient and modern authors which is based on a vague and relatively inoperative temporal difference as we wish to distinguish not between periods but between interventions requirements of search engines, in a frame that is often institutional.We may note, for example, that the tool SKOS Play allowing the conversion of data into RDF/SKOS from an Excel spreadsheet was developed with funding from the Luxembourg State in order to produce public data in a semantic format (cf.[Francart, 2017]). 34http://info.omeka.net/build-a-website/manage-plugins/oai-pmh-repository/;consulted on 23 April 2021. 35https://documentation.huma-num.fr/isidore/;consulted on 23 April 2021. 36The development and implementation work described in this section was carried out mainly by Bernard Weiss and Julie Giovacchini (ontology and mapping with the DCMI by Julie Giovacchini, scripts for automated supply to the Nakala API by Bernard Weiss); the tests were devised jointly by both these researchers. 37https://www.nakala.fr/consulted on 23 April 2021.
on the texts.Thus the IPhiS data model allows the entry of an 'ancient' author as a contributor to a text on a par with a humanist or contemporary publisher.

Adapting to Nakala
For the actual uploading in Nakala, we had to overcome a number of relatively substantial technical constraints, which obliged us to delimit very clearly both what we wanted to display and our choice of metadata to accompany the display.
In the first place, Nakala is mainly intended to be used for uploading data in the form of files attached to metadata, and the main working interface is a visual interface that only allows the individual uploading of file after file.But our situation is very different: we want to post large sets of data in an automated way, by supplying the depository directly so that it updates itself at the same time as the database, which is designed to continue to receive data indefinitely.We therefore concentrated firstly on uploading the records of editions in the presentation proposed by CIRIS, as these constitute the most important and complete IPhiS data, leaving aside temporarily the direct display of the thesauruses of Ancient authors and texts.Our Nakala depository is envisaged as a place for displaying edition records which are considered as so many separate files, each comprising an image (PDF or screen capture of the page of the record in CIRIS) and a set of metadata (Figures 3 and 4 39 ).
39 The screen captures were generated at the time of the stage in the test of the procedure described in these pages, as the Nakala depository whose development we describe is not yet public.Also, in order to automate the process and upload large quantities, we had to access Nakala via its API40 .Using a script, we recuperated the necessary files and metadata in IPhiS and then uploaded them in the API by using a 'post' request.This is the only way the depository can be updated automatically.The script, first developed in PHP, is still experimental, but once it has been stabilised, the next stage of the work will consist of proposing an open version in PHP and Python that can be adapted for use in other projects similar to ours.The Nakala API requires a particular formalism for the presentation of metadata, requiring us to adapt our requests to convert the fields in the IPhiS database in such a way as to produce metadata in JSON format, which is compatible with Nakala.This part of the work was relatively delicate and called for a number of arbitrations that were at times somewhat frustrating.For a very straightforward record, limited for example to very little metadata in addition to the compulsory metadata, we arrive at the following form: Nakala relies mainly on two controlled vocabularies, Dublin Core and FOAF, and adds five compulsory metadata (type, license, creator, created, title) in its own namespace.As indicated above, it is theoretically possible to translate all the metadata for our bibliographic records using qualified Dublin Core -including relationships between texts, external sources and references, using the notion of relationship.Our initial mapping was organised as follows: In practice we nevertheless came up against certain consistency issues, connected with the different intents of our database and the Nakala environment, and we had to partly adapt this mapping.
Thus the compulsory metadata in Nakala includes the notions of creator and created in respect of the date of creation.But these notions, in Nakala's internal logic as expressed in the API, apply not to the content of the data but to the data itself: the creator is the person who posted the data, and the date is the date of posting.If we maintain this idea in our display, the metadata then proves to be very poor, and harvesting is likely to prove of little pertinence, even though it is always possible to add further layers of metadata subsequently.For our type of bibliographical data, creating the possibility of carrying out a search on the basis of the name of the person uploading an item of data merely results in the creation of unnecessary noise and is probably a nuisance for users of both Nakala and Isidore.
We therefore decided to circumvent this constraint and indicate as creator the creator of the intellectual content -so, as far as we are concerned, still a creator author, but an ancient author.However, in this case it is impossible to associate a creation date with this creator.This is because the date format for an Ancient work cannot be expressed as D/M/YYYY as required by Nakala's API.At the time of posting we therefore systematically need to associate a zero value to the compulsory created item of metadata.These constraints are offset by the possibility of incorporating a clickable link in the metadata items leading directly to the record in the database.Thus whether by visiting the depository directly or as a result of harvesting by Isidore, the user will always have the possibility of rapid access to the full record in its original place of publication, as the last metadata item visible in Figure 5 shows.

What we display in Nakala, and why
This obviously raises the question of the necessary redundancy of accesses to an item of digital data.The purpose of conversion is to produce a display which will, we hope, increase the visibility of the database -but which cannot take the place of direct consultation of the database if full information is required.Therein lies all the ambiguity of the application of semantic technologies to digital objects that are already natively relational: the overlay of relational logic, thought out as navigation or an explicit path from one resource to another, by an object logic in which the fine description provides the key but does not set out the entire path; thus the description increases the visibility of the resource in certain search tools, but does not take the place of direct exploration of the resource itself.In other words, full display in Nakala would require the construction of a semantic clone of our database, and that would be pointless.Nakala is therefore used like a magnifying glass, with the intention of making it possible for Isidore to harvest certain strategic metadata in order to increase in fine the flow of visits not to the Nakala depository but to the database itself.It is in this logic that the date of creation of the data or its creator becomes a useless item of information, which must be replaced by the creator of the intellectual content likely to be of interest to the user.These choices, summed up in Figure 6, which are still being implemented, are the final stage in a lengthy thinking process; before accepting these compromises it was necessary to start by exploring the hypothesis of systematicity.Would we have been able to display the complete set of IPhiS data directly in Nakala if we had wanted to?The answer is no, for the time being, because firstly if we had wanted to display not only the records for editions but also all the connected data in the IPhiS thesaurus, we would have fairly quickly reached the limit of the vocabularies accepted by the API, and secondly Nakala does not natively recognise the SKOS vocabulary.If we compare for example the IPhiS data and a set that is close not in terms of quantity but of type of object listed -the BNF data -we can see fairly well how DataBNF compensates for the shortcomings of qualified Dublin Core and FOAF with on the one hand certain elements of SKOS for the data in its thesauruses and on the other a specific ontology describing relations and objects specific to its resources and its display 41 .Similarly, a full IPhiS semantic display would suppose the creation of a specific ontology to express certain traits which, as far as we know, are not currently taken into account by the standard bibliographical ontologies.This is more particularly the case for the matter of incomplete texts, noted as fragments or pieces, or collections of texts, anthologies or corpora; it is also, and perhaps even more so, the case for describing the complex progression in order to distinguish between a text's creator, copyist, translator and publisher throughout its history.It is possible to construct this ontology, and we are able to propose an overview of it here, produced using Protégé software 42 (Figures 7 to 11      Implementation of this ontology is not immediately desirable, since it would lead to a very significant technical constraint by causing, apart from the creation of one or more namespaces, the need to attribute a considerable number of URI because of the growing size of the database and manage the negotiation of content between the HTML and RDF versions for each item of data.This is a very real constraint, and one to which experts in the bibliographical semantic web have been drawing attention for quite some time 43 .

Conclusion
As we see it, it is not pertinent, because of the nature of our data on ancient texts, to only use tools such as Nakala to make sure that they are displayed and harvested.Our philological aims are as varied as they are complex; there is therefore always a constraint to either force their harmonisation so that it is not necessary to draw up over-heavy sharing schemas, or simplify them artificially so that they fit into categories that were not designed to accommodate them.
43 "The attribution of URI specific to your set of data for secondary entities is thus an additional burden since you will have to maintain them; however, it is also a security feature in terms of the consistency of your set of data.It all depends on how much you trust the sets of data whose URI you re-use, and the ease of discerning which entities you are manipulating within these external artefacts.Happily, there are other less ponderous solutions for displaying data such as ours in the best possible way.Some are already virtually in place, as a result of our initial choice to cross our own data with as many external repositories as possible.The alignment work carried out with the VIAF, the Pinakes database authorities, dataBNF, or even, in the case of our cartographic extension, several geographical repositories, may ultimately make it possible to envisage entering IPhiS data via channels that are not strictly philological, either via a general bibliography or by using access paths that cease to have anything to do with the base material and are semantic by nature from the outset -or at least readily convertible into a semantic logic.
What is more, we have long given CIRIS users the possibility of downloading a large part of the data in open format for their personal use.This is the case not only for the thesauruses of texts by ancient authors, which are directly downloadable in *.csv format in CIRIS, but also for the edition records that can be exported in RIS format, thereby making them compatible with most bibliographical management software.In the cartographic extension carried out using the open application uMap44 , all the data for the maps can also be downloaded (Figure 12).Diversifying the ways of displaying our data does not in any way mean having to renounce full interoperability; quite the contrary, in fact: it means optimising this sharing by selecting, for each aspect of the items of data, the tool best suited to their display.It is important to take care in determining which item of data should be entrusted to which tool.Doing so takes advantage of the specific features of the items -their transmission is complex, they are charged with a scientific uncertainty that it is our duty to make known, and they represent a wealth of variety of types that it would be wrong to attempt to standardise.An effort still needs to be made regarding the data contained in thesauruses.Here we feel the choice to retain a firm barrier between back-and front-office remains pertinent, since this is checked data that must be subjected to tight editorial control, under an expert eye.That is why, while contributions to the IPhiS database are allowed, in order to enable external users to

Figure 6 :
Figure 6: Complete process of publication and display. ).

Figure 8 :
Figure 8: Hierarchy of properties of IPhiS ontology in Protégé.

Figure 9 :
Figure 9: IPhiS ontology, extracted from the description in RDF: semantics of Texts.

Figure 10 :
Figure 10: IPhiS ontology, extracted from the description in RDF: semantics of Editions.

Figure 12 :
Figure 12: Access to geographical data in CIRIS.
If you create your own URI for secondary entities, you can always link them to the others at a later stage (see A.7)." [trans.][Bermès, Isaac & Poupeau, 2013 paragraph 49].