I am happy to announce a new and simple project called Atlas de Datos. Atlas de Datos is a catalogue of sources for digital texts in Spanish in any format (web, eBook, PDF, txt, XML-TEI…). It contains more than 40 sources of texts that I have known in the last years, specially during my activity in the CLiGS research project at the University of Würzburg.
In its current format, Atlas de Datos is a CSV table in GitHub. This platform gives a beautiful rendering for this format makes searchs very easy. For example, we can look for the digital sources that have something to do with poetry:
One question that I receive often is about the resources of texts in XML-TEI in Spanish. The best answer I can give now is this:
A CSV in a GitHub repository isn’t fancy. I have put more effort in the quality, structure of the data and accessibility than its default presentation. Once you have your information well organised, you can export it to other tools or formats and visualise specific aspects. For example, in the following chart, done with RAW, we can see the years on the y-axis, the kind of edition (non-professional, opportunistic, philological or critical philological) on the x-axis, the format as colour (red: HTML; blue: PDF; yellow: XML; green: XHTML) and the amount of texts (normalized). With this visualisation we can realise what a pioneer Gutenberg project was, which are the projects with most texts or which are the very detailed version in the last years:
Or we can cluster the projects using the information about if they are under Creative Commons Licences (right) or not (left); the subgroups show if the projects allow to download the text with a single click (right) or not (left). And we can see the good news, because the CC and full-text-downloads projects (the group far right) are almost all yellow: ¡XML-TEI!
I am very thankful to Carlos Fernández and Antonio Rojas Castro for their help looking for other projects and I would like to encourage other researchers or users to let me know about new sources of texts or possible mistakes or missing information.