PDF, EPUB, HTML and KaTeX document formats are directly supported by digi-libris, i.e. their meta data, if any, are automatically recognized and serve to classify, index and display the document
digi-libris Reader also reads and writes dMeta packages (= data+metadata XMP - see there) which act like documents with embedded metadata and can be used with any type of file including research relevant data sets.
All other documents (e.g. from word processing, spreadsheet order presentation applications) can, of course, also be entered into the digi-libris listing and if their extension matches a program on your computer, you can even open them with a double click. To classify these documents you may have to manually enter the relevant meta data or import them from an accmopanying XMP sidecar file.
PDF This is the most popular file format for multipage electronic documents. There are several applications that can produce files in this format, above all Acrobat® and other Adobe® programmes. While Acrobat and its PDF-writer allow you to enter the basic meta data (title, author, subject and keywords) which it also lists in its advanced meta data viewer as appropriate DC-elements, DC-terms are not supported, albeit they are recognized/tolerated if correctly entered with third-party tools such as digi-libris Reader.
It also includes its own XMP set of meta data, but no tool to easily manipulate these within Acrobat itself. All you can do is export/import all advanced meta tags to/from an *.XMP sidecar file.
Another issue with PDF is encryption and password protection. While digi-libris can read the basic four tags in most cases, it may not be able to decipher any other metadata they may contain unless the metadat is attached, e.g. as and XMP sidecar file (see also under blog about dMeta).
Suitable software can extract from, modify and update open PDF files with both DC-elements, DC-terms, citation relevant tags and custom attribute/value pairs as well as other any other well defined schema (DICOMED, IPTC, TIFF etc.).
EPUB This format is the de-facto standard for ebooks although there exist now various flavours, some of which might be proprietary and some may be encrypted (DRM) which may make reading embedded metadata impossible.
Some layout programs (e.g. InDesign) can generate it directly, most major ebook readers can handle it and some schools in California made it even mandatory and distributed free eBook readers.
An EPUB file is essentially a zip package containing all elements of the book in html or plain text format. The standard includes a pointer to a separate text file (usually called content.opf) holding all meta data.
Most examples we have seen use dc:terms correctly. Suitable software can interpret and modify these without problem.
HTML This is the most uncontrolled format of all in terms of embedded metadata and there are literally hundreds of applications that can create it, each adding its own flavor of syntax and scripts.
While the <meta name= tagging> in the <head> block is usually respected by all, we have seen the wildest excesses of what follows after the equal symbol.
By far not all originators use <dc... or <dc.terms> and often those that do add fancy designations of their own. Recently we have seen custom tags with Facebook and Twitter prefixes which often contain citation relevant values.
Suitable software should contain a tool that allows re-mapping unusual meta names to valid dc.elements, dc.terms, citation variables or custom value/attribute pairs even before adding these to a collection.
Another issue with HTML files is that many are virtual or created on the fly, e.g. in response to a query, and many are the result of multiple re-directions and are not necessarily the file the user thought he clicked upon. This happens typically on sites built with frame sets or master pages with many links.
Suitable software should allow collections and tables of contents to include dc.elements and dc.terms of virtual files as well.
As downloading such files can yield zero-length files, it is recommended that all downloads be verified before they are added to a collection.
For this reason digi-libris shows all the files that were opened during a single click on a link.
LaTeX This is a typesetting program primarily used in scientific circles for its ability to display complex mathematical formula. Because publishing is important for researches, their output should contain or be accompanied by certain relevant meta data to facilitare sharing and the exchange of knowledge as well as publishing in a way to survive peer reviewing and reproduction in PDF format.
With digi-libris the embedded meta data in LaTeX files can be read to help classify a document, it can also be edited and supplemented and then be exported as XMP file which can be attached to the submitted document (see also dMeta). Third parties (e.g. publishers) can easily import these XMP files into the final document prior to on-line publication.