A well-behaved document is an electronic document that is both user friendly and search friendly
About
						Why?
The Open Access idea is gaining momentum and the sheer amount of (scientific, professional and other) documents available on the Internet makes keeping an overview a real challenge. To best organize the mass of material that accumulates over time and re-find the information again when needed for work, documents must be easy to peruse and easy to classify with little or no manual intervention. And you must have suitable software (such as digi-libris Reader) to automatically index and alphabetically sort all newly added documents to help you keep track of it all.
A well-behaved PDF or ePub document that is user friendly and search friendly offers important advantages for document producers, distributors and end users:
- facilitates scholarly communication
- is easier to discover and retrieve
- is easier to be found again in one’s personal knowledge base
- offers better authors’ exposure
- has a better chance of being referenced.
User friendly
means  a document is easy to read and easy to navigate on any reading device   and for which reading software is readily available. It is in an open   format and does not depend on proprietary (paid) software for display,   styles and multimedia content. It must be searchable, has bookmarks (in   applications that allow for it, such as PDF files in Acrobat or Adobe   Reader), an interactive table of contents, i.e. one with “clickable”   links to the correct target page, and possibly an interactive index,   cross references and links to external resources. Except for copyrighted  material it should not be password protected or encrypted but must   allow the user to print it out and to copy/paste portions of the text   and possibly to add bookmarks and comments of his own.
This  applies not only to scientific papers, monographs and manuals but to   all documents that one would consult or refer to rather than read in a   continuous stream from cover to cover, like novels or literary works.
								Making documents interactive and embedding metadata does not necessarily require any extra work if properly planned and some simple rules (consistent use of styles) are observed.
The author having spent a year on a thesis can certainly spend 10 more minutes to write down some keywords plus a description, the typesetter who produces a table of Content anyhow has only to check a single box before exporting to PDF and the publisher can easily import an XMP file containing metadata into the final document.
								Search friendly
is  a document that has useful embedded meta data which librarians, digital  asset managers and individuals can exploit to classify a document in  his personal knowledge base with  little or no manual intervention. 
University  and public libraries prefer to keep the meta data of all their   documents in separate catalogues or data bases for reasons of integrity   and maintainability, but since one does not exclude the other,  embedding the same meta data or a selection thereof also directly into a  digital  resource, automatically makes this data available to third  parties who  download or otherwise obtain access to such resources which  they may  want to preserve locally in their own knowledge base and/or  to consult  off-line. Notation in attribute/literal pairs is probably  adequate for  most private or local repositories.
Search-friendly scholarly publications
Search-friendliness,  or machine-readability, is increasingly important in view of the global  influence of digitization and open access in the changing publishing  and archiving environment. Most scholarly publications are becoming  available on the Internet, which makes their processing and systematic  archiving a real challenge. To organize a bulk of the Internet content,  scholarly papers should be easily classifiable with little or no manual  intervention, which requires properly embedding metadata. Explicit  metadata facilitate the work of librarians, digital asset managers and  non-expert users because
- sources are automatically classified and indexed for searching across a collection of documents
- journal submissions and publications are easier to locate and cite
- interdisciplinary networking and sharing of information is facilitated
- authors get more and better exposure
- no need to hunt for citation relevant metadata on the Internet, particularly beneficial for students who lack patience or the wherewithal to locate relevant repositories
- citations and bibliographic references can be generated off-line, a must for self-published articles, work in progress and editorial content.
Metadata standards
Dozens  of metadata standards are currently available, each being linked to its  own vocabulary.  Unfortunately, none of the standards is universally  applicable. A student seeks data to generate citations while an expert  searches his collection of papers, employing certain technical criteria.  Information about book publishers, image or painting copyrights  holders, song writers, or architects of ancient pyramids are all  essential metadata and attributes of the items. Metadata are processed  to classify items in search engines to share them with the global  community. Different users seek different pieces of data. Art critiques,  veterinary specialists, physicists, and lawyers download contents from  interdisciplinary web domains, and they would prefer to do so without  manual intervention, relying on embedded metadata.
Universities  and public libraries are challenged to upgrade their services and to  more actively contribute to scientific research. Although they prefer to  integrate and preserve metadata of all their documents in separate  catalogues or databases, I think that one should not exclude the other.  Embedding a descriptive selection thereof in a digital resource  automatically makes this data available to users for off-line consulting  and referencing. And it saves their time. A notation in  attribute/literal pairs is probably adequate for most private or local  repositories. A separate sidecar Extensible Metadata Platform (XMP) file  can be linked or sent along if direct embedding is impossible (eg due  to checksum).
A pragmatic solution
Documents  with embedded metadata are gradually increasing in open-access  repositories and on publishers’ websites. It is partly due to the  institutional requirements to provide metadata along with documents.   New forms of metadata such as those on HTML pages pointing to Facebook  and Twitter are constantly developing. Citation specific variables are  currently used in conjunction with Citation Style Language (CSL). And,  adding to the jumble, there is a wide range of proprietary name spaces,  where each organisation defines metadata specific for different  subjects. A document can, therefore, include hundreds of metadata  variables, which may or may not be meaningful for users. Solution to  this issue should be universal. I suggest an individually extensible and  universally applicable metadata set that builds on the
widely  used Dublin Core standard (minus refinements) plus an unlimited number  of customizable attribute/value pairs for the data. Consider it as an  alternative Dublin Core application profile (DCAP) for individuals who  may or may not have to rely on a single standard issued by a parent  institution. For the exchange of metadata with third parties it relies  on Adobe®’s XMP technology.
Who should provide and embed metadata?
- The ultimate responsibility for the inclusion of useful metadata lies with the publisher. However, all other stakeholders in the development of adocument, from author through to distributor, should also contribute by adding metadata to the final versions of their documents because
- authors know their subject best and should propose relevant tags to their papers’ abstracts and citations. Ideally, they should generate a list and submit it along with their manuscripts. XMP sidecar files are probably the best option to ensure integrity of their metadata
- reviewers may suggest changes to the titles and descriptions in addition to factual adjustments
- editors and translators may include different data and add keywords for optimal searches through search engines
- publishers can adapt metadata and add specifics such as Creative Commons licenses, copyrights details, dates of submission and acceptance, and ISSN/DOI identifiers.
- libraries and content providers, who gather metadata for their catalogues, should ensure that useful metadata accompanies each document for automatic classification, indexing, and retrievability.
Adding metadata to PDF files
Adobe’s®  XMP™ technology is well suited for embedding metadata. This is the  format implemented in PDF documents. It has placeholders for Dublin Core  elements and other standard meta types such as Dicomed for medical  applications and IPTC which is used by the International  Press community and professional photographers to secure their  copyrights. It also allows to define proprietary sets with their own  namespaces as well as unlimited number of custom attribute/variable  pairs which can be used to describe anything. To view, to edit and to  export metadata, a suitable (free or low cost) software is required. To embed these in a PDF document Acrobat® or another PDF tool that can  import XMP files are used.