Uncategorized reinout | 04 Feb 2010 05:07 pm
Bibliographic metadata formats
Dear Lazyweb,
I need to represent some bibliographic information records like this one in a machine-readable format. But I wouldn’t like to invent my own format when there are dozens to choose from. Could anybody point me to a (preferably semantic web-compatible) format suitable for this purpose that’s also easy to generate using Java?
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Netherlands License.






on 04 Feb 2010 at 18:05 1.RubenV said …
BibTeX?
on 04 Feb 2010 at 18:24 2.Rodrigo Moya said …
The best thing I know for bibliographic information is LaTeX, which you can convert to HTML easily. Also, generating LaTeX from code should be easy also.
Not sure if there are better options though, and if that would fit your needs
on 04 Feb 2010 at 18:30 3.Morten Juhl-Johansen Zölde-Fejér (mjjzf) said …
I played with BibTeX and RIS (http://en.wikipedia.org/wiki/RIS_%28file_format%29) for Aigaion, a web-based bibliography management software:
http://www.aigaion.nl/
Aigaion is nice for this, because you can paste a block to import it.
on 04 Feb 2010 at 18:58 4.kurmucis said …
Hi,
just take a look:
http://www.loc.gov/marc/
– You could try to use most suitable subversion.
).
Sorry, don’t know about possibilities to work via Java, maybe MARC’s forums could be You friend.
… just a glimpse – my wife said it’s powerfull and popular ( she has some 25 years background as librarian
Good luck!
on 04 Feb 2010 at 20:05 5.Mikkel Kamstrup Erlandsen said …
It’s a question I’d like an answer for as well. But my experience as a developer in the library world is that “pure” semantic web is not something that’s easy inside the library world because the general rules that govern the metadata are extremely complex. – And people tend to be highly polarized in their preference and interpretations of the standards.
This is probably why we have been struggling with FRBR for years and years, but never seem to really get there.
If I where to name-drop a few emerging standards that may be of interest to you it would probably be AtomPub and Jangle – but I don’t know if they are exactly what you are looking for…
on 04 Feb 2010 at 22:12 6.Frej Soya said …
Just remember that the important part is semantics (ie. ontologies), and dublin core gets you quite far for a very basic set of 15 el elements… From there you can always expand…with you own schema/ontology (works fine for internal use) or whatever is else is out there…. also RDF is a very nice and simple abstraction for structuring meta data….
The ‘oldtech’ way is just to use bibtex/ris or similar…avoid MARC it’s old and comes with a lot of cruft….. (even the XML version).
But it’s been some years since i worked at a university library.. one thing to remember about RDF is that XML serialization is just a possibility (and pretty much only handy for machines…..).
But for java it shouldn’t be a problem ,there several RDF parsers out there, that works with several format.
on 04 Feb 2010 at 22:56 7.Shaun McCance said …
Do the bibliography formats of DocBook or TEI suit your needs?
http://www.docbook.org/tdg/en/html/bibliography.html
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#COBI
on 04 Feb 2010 at 22:59 8.Stuart said …
marcxml is the Library of Congress update to marc 21 for modern electronic storage of bibliographic data.
http://www.loc.gov/standards/marcxml/
on 05 Feb 2010 at 1:00 9.Wouter Bolsterlee said …
Perhaps the Dublin Core vocabulary [1] expressed in RDF (possibly represented as RDF/XML) [2] satisfies your needs… optionally you can extend this with your own terms in additional RDF schemas.
[1] http://dublincore.org/documents/dcmi-terms/
[2] http://dublincore.org/documents/dc-rdf/
on 05 Feb 2010 at 9:53 10.Niko said …
Try this one:
http://www.ncd.matf.bg.ac.yu/casopis/14/NCD14043.pdf
see chapter 5:
5. Unified E-Book Format
You can use any bibl. format inside that format
on 05 Feb 2010 at 13:27 11.xurfa said …
Today I wouldn’t consider anything that is not unicode-positive. I would try e.g. the UNI-MARC standard used in libraries.
on 05 Feb 2010 at 15:00 12.pel said …
I’ve never come across anything that has had any wide use except Dublin Core (or can produce it on demand) – that goes especially for semantic web stuff.
Well there are other formats in wide use – but I would refrain from calling them “bibliographic”.
on 06 Feb 2010 at 8:47 13.chris said …
OpenURL COinS: A Convention to Embed Bibliographic Metadata in HTML: http://www.ocoins.info/
on 06 Feb 2010 at 23:28 14.Jakob said …
I all depends on what you want to do with the data – “machine-readable” is almost everything to some degree. If your requirement is to be semantic-web compatible, use The Bibliographic Ontology. If you want to use the data for bibliographies, use BibTeX. If you want to exchange data with library (and only with libraries), use some of their obscure formats like MARC. Ok, MODS is worth a try: It’s XML and compatible with MARC.
on 08 Feb 2010 at 13:50 15.mee said …
Bibtex is good. If you don’t like its format, there is an application called “referencer” which uses a xml-like file to manage bibtex files. Since xml might be more “machine-readable” than bibtex you could take a look at “refeence”