12-31-2009, 05:12 PM | #1 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
(X)HTML Metadata
What metadata does Calibre support for (X)HTML input? Is there a setting somewhere? I couldn't find anything in the User Manual, but I may have missed it. My experiment with importing XHTML wasn't too effective.
It seems to be able to output such data well, though. We're having a discussion of Sigil's support for Dublin Core metadata in the <head> of (X)HTML docs used as input sources here. Near the bottom is a list of possible DCTERMS that cover most useful metadata as they relate to book collections, and some outliers. There is a specific list of what Sigil currently supports here. Zipped, single-file XHTML is my storage format of choice, as it should convert easily to pretty much anything. I also edit a lot of files into decently marked-up XHTML. It'd be nice to have Calibre recognize the info. Thanks! m a r |
12-31-2009, 05:37 PM | #2 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
IIRC the HTML metadata reader is optimized for the output of the ereader2html script as that is the most common use case.
|
12-31-2009, 06:57 PM | #3 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Really?
I just imported 180 or so Doc Savage books from Blackmask, using the "Add books from directories, including sub-directories" option of Calibre. Did a fine job, but the title could only be grabbed from the filename. I deleted them, then wrote a bash script to rename all the files from the folder names (which had the full titles) and re-imported. Then a bulk metadata edit took care of most of the rest. So, not terrible. And these books don't have the best choices of metadata or particularly stringent values eg: Code:
<TITLE>THE MAN OF BRONZE</TITLE> <META NAME="Author" CONTENT="A Doc Savage Adventure by Kenneth Robeson"> <META NAME="Description" CONTENT="Mystery, Suspense, History, Gothic, Literature, Books, Arts"> Sort of a chicken-and-egg thing. Thanks, m a r |
12-31-2009, 09:24 PM | #4 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Hmmm maybe html metadata reading is broken, opena ticket and I'll take a look at it when I have some ntime.
|
Tags |
dublin core, html, metadata, metadata import, xhtml |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML Metadata | drsassafras | Calibre | 10 | 09-17-2010 02:56 PM |
Recognition of author and title from html files/reading metadata from a seperate file | Lethe | Calibre | 5 | 04-03-2010 08:35 AM |
"metadata" (toc) in HTML documents | pedz | Calibre | 8 | 03-30-2010 09:23 PM |
metadata out of Html | horseman | Calibre | 0 | 08-04-2009 08:34 AM |
Wide margins in html to epub; font size mngmt; PDF metadata | dementrio | Calibre | 2 | 08-01-2009 01:33 AM |