Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 05-20-2013, 05:11 AM   #1
simobk
Junior Member
simobk began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2013
Device: Asus TF201; Asus Nexus 7; HTC Droid DNA
Filename to Metadata

Hi all,

OK, I've been using PDF as my ebook format until now. I read a bit and decided to start using ePub from now on.

I name ALL my ebooks : Author - Title (year).pdf

Is there anyway to generate the metadata from the file?

For example, I had written this little script to do this with my PDF's :
Code:
// Get the current filename
var fullName = this.documentFileName;
// Extract author, title and year
var author = fullName.slice(0,fullName.indexOf(" - "));
var title = fullName.slice(fullName.indexOf(" - ")+3, fullName.indexOf(" ("));
var year = fullName.slice(fullName.indexOf(" (")+2, fullName.indexOf(").pdf"));
// Insert metadata
this.info.Author = author;
this.info.Title = title;
In case there is no such tool, I played a bit with the files and realized they are actually ZIP file. After extracting them, I found the content.opf file which is actually an XML file.

I should be able to write me a little script that changes the metadata for me, I just want confirmation from the more experienced users about this :
  1. Is content.opf the only file to edit?
  2. Is all the metadata contained in the <metadata> tag?
  3. I am under the impression that the only "standard" tags are the ones starting with <dc:...> and everything else is editor specific. Please confirm?
  4. Is there always a cover.jpg inside the files?

Thanks for any and all help!

Simo

Last edited by simobk; 05-20-2013 at 05:17 AM.
simobk is offline   Reply With Quote
Old 05-20-2013, 05:52 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Ad 1: yes
Ad 2: yes
Ad 3: not by default. There are more. The Dublin Core is used. Check that for full specs. You can also check the IDPF site for the official ePUB specs. I would advise the ePUB2 version, as that one is generally used.
Ad 4: no, that is not required. Only author, title and language are hard required.

Be aware that it is a special zip file. Packing must be done according to certain rules, or the result will no longer be an ePUB file.
Toxaris is offline   Reply With Quote
Advert
Old 05-20-2013, 06:25 AM   #3
simobk
Junior Member
simobk began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2013
Device: Asus TF201; Asus Nexus 7; HTC Droid DNA
Hi Toxaris and thanks for the answer.

First of all, I guess the fact that you're answering these questions means that there is no automated tool yet

I googled a bit and read a bit more. This is what I came up as being "correct" metadata that contains the fields I am interested in :
PHP Code:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:opf="http://www.idpf.org/2007/opf">
  <
dc:creator opf:role="aut" opf:file-as="Arthur Conan Doyle">Arthur Conan Doyle</dc:creator>
  <
dc:title opf:file-as="A Study in Scarlet">A Study in Scarlet</dc:title>
  <
dc:date>1887</dc:date>
  <
dc:subject>DetectiveCrimeMysteryNovel</dc:subject>
  <
dc:description>A Study in Scarlet is a detective mystery novel written by Sir Arthur Conan Doyleintroducing his new character of Sherlock Holmeswho later became one of the most famous literary detective characters.</dc:description>
  <
dc:language>en</dc:language>
  <
dc:identifier id="BookId">urn:uuid:9ef8ecb0-c134-11e2-8b8b-0800200c9a66</dc:identifier>
</
metadata
Does that look to you like correct metadata? I'm specially wondering if I am putting the right stuff in subject and description? I also couldn't find "standard" separators for the subjects?

I will take a look later at the ePub 2.0.1 specs. I'll try to find a more "to the point" source though!

Quote:
Originally Posted by Toxaris View Post
Ad 4: no, that is not required. Only author, title and language are hard required.
I guess that means I'll have to use conditional statements, if there's a "cover.jpg", use it as a cover, otherwise, use the first image that appears as the cover. I'll read more about it later as I am under the impression there needs to be an associated xhtml.

Quote:
Originally Posted by Toxaris View Post
Packing must be done according to certain rules, or the result will no longer be an ePUB file.
Care to develop?

There goes my hope for a "quick" solution!
simobk is offline   Reply With Quote
Old 05-20-2013, 07:36 AM   #4
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,507
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by simobk View Post
Hi Toxaris and thanks for the answer.

First of all, I guess the fact that you're answering these questions means that there is no automated tool yet
I'm not quite certain what it is you want to do. If your current books are PDFs, converting them to ePubs is going to be error-prone. The Meta-data will be the least of your worries!

However, I beleive that it might be possible to configure calibre to extract metadata from imported file names.

But if you want to create your own ePubs using a tool that you write, you're going to need to delve into the specifics of the ePub format, and the best place is the idpf web site, since they originated the specifications for ePub.
pdurrant is offline   Reply With Quote
Old 05-20-2013, 09:00 AM   #5
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You can always load your ePUB in Sigil. There is a simple metadata editor there and it is also possible to set the cover right. If no cover is specified, a lot of readers will take the first page as cover.

Pdurrant is right, there is no good tool to convert from pdf to ePUB. There are a lot of mediocre tools for the conversion. Depending on the PDF, expect a lot of post work to clean everything up.
Toxaris is offline   Reply With Quote
Advert
Old 05-20-2013, 09:07 AM   #6
simobk
Junior Member
simobk began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2013
Device: Asus TF201; Asus Nexus 7; HTC Droid DNA
I do understand the difference between the formats, so no, I am not converting PDF's to EPUBS. I am little by little redownloading all of them (most of them are 100+ year old book not copyrighted anymore)

Thank you for your answers... I played with some files, and I realize it is way too complicated for a script as not all the metadata can be in the filename. I guess I will end up doing it manually over a few weeks.
simobk is offline   Reply With Quote
Reply

Tags
batch file rename, epub, filename, metadata


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex and Metadata from filename. asrrin29 Calibre 5 12-03-2023 04:51 AM
Metadata in Title/filename mezme Calibre 0 08-18-2010 03:08 AM
Need help with metadata by filename artbatista Calibre 17 12-19-2009 07:51 AM
Little Help with Metadata from Filename needed plunderydoo Calibre 4 09-06-2009 08:34 AM
Metadata from filename problem kad032000 Calibre 0 05-24-2009 02:26 AM


All times are GMT -4. The time now is 03:38 AM.


MobileRead.com is a privately owned, operated and funded community.