MobileRead Forums - View Single Post

Matsendrasana · 11-20-2012, 03:51 AM

Quote:

Originally Posted by kovidgoyal

And you need to learn how to use tools to parse text output. Install cygwin and use grep, cut, and tr tools to get ids.

In fact text parsing and processing is probably the favorite thing I do with a computer, and Cygwin I use since 1999.

This is why I found it hard to parse calibredb's output:

Code:

id title                                                     authors
3  Exchangeable image file format for digital still cameras: Unbekannt
   Exif Version 2.2
6  R4 Bedienungsanleitung                                    PDFCreator
7  R9 Bedienungsanleitung                                    Unbekannt
9  E-1 Bedienungsanleitung                                   asanuma
10 E-5 Bedienungsanleitung (DE)                              bartdr
15 Financial Applications Using Excel Add-in Development in  Steve Dalton
   C/C++
16 Auszüge aus den Bramahnas und Upanishaden                 Alfred Hillebrandt
17 Markandeya Purana                                         Jens Grünewald
22 The CRAY-2 Computer System, 1985                          Unbekannt
25 Devi-Mahatmya                                             Klaus Mailahn

Names are separated by blanks. It is hard to tell when the book title ends and the author begins, because the fields id, title and author are not tab-separated. They're separated by blanks like words in the title and the author fields. Furthermore some titles are broken into separate lines (3, 15). Also title and author seem to have no fixed columns. In the output the title/author currently starts at the odd positions column 3/61. What if id > 99?

Finally, how to get the title of my own files? exiftool -title will extract the title as one-liner for PDFs but not for EPUBs. ebook-meta --title can only set the title, but not read it. Correct?

calibredb refuses to import an EPUB when its title is already used. When this is true, isn't it then easier, and less error-prone, to leave title comparison in the hands of calibredb? Parsing titles should be used by a script to decide if the --duplicate option must be applied, and along with that, another option would be handy that replaces the book instead of creating a new instance. Both are legal cases when working with the cli-tools (which are really great!).