Quote:
Originally Posted by kovidgoyal
And you need to learn how to use tools to parse text output. Install cygwin and use grep, cut, and tr tools to get ids.
|
In fact text parsing and processing is probably the favorite thing I do with a computer, and Cygwin I use since 1999.
This is why I found it hard to parse
calibredb's output:
Code:
id title authors
3 Exchangeable image file format for digital still cameras: Unbekannt
Exif Version 2.2
6 R4 Bedienungsanleitung PDFCreator
7 R9 Bedienungsanleitung Unbekannt
9 E-1 Bedienungsanleitung asanuma
10 E-5 Bedienungsanleitung (DE) bartdr
15 Financial Applications Using Excel Add-in Development in Steve Dalton
C/C++
16 Auszüge aus den Bramahnas und Upanishaden Alfred Hillebrandt
17 Markandeya Purana Jens Grünewald
22 The CRAY-2 Computer System, 1985 Unbekannt
25 Devi-Mahatmya Klaus Mailahn
Names are separated by blanks. It is hard to tell when the book title ends and the author begins, because the fields
id,
title and
author are not tab-separated. They're separated by blanks like words in the
title and the
author fields. Furthermore some titles are broken into separate lines (3, 15). Also
title and
author seem to have no fixed columns. In the output the title/author currently starts at the odd positions column 3/61. What if id > 99?
Finally, how to get the title of my own files?
exiftool -title will extract the title as one-liner for PDFs but not for EPUBs.
ebook-meta --title can only set the title, but not read it. Correct?
calibredb refuses to import an EPUB when its title is already used. When this is true, isn't it then easier, and less error-prone, to leave title comparison in the hands of
calibredb? Parsing titles should be used by a script to decide if the
--duplicate option must be applied, and along with that, another option would be handy that replaces the book instead of creating a new instance. Both are legal cases when working with the cli-tools (which are really great!).