Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-22-2011, 03:58 PM   #1
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
Smile ebook-convert issues

I'm getting thousands of lines like:
Property: Unknown Property name. [1411:5: panose-1]
Property: Unknown Property name. [1416:5: panose-1]
Property: Unknown Property name. [1421:5: panose-1]
...when converting from epub to mobi format.

I'm using a Powershell script to batch process files, using the command line tool ebook-convert.exe

Any chance of updating the conversion subroutine to understanding the Panose character recognition standard?
Lightsource is offline   Reply With Quote
Old 03-22-2011, 04:36 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
panose character recognition is not valid CSS 2.1, which is what calibre is quite correctly telling you.

You almost certainly got the input documents from a Micro&soft product, which produce this invalid junk. For example in Word use save as webpage filtered to produce HTML without this junk.
kovidgoyal is offline   Reply With Quote
Old 03-22-2011, 05:52 PM   #3
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Moderator Notice
Moved to the appropriate subforum.
Manichean is offline   Reply With Quote
Old 03-23-2011, 12:34 AM   #4
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
Powershell example script (ebook batch conversion)

I originally found this forum when looking for a solution to batch processing files (converting) using Calibre's command line interface. It seems there are a number of folks having the issue, and a number have suggested running the batch file from Powershell - didn't work for me.

Neither did prefacing each invocation of ebook-convert.exe with any combination of start, wait call or c/|!@#$%^&* variations. It will run once, then the command shell will simply not run a second instance of ebook-convert.exe. It doesn't start, it doesn't fail, no errors, it simply does nothing on any subsequent attempts to execute, unless you close the command prompt and open a new one.

I wanted to do this because I had about 960 epub books that my sister asked me to convert for her new Kindle (and since I didn't want to load all of her romance novels into my Calibre database, I didn't want to do this from the GUI).

I've seen a dozen posts around the internet grousing about this behavior, and a number of them just skulked off and said something like, well, I guess I'll just go do it in Powershell. This was totally unhelpful to me, as no one actually DID IT and shared.

So, I've done it and I'm sharing. Attached is a Powershell script that will run multiple instances of ebook-convert.exe sequentially. It will allow you to batch convert ebooks using Calibre's command line conversion utility.

This script is hard coded to run a conversion from epub to mobi, however, it is well commented and even if you don't know how to code in Powershell, you should be able to read it and modify it to do any supported conversion.

The only caveat is, in order to run Powershell scripts, you have to install Powershell. It's free, and runs on just about any version of Windows.

Powershell is available from Microsoft here:
http://support.microsoft.com/kb/968929
(the link is at the bottom of the page - there's a separate installer for each platform)

You can also just google the following phrase and find it:
"download Powershell site:microsoft.com"

Since PS is such a powerful tool (you can access machines anywhere in your network and remote-execute scripts on them), it is disabled out of the box - once you install it, you have to enable script execution - there's a very clear and concise article about it here:
http://www.tech-recipes.com/rx/2513/...cript_support/

Once you have PS installed and enabled, you can just run this script - It will ask you for the directory of your ebooks (the ones you want to convert), then will grab all of the epub files and process them one by one. Since each conversion takes about a minute (on my system, anyway - dual proc 32-bit XP), if you have 60 files, it will take about an hour; 120 files would take roughly two hours, and so on.

Download the script, read the comments. If you want to convert say, LIT to EPUB or PDB to HTML, you can just modify the script - it's commented so that you should be able to figure out exactly what to change and where to do what you need, without being a scriptmaster.

I have a number of revisions in mind for this, but it's a good starting point, and it works as is - hope this helps those of us that are frustrated by the lack of a batch conversion tool.
Attached Files
File Type: zip CalibreConvert_EpubToMobi.zip (1.7 KB, 323 views)

Last edited by Lightsource; 03-23-2011 at 12:14 PM. Reason: Updating Attachment
Lightsource is offline   Reply With Quote
Old 03-23-2011, 01:03 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You know you could just have setup a new library in calibre, imported the books into it and converted in the GUI, but thanks for the script
kovidgoyal is offline   Reply With Quote
Old 03-23-2011, 01:06 AM   #6
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Seems like overkill. In WinXP, using a simple DOS cmd shell/Command Prompt, I can convert via many instances of ebook-convert by prefixing EACH line with "start /w " where it is invoked.

I've successfully used this within my GuteBook conversion program's rebuild DOS batch file, namely:
Code:
rem Convert .htm to Sony .epub
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.epub" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg" --chapter "//*[name()='h2']" --output-profile=sony

rem Convert .htm to Sony .lrf
start /w ebook-convert "28700-h\28700-h.htm" "Paul Creswick - Robin Hood.lrf" --title "Robin Hood" --authors "Paul Creswick" --publisher="Project Gutenberg"
Or when converting older PalmDoc .pdb to .epub, en masse, using batch files and "for ... do" calls :
Code:
start /w ebook-convert "Austen, Jane - Emma".pdb "Austen, Jane - Emma".epub --authors "Austen, Jane" --title "Emma" --no-default-epub-cover --chapter "//*[(name()='p' and re:test(., '^chapter |^book |^section |^part \S+', 'i')) or name()='h1' or name()='h2']" --chapter-mark=none --output-profile=sony 

start /w ebook-convert "Austen, Jane - Pride & Prejudice".pdb "Austen, Jane - Pride & Prejudice".epub --authors "Austen, Jane" --title "Pride & Prejudice" --no-default-epub-cover --chapter "//*[(name()='p' and re:test(., '^chapter |^book |^section |^part \S+', 'i')) or name()='h1' or name()='h2']" --chapter-mark=none --output-profile=sony
It's worked without issue ever since it became broken in calibre's windows version. Obviously, your OS may be hampering this, but it DOES work when prefixed with "start /w "!!!

Last edited by nrapallo; 03-23-2011 at 01:08 AM.
nrapallo is offline   Reply With Quote
Old 03-23-2011, 02:01 AM   #7
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
Nick:

While I did try that (and all the other "preface your lines with this" suggestion that I found, it is quite possible that I did it in a command shell that had already been killed by the process - in other words, what I did was run my original batch, note the behavior, research on the web, find the suggested fixes, modify the batch file, then attempt to re-run it in the same command prompt shell. (what was I thinking?)

I probably never killed the shell and restarted between each attempt to run the batch file. However, even though it _is_ admittedly overkill, the Powershell script works reliably and the only issues that I have at this point are those caused by corrupt input books - as Kovid pointed out above, garbage in, garbage out.

-light
Lightsource is offline   Reply With Quote
Old 03-23-2011, 04:42 AM   #8
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
Quote:
Originally Posted by kovidgoyal View Post
You know you could just have setup a new library in calibre, imported the books into it and converted in the GUI, but thanks for the script
Hmmm... justification for reading the manual - I didn't know that I could maintain multiple libraries. I'll have to check and see if I can set the data storage to two separate locations as well. I'm afraid that Jim Butcher and Stephen R. Green would gang up on Jim Patterson and thrash him soundly if they shared the same directory tree.

Next week, I'm starting on re-inventing the wheel, 'cause I have got that kind of time

I've had Powershell installed and quietly waiting for several months now - I just needed the motivation of something useful to DO with it. This fit the bill; I'll update the script as time permits.

-light
Lightsource is offline   Reply With Quote
Old 03-23-2011, 06:07 AM   #9
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Lightsource View Post
I'll have to check and see if I can set the data storage to two separate locations as well.
You actually have to, otherwise, you'll overwrite your database. If you create a new library, you should always use a completely empty folder for it.
Manichean is offline   Reply With Quote
Old 03-23-2011, 12:01 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Lightsource View Post
I didn't know that I could maintain multiple libraries. I'll have to check and see if I can set the data storage to two separate locations as well.
Two libraries, by definition, means two different data storage locations. What is shared is the configuration directory.
Starson17 is offline   Reply With Quote
Old 03-27-2011, 12:31 PM   #11
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
Question:
based on the man page here:
http://calibre-ebook.com/user_manual...t-8.html#id150
(the section is titled "PDF Input to HTML Output", I'm trying to clean up some files using ebook-convert.exe

Here's the command line:
ebook-convert.exe "MyFile.pdf" "MyFile.html" --new-pdf-engine --mobi-ignore-margins --no-inline-toc --smarten-punctuation --line-height=14 --base-font-size=12 --margin-top=5 --margin-left=5 --margin-right=5 --margin-bottom=5

What I get is:
ValueError: No plugin to handle output format: html

Is there something else that I can download to make this work?

Thanks,
-light

ps - this is low priority - I do know that I can batch convert in Acrobat Pro - just trying for a simple scripted solution

Last edited by Lightsource; 03-27-2011 at 01:21 PM. Reason: adding additional information
Lightsource is offline   Reply With Quote
Old 03-27-2011, 12:44 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
MyFile.zip not MyFile.html
kovidgoyal is offline   Reply With Quote
Old 04-10-2011, 03:26 PM   #13
Lightsource
Junior Member
Lightsource doesn't litterLightsource doesn't litterLightsource doesn't litter
 
Lightsource's Avatar
 
Posts: 8
Karma: 216
Join Date: Mar 2011
Location: Houston, TX - USA
Device: Kindle 3
I also have a series of files in txt format (Windows 1252: Western European) that lose all the quotations, colons, ticks, emdashes, etc (they become squares) on conversion to mobi. I've tried changing the encoding to a variety of different things, to no avail. Is this a job for "recipies" and some regex scripts? I'm using

--mobi-ignore-margins
--no-inline-toc
--smarten-punctuation
--output-profile=kindle

I also see options like --asciiize and the regex search and replace options - does anyone have suggestions about how to go about troubleshooting issues like this? What I mean is, when the conversion is done and it's a mobi, I can't "see" what the actual character is (it's a square). When I'm viewing it in a text editor before conversion, it's ok to me (whether it's a quote or an emdash, whatever) because I am using a smart editor (editpad, mostly) the displays the character.

I'm going to take a shot in a few with converting them all to html and doing some f/r, then converting to mobi - I'll update with the outcome.
Lightsource is offline   Reply With Quote
Old 04-10-2011, 04:32 PM   #14
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
This definitely sounds like a character-encoding type problem so it probably means that you have not hit on the correct one . You could try running the conversion in debug mode, and look at the HTML file prior to its conversion to MOBI. That might give a clue?
itimpi is offline   Reply With Quote
Old 04-10-2011, 04:45 PM   #15
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Lightsource View Post
When I'm viewing it in a text editor before conversion, it's ok to me (whether it's a quote or an emdash, whatever) because I am using a smart editor (editpad, mostly) the displays the character.
I see the same thing dealing with older DOS text files. My "smart text editor" (TextPad) has a "dumb down" text mode which it calls "Convert to DOS". It usually helps with quotes and hyphens, but may not do so well with UTF-8 "characters". In that case try the "conversion to .html" method you outlined below.

Quote:
I'm going to take a shot in a few with converting them all to html and doing some f/r, then converting to mobi - I'll update with the outcome.
When the text is converted to html (I've use the freeware utility called text2html), I would then also pass it through Tidy to change the literal characters to HTML codes. Those you could then convert using regex's to their equivalent more popular character encodings, if you like.
nrapallo is offline   Reply With Quote
Reply

Tags
kovidgoyal


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Sony eBook software: v3 sync issues leonnyan Reading and Management 15 04-20-2011 04:12 PM
How to batch-convert with ebook-convert? cypresstwist Conversion 8 02-22-2011 09:28 AM
PRS-600 Issues using both Adobe ePub and eBook ahahmed Sony Reader 4 09-13-2009 06:10 PM
emailed Sony about ebook issues JSWolf Sony Reader 50 06-06-2007 02:34 PM


All times are GMT -4. The time now is 11:02 PM.


MobileRead.com is a privately owned, operated and funded community.