Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-03-2012, 06:08 AM   #1
adrian_loetscher
Junior Member
adrian_loetscher began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2011
Device: Archos 70B
Continue conversion with debug stage?

Hi. I have a question if it is possible to enter in the conversion process on a specific debug-stage (for example the input-stage)?

I would like to do this, because I have to use more than 3 search&replace criterias to get the whole table of contents from the pdf.

I thought that I would use the following steps:
  1. Convert the pdf-file to epub using only the DEBUG_PIPELINE option
  2. Then go to the input-directory in the debug-folder and edit manually the generated html-file, using for example sed.
  3. Then continue with the conversion with the files in the input-directory, withouth using the input-plugin again and enter in the conversion process on the pre-processing stage.

But I can't find such an option 3, perhaps it isn't possible? My reasons, why I would enter in the input-stage are the following:
  • I would have to use more than three replace-criterias to convert the pdf completely without manually editing.
  • I could do some finetuning on the converted ebook in Sigil, but the creation of a table of contents is difficult in Sigil and would be simpler with more than 3 search&replace criterias in Calibre.
  • I could create a zip-file of the input-directory and add this format in the "Edit Metadata" dialog in Calibre. But then I would have to do the pre-processing stage manually to format the paragraphs for example, because otherwhise I would get for each line in the original pdf a paragraph after conversion of the zip to epub. Perhaps I do something wrong, because the calibre documentation suggests this way:
    Quote:
    If you want to edit the input document a little before having calibre convert it, the best thing to do is edit the files in the input sub-directory, then zip it up, and use the zip file as the input format for subsequent conversions. To do this use the Edit meta information dialog to add the zip file as a format for the book and then, in the top left corner of the conversion dialog, select ZIP as the input format.
  • Or I could manipulate the html-file in the parsed-directory, which already is pre-processed and then zipping and importing this directory into Calibre to convert from zip to epub. But with the pre-processing some useful information gets lost to manipulate the html-file with sed.

I hope, that I could explain my problem. Thanks you for any response.

Adrian.

Last edited by adrian_loetscher; 01-03-2012 at 06:38 AM. Reason: Add an information from the calibre-documentation
adrian_loetscher is offline   Reply With Quote
Old 01-03-2012, 06:58 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,433
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Simply convert the opf file in the input subdirectory.

ebook-convert whatever.opf output.epub
kovidgoyal is offline   Reply With Quote
Old 01-03-2012, 09:00 AM   #3
adrian_loetscher
Junior Member
adrian_loetscher began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2011
Device: Archos 70B
Thank you, just more a question

Thank you very much for the response and your great software! It's really great to convert directly the .opf-file.

Yet I only still have the problem with the paragraphs, as it seems, that the conversion is retaken from the beginning, using the html-input plugin. The following code:

Quote:
<hr>
<A name=6></a><b>Introduction</b><br>
Some_text_on_the_first_line_of_pdf<br>
Some_text_on_the_second_line_of_pdf<br>
is converted in this manner (in the parsed-stage):

Quote:
<p><b>Introduction</b></p>
<p>Some_text_on_the_first_line_of_pdf</p>
<p>Some_text_on_the_second_line_of_pdf</p>
The output of ebook-convert is the following:
Quote:
1% Eingabe zu HTML konvertieren ...
InputFormatPlugin: HTML Input running
on D:\Temp\html-demo\demo-debug\input\metadata.opf
Parsing all content...
Generating default TOC from spine...
34% Veränderungen am E-Book durchführen ...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
67% Erstellen EPUB Output
Looking for large trees in index.html...
No large trees found
Split into 21 parts
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to C:\Program Files (x86)\Calibre2\metadata.epub
Ausgabe gespeichert in C:\Program Files (x86)\Calibre2\metadata.epub
Each line get its own paragraph, so that I would have to do your great pre-conversion manually. Do you suggest to use the parsed-directory instead of the input-directory and before that, in the first debug-conversion introduce some "marker" to detect the beginning of a new page, as this information gets lost after the pre-conversion?

Sorry for my bad english. I hope you can understand it. Thank you very much for an answer.

Adrian.
adrian_loetscher is offline   Reply With Quote
Old 01-03-2012, 09:05 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,433
Karma: 5383257
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
When you conevrt the opf the conversion runs from the beginning, there is no way to resume a conversion from after the input stage.
kovidgoyal is offline   Reply With Quote
Old 01-03-2012, 09:28 AM   #5
adrian_loetscher
Junior Member
adrian_loetscher began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2011
Device: Archos 70B
Thank you for your response. Then I will try to manipulate the parsed-subdirectory, as the pre-conversion of the pdf-file has a good resultat and recognizes the paragraphs correctly.

I thought about the following way:
  1. First conversion (pdf->epub): with options -d DEBUG_PIPELINE and substitute page-break with the marker "*****"
  2. Processing manually: Then manipulate the generated html-file in the parsed-subdirectory with sed using the introduced marker to still detect the page-break, which is important in my case to detect headings
  3. Second conversion (opf->epub): Then convert the opf-file in the parsed-subdirectory to epub
adrian_loetscher is offline   Reply With Quote
Old 01-03-2012, 12:07 PM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,249
Karma: 6020307
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by adrian_loetscher View Post
Code:
<hr>
<A name=6></a><b>Introduction</b><br>
Some_text_on_the_first_line_of_pdf<br>
Some_text_on_the_second_line_of_pdf<br>
Adrian.
Those code examples are horrible
Tag case should be kept the same case (may be allowed in some codes, but best to avoid)
breaks should self close <br />

Line1 is a naked anchor, which is not the same a a Heading tag (commonly used to identify Chapter starts


I till catch myself coding keywords in MixedCase(the language I was using permitted, made reading non-color-styled-code easy)
theducks is offline   Reply With Quote
Reply

Tags
conversion, debug, input

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amazon needs to continue K2 updates. xnowimcoolx Amazon Kindle 41 02-16-2011 11:15 AM
Power on to continue reading s2welee Kobo Reader 1 06-29-2010 09:11 AM
iLiad hangs at the boot stage haihaowu iRex 5 09-22-2009 02:26 PM
Super Mario - on the stage! PostGrant Lounge 1 08-10-2006 01:06 PM


All times are GMT -4. The time now is 05:54 PM.


MobileRead.com is a privately owned, operated and funded community.