Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 12-13-2015, 10:51 AM   #1
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Question A description of a PDF's structure

Hi!

I'd like to know about a generated PDF's HTML structure. Wild guessing allowed me to find that the inner content of a page is wrapped in a tag with class .page, but that's pretty much about it.

Is there a resource showing the big picture of a generated PDF's structure?

As a matter of fact, what I'm trying to do is, through CSS, assign a background to the full extent of my PDF's pages (much like what happens for a cover page). I tried to assign a background-color to body, but it doesn't work. I was thinking maybe Calibre wraps the entire page inside a tag with a specific classname I could leverage.

Thank you!
chikamichi is offline   Reply With Quote
Old 12-13-2015, 10:53 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre uses pdftohtml from poppler to generate html from PDF files. As far as I know, the only structure present in such files is an empty achor witht he page number at the start of every page.
kovidgoyal is offline   Reply With Quote
Advert
Old 12-13-2015, 11:09 AM   #3
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Thank you, that's a good lead, but actually, my process is the other-way around I have an HTML input (which I can't share for now for it's generated on the fly as a temporary file right before passing it down to ebook-convert; I need to find a way to snapshot it before it gets removed from the hard-drive) and that HTML is converted into a PDF thanks to ebook-convert.

I have access to the raw HTML templates eventually combined into one fully-fledged document fed to ebook-convert, though, so that might help. Within those templates, I have no ".page" tag, yet that CSS selector is being recognized by the convert tool eventually, for I can apply styles to that classname and they get properly rendered in the generated PDF document. I can't get styles assigned to the CSS selector "body" to be rendered, though. That's why I'm a bit confused about the final structure used by ebook-convert: it doesn't 100% match the HTML templates. Any insights?

Last edited by chikamichi; 12-13-2015 at 11:15 AM.
chikamichi is offline   Reply With Quote
Old 12-13-2015, 11:16 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
http://manual.calibre-ebook.com/conv...l#introduction
kovidgoyal is offline   Reply With Quote
Old 12-13-2015, 11:46 AM   #5
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Thank you. Actually, it's because I read on this very page you linked to, that "It is important to remeber that all the transforms act on the XHTML output by the Input Plugin, not on the input file itself", that I decided to ask my question here

I would like to know how this XHTML (the Input Plugin generates) look like. The big picture, really, I guess there is an overall, default structure that's going to be used as a layout for a transform. I have not been able to find that piece of information so far. With the process I'm currently bound to, ebook-convert is used internally, so I am unable to use Calibre's debug mode to inspect the output from the Input Plugin itself meanwhile the transform is happening.

Last edited by chikamichi; 12-13-2015 at 12:01 PM.
chikamichi is offline   Reply With Quote
Advert
Old 12-13-2015, 12:03 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Did you try using the debug option to inspect the state of the XHTML at each stage of the conversion.
eschwartz is offline   Reply With Quote
Old 12-13-2015, 12:08 PM   #7
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Quote:
Originally Posted by eschwartz View Post
Did you try using the debug option to inspect the state of the XHTML at each stage of the conversion.
Hi. As I said, "With the process I'm currently bound to, ebook-convert is used internally, so I am unable to use Calibre's debug mode to inspect the output from the Input Plugin itself meanwhile the transform is happening." What I meant by "internally" is I have no control whatsoever on ebook-converter, I can't tweak the options or anything like that to activate a debug/verbose mode, for instance.

I was hoping there would be some kind of "default template" from a "default recipe" used by the Input Plugin to do conversion, that I could peak at to get a sense of what the HTML is likely to, well, look like at this stage of the process.

Last edited by chikamichi; 12-13-2015 at 12:12 PM.
chikamichi is offline   Reply With Quote
Old 12-13-2015, 12:15 PM   #8
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Huh.

Why can't you control the options to ebook-convert? Are you using some sort of precompiled binary with an arbitrarily-canned cmdline which you don't have the source to?

Well, go ahead and write a batch/bash script that comes earlier in the PATH than ebook-convert, and your current process will find it instead. Make it add the debug flag to the cmdline and pass it on to the real ebook-convert.

Last edited by eschwartz; 12-13-2015 at 12:18 PM.
eschwartz is offline   Reply With Quote
Old 12-13-2015, 12:18 PM   #9
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Quote:
Why can't you control the options to ebook-convert?
I have no ownership of the tool leveraging ebook-convert, I simply know it uses it to transform an HTML document into a PDF one.

I like your idea about trumping ebook-convert, thx.
chikamichi is offline   Reply With Quote
Old 12-13-2015, 12:23 PM   #10
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Ah, so precompiled binary with an arbitrarily canned cmdline then. You're welcome.

Let's just hope they don't hardcode the binary location!
eschwartz is offline   Reply With Quote
Old 12-13-2015, 12:38 PM   #11
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Alright, with further inspection, I'm pretty sure the Input Plugin is being leveraged is https://github.com/kovidgoyal/calibr.../html_input.py

I also know the HTML document fed to that plugin is valid.

And I was also able to activate some debugging from ebook-convert!

Quote:
debug: InputFormatPlugin: HTML Input running
on /tmp/tmp-82696wkvncw/SUMMARY.html
debug: Creator not specified
debug: Building file list...
debug: Normalizing filename cases
debug: Rewriting HTML links
debug: 34% Exécution des transformations du livre numérique…
debug: Merging user specified metadata...
debug: Detecting structure...
debug: Detected chapter: Introduction
debug: Detected chapter: Quick-start
debug: Auto generated TOC with 2 entries.
debug: Flattening CSS and remapping font sizes...
debug: Source base font size is 25.92000pt
debug: Removing fake margins...
debug: Cleaning up manifest...
Trimming unused files from manifest...
debug: Creating PDF Output...
debug: 67% Exécution de l'extension PDF Output
debug: libpng warning: iCCP: Not recognizing known sRGB profile that has been edited
debug: Bottom margin is too small for footer, increasing it to 18.0pts
debug: 78% Rendered SUMMARY.html
debug: 89% Rendered index.html
debug: 100% Rendered quick-start.html
debug: Rendered PDF in 0.571188 seconds:
debug: PDF output written to /tmp/tmp-82696wkvncw/index.pdf
debug: Sortie sauvegardée vers /tmp/tmp-82696wkvncw/index.pdf
Too bad I never get a chance to have a look at /tmp/tmp-82696wkvncw/SUMMARY.html before the conversion process ends, for it gets destroyed: I would then be able to actually see what is being fed to the Input Plugin.

What I was unable to discover as well, from inspecting Calibre's codebase, is how this HTML document is going to be transformed into Calibre's internal XHTML for further processing/conversion.
chikamichi is offline   Reply With Quote
Old 12-13-2015, 12:41 PM   #12
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Looking at the actual code which does the work is indeed another way to figure out what it is doing.

http://manual.calibre-ebook.com/deve...ml#code-layout

And take a look at:
src/calibre/ebooks/conversion/plumber.py
src/calibre/ebooks/oeb/transforms/*

Last edited by eschwartz; 12-13-2015 at 12:45 PM.
eschwartz is offline   Reply With Quote
Old 12-13-2015, 12:44 PM   #13
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
Thank you, it is what I'm doing right now already, indeed I might find a hint, it's worth trying.
chikamichi is offline   Reply With Quote
Old 12-13-2015, 12:48 PM   #14
chikamichi
Member
chikamichi began at the beginning.
 
Posts: 17
Karma: 10
Join Date: Dec 2015
Device: none
By the way, I am now using the "--log=debug --debug" flags with ebook-convert. I thought the "--debug" flag would be responsible for enabling the debug output, creating input/, parsed/, structure/ and processed/ directories, but I have not been able to find them. I may simply not be looking at the correct location. Is there a way to enforce the output location from the command line as well?
chikamichi is offline   Reply With Quote
Old 12-13-2015, 12:58 PM   #15
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
--debug-pipeline /path/to/output-directory/

The --log would I assume save the log as created by status messages during the conversion.
eschwartz is offline   Reply With Quote
Reply

Tags
css, html, pdf, structure


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[old thread] filename and library structure /author and titel structure tscamera Library Management 4 05-31-2011 05:44 PM
Interesting behavior of Structure Detection PDF to MOBI tleon Conversion 8 05-04-2011 05:29 PM
description for the lrf structure joblack LRF 2 08-01-2009 03:35 PM
Other Non-Fiction Abelson, H; Sussman G: Structure and Interpretation of Computer Programs. PDF. v1.0 scottdw Other Books 0 07-05-2008 08:43 AM


All times are GMT -4. The time now is 02:54 AM.


MobileRead.com is a privately owned, operated and funded community.