![]() |
#1 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4
Karma: 2452
Join Date: Oct 2010
Device: nook
|
PDF to epub convertion grief; keeping indentation
Converting pdf to epub for my nook, using Calibre is easy. Except, I can not find a way to keep the proper space (indentation) when the pdf has portion of code examples like python, that requires proper block indentation.
How can I make a proper conversion to epub, where the output file will display the python indentation properly? Instead of resulted output Code:
def _find_note(self, note_id): '''Locate the note with the given id.''' for note in self.notes: if str(note.id) == str(note_id): return note return None Code:
def _find_note(self, note_id): '''Locate the note with the given id.''' for note in self.notes: if str(note.id) == str(note_id): return note return None Code:
p { white-space=pre; } Is there any thing else I can do, or is this the final state of affairs in the conversion world? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,223
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no the space is clobbered in the pdf input stage itself.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Linux User
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,282
Karma: 6123806
Join Date: Sep 2010
Location: Heidelberg, Germany
Device: none
|
In general, PDF has very little knowledge about formatting and contents of a document; instead it is a set of instructions like "draw line from point A to B" or "place letter X in size Y on coordinates Z". So you're lucky to even get simple things such as paragraphs or chapters or headings out of a PDF file. While the indentation is certainly visible to you as a human, this information is not actually readily available in a PDF file since it's more an image of a page layout, rather than the information about the formatting rules that led to this particular page layout.
It's not impossible to convert it, however you'd have to write a custom script that does it. It'd have to be smart enough to recognize the Python snippets and deduct the indentation based on how the text is positioned. This is one of the two things OCR has to do; one is recognizing the characters - you can skip that step with most (but not all) PDFs; the other is recognizing the layout. If there aren't too many snippets in the book, it'd probably be faster to just reindent them manually. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Doc to Epub convertion problems | johnbajer | Calibre | 5 | 06-04-2010 05:30 PM |
Cover pictures after convertion from ePub to Mobi | paulpeer | Calibre | 8 | 03-23-2010 09:23 AM |
Best PDF Convertion Tool | Nathan Campos | Workshop | 5 | 12-27-2009 10:47 AM |
Epub and negative indentation | Nate the great | ePub | 6 | 04-27-2009 11:48 AM |
PDF conversion & indentation | Shiren | Calibre | 5 | 12-11-2008 02:09 PM |