View Single Post
Old 02-27-2009, 01:20 PM   #1
tlc
Zealot
tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!tlc is faster than a rolling 'o,' stronger than silent 'e,' and leaps capital 'T' in a single bound!
 
Posts: 140
Karma: 50288
Join Date: Feb 2009
Device: KK 3G, iPad
PDF->MOBI questions

I've used Calibre to convert a bunch of PDFs and I'm seeing some issues. (I know, PDF is a bad format for this, but that's what I have.) I've found answers for the first couple Qs, but I'll ask anyway in case there are newer, better options. I'm on OS X if it matters.

1) I have a PDF source with headers and footers I want to remove.
Find and install pdftohtml and edit the html myself?

2) I have some PDFs where the MOBI forms don't wrap well. I get full-line, single word line, etc. (I think this is "flow" and "hard-coded line-breaks".) Sometimes this is very consistent -- every line break in the original PDF. Sometimes it is not consistent.
Use pdfreflow?

3) Also paragraph breaks are usually missed.
Ideas?

4) In PDF->MOBI here do TOCs come from? Does there have to be a clickable TOC in the PDF? Can it be generated from (what appear to me as) headers? If so, how can I do this?

5) One PDF has a table

year: event
year: event

where 'year' is centered vertically against the event's 1 to 3 lines. In the MOBI, the year is embedded in the event text as if the PDF was read straight across without regards to tables. Actually I don't know if the PDF is the original source, so the PDF may be that way.

Any ideas? Convert to HTML and add table markup by hand? What HTML is allowed (or not ignored) when doing HTML->MOBI?


Thanks!
tlc
tlc is offline   Reply With Quote