Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > More E-Book Readers > iRex

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2008, 07:13 AM   #1
johnnytruant
Member
johnnytruant doesn't litterjohnnytruant doesn't litter
 
Posts: 23
Karma: 149
Join Date: Jan 2008
Device: iRex iLiad, Aura HD
manipulating/converting pdfs under linux

First, apologies if I've just failed to find the answer to this in these forums already - I did try some searches, but without much luck.

So, situation: I have some ebook pdfs (duh), which are mostly US Letter and A4. I would like them to be repaginated onto 12x15 pages so I can read them on my shiny new iliad without messing about with zoom etc.

How best to do that? I've tried doing 'Print Setup --> 12x15 --> Print to file' in evince, and that resizes the pages, but it doesn't repaginate (ie, a 500 page A4 pdf becomes a 500 page iLiad-size-pdf, with corrospondingly small text).

If the starting format isn't pdf, OpenOffice exports a lovely pdf from rtf, but it won't open PDFs initially. I've tried converting the pdfs to PostScript, but then that won't open in OO, or Scribus (scribus creates millions - and I mean millions - of Document-xxxxxx-.dat files on PS import)

I'm totally stumped at the moment. The best I've done so far is convert a PDF into a PostScript file which I then can't open in anything (except evince, which still won't let me repaginate, as described above). PDFedit looks great if you want to change text in a PDF, but I can't find anywhere that lets me change paper size. pdftk again looks great for doing anything except this one thing I want to do..

Any help or advice would be much appreciated.
johnnytruant is offline   Reply With Quote
Old 01-12-2008, 08:48 AM   #2
daudi
Addict
daudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-booksdaudi has learned how to read e-books
 
Posts: 281
Karma: 904
Join Date: Oct 2007
Location: Kent, UK
Device: iRex iLiad, Psion 5MX, nokia n800
The issue is that for the most part PDF is not a format that is designed to reflow (unlike HTML for example). In fact, it is designed to make sure that the layout remains constant.

That being said some people have had some success with the latest version of acrobat which does reflow. Try searching this forum and perhaps the irex forum for "reflow" for threads with more info.

You might get somewhere with pdftotext, but that will kill formatting (e.g. italics, bold, titles) and images. There pdfimages tool for extracting images. I've tried with these and other tools on linux but not really ever got anything I was happy with. These days I am happy to use either zoom or rotate single column docs and use the column-wise version of ipdf for multicolumn docs.
daudi is offline   Reply With Quote
Advert
Old 01-12-2008, 09:45 AM   #3
johnnytruant
Member
johnnytruant doesn't litterjohnnytruant doesn't litter
 
Posts: 23
Karma: 149
Join Date: Jan 2008
Device: iRex iLiad, Aura HD
Acrobat eh? Bah. Adobe don't seem to have 64bit targets for any of their products, and don't seem to care (the flash plugin still doesn't have support for 64bit systems, several years after they finally did v9 for *nix), which rules it out for me. I'm not downgrading my entire system to 32 bit for one piece of software.

pdftotext (and pdftohtml and pdftoabw and pdftops) all produce unacceptable results at the moment. Formatting, I can live without, but lots of non-text crap spread through the output isn't cool. I have a feeling this is something to do with whomever created the pdfs I'm testing on. I keep seeing file:///pathtosometextfile at the top and bottom of every page. I'd do a s/r on it, but it's not consistent.

I've never liked pdfs. Unless I was printing something, of course. In which case they're great.
johnnytruant is offline   Reply With Quote
Old 02-02-2009, 02:57 PM   #4
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by johnnytruant View Post
I've never liked pdfs. Unless I was printing something, of course. In which case they're great.
Yes, that's the reason why PDF is not really a good format for ebooks, but a de facto standard for commercial printers

To expand a bit on Acrobat, I see you don't like it. While I tend to agree with you (even on Windows their Update Manager is among the worst pieces of crap I've seen) you could try running it in a VM. I don't think Wine'll work too well.

It has a tacked-on "Save to HTML 3.2" which still has given me reasonably good results from Acrobat 8. This will normally give a reflowable html file with a different font tag for each level (title, subtitle, quote, body text), but not thousands of different <span> tags. I can read it directly in my reader (I'm using an iLiad), and also reformat it or extract the text with Openoffice or grep/sed if I want other font sizes or styles.

This assumes that your PDF is text, not scanned pages, and also that it is single column. Images will sometimes make it to the proper position, sometimes not. Depending on your source PDF you might get working links, DTP programs usually include an option to create linked TOC/endnotes when generating a PDF.

If your PDF has a more fancy layout with multiple columns, images outside the textflow and whatnot, the postprocessing necessary may be doable but a bit of work. The above solution still gives the best quick results I've seen.

Using evince, I guess you're on some kind of Unix (Linux?) as I am... I solve that problem by remoteing into my work machine when I need to use Windows software. I don't need to to do that very often anymore

Anyway, good luck! If you find alternative solutions that work, please post them here!
Man Eating Duck is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting PDFs macrotor PDF 62 08-14-2011 07:10 PM
Converting PDFs JoshLessard Amazon Kindle 12 10-07-2010 06:40 AM
Converting Layered? PDFs kerrware Calibre 2 06-30-2010 03:31 PM
reader for PDFs without converting? kuck Which one should I buy? 24 06-30-2010 02:55 AM
Converting PDFs to images (Linux only) kylecronan Kindle Developer's Corner 1 02-28-2009 02:37 PM


All times are GMT -4. The time now is 12:48 PM.


MobileRead.com is a privately owned, operated and funded community.