Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 11-21-2010, 10:43 PM   #1
Seanette
Addict
Seanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-books
 
Seanette's Avatar
 
Posts: 254
Karma: 834
Join Date: Oct 2010
Location: Sacramento, CA
Device: Samsung Galaxy s3 (Android 4.4.2), iPad 2, Win10 laptop
From PDF to ePub: how best to do this?

I know, probably been beaten to death , but I'm still pretty new here and while a competent end user, I'm not a real expert on file formats and such.

I'm hoping to move PDFs that have two-column pages over to ePub and actually get the text formatted sanely in the process. At this point, I'm leaning toward extracting the text from said files, then running the result (after manual cleanup) through Calibre to ePub (I've tried direct conversion with disappointing results), but have yet to figure out what's going to be my best intermediate format to get decent output that'll be readable on my iTouch.
Seanette is offline   Reply With Quote
Old 11-22-2010, 07:40 AM   #2
Nexutix
Reading and reading
Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.
 
Nexutix's Avatar
 
Posts: 582
Karma: 8250144
Join Date: Oct 2010
Device: Infibeam Pi, iPod Touch 4G, iPad Air 2, iPad mini 2, Oneplus One
Get the pdf you want. Add to calibre, click convert. In settings, set unwrap factor to 0.05 and convert. You are done! It renders nice output for me. I also have converted some CHM help guides to epubs. You can also make font bold or use different font supported by device using space needed to add extra CSS. (If you want, google it, you will find the code easily.)

You can always convert a book without adding to calibre. Use "ebook-convert.exe" through command line. (You can see how to do this in Calibre manual, if you don't know.)
Nexutix is offline   Reply With Quote
Advert
Old 11-22-2010, 08:00 AM   #3
Seanette
Addict
Seanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-books
 
Seanette's Avatar
 
Posts: 254
Karma: 834
Join Date: Oct 2010
Location: Sacramento, CA
Device: Samsung Galaxy s3 (Android 4.4.2), iPad 2, Win10 laptop
Sorry, produced a jumbled mess with text from the two columns scrambled together.

Might help if I understood what the unwrap is, but this didn't do any good. In fact, I got worse results on my test file with .05 than with .45 (and .5 was no better. I wound up trying that via accidental typo).

As I said, at this point, I may be stuck with extracting text to a file to be put through Calibre and am trying to figure out my best intermediate format (.txt, .doc, .odt, ????), unless someone has a better idea. Some of the files I want to work with are pretty big, so while I can put up with manual editing if necessary, it's not exactly the preferred option .

To give you an idea of my ability level in this area, I barely know what a CSS *is*, let alone how to tweak one.
Seanette is offline   Reply With Quote
Old 11-22-2010, 06:05 PM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Personally, I would just use BRISS to cut each PDF page into readable sized chunks. You could do four, six or even more chunks per page.



I'd just leave the results in PDF format; so long as the chunks are readable, that's what really matters, and there's far less of a chance of losing formatting that way.

If you're really intent on extracting the text, I'd use HTML as your intermediate format. Something like Acrobat might be able to recognize the columns and extract the text appropriately. If you need a free solution, pdftohtml (used, e.g., alongside http://sourceforge.net/projects/pdfreflow/pdfreflow) claims to be able to do (and their webpage gives an example of such), but I've got mixed results from trying this. But in honesty, you'd still probably be best off using BRISS to separate the columns before attempting to extract the text. (Since these aren't usually true crops but just redefining the bounding boxes, though it's still not entirely clear this will work.)
frabjous is offline   Reply With Quote
Old 11-22-2010, 09:44 PM   #5
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
$70 for something you can do (if imperfectly--though I doubt yours is better--) with free software? You've got to be kidding.

Take your SPAM elsewhere, please.
frabjous is offline   Reply With Quote
Advert
Old 11-22-2010, 11:22 PM   #6
Seanette
Addict
Seanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-books
 
Seanette's Avatar
 
Posts: 254
Karma: 834
Join Date: Oct 2010
Location: Sacramento, CA
Device: Samsung Galaxy s3 (Android 4.4.2), iPad 2, Win10 laptop
Quote:
Originally Posted by johnson23 View Post
(snip promo)
Obviously, spammers don't read very well. Where did I say anything about having any access to a Mac? I do, but did not state that previously, since it belongs to my DH and he's not happy about letting me install stuff on it.
Seanette is offline   Reply With Quote
Old 11-22-2010, 11:23 PM   #7
Seanette
Addict
Seanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-booksSeanette has learned how to read e-books
 
Seanette's Avatar
 
Posts: 254
Karma: 834
Join Date: Oct 2010
Location: Sacramento, CA
Device: Samsung Galaxy s3 (Android 4.4.2), iPad 2, Win10 laptop
Quote:
Originally Posted by frabjous View Post
Personally, I would just use BRISS to cut each PDF page into readable sized chunks. You could do four, six or even more chunks per page.
The issue for me is text size on a small screen .
Seanette is offline   Reply With Quote
Old 11-24-2010, 01:34 PM   #8
Nexutix
Reading and reading
Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.Nexutix ought to be getting tired of karma fortunes by now.
 
Nexutix's Avatar
 
Posts: 582
Karma: 8250144
Join Date: Oct 2010
Device: Infibeam Pi, iPod Touch 4G, iPad Air 2, iPad mini 2, Oneplus One
Quote:
Originally Posted by Seanette View Post
The issue for me is text size on a small screen .

I had that issue also. My mother did have too.

I got a process here; but seriously, it sucks for the time and crazy processing it takes. (It's insane that we don't have any good all-in-one software for this.)

1. Convert the said pdf file with briss to cut pages with no repetition of words on cut pages (that's where it gets frustrating, because not all documents are so well formatted that all pages will have same line on same pixel). Also, one page for a column will do good for next step.
2. Then feed this file to calibre to convert to epub. And you are done.

Yes, I agree this is crazy... too crazy. But you could try this really insane thing.
Nexutix is offline   Reply With Quote
Old 11-27-2010, 07:07 PM   #9
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Using BRISS to convert to single column could be a useful step. I suggest that format you use for storage and to rework the data should be ePUB. Sigil is a good ePUB editor and you can capture images, metadata, text all in one ePUB file making the management easier. HTML is easy to convert into ePUB so it is a good choice for the initial conversion from PDF but any text form that supports formatting would also be ok to get to ePUB.
DaleDe is offline   Reply With Quote
Old 12-21-2010, 02:45 AM   #10
doreenjoy
01000100 01001010
doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.doreenjoy ought to be getting tired of karma fortunes by now.
 
doreenjoy's Avatar
 
Posts: 1,889
Karma: 2400000
Join Date: Mar 2009
Device: Polyamorous
I had a number of PDFs to convert, and after much frustration I broke down and purchased ABBYY PDF Transformer to OCR the text (it did a great job BTW). Then I converted the RTF to EPUB.
doreenjoy is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to epub constroy PDF 5 11-12-2012 09:16 AM
PDF to ePub kissyfish Apple Devices 12 11-15-2010 04:00 PM
EPUB to PDF paulpeer PDF 3 12-15-2009 04:12 AM


All times are GMT -4. The time now is 07:04 AM.


MobileRead.com is a privately owned, operated and funded community.