Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-29-2009, 05:35 AM   #1
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Question Reformating books for Sony Reader with OpenOffice.org

Okay so I have a ebook in PDF format which I would like to reformat for the sony reader. I'll try to specific about my problems and what I'd like to do.

Opening up the file in Nitro PDF allows me to copy and paste plaintext. Tools > Select Text; Edit > Select All.

I copy into OpenOffice.org Writer. I think I lose bold and italics, which would be nice to retain but that's no big deal.

There seems to be an indicator for paragraphs but there are line breaks which cause a problem when I want to reform the text. I'd like each paragraph to be a single block of continuous text.

The next problem is that there is some sort of paragraph symbol (Hex: A00A), then a line break, {page number} {author name}, line break. This occurs every so often and interrupts the text. I'd like to remove it altogether.

It would be handy to have some options for the formatting of new paragraphs. E.g. single line break; line break and tab; double line break, etc.

I'm a bit of newbie with all this. Perhaps there are some OOo macros that can help me out. I already know how to adjust the size via Format > Page.

I don't want to use MSWord as it gives me some obscure error when I attempt to create a new macro. I think I prefer OOo anyway; it has native PDF output which is convenient.

TIA.

Last edited by elegant; 04-29-2009 at 05:42 AM.
elegant is offline   Reply With Quote
Old 04-29-2009, 06:27 AM   #2
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
Hmm, okay, I don't know about OpenOffice and this is a little round-about, but you could use Mobipocket Creator to create an html version of the PDF and then copy and paste that from a browser. Then you'd at least have a better source material in OO as you won't have all the problems with line breaks. Mobipocket Creator can also - to some degree - handle footnotes and header texts.

If you're not too picky, and for my own taste I find it works okay - you can just use the html and convert to e.g. lrf in calibre. There might be the odd error in the text, but I don't find it hard to ignore. A lot of footnotes can be a problem though.
Ea is offline   Reply With Quote
Advert
Old 04-29-2009, 06:52 AM   #3
BlackVoid
Evangelist
BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.BlackVoid ought to be getting tired of karma fortunes by now.
 
Posts: 415
Karma: 510423
Join Date: Nov 2006
Device: Sony PRS-505
Get BookDesigner - with it you can remove / clean up anything. It is also a best tool for PDF conversion to ebook formats.
The author - book title is often in the footer/header in the PDF, you can remove this in BD with search & replace or you can use PDF Cropper or Adobe professional to crop the margins. Unfortunately margin cropping is not recognized by some conversion tools (BD included) so after cropping export it to some other format (DOC, RTF, LIT).

Another option is to get ABBY FineReader and export the PDF to LIT then convert it to your desired format with another tool. This is the best way for books with pictures if you want to do it quickly and automatically.
BlackVoid is offline   Reply With Quote
Old 04-29-2009, 08:37 AM   #4
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
BlackVoid is correct in saying that BD is probably the best tool going for PDF conversion, but you've got to accept that any PDF conversion is probably going to require pretty extensive "cleaning" by hand. PDF was never designed to be an "eBook" format, and just doesn't contain the "structural" information that allows "clean" conversion to other formats.
HarryT is offline   Reply With Quote
Old 04-29-2009, 09:23 AM   #5
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
I have received the best results by running the pdf through an OCR program
Readiris pro 11 (special HP version)
that was bundled with some cheap scaner/printer combo from Hewlett Packard.
kacir is offline   Reply With Quote
Advert
Old 04-29-2009, 09:52 AM   #6
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Thanks for the help and recommendations guys. Does anyone have a direct download link for BookDesigner?
elegant is offline   Reply With Quote
Old 04-29-2009, 09:56 AM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,356
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by elegant View Post
Thanks for the help and recommendations guys. Does anyone have a direct download link for BookDesigner?
Here is the thread on Book Designer. https://www.mobileread.com/forums/showthread.php?t=11786 The first message contains the attachments. Read the entire thread so you'll know any of the gotchas when installing.

Also, you'll need the Book Cleaner files as well. https://www.mobileread.com/forums/showthread.php?t=11649

So install Book Designer and then once that's working, install the Book Cleaner files. It's very important to not forget the Book Cleaner files.
JSWolf is offline   Reply With Quote
Old 04-29-2009, 04:39 PM   #8
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Okay so I opened up the PDF with BookDesigner. It seems to deal with the margins/line breaks much better.

However the "{page number} {author name}" and "{title} {page number}" problem remains, and its insertion into the text area seems to be more random. Sometimes the page number is used as a header.

In this sense I think it may be easier to reformat the text with an OOo macro.

I'm going to download PDFCropper now and see how that goes.

Last edited by elegant; 04-29-2009 at 04:43 PM.
elegant is offline   Reply With Quote
Old 04-29-2009, 05:37 PM   #9
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Actually I'm thinking maybe PDFCropper is not the best program to use because I am still working through an inflexible format: PDF.

I think the best option for me would be to remove the {page number} {author name} and {title} {page number} insertions in the text.

Quote:
The author - book title is often in the footer/header in the PDF, you can remove this in BD with search & replace or you can use PDF Cropper or Adobe professional to crop the margins.
Perhaps you could run through how to do this in PDFCropper for me?

Last edited by elegant; 04-29-2009 at 05:40 PM.
elegant is offline   Reply With Quote
Old 04-30-2009, 12:50 AM   #10
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Quote:
Originally Posted by elegant View Post
In this sense I think it may be easier to reformat the text with an OOo macro.
I am a Ubuntu user.
I use OpenOffice to work pretty quickly with text of Gutenberg txt books.

If this can be of any help, I use a macro made out of the following points:

1 - Search and replace "$" with one space
2 - Search and replace ".*$" with "&\n"
3 - Select all and copy
3 - Open a 9x12cm model
4 - Copy

I found that OpenOffice can't work with page breaks (this is not for Gutenberg which do not use page breaks but for other books) and currently have no solution for that.

There is an extension with an alternative search and replace menu but I find it a bit too complicated.

Last edited by roger64; 04-30-2009 at 12:54 AM.
roger64 is offline   Reply With Quote
Old 04-30-2009, 05:12 AM   #11
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Do I have to put a term in place of the dollar symbol?
elegant is offline   Reply With Quote
Old 07-03-2009, 09:26 PM   #12
elegant
Member
elegant began at the beginning.
 
elegant's Avatar
 
Posts: 12
Karma: 10
Join Date: Apr 2009
Device: sony reader
Quote:
1 - Search and replace "$" with one space
2 - Search and replace ".*$" with "&\n"
Could someone explain how the above works?

I want to remove progressive page numbers.
elegant is offline   Reply With Quote
Old 07-04-2009, 08:26 AM   #13
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by elegant View Post
Could someone explain how the above works?
Quote:
1 - Search and replace "$" with one space
2 - Search and replace ".*$" with "&\n"
I want to remove progressive page numbers.
The above doesn't work.

For several reasons:

1. You have to switch "Regular expressions" option ON for this to work.

2. Step one searches for all "end of paragraph" marks, that are in OpenOffice represented by regular expression $ and replaces that "end of paragraph" with a space. Step 2 searches for all "end of paragraph" marks preceded by dot. But there are NO "end of paragraph" characters anymore, because you have replaced all of those in step one.

Post an example of how the text you need to process looks and I will try to construct a regular expression that will delete all page numbers (and *only* page numbers ;-) )

Last edited by kacir; 07-04-2009 at 09:31 AM.
kacir is offline   Reply With Quote
Old 07-04-2009, 09:18 PM   #14
Thrasymachus
Enthusiast
Thrasymachus doesn't litterThrasymachus doesn't litterThrasymachus doesn't litter
 
Posts: 26
Karma: 222
Join Date: Jun 2009
Device: Sony Reader PRS-505, Astak EZ Reader
Nitro PDF + Sony Reader

Hi elegant,

In your first post you say you can view the PDF file in Nitro PDF -- but Nitro PDF does a nice job of cropping pages on its own. I've found that by cropping the page so as eliminate almost all margins and headers and footers, the pages of most ebooks become readable in the Sony Reader. This probably won't work if the original page size is letter or A4 size, but ebooks with original pages sizes of 5" by 7" up to maybe 6.5" by 9" seem to work pretty well, especially if you're willing to read in landscape mode. Of course it depends on the original typeface and other factors, but so far I've tried it on about a half dozen books with good results.

Even if you don't want to do this, if you use Nitro to crop the headers and footers, and to remove the cover and other irrelevant pages, the editing/formatting process would be easier and quicker, I think.
Thrasymachus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The best e-book reader in 2009, according to the-ebook.org Forkosigan News 54 01-06-2010 03:22 PM
Where do you get books for your Sony Reader? kcnightfang Sony Reader 18 12-17-2009 03:13 PM
Archive.org adds Mobi format for most of 1.8m books Nate the great News 2 12-11-2009 03:01 PM
Fictionwise books and Sony Reader ? Firestorm Sony Reader 19 03-16-2009 07:16 PM
NeoOffice - OpenOffice.org for Mac Chaos Lounge 0 06-06-2005 06:06 PM


All times are GMT -4. The time now is 11:06 AM.


MobileRead.com is a privately owned, operated and funded community.