Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-20-2010, 06:10 AM   #16
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Let me start by saying up front I might not understand the problem, but I'm guessing one of my suggestions will work for you

Quote:
Originally Posted by Wintersdark View Post
I tried converting to text and back, but the way it's formatted I basically get each paragraph followed by a pair of CR/LF's. So, converting directly back to epub doesn't help.
According to this thread calibre looks for two consecutive CR/LFs to identify a paragraph. When converting back to ePub you might have to change the settings in the text input area while converting.

Quote:
Originally Posted by Wintersdark View Post
However, as it's not every book, I'm just addressing it on a case by case basis with Notepad++ as I go. If I were still running linux, I'd mass convert them all to text and figure out how to script applying the regex replace to them, but I have no idea of how to go about that in windows.
When I have text I need to put back into form with proper word-wrapped paragraphs so I can get a clean epub upon conversion I use Openoffice.org's Writer program with the My Text Cleaner extension installed. I select the whole document and run the extension. It does a good job of reassembling the paragraphs.

Alternatively I can often use Sigil's find and replace to fix ePubs as long as there is something unique to denote paragraph breaks in the output. Often I find a
Quote:
<p class=calibre9></p>
or similar marking between what should be each paragraph. If this exists then it is only 2 or 3 steps to clean it up.

Good Luck.

Last edited by DoctorOhh; 08-20-2010 at 06:35 AM.
DoctorOhh is offline   Reply With Quote
Old 08-27-2010, 11:57 PM   #17
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Calibre 0.7.16 enables the preprocessing option for lit, txt, rtf, and html input types. It will attempt to unwrap lines and also mark chapters in books where chapter headings previously weren't marked.

Select "preprocess input to possibly improve structure detection" under the Structure Detection options in conversion for a bad book to have Calibre attempt to fix the markup. If you're dealing with a troublesome text file then you should also choose "Treat each line as a paragraph" under text input.

It seems to work ok on the books I've tested, but I know there is a pretty large variety of badly formatted books coming from OCR sources, etc. Let me know if there are books that fail to unwrap. I'm also interested in books that have text formatting that traverses lines. I think that should be ok, but I didn't have any test cases.
ldolse is offline   Reply With Quote
Advert
Old 09-04-2010, 04:34 AM   #18
medved13
Junior Member
medved13 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2010
Device: none
texthandler.com can help you to remove unnecessary line breaks from text
medved13 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Tool for removing line breaks in text documents kahn10 Sony Reader 9 08-22-2010 10:05 PM
No line breaks ecpepper Amazon Kindle 3 08-09-2009 06:42 PM
Removing Line-breaks / Preserving Paragraphs ahi Workshop 5 06-08-2009 02:22 AM
Removing the first line jethro10 Calibre 2 03-05-2009 12:32 PM
Removing extra line breaks plemming Calibre 0 07-31-2008 07:50 PM


All times are GMT -4. The time now is 11:58 AM.


MobileRead.com is a privately owned, operated and funded community.