Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-26-2018, 08:00 PM   #1
G2B
Member
G2B began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Question Help with Calibre editor

ebooks that are converted from .txt or .pdf files have way too many line breaks, which often make it very unpleasant to read those files.

Removing ALL linebreaks causes everything to be bunched up without any visual separations. That makes an ebook imo difficult to read.

In large files, removing line breaks selectively to have a visually pleasing separation of paragraphs takes forever.

Is there any way that the Calibre editor can get back to the original layout by removing the extra linebreaks, without also removing the ones between the original paragraphs?

Q1: Can this be done during the conversion from TXT o PDF to epub?
If yes, please explain how, or refer to an article where explained.


Q2: I have a number of files that I did convert to epub and have deleted the source txt file. If Q1 is answered positively, I guess I could convert those back to TXT and then again to epub with the fix.

If that is not possible, can this be fixed in the Calibre EPUB editor without having to waste days of my time on manually fixing a single book?

TIA
G2B is offline   Reply With Quote
Old 02-26-2018, 08:17 PM   #2
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 244
Karma: 7568340
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20
Quote:
Originally Posted by G2B View Post
ebooks that are converted from .txt or .pdf files have way too many line breaks, which often make it very unpleasant to read those files.
...
Is there any way that the Calibre editor can get back to the original layout by removing the extra linebreaks, without also removing the ones between the original paragraphs?

Q1: Can this be done during the conversion from TXT o PDF to epub?
If yes, please explain how, or refer to an article where explained.
It's not easy. You can often get 90% of the way there by using heuristics (when you convert from PDF/text, click on "Heuristic Processing" on the left, and the check the "enable heuristic processing" box and play with the options there). But you're going to have to proofread it fully to get things 100% right in general.
sjfan is offline   Reply With Quote
Advert
Old 02-27-2018, 02:35 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,598
Karma: 13662888
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by G2B View Post
ebooks that are converted from .txt or .pdf files have way too many line breaks, which often make it very unpleasant to read those files.
When converting from a text file having your paragraph style or formatting style (in the text input settings) set to auto doesn't always work. Here is a link to the text input setting for conversion found in the manual. If you select correctly, it is rare that text files don't fall into one of the settings and convert correctly without any heuristics selected at all.
DoctorOhh is offline   Reply With Quote
Old 02-28-2018, 05:51 AM   #4
Divingduck
Wizard
Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.Divingduck ought to be getting tired of karma fortunes by now.
 
Posts: 1,058
Karma: 1293081
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
In addition for PDF:
https://www.mobileread.com/forums/sh...3&postcount=17
Divingduck is offline   Reply With Quote
Old 03-02-2018, 11:37 AM   #5
G2B
Member
G2B began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by sjfan View Post
It's not easy. You can often get 90% of the way there by using heuristics (when you convert from PDF/text, click on "Heuristic Processing" on the left, and the check the "enable heuristic processing" box and play with the options there). But you're going to have to proofread it fully to get things 100% right in general.

That is pretty much my experience. I did figure out the heuristic processing , and am experimenting with the 'search/replace" function during conversion.




Is there a way to use regular expressions in the editor as well? I have tried that, doesn't seem to work.
G2B is offline   Reply With Quote
Advert
Old 03-02-2018, 12:58 PM   #6
Terisa de morgan
Wizard
Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.Terisa de morgan ought to be getting tired of karma fortunes by now.
 
Terisa de morgan's Avatar
 
Posts: 4,886
Karma: 5438866
Join Date: Jun 2009
Location: Madrid, Spain
Device: Kobo Clara,Kobo Aura One, XiaoMI 5, iPad, Huawei MediaPad, YotaPhone 2
Yes, in the search zone select Regex instead of Normal of Function.
Terisa de morgan is offline   Reply With Quote
Old 03-08-2018, 03:55 PM   #7
G2B
Member
G2B began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by Terisa de morgan View Post
Yes, in the search zone select Regex instead of Normal of Function.
Thanks, I'll try that.
G2B is offline   Reply With Quote
Old 03-24-2018, 09:22 AM   #8
Phssthpok
Addict
Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.Phssthpok knows how to set a laser printer to stun.
 
Posts: 359
Karma: 95229
Join Date: Nov 2014
Device: Horrible Kindle
Quote:
Originally Posted by G2B View Post
ebooks that are converted from .txt or .pdf files have way too many line breaks, which often make it very unpleasant to read those files.
Try Aiseesoft's PDF-to-ePUB converter, which in my experience does a better job than anything else I've come across. It recognises italics and para breaks and removes page headers/footers almost perfectly, apart from anything else. You sometimes have to join or split paras across PDF page boundaries, but it gets it right a lot of the time.

It's commercial-ware but fairly cheap, and there's a time-limited trial version you can download from their website. (NB: I have no connection to Aiseesoft, but I was impressed enough that I bought a copy after playing around with the trial version. I'm still impressed.)

Last edited by Phssthpok; 03-24-2018 at 09:26 AM.
Phssthpok is offline   Reply With Quote
Old 03-30-2018, 10:32 AM   #9
G2B
Member
G2B began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Thank you. I'll look into that.
G2B is offline   Reply With Quote
Old 04-02-2018, 05:45 PM   #10
Ecuaman
Junior Member
Ecuaman began at the beginning.
 
Ecuaman's Avatar
 
Posts: 4
Karma: 10
Join Date: Apr 2018
Location: Quito, Ecuador
Device: none
Quote:
Originally Posted by G2B View Post
Thank you. I'll look into that.
Hi, also struggling with this. I signed in to this forum today. The first post I noticed was yours G2B, funny I had the same problem
Right now I'm into the "Heuristic Processing" backward and forward...
Trying to get the conversion notice the "Chapters" and centered headlines etc.
Did you test the "PDF-to-ePUB converter"?
Ecuaman is offline   Reply With Quote
Old 04-02-2018, 07:02 PM   #11
Ecuaman
Junior Member
Ecuaman began at the beginning.
 
Ecuaman's Avatar
 
Posts: 4
Karma: 10
Join Date: Apr 2018
Location: Quito, Ecuador
Device: none
Quote:
Originally Posted by Phssthpok View Post
Try Aiseesoft's PDF-to-ePUB converter, which in my experience does a better job than anything else I've come across. It recognises italics and para breaks and removes page headers/footers almost perfectly, apart from anything else. You sometimes have to join or split paras across PDF page boundaries, but it gets it right a lot of the time.

It's commercial-ware but fairly cheap, and there's a time-limited trial version you can download from their website. (NB: I have no connection to Aiseesoft, but I was impressed enough that I bought a copy after playing around with the trial version. I'm still impressed.)
Thanks "Phssthpok" I bought the "PDF-to-ePUB" software... Very good advice, first I just converted in the easy-auto presetted mode, almost perfect
Now I'm trying with advanced settings...
Ecuaman is offline   Reply With Quote
Old 04-06-2018, 09:39 AM   #12
G2B
Member
G2B began at the beginning.
 
Posts: 16
Karma: 10
Join Date: Feb 2018
Device: PC / iPad
Quote:
Originally Posted by Ecuaman View Post
Did you test the "PDF-to-ePUB converter"?
No, I did not. I downloaded it, but have not needed it so far. The majority of what I do is cleaning up files that already have been converted to PDF without concern for readability, meaning I'm starting with EPUB format.
I do use the heuristic processing first; the rest is mostly working in the editor with regular expressions to filter out the remaining garbage.
G2B is offline   Reply With Quote
Reply

Tags
conversion problem, pdf conversion failure, txt to epub conversion

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug Calibre editor Stevex Calibre 0 06-21-2017 07:56 AM
Using calibre's editor independently sbin Editor 34 07-31-2014 01:40 PM
regex in calibre editor mrmikel Editor 2 02-01-2014 10:39 AM
Calibre Ebook Editor filmo Kobo Reader 0 01-13-2014 10:11 AM
Calibre Editor for former Sigil users theducks Editor 2 12-22-2013 12:55 PM


All times are GMT -4. The time now is 09:19 PM.


MobileRead.com is a privately owned, operated and funded community.