Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-28-2020, 05:57 AM   #1
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Newbie text to epub conversion

I'm new to calibre and conversion, I'm trying to gather some pages/information from website and put them together into an "epub" ebook. I've tried to use the default setting in calibre to convert, unfortunately it didn't work. Tried to search on internet tutorials/youtube explaining it, and found very few tutorials available for this topic. Can someone please give me an hint where to start to learn this topic?

The trouble I'm having in converting is this,
original text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Mauris gravida urna ac vulputate efficitur. Duis ultrices

nisl id tempor ultricies. Ut feugiat metus a ornare aliquam.

Integer vel accumsan elit, in facilisis libero.


Vestibulum mollis justo ut dictum tempor. Maecenas euismod

dui sed feugiat auctor. Aenean at accumsan mauris.

after conversion:
<p class="calibre2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. </p>

<p class="calibre2">Mauris gravida urna ac vulputate efficitur. Duis ultrices </p>

<p class="calibre2">nisl id tempor ultricies. Ut feugiat metus a ornare aliquam. </p>

<p class="calibre2">Integer vel accumsan elit, in facilisis libero.</p>


<p class="calibre2">Vestibulum mollis justo ut dictum tempor. Maecenas euismod </p>

<p class="calibre2">dui sed feugiat auctor. Aenean at accumsan mauris. </p>


It seems for some reason, the webpage generated linebreak after each line instead of end of paragraph. It seems not too complicate to solve but I have no clue how.

Last edited by michaelbr; 11-28-2020 at 07:24 AM.
michaelbr is offline   Reply With Quote
Old 11-28-2020, 06:39 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
You probably want to enable the heuristics processing option for conversions that have that problem.
itimpi is offline   Reply With Quote
Advert
Old 11-28-2020, 06:52 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by michaelbr View Post
I'm new to calibre and conversion, I'm trying to gather some pages/information from website and put them together into an "epub" ebook.
You also might want to check out dotepub.
Doitsu is offline   Reply With Quote
Old 11-28-2020, 07:21 AM   #4
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by Doitsu View Post
You also might want to check out dotepub.
Thanks for this tip, I'm not sure if dotpub will solve my problem, since nowadays a lot of sites has embeded frames with ads, so I assume all those frames/ads will be included/converted too?
michaelbr is offline   Reply With Quote
Old 11-28-2020, 07:24 AM   #5
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by itimpi View Post
You probably want to enable the heuristics processing option for conversions that have that problem.
Thanks for the tips, is there anywhere a tutorial which explains tweaks for conversion? I've found only one on youtube, and it's a very short one, using one regex to solve a specific problem.
michaelbr is offline   Reply With Quote
Advert
Old 11-28-2020, 12:08 PM   #6
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
When you go to convert a text file, a "TXT Input" settings panel becomes available in the left side of the convert screen.

If your input text has indentation at each paragraph, or if there is a blank line between paragraphs, you can use the "Paragraph style" dropdown. "Block" will use blank lines to determine actual paragraphs, and "Print" will use indentations. (Look at the tool tips.)

If your input text is just a long list of short lines, with no indication of where actual paragraphs should be, then the heuristic processing is your next best bet. It will use the length of short lines to try and guess the paragraph boundaries. It will maybe give pretty good results, but will not be perfect.

How well any of this works depends on the consistency of the input text. If you have inconsistent indentations or blank lines, or if there are, by chance, many paragraphs that end in long lines rather than shorter ones, you should edit the input text first, to get good results.

If you are doing a copy/paste to gather text from a web page, your best bet is to paste it into Word or Writer first, fix it up there, and then convert the word processor doc to epub.
retiredbiker is offline   Reply With Quote
Old 11-29-2020, 02:58 AM   #7
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by itimpi View Post
You probably want to enable the heuristics processing option for conversions that have that problem.
Thanks itimpi for this tip, unfortunately it did not work, maybe the linebreak is somehow hardcoded into the page when the webpage was generated.
michaelbr is offline   Reply With Quote
Old 11-29-2020, 03:02 AM   #8
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by retiredbiker View Post
When you go to convert a text file, a "TXT Input" settings panel becomes available in the left side of the convert screen.

If your input text has indentation at each paragraph, or if there is a blank line between paragraphs, you can use the "Paragraph style" dropdown. "Block" will use blank lines to determine actual paragraphs, and "Print" will use indentations. (Look at the tool tips.)
Thanks so much for your detailed explanation, it's much appreciated, I'll give it a try.
michaelbr is offline   Reply With Quote
Old 11-29-2020, 10:17 AM   #9
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
Quote:
Originally Posted by michaelbr View Post
Thanks itimpi for this tip, unfortunately it did not work, maybe the linebreak is somehow hardcoded into the page when the webpage was generated.
Yes, conversion will create a separate line for each line or paragraph in the text file that has a CR at the end (carriage return, same as hitting the Enter key at the end of a sentence or line or paragraph). If you remove those CRs in your text file and prepare the text without the CRs (when they are in the wrong places), the Calibre conversion will turn out the way you want it.
deback is offline   Reply With Quote
Old 11-30-2020, 02:35 AM   #10
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by deback View Post
Yes, conversion will create a separate line for each line or paragraph in the text file that has a CR at the end (carriage return, same as hitting the Enter key at the end of a sentence or line or paragraph). If you remove those CRs in your text file and prepare the text without the CRs (when they are in the wrong places), the Calibre conversion will turn out the way you want it.
Thanks deback, but the trouble is to remove the CR, I don't think it can be removed automatically (there're CR at the end of each line and at the end of paragraph), is there any other way than to remove manually?
michaelbr is offline   Reply With Quote
Old 11-30-2020, 03:03 AM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
@michaelbr you really might want to give dotepub a try. It's much easier to remove a couple of ads than having to deal with each single line.
If you're a Fireforx user, also try to enable the Reader View option, which'll remove a lot of clutter. (It's also available as a Chrome extension.)
Doitsu is offline   Reply With Quote
Old 11-30-2020, 08:25 AM   #12
deback
Book E d i t o r
deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.deback ought to be getting tired of karma fortunes by now.
 
Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
Quote:
Originally Posted by michaelbr View Post
Thanks deback, but the trouble is to remove the CR, I don't think it can be removed automatically (there're CR at the end of each line and at the end of paragraph), is there any other way than to remove manually?
Yes, remove manually in a text editor, but I would try this first: Find \n\n and replace it with \n. Hopefully, the line space that should be there after a paragraph would be at \n\n\n.
deback is offline   Reply With Quote
Old 11-30-2020, 11:12 AM   #13
retiredbiker
Evangelist
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
Quote:
Originally Posted by deback View Post
Yes, remove manually in a text editor, but I would try this first: Find \n\n and replace it with \n. Hopefully, the line space that should be there after a paragraph would be at \n\n\n.
Doing this in a text editor just replacing \n\n with \n would remove the blank lines, making it worse. If I have a text file with "blank" lines, what I do is basically:

Replace \n\n with some unused symbol, like #
Replace \n with a space
Replace the # with \n
Replace space space with space a few times to get rid of multiple spaces.

It's almost always messier than just that, but that is the basic process.
retiredbiker is offline   Reply With Quote
Old 11-30-2020, 11:20 AM   #14
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
I have 5 'Join type' REGEX S&R that I use.
Only 2 do I ever run in ALL mode. the other 3, I step thru and do a Find (aka skip) if the visual shows it should not be applied.
It takes me 5-10 minutes to do the whole book. There are ALWAYS a couple of cases the the keyboard Del or Enter is used.
Take a few and learn REGEX (there are a couple of tutorials here at MR). It will pay off and make fixing so much easier that doing multiple Conversions while trying to fine tune the Heuristics settings
theducks is offline   Reply With Quote
Old 12-02-2020, 11:43 AM   #15
michaelbr
Connoisseur
michaelbr began at the beginning.
 
michaelbr's Avatar
 
Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
Quote:
Originally Posted by theducks View Post
I have 5 'Join type' REGEX S&R that I use.
Only 2 do I ever run in ALL mode. the other 3, I step thru and do a Find (aka skip) if the visual shows it should not be applied.
It takes me 5-10 minutes to do the whole book. There are ALWAYS a couple of cases the the keyboard Del or Enter is used.
Take a few and learn REGEX (there are a couple of tutorials here at MR). It will pay off and make fixing so much easier that doing multiple Conversions while trying to fine tune the Heuristics settings
Thanks theducks for this tip, I start to have the same feeling, there are too many places for one to tweak to accomplish the desired results. If I learn REGEX, it'll solve my problems not on this topic, also in other places/apps where REGEX is required.
michaelbr is offline   Reply With Quote
Reply

Tags
linebreak, regex, txt to epub conversion


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
EPUB -> EPUB Conversion Results In BOLD Text cromag Conversion 3 03-18-2017 12:38 AM
Calibre PDF to epub conversion changes text 'll' to 'l' twinflameskiss Conversion 3 11-15-2015 12:18 PM
Conversion ePub -> azw3, text centered apeiron75 Calibre 0 05-18-2013 12:28 PM
Text to ePub Chapter conversion dvd8n Conversion 6 10-19-2012 03:11 AM
Newbie problems with Epub -> Mobi conversion adam_lipscombe Conversion 1 02-21-2012 08:52 AM


All times are GMT -4. The time now is 11:48 AM.


MobileRead.com is a privately owned, operated and funded community.