![]() |
#1 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Newbie text to epub conversion
I'm new to calibre and conversion, I'm trying to gather some pages/information from website and put them together into an "epub" ebook. I've tried to use the default setting in calibre to convert, unfortunately it didn't work. Tried to search on internet tutorials/youtube explaining it, and found very few tutorials available for this topic. Can someone please give me an hint where to start to learn this topic?
The trouble I'm having in converting is this, original text: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris gravida urna ac vulputate efficitur. Duis ultrices nisl id tempor ultricies. Ut feugiat metus a ornare aliquam. Integer vel accumsan elit, in facilisis libero. Vestibulum mollis justo ut dictum tempor. Maecenas euismod dui sed feugiat auctor. Aenean at accumsan mauris. after conversion: <p class="calibre2">Lorem ipsum dolor sit amet, consectetur adipiscing elit. </p> <p class="calibre2">Mauris gravida urna ac vulputate efficitur. Duis ultrices </p> <p class="calibre2">nisl id tempor ultricies. Ut feugiat metus a ornare aliquam. </p> <p class="calibre2">Integer vel accumsan elit, in facilisis libero.</p> <p class="calibre2">Vestibulum mollis justo ut dictum tempor. Maecenas euismod </p> <p class="calibre2">dui sed feugiat auctor. Aenean at accumsan mauris. </p> It seems for some reason, the webpage generated linebreak after each line instead of end of paragraph. It seems not too complicate to solve but I have no clue how. Last edited by michaelbr; 11-28-2020 at 07:24 AM. |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
You probably want to enable the heuristics processing option for conversions that have that problem.
|
![]() |
![]() |
Advert | |
|
![]() |
#4 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Thanks for the tips, is there anywhere a tutorial which explains tweaks for conversion? I've found only one on youtube, and it's a very short one, using one regex to solve a specific problem.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
When you go to convert a text file, a "TXT Input" settings panel becomes available in the left side of the convert screen.
If your input text has indentation at each paragraph, or if there is a blank line between paragraphs, you can use the "Paragraph style" dropdown. "Block" will use blank lines to determine actual paragraphs, and "Print" will use indentations. (Look at the tool tips.) If your input text is just a long list of short lines, with no indication of where actual paragraphs should be, then the heuristic processing is your next best bet. It will use the length of short lines to try and guess the paragraph boundaries. It will maybe give pretty good results, but will not be perfect. How well any of this works depends on the consistency of the input text. If you have inconsistent indentations or blank lines, or if there are, by chance, many paragraphs that end in long lines rather than shorter ones, you should edit the input text first, to get good results. If you are doing a copy/paste to gather text from a web page, your best bet is to paste it into Word or Writer first, fix it up there, and then convert the word processor doc to epub. |
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
|
![]() |
![]() |
![]() |
#8 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Book E d i t o r
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
|
Yes, conversion will create a separate line for each line or paragraph in the text file that has a CR at the end (carriage return, same as hitting the Enter key at the end of a sentence or line or paragraph). If you remove those CRs in your text file and prepare the text without the CRs (when they are in the wrong places), the Calibre conversion will turn out the way you want it.
|
![]() |
![]() |
![]() |
#10 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,727
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
@michaelbr you really might want to give dotepub a try. It's much easier to remove a couple of ads than having to deal with each single line.
If you're a Fireforx user, also try to enable the Reader View option, which'll remove a lot of clutter. (It's also available as a Chrome extension.) |
![]() |
![]() |
![]() |
#12 |
Book E d i t o r
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 432
Karma: 288184
Join Date: May 2015
Device: Laptop
|
Yes, remove manually in a text editor, but I would try this first: Find \n\n and replace it with \n. Hopefully, the line space that should be there after a paragraph would be at \n\n\n.
|
![]() |
![]() |
![]() |
#13 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 3886916
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Kobo Forma
|
Quote:
Replace \n\n with some unused symbol, like # Replace \n with a space Replace the # with \n Replace space space with space a few times to get rid of multiple spaces. It's almost always messier than just that, but that is the basic process. |
|
![]() |
![]() |
![]() |
#14 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,047
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I have 5 'Join type' REGEX S&R that I use.
Only 2 do I ever run in ALL mode. the other 3, I step thru and do a Find (aka skip) if the visual shows it should not be applied. It takes me 5-10 minutes to do the whole book. There are ALWAYS a couple of cases ![]() Take a few and learn REGEX (there are a couple of tutorials here at MR). It will pay off and make fixing so much easier that doing multiple Conversions while trying to fine tune the Heuristics settings ![]() |
![]() |
![]() |
![]() |
#15 | |
Connoisseur
![]() Posts: 81
Karma: 10
Join Date: Aug 2010
Location: Murcia/Spain
Device: Android 12
|
Quote:
|
|
![]() |
![]() |
![]() |
Tags |
linebreak, regex, txt to epub conversion |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
EPUB -> EPUB Conversion Results In BOLD Text | cromag | Conversion | 3 | 03-18-2017 12:38 AM |
Calibre PDF to epub conversion changes text 'll' to 'l' | twinflameskiss | Conversion | 3 | 11-15-2015 12:18 PM |
Conversion ePub -> azw3, text centered | apeiron75 | Calibre | 0 | 05-18-2013 12:28 PM |
Text to ePub Chapter conversion | dvd8n | Conversion | 6 | 10-19-2012 03:11 AM |
Newbie problems with Epub -> Mobi conversion | adam_lipscombe | Conversion | 1 | 02-21-2012 08:52 AM |