Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 11-29-2016, 08:57 AM   #1
citac
Fanatic
citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.
 
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
.txt to .epub conversion with option to remove extra paragraph breaks

I haven't used calibre in a long time, but recently I found myself with some spare time to try and convert my ancient .txt files, something I've been putting off forever. They are mostly fics saved years ago from various websites, LJs, forums etc, and many have unnecessary paragraph breaks in the middle of a line.

I received advice a long time ago how to do this without regexp, and I did that for a while, but it was time consuming and I set it aside. I am now hoping that there is someone who can guide me through setting up calibre so that it makes a conversion during the process of adding these fics to my library. Would this be in the Conversion settings, Input options, TXT options, or in Common options, perhaps Heuristic processing, or will I need to go to Search & Replace and use regexp?

Thank you in advance!
citac is offline   Reply With Quote
Old 11-29-2016, 09:46 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
(NB I prefer the REGEX way No trial and error fuss, just a bit of tedium and I get to look at each change as it happens )

Preferences Common options: Heuristic Processing: (Enable ticked) Unwrap Lines: The default is .40 adjust smaller in TINY amounts (<= .05)

remember Preferences affects the FIRST TIME a book is converted. Use the conversion (start) Dialog to make adjustments (or clear /reset previous usage values)
theducks is online now   Reply With Quote
Advert
Old 11-29-2016, 02:06 PM   #3
citac
Fanatic
citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.
 
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
Thank you! I will try that with single files before trying more of them at once

I'm also open to learning how to use regex for this; I admit seeing the changes as they happen also sound very appealing to me. I read the stickies on using the search & replace function and the intro to regex, but it's still a bit over my head for me.

I did try loading a few of the .txt files to my tablet, and I noticed some of them also have a random character (a question mark in a diamond) instead of quote marks and apostrophes, so I need to figure out how to fix that too. Some of the files are corrupt beyond help - those I will either have to find and redownload, or give up on as a lot of the original sites are long gone.
citac is offline   Reply With Quote
Old 11-29-2016, 02:33 PM   #4
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by citac View Post
...

I did try loading a few of the .txt files to my tablet, and I noticed some of them also have a random character (a question mark in a diamond) instead of quote marks and apostrophes, so I need to figure out how to fix that too. ...
This is very likely a font encoding fault. It will likely require a Trial-and-Terror approach to converting the files using various guesses as the the encoding.

Last edited by dwig; 11-29-2016 at 02:36 PM.
dwig is offline   Reply With Quote
Old 11-29-2016, 04:19 PM   #5
citac
Fanatic
citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.
 
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
Quote:
Originally Posted by dwig View Post
This is very likely a font encoding fault. It will likely require a Trial-and-Terror approach to converting the files using various guesses as the the encoding.
Even if it's just a copy/paste from storage folder to a folder on the tablet? I haven't done any conversion yet, nor sent the file to tablet via calibre, just wanted to see what a raw .txt file would look read on the tablet (Lenovo Tab 2 A7) with Moon+ Reader. Should I be looking into the settings of the reader app?
citac is offline   Reply With Quote
Advert
Old 11-29-2016, 05:58 PM   #6
dwig
Wizard
dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.dwig ought to be getting tired of karma fortunes by now.
 
dwig's Avatar
 
Posts: 1,613
Karma: 6718479
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
Quote:
Originally Posted by citac View Post
Even if it's just a copy/paste from storage folder to a folder on the tablet? I haven't done any conversion yet, nor sent the file to tablet via calibre, just wanted to see what a raw .txt file would look read on the tablet (Lenovo Tab 2 A7) with Moon+ Reader. Should I be looking into the settings of the reader app?
Yes, but it seems the issue is that the reading software you've chosen for viewing the TXT file is assuming one encoding and the file was created with another.

The solution may well be opening the antique TXT file in a more flexable text editor (e.g. TextWrangler for Mac, Notepad++ for Windows, ...) and using their options of (re)open the file using specific encoding (trial-and-terror time) until you find the one that works and then resaving the file using UTF-8, which is the common choice today for web pages and ebooks.
dwig is offline   Reply With Quote
Old 11-29-2016, 06:50 PM   #7
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
On Windows you may not need to install a 'more flexible' text-editor, Notepad has a couple of Encoding options including UTF-8, default is ANSI.

Click image for larger version

Name:	Clipboard01.png
Views:	182
Size:	22.0 KB
ID:	153298

That said, having a decent text editor such as those dwig suggests is almost mandatory these days.

The editor I use when I'm bored with NP++ offers an encoding of Modern Greek in IBM EBCDIC

BR
BetterRed is offline   Reply With Quote
Old 12-01-2016, 04:26 AM   #8
citac
Fanatic
citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.
 
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
Thank you all for your suggestions! Does Notepad++ offer an option to save these files in the UTF-8 encoding in bulk? Otherwise I'll just have to devote half an hour here and there and do them one by one - which would be ok except then I want to start re-reading them and half an hour turns into three...
citac is offline   Reply With Quote
Old 12-01-2016, 06:20 AM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,457
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by citac View Post
Does Notepad++ offer an option to save these files in the UTF-8 encoding in bulk?
Yes.

But rather than wasting your time waiting for someone in the Calibre Forum to provide detailed instructions on how to use Notepad++, why not search for something like 'notepad++ batch convert encoding', or ask at the Notepad++ Forum.

BR
BetterRed is offline   Reply With Quote
Old 12-01-2016, 09:00 AM   #10
citac
Fanatic
citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.citac ought to be getting tired of karma fortunes by now.
 
Posts: 550
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
I was planning to this weekend, thank you. Right now I'm checking this thread during short breaks at work so I wanted to clarify this one option.
citac is offline   Reply With Quote
Reply

Tags
conversion, conversion problems, txt to epub conversion

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extra paragraph marks in ePub Carl A. Carlsson Sigil 22 12-21-2013 04:03 PM
Help! How do I remove unwanted paragraph breaks? ElMiko Sigil 7 03-27-2013 11:43 AM
Paragraph breaks in ePub? rocalisa Calibre 3 10-29-2010 03:53 PM
Removing unnecessary paragraph breaks in .txt citac Other formats 2 10-26-2010 05:16 PM
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 05:06 PM


All times are GMT -4. The time now is 10:16 AM.


MobileRead.com is a privately owned, operated and funded community.