|
|
Thread Tools | Search this Thread |
11-29-2016, 08:57 AM | #1 |
Fanatic
Posts: 556
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
.txt to .epub conversion with option to remove extra paragraph breaks
I haven't used calibre in a long time, but recently I found myself with some spare time to try and convert my ancient .txt files, something I've been putting off forever. They are mostly fics saved years ago from various websites, LJs, forums etc, and many have unnecessary paragraph breaks in the middle of a line.
I received advice a long time ago how to do this without regexp, and I did that for a while, but it was time consuming and I set it aside. I am now hoping that there is someone who can guide me through setting up calibre so that it makes a conversion during the process of adding these fics to my library. Would this be in the Conversion settings, Input options, TXT options, or in Common options, perhaps Heuristic processing, or will I need to go to Search & Replace and use regexp? Thank you in advance! |
11-29-2016, 09:46 AM | #2 |
Well trained by Cats
Posts: 30,365
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
(NB I prefer the REGEX way No trial and error fuss, just a bit of tedium and I get to look at each change as it happens )
Preferences Common options: Heuristic Processing: (Enable ticked) Unwrap Lines: The default is .40 adjust smaller in TINY amounts (<= .05) remember Preferences affects the FIRST TIME a book is converted. Use the conversion (start) Dialog to make adjustments (or clear /reset previous usage values) |
Advert | |
|
11-29-2016, 02:06 PM | #3 |
Fanatic
Posts: 556
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Thank you! I will try that with single files before trying more of them at once
I'm also open to learning how to use regex for this; I admit seeing the changes as they happen also sound very appealing to me. I read the stickies on using the search & replace function and the intro to regex, but it's still a bit over my head for me. I did try loading a few of the .txt files to my tablet, and I noticed some of them also have a random character (a question mark in a diamond) instead of quote marks and apostrophes, so I need to figure out how to fix that too. Some of the files are corrupt beyond help - those I will either have to find and redownload, or give up on as a lot of the original sites are long gone. |
11-29-2016, 02:33 PM | #4 |
Wizard
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
This is very likely a font encoding fault. It will likely require a Trial-and-Terror approach to converting the files using various guesses as the the encoding.
Last edited by dwig; 11-29-2016 at 02:36 PM. |
11-29-2016, 04:19 PM | #5 |
Fanatic
Posts: 556
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Even if it's just a copy/paste from storage folder to a folder on the tablet? I haven't done any conversion yet, nor sent the file to tablet via calibre, just wanted to see what a raw .txt file would look read on the tablet (Lenovo Tab 2 A7) with Moon+ Reader. Should I be looking into the settings of the reader app?
|
Advert | |
|
11-29-2016, 05:58 PM | #6 | |
Wizard
Posts: 1,613
Karma: 6718541
Join Date: Dec 2004
Location: Paradise (Key West, FL)
Device: Current:Surface Go & Kindle 3 - Retired: DellV8p, Clie UX50, ...
|
Quote:
The solution may well be opening the antique TXT file in a more flexable text editor (e.g. TextWrangler for Mac, Notepad++ for Windows, ...) and using their options of (re)open the file using specific encoding (trial-and-terror time) until you find the one that works and then resaving the file using UTF-8, which is the common choice today for web pages and ebooks. |
|
11-29-2016, 06:50 PM | #7 |
null operator (he/him)
Posts: 20,932
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
On Windows you may not need to install a 'more flexible' text-editor, Notepad has a couple of Encoding options including UTF-8, default is ANSI.
That said, having a decent text editor such as those dwig suggests is almost mandatory these days. The editor I use when I'm bored with NP++ offers an encoding of Modern Greek in IBM EBCDIC BR |
12-01-2016, 04:26 AM | #8 |
Fanatic
Posts: 556
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
Thank you all for your suggestions! Does Notepad++ offer an option to save these files in the UTF-8 encoding in bulk? Otherwise I'll just have to devote half an hour here and there and do them one by one - which would be ok except then I want to start re-reading them and half an hour turns into three...
|
12-01-2016, 06:20 AM | #9 | |
null operator (he/him)
Posts: 20,932
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
But rather than wasting your time waiting for someone in the Calibre Forum to provide detailed instructions on how to use Notepad++, why not search for something like 'notepad++ batch convert encoding', or ask at the Notepad++ Forum. BR |
|
12-01-2016, 09:00 AM | #10 |
Fanatic
Posts: 556
Karma: 1020204
Join Date: Sep 2008
Location: Bosnia and Herzegovina
Device: Lenovo Yoga Tab 2 (Android)
|
I was planning to this weekend, thank you. Right now I'm checking this thread during short breaks at work so I wanted to clarify this one option.
|
Tags |
conversion, conversion problems, txt to epub conversion |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extra paragraph marks in ePub | Carl A. Carlsson | Sigil | 22 | 12-21-2013 04:03 PM |
Help! How do I remove unwanted paragraph breaks? | ElMiko | Sigil | 7 | 03-27-2013 11:43 AM |
Paragraph breaks in ePub? | rocalisa | Calibre | 3 | 10-29-2010 03:53 PM |
Removing unnecessary paragraph breaks in .txt | citac | Other formats | 2 | 10-26-2010 05:16 PM |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 05:06 PM |