12-28-2009, 03:17 PM | #1 |
Enthusiast
Posts: 26
Karma: 10
Join Date: Dec 2009
Location: Toronto
Device: Kobo Forma, Kobo Aura H2O Edition 2
|
Beginner: Converting lit to epub-linefeed h*ll
I'm a bit overwhelmed at the discussions here, and I've not been able to find a "how-to" guide that starts from a fundamental point, so I'll try a thread here and hope I'm not too far off topic.
I got a Sony PRS-300 for Christmas. I have a series of books that are in .lit format. I added these files to Calibre and let it do whatever to them to let them work on my Reader. I find that there is a full space after each line, hard hyphenations, and occasional blank pages. The TOC is empty, and the metadata appears weird, i.e. it's sorting by the author's first name. I tried opening the file in Sigil, and it apparently doesn't like it when you "select all" on my iMac 2.16GHz with Snow Leopard. I installed eCub, noticed that every line has it's own entry in the XHTML files, so I tried to remove all the Code:
<p class="MsoPlainText"> What I'm trying to do is see what the book would look like without the linefeeds, except perhaps at the paragraph. I don't care too much about justification at this time. I would like to build the TOC, or find it in the original lit and migrate it, but I can't seem to open the lit file with anything that runs on Mac. I know that Sony only sees the metadata if it's added in a certain way, do I need to jiggle the settings in Calibre's converter? Or, do I need to go Windows to fix this in the lit file beforehand? |
12-29-2009, 09:02 AM | #2 |
Member
Posts: 13
Karma: 64
Join Date: Nov 2009
Location: S. Ontario, Canada
Device: Jetbook, Sony PRS-505
|
I have a Windows VB app I wrote to remove extra breaks in Epubs. It creates paragraphs at a user set interval - not as good as the proper paragraphs but at least it is readable.
If you are interested I can send it to you. |
12-29-2009, 10:58 AM | #3 |
Grand Sorcerer
Posts: 6,199
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hi PhyrePhox,
I may be wrong but it looks as if your source LIT file is poor. It looks as if it's been created from a plain text file in MSWord which has not had its "hard line breaks" removed before conversion to LIT. My advice would be to try to clean up the source rather than editing the EPUB. This is the approach I would take.
If you need more help I would be happy to take a look at your LIT file and give more specific help based on what I see. |
12-31-2009, 09:54 AM | #4 | |
Enthusiast
Posts: 26
Karma: 10
Join Date: Dec 2009
Location: Toronto
Device: Kobo Forma, Kobo Aura H2O Edition 2
|
Thank you for this. If I give up on editing on the Mac I will contact you for this script.
Quote:
I now have a single html file, the "title" is set to the first chapter name, with very short lines (probably page width from a Word doc), with "</p>" and a hard return on each line. Not fun, but at least I have a place to work from now. Is there a summary somewhere here of what html tags are meaningful for ebooks? Also, how can I feed the resulting html back into Calibre to convert to epub? |
|
12-31-2009, 05:24 PM | #5 | ||
Grand Sorcerer
Posts: 6,199
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hi PhyrePhox, Me again ...
Quote:
Quote:
Anyway, that's enough from me for the time being. I don't know how much is relevant for your circumstances but feel free to ask if you think I could help. Happy New Year |
||
01-01-2010, 10:50 AM | #6 |
Addict
Posts: 227
Karma: 2530
Join Date: Dec 2009
Device: PRS-505, iPad
|
Jackie_w, that is some good info. I'll just add a few of my usual tricks for fixing messed up formatting. Even a good .lit file needs some massaging to make a good epub. I find it's worth a little effort to fix up a text before reading it. And while I'm reading one book and I can be working on another. Once you get better at it, you can fix even the most messed up text in about 30 mins.
Do a search for " " or whatever is used for the spoken text. This will find lines with two different people talking (The end of the first person and the beginning of the second). Then break the lines up so the story flows better. I hate when two people are talking on the same line :0 Another big one is attaching broken paragraphs. The obvious detection is the fact that a hard return is followed by a lowercase letter (and vice versa). Using Word, a regex you can find these and either mass change the results or just find and fix. For example, things like lyrics or special messages would get caught in this find but you wouldn't want to attach the lines. ex. a regex search for a hard return, ^13, and then a lowercase letter [a-z] would be ^13([a-z]) With the parenthesizes you can do a macro replace of what it found. Replace like this would be ' \1' --without the '. i.e. a space then \1. This will remove the return with a space and the lowercase letter it found will be added back to make the line join up. Doing the search the other way, ([a-z])^13 helps find broken lines as well as missing endings like periods. It's replace format would be '\1 ' without the ' Then clean up the extra spacing by searching for ^p^p (2 returns, or as many as you are looking for) and replace with ^p. You can then select the whole text and do margin and line spacing as well to something you like. In Word I find it better to save a copy as html (filtered). Even with the crappy MS additions, Calibre will build a very accurate result in epub. You can also copy & paste it into Open Office and save the result in .html and it will have even less baggage but I've not seen any benefit in the resulting epub. Or even use something like notepad++ and with some experience, wipe out all the extraneous html tagging. I usually leave it at the ms word filtered unless I want a standard .html file. Be sure to look at the Calibre options for removing spaces between paragraphs. Even with your html page looking right, this can help fix extra spaces from creeping in. Good luck! |
01-02-2010, 04:49 PM | #7 |
Banned
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
|
04-07-2010, 04:03 AM | #8 |
Junior Member
Posts: 3
Karma: 10
Join Date: Apr 2010
Device: iPad
|
Hi All -
I'm also a beginner, and trying to get my head around this. I have a problem almost exactly the opposite of the original poster - no line breaks seem to be translated over when converting from LIT to ePub with one of my files. Following the advice to save debug files, I opened the .htm file in the Input directory with Dreamweaver and saw that while the text appeared correctly, the entire body of the text was encapsulated in a single <pre> tag. I assume this can't possibly be right. How would you suggest I go about correcting this? Thanks in advance! Alex |
04-07-2010, 11:18 AM | #9 | ||||
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Quote:
Quote:
Quote:
Quote:
You can get halfway there if you just want to stick with Calibre. Set the metadata correctly for the lit file before performing the conversion, and Calibre will use the corrected values for the ePub. Tick the checkbox I referred to earlier and tick the 'Preprocess input file...' box on the structure detection page. If it's still not picking up the chapters properly then you'll have to look at the debug output and work out an Xpath expression that will catch them. |
||||
04-07-2010, 11:31 AM | #10 | |
Wizard
Posts: 1,196
Karma: 1281258
Join Date: Sep 2009
Device: PRS-505
|
Quote:
The only way to fix this is to search through the text and try to find what sort of marker was being used to indicate paragraphs instead of marking the text properly. This depends on the species of monkey who prepared the original. Old-World monkeys may do this by inserting a number of spaces at the start of a new paragraph to simulate a text indent (the use of a <pre> tag suggests this might be what you're dealing with). New-World monkeys may use <br> tags to indicate paragraphs. Once you've discovered what sort of ridiculous scheme they used, you'll need to search for these elements and replace them with </p><p> pairs in order to reconstruct a proper markup (obviously you'll need to check the start and end of text blocks as well and add the proper opening or closing tags). Luckily, publishers have generally phased out the use of monkeys to encode their ebooks, but many still try to save money by employing chimpanzees and gorillas. A few are grudgingly starting to employ actual human beings to do this. |
|
04-07-2010, 03:14 PM | #11 | |
Enthusiast
Posts: 25
Karma: 10
Join Date: Sep 2009
Location: Tennessee
Device: Kobo Aura HD
|
Quote:
This finds paragraphs that end in anything other than a period, a quote mark, an exclamation point, or a question mark. Of course, it has to be modified if the text uses curly quote marks, single quotes, or some other tag (like </span>, for instance) between the end of the text and the </p> |
|
Tags |
epub, lines, lit, sony 300, xhtml |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mass Converting LIT, RTF, & PDF to ePUB | Tom2112 | ePub | 8 | 01-11-2010 01:14 AM |
Question about converting DRM lit to epub | weeziepepper | ePub | 3 | 12-17-2009 10:52 AM |
converting .lit to mobi | rick98761 | Amazon Kindle | 8 | 07-08-2009 10:28 PM |
Converting LIT to LRF Woes (or: Trouble with Images in LIT Files) | JEMelby | Sony Reader | 0 | 07-27-2007 09:18 PM |
Need help while converting to .LIT | jungelbobo | Workshop | 1 | 05-03-2006 05:51 AM |