07-21-2006, 07:19 AM | #1 |
Fulfilled but not by iRex
Posts: 932
Karma: 286846
Join Date: May 2006
Location: London
Device: Far too many
|
RTF conversion.
Because it bugs me...
Anyone know of a tool which will strip font and size tags from an rtf file, but leaves the bold and italic tags in place? It would aide me in converting my rtf files to html (as they bloat the end file if you convert straight. For example I stripped 1.4mb of unnessessary crud from one I was playign with yesterday and dropped the file size from 2.5MB to 1.1MB) I know I saw such a tool when I was searching for an rtf/html conversion tool, but unfortunately I diddn't grab it at the time, and now cannot find it. |
07-21-2006, 07:24 AM | #2 |
iLiad freak
Posts: 339
Karma: 243
Join Date: Apr 2006
Location: Mallorca, Spain
Device: iRex iLiad
|
There's a tool called... Tidy something-or-other (TidyHTML? TidyUI? I have it at home), which is really cool for cleaning up code, also works great for cleaning up general MS Word crud. It's free, so you can check it out. Just google it, I found it that way.
|
Advert | |
|
07-21-2006, 07:38 AM | #3 |
Fulfilled but not by iRex
Posts: 932
Karma: 286846
Join Date: May 2006
Location: London
Device: Far too many
|
Doh! why diddn't I think of that? I was so focussed on finding something to "fix" the input file, I diddn't think of "fixing" the output. (even though I was trying to do that manually).
That looks perfect, I will have a play and tell you all how it does. |
07-21-2006, 07:55 AM | #4 |
iLiad freak
Posts: 339
Karma: 243
Join Date: Apr 2006
Location: Mallorca, Spain
Device: iRex iLiad
|
Hehehe... you're welcome, I found it easy to use.
|
07-24-2006, 04:37 PM | #5 |
Junior Member
Posts: 6
Karma: 10
Join Date: Jul 2006
Device: TH55
|
Where can I find this Tidy?? program?
Thanks |
Advert | |
|
07-24-2006, 05:07 PM | #6 |
Connoisseur
Posts: 93
Karma: 549
Join Date: Jul 2006
Location: Amsterdam
Device: Palm Zire
|
It's the first link in Google. Do you know what Google is?
|
07-25-2006, 08:27 AM | #7 |
Junior Member
Posts: 6
Karma: 10
Join Date: Jul 2006
Device: TH55
|
By the sarcasm I assume it is either tidyhtml http://www.tucows.com/preview/206197 or tidy ui http://www.forums.devnetwork.net/vie...e6b53b800f978c
|
07-25-2006, 12:59 PM | #8 |
Connoisseur
Posts: 93
Karma: 549
Join Date: Jul 2006
Location: Amsterdam
Device: Palm Zire
|
Sarcasm?
|
08-01-2006, 10:37 AM | #9 |
Pac-Man caught my iLiad.
Posts: 807
Karma: 3595
Join Date: Apr 2006
Location: Germany; next to Baltic Sea
Device: Boox Max Lumi, iRex iLiad (RIP)
|
If you are a fan of almighty LaTeX give rtf2LaTeX a try. It works fine. http://sourceforge.net/projects/rtf2latex2e
|
08-01-2006, 10:47 AM | #10 | |
Wizard
Posts: 1,018
Karma: 67827
Join Date: Jan 2005
Device: PocketBook Era
|
Quote:
1. I convert the RTF into an HTML file. 2. Reload the HTML file back into OpenOffice. 3. I use the source view to do a Find/Replace on all the offending tags. 4. Then I convert the HTML into a regular OpenOffice file to save it. 5. And finally, I export to PDF to put it on my iLiad. |
|
08-02-2006, 05:13 PM | #11 |
Member
Posts: 12
Karma: 10
Join Date: Jul 2006
|
I just got rtf2latex2e compiled for OS X and used it to convert a Baen RTF. It helped some, but I have to say it's simply not very good. I had to deal with a great number of unbalanced environment tags (italics started but never ended) and a large section of boldface which was not visible in the RTF. This may be because the RTF file included some formatting badness, but I think the above suggestion to use OpenOffice to convert to xhtml is better.
Unfortunately, the xhtml generated by OpenOffice uses CSS heavily, so it's not always obvious what markup to substitute. Italics is not done with an i tag, it is a p tag with a CSS class. Still, it puts it in a format that's at least workable. The final issue is to replace double and single quotes with appropriate text quotes, for which I'm working on a script to do heuristically (you can't just count on there being left-right quote pairs, since multiple paragraphs in quotes are traditionally started with but not ended by a text quote). |
08-02-2006, 05:33 PM | #12 | |
Gizmologist
Posts: 11,615
Karma: 929550
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Pocketbook Touch HD3
|
Quote:
Why is this such a deal? It doesn't bother me at all if it's a "" instead of “” -- either way I get that it's a quote.... Is it just a matter of preference, or am I missing something here? As a suggestion to address this, wouldn't it be a “ if it has a non-whitespace character after it, and a ” otherwise? Maybe that helps with the find/replace. I think I'd try searching for "<whitespace> and replace all those with ”<whitespace>, and then search for all the remaining " and replace with “ Last edited by NatCh; 08-02-2006 at 05:40 PM. |
|
08-03-2006, 05:51 AM | #13 |
Fulfilled but not by iRex
Posts: 932
Karma: 286846
Join Date: May 2006
Location: London
Device: Far too many
|
JSC: I would suspect the original file. I had similar problems with size/justification/etc when converting them.
Natch: If you are missing something then so am I. I find "zzz" ''zzz'' “zzz” almost indistingushable. So it's a matter of personal preference IMO. |
08-03-2006, 11:48 AM | #14 | |
Gizmologist
Posts: 11,615
Karma: 929550
Join Date: Jan 2006
Location: Republic of Texas Embassy at Jackson, TN
Device: Pocketbook Touch HD3
|
Quote:
|
|
08-08-2006, 11:28 AM | #15 |
Member
Posts: 12
Karma: 10
Join Date: Jul 2006
|
NatCh, your suggestion about spaces before or after is correct, and I've been doing that, but there are situations where it does not apply. Especially with the first author I've been working at converting, who tends to use a lot of m-dashes to interject comments within speeches, you get a lot of ---"text and text"--- and you cannot just assume that the text is quote or commentary.
Why bother? Well, I brought up textquotes specifically in relation to the use of rtf2latex2e. If anyone is going to bother using LaTeX, then there is a higher probability that they have a higher interest in the niggling details of fine typography, making ebooks look like books and not just text files. And text quotes is just one such detail, along with the proper use of hyphens, n-dashes, and m-dashes, ligatures, proper spacing after sentences but not abbreviations, non-breaking spaces, widows and orphans, etc. Thankfully, LaTeX takes care of most of those things automatically, but not the quotes thing. That's useful only if one cares. I'm not a type-fascist myself, but the iLiad screen is so nice, I thought I might expend the effort for at least a few books just so I have something worthy of the screen. But I find just the plain PDF output from OO reads just fine as well. And the manybooks.net output for iLiad looks really very good. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion from .rtf problems | gondwild | Calibre | 7 | 02-06-2010 11:18 PM |
RTF and TEXT conversion | spaze | Calibre | 4 | 08-23-2009 03:11 AM |
Error with RTF Conversion | daesdaemar | Calibre | 4 | 01-29-2009 05:42 PM |
rtf conversion | martingUSA | Calibre | 11 | 11-29-2008 10:38 AM |
RTF Conversion with Plucker | cactusjack | Reading and Management | 9 | 11-16-2004 07:44 PM |