Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 08-29-2009, 01:42 AM   #1
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
pacify.py (Text reformatter / RTF extractor)

Updated download as of 2009-08-29 12:27 EST
Updated download as of 2009-08-29 23:20 EST

Very much work in progress, and rather unpolished... but it's already turning out to be incredibly helpful to me in some eBook preparation, so I thought I would share it here.

Suggested use:

Produce HTML from text:
pacify.py -i input.txt -pcq

Produce LaTeX from text:
pacify.py -i input.txt -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

Produce HTML from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro

Produce LaTeX from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

The RTF extraction isn't very sophisticated yet... but should work fine with simple, straightforward RTF files.

- Ahi
Attached Files
File Type: zip pacify.zip (7.0 KB, 390 views)

Last edited by ahi; 08-31-2009 at 12:06 PM.
ahi is offline   Reply With Quote
Old 08-29-2009, 07:55 AM   #2
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Thanks for this code. It looks like it does a very nice job of an arduous task.

I ran the python script with just -r and it worked, but with -pqr it crashed yielding this error message:
Code:
E:\ebooks\Coding_Mobi2IMP_PDFRead\pacify>pacify.py -i 157.txt -pqr
pacify v0.2 - Copyright 2009 Pax Librorum (www.PaxLibrorum.com)

Right-trimming lines...

Parsing...

  line break: nr (2643)
  paragraph break: nrnr (823)
  ...:  (46)

Removing intraparagraph linebreaks...

Fixing erroneous paragraph breaks...

Traceback (most recent call last):
  File "E:\ebooks\Coding_Mobi2IMP_PDFRead\pacify\pacify.py", line 802, in <module>
    main()
  File "E:\ebooks\Coding_Mobi2IMP_PDFRead\pacify\pacify.py", line 111, in main
    theTome = parfixTome(theTome)
  File "E:\ebooks\Coding_Mobi2IMP_PDFRead\pacify\pacify.py", line 440, in parfix Tome
    if strLowerAlpha.find(theTome[idx+2][0:1]) > -1:
IndexError: list index out of range
Could you also create a windows executable, pacify.exe, for those here that don't do python? I did create one myself using py2exe and this setup.py code (used in PDFRead ):
Code:
import py2exe
from distutils.core import setup

setup(
    name = 'Pacify',
    description = 'Text reformatter / RTF extractor - Copyright 2009 Pax Librorum (www.PaxLibrorum.com)',
    version = '0.2',
    author = 'ahi',
    author_email = 'http://www.paxlibrorum.com/contact/',
    console = ['pacify.py'],
    options = {"py2exe": {"typelibs": [('{1103EA00-3A0C-11D3-A6F6-00104B2947FB}',0,1,0)]}},
)
Attached Files
File Type: txt 157.txt (235.0 KB, 323 views)
File Type: txt output.txt (233.2 KB, 579 views)
File Type: txt output-smartquotes.txt (233.7 KB, 330 views)

Last edited by nrapallo; 08-29-2009 at 08:07 AM. Reason: added smart quotes version of output.txt
nrapallo is offline   Reply With Quote
Old 08-29-2009, 11:28 AM   #3
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Thanks, nrapallo!

I'll take a look at your crash report and see what's going on. I'm getting semi-regular crashes (always on certain files) myself--crashes that I suspect could be fixed by a bit more thorough/clever preprocessing of files.

And yes, I'll make an .exe of it for the next upload.

This script is a step short of my crazy (and formerly aired) idea of turning text files into databases with walkable nodes representing all words, sentences, paragraphs, et al.

The next things I am going to try to get working are 1) part/chapter/section title detection and 2) poetry/quotation detection.

I see both of those working either in an overzealous automatic mode (assumes anything that might be a title or a quotation *is* one, and the user/bookmaker will restore formatting if it isn't) and an interactive one, where python informs the user of the match it thinks it found, and lets the user instruct it how to handle said potential match.

e.g.:

Code:
Potential title match:

-2:        and so she left.
-1:        
 0:        III. On the way to Istanbul
 1:        
 2:        The friar did not hesitate to purchase a ticket on the next ship, perhaps because

   Encode line 0 as [P]art/H1, [C]hapter/H2, [S]ection/H3 or [I]gnore?
   Enter  choice: _
I'm also hoping I'll get around to tidying up the code a bit at some point...

- Ahi
ahi is offline   Reply With Quote
Old 08-29-2009, 11:41 AM   #4
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
I will figure out the cause of the error... but I should perhaps note, nrapallo (in case it is unclear either to you or others) the -p option corrects erroneous paragraph breaking, not systematic paragraph breaking.

The program, regardless of whether the -p option is used, fixes systematic paragraph breaking like:

Code:
Here I am!  I travelled yesterday for four hours in a train.  It's a

funny sensation, isn't it?  I never rode in one before.



College is the biggest, most bewildering place--I get lost whenever I

leave my room.  I will write you a description later when I'm feeling

less muddled; also I will tell you about my lessons.  Classes don't

begin until Monday morning, and this is Saturday night.  But I wanted

to write a letter first just to get acquainted.
What -p would fix would be if the same lines were thus:

Code:
Here I am!  I travelled yesterday for four hours in a train.  It's a

funny sensation, isn't it?  I never rode in one before.



College is the biggest, most bewildering place--I get lost whenever I

leave my room.  I will write you a description later when I'm feeling


less muddled; also I will tell you about my lessons.  Classes don't

begin until Monday morning, and this is Saturday night.  But I wanted

to write a letter first just to get acquainted.
The -p option would detect that the line that the "paragraph" that ends with "... I'm feeling" and is followed by a paragraph that starts with "less muddled; also ..." are almost certainly supposed to be a single paragraph.

While the -p option is good to use (once it works reliably) on all files "just in case" (and since it reports to the user what it changes, you'll know if it corrects something in error)... a file that has no such systematic paragraph errors could be nicely processed with:

pacify.py -i input.txt -cq

Doing so with 157.txt yields the attached. At first look, it seems to work rather nicely, smartening up all single quotes without interfering/being confused by apostrophes... though if and when you find it messed up somewhere in this file, nrapallo, do let me know. It almost certainly get it wrong if there was a word like 'tis that began with a single quote--though since there are not many such words, it's not unreasonable for my program to keep a list of those so it knows to treat them correctly.

- Ahi
Attached Files
File Type: txt output.txt (230.9 KB, 357 views)
ahi is offline   Reply With Quote
Old 08-29-2009, 12:27 PM   #5
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Corrected file uploaded...

... it was actually cutting off final lines, and the error nrapallo found related to that.

Seems ok now.

- Ahi

Last edited by ahi; 08-29-2009 at 11:20 PM.
ahi is offline   Reply With Quote
Old 08-29-2009, 11:34 PM   #6
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Updated pacify.py ... see attached or first post.

---

Now pacify.py by default produces HTML... of sorts. Don't worry, I'll add back in the functionality to output UTF-8 plaintext when I get around to it.

The caveat being: no <p> tags are produced (although if you look at the source, it is separated by linebreaks, so you can add in the <p> tags easily enough with some clever search and replace.

Also, footnotes extracted from RTF files are enclosed in <footnote>some text here</footnote> for the sake of simplicity--this will be fixed.

Oh, and presently only formatting from RTF is picked up... so presently _emphasized phrase_ style formatting is not recognized.

---

The input is autodetected as either .txt or .rtf based on the file extension. Many RTFs work well... I am regularly encountering ones that prove problematic. Since RTF seems to be a rather large and unwieldy specification, I am not sure how likely am I to be able to guarantee the accuracy of conversion.

If anybody has advice on how I can make my RTF parser cleverly ignore stuff that it doesn't care about, I'd be grateful. It does alright so far... but since I do not yet understand how I could opt to only process text that shows visibly (as opposed to metadata) I am actively filtering out metadata one rtf command at a time... doubtless the wrong way to do it, I know.

The output defaults to HTML unless the -l switch (LaTeX) is used. The LaTeX switch now requires an argument... currently only supports -l gppro though.

Also, if you should provide the title (-T "..."), author (-A "lastname, firstname"), and optionally subtitle (-S "...") for the generated LaTeX document to have a nice title page. Optionally you can also specify your name (-I "...") for an "Ex Libris ..." inscription at the bottom of the title page.

Some parts of the program are a bit more robust now... so you are less likely to encounter errors, but they will almost certainly still happen if the file is very messy (or, I suppose, just very different from the ones I have tested with).

Comments, reports, suggestions are appreciated.

---

Suggested use:

Produce HTML from text:
pacify.py -i input.txt -pcq

Produce LaTeX from text:
pacify.py -i input.txt -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

Produce HTML from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro

Produce LaTeX from RTF (preserving italic/bold formatting + footnotes):
pacify.py -i input.rtf -pcql gppro -T "Title of Book" -A "Lastname, Firstname" -S "a jolly good tale" -I "Ahi"

- Ahi
Attached Files
File Type: zip pacify.zip (7.0 KB, 305 views)
ahi is offline   Reply With Quote
Old 08-31-2009, 02:51 AM   #7
sherman
Guru
sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.
 
Posts: 850
Karma: 2641698
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
I haven't tried this script yet, bit I will do at some point when I have some text files that need working on.

Just an idea for the distant future - and don't know if this is feasable even - but a lot of text files do not have any markdown whatsoever. It would be so timesaving if a script/program could be written that could automate the task of adding italics/emphasis to text. Stuff like internal dialogue (I doubt this is possible he mused) and telepathic type conversations. Maybe even ship/aircraft/<insert vehicle> names.

Yeah, I know. I'm probably dreaming...
sherman is offline   Reply With Quote
Old 08-31-2009, 11:36 AM   #8
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Sherman, assuming there are reasonably consistent rules that define what you want in italics, this might not even be hard.

The example you gave though is far too vague.

Quote:
I doubt this is possible he mused
to

Quote:
I doubt this is possible he mused
There is nothing indicated that "he mused" is not just a natural part of the sentence... like as in: "She always admired the way he mused." Not the most sensible statement... but certainly you wouldn't want any part of that italicized as internal dialogue.

The other issue is that "mused" doesn't necessarily indicate internal dialogue. He could have been musing aloud to somebody else.

Having said that... if you wanted all sentences that end with ", s/he thought" and ", s/he thought to him/herself" and ", s/he wondered." that's doable. The problem is the potential for considerable variety.

The only way I see this being doable is via a method where upon first pass, the program produces a list of sentences that it believes (based on whatever sort of pattern matching) to be candidates of italicizing as internal dialogue.

The user would then go over this list, and take out all false positives, and rerun the program for the italicization to take place based on the previously produced and now corrected list file.

Not sure how reliable this would be for the specific thing you are proposing it for... but the general principle might work for other similar tasks.

- Ahi
ahi is offline   Reply With Quote
Old 08-31-2009, 09:01 PM   #9
sherman
Guru
sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.
 
Posts: 850
Karma: 2641698
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
Hmm, when there's a sentence (or more than one) that include "I", sans quotes of any types this might not be so hard. The program would have to check it against the entire paragraph though to ensure it is not part of a conversation.

Also, if a sentence ends in something like "...s/he/<character name> thought.", and again that sentence is not detected within quotes, chances are it could be internal dialogue.

A harder one to catch is this sort of situation: ...That's the sixth servent he's sent screaming so far today., where that was the end of some internal dialogue. There may or may not be preceding internal dialogue with that, but I doubt it would be so simple for an automatic script or program to catch.
sherman is offline   Reply With Quote
Old 08-31-2009, 09:22 PM   #10
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
If you post a 4-5 paragraph sample text, I can tell you how readily scriptable it would be.

- Ahi
ahi is offline   Reply With Quote
Old 09-01-2009, 04:55 AM   #11
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I seriously doubt this can be automated in any way. Recognizing sentences and parsing their meaning goes far beyond simple scripting and into artificial intelligence.

Take, for instance, a text written in first person:

It was a dark night, I could hardly see her face, and I wondered what she thought.

It has no quotes, it includes "I" and it ends with "she thougt"...
Jellby is offline   Reply With Quote
Old 09-01-2009, 07:51 AM   #12
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Jellby View Post
I seriously doubt this can be automated in any way. Recognizing sentences and parsing their meaning goes far beyond simple scripting and into artificial intelligence.

Take, for instance, a text written in first person:

It was a dark night, I could hardly see her face, and I wondered what she thought.

It has no quotes, it includes "I" and it ends with "she thougt"...
Oh, there no chance of being able to automate this universally. But if somebody has a specific book in mind that perhaps has a particular sort of MO for internal dialogue, it could be gotten right for that.

Not that I'm even sure though what part of the sentence you quoted ought to be italicized as "internal dialogue"... just the last third? All of it? None if it (it being more narration than internal dialogue)?

- Ahi
ahi is offline   Reply With Quote
Old 09-01-2009, 07:58 AM   #13
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by ahi View Post
Not that I'm even sure though what part of the sentence you quoted ought to be italicized as "internal dialogue"... just the last third? All of it? None if it (it being more narration than internal dialogue)?
None of it, of course.
Jellby is offline   Reply With Quote
Old 09-01-2009, 08:54 AM   #14
ahi
Wizard
ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.ahi ought to be getting tired of karma fortunes by now.
 
Posts: 1,790
Karma: 507333
Join Date: May 2009
Device: none
Quote:
Originally Posted by Jellby View Post
None of it, of course.
Well then, Old Chap... it wasn't a very good example of internal dialogue that should be detected to be italicized, but probably wouldn't be.

- Ahi
ahi is offline   Reply With Quote
Old 09-01-2009, 09:23 AM   #15
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
No, it was an example of a likely false positive with the rules above

Granted, you can eliminate false positives with your two pass method, but there could be literally hundreds of them, often many more than real "internal dialogue" phrases.

As for false negatives, I often find dialogues (internal or not) that just omit the "he said", "she thought", etc. words. One should also look for "he said to himself" or "he wondered", or "he secretly admited", etc.

An automated tool can be of some help, but the danger is letting the user rely solely on the tool, which can be worse than just leaving the "internal dialogues" unformatted. Similarly, when I see curly quotes wrongly oriented I would prefer they had been left as straight quotes instead.
Jellby is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Best pdf to text/rtf/whatever I have ever seen jblitereader Ectaco jetBook 13 07-10-2010 12:02 AM
RTF and TEXT conversion spaze Calibre 4 08-23-2009 03:11 AM
Automatic .Lit extractor for the iLiad Adam B. iRex 34 09-25-2008 07:20 PM
kovidgoyal: templatemaker -- automatic data extractor sammykrupa Sony Reader 1 07-21-2007 01:52 PM
Text to RTF question. Roy White Sony Reader 0 05-12-2007 06:59 PM


All times are GMT -4. The time now is 12:52 AM.


MobileRead.com is a privately owned, operated and funded community.