03-04-2010, 11:03 AM | #1 |
Member
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
Utility for Project Gutenberg Text Files
Hi Everybody:
I stumbled on a great little application written by Duncan Jauncey that removes the embedded line-returns in a Project Gutenberg text file. http://www.duncanjauncey.com/gutenberg.html Update: Thursday, March 18, 2010 I ran across some other issues with a Project Gutenberg text file ... and decided to work up my own, expanded, script to handle embedded line returns, starting tabs, insert & remove blank lines, etc. The "GrannyFix" page is 100% HTML and JavaScript, so it runs on your own computer. I've pasted in the entire "Pride & Prejudice" text and the textarea handles it OK. Because I'm using Regular Expressions to comb through the file, it's a lot faster than the Jauncey script. Future development plans are to (1) tighten up the regex and (2) add more options as I work with other Gutenberg files. Link: www.rocketgranny.com/freebies/GrannyFix.html PS: Right-click the page to view source, copy and paste to your own system. Thanks. Last edited by rocketgranny; 03-18-2010 at 04:50 PM. |
03-05-2010, 02:09 AM | #2 |
C L J
Posts: 2,912
Karma: 21115458
Join Date: Dec 2008
Location: Birmingham UK
Device: Sony e-reader 505, Kindle PW2, Kindle PW3, Kobo Libra2
|
Does this maintain paragraphing? I hate text files with line returns, there are breaks in all the wrong places. Yuck! Thanks granny.
|
03-12-2010, 12:34 PM | #3 |
Member
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
Hi Bookcat:
Paragraphing is maintained IF there is a blank line between paragraphs. The logic is that it replaces all line returns with a blank space UNLESS it finds 2 consecutive line returns, in which case it removes one and leaves the other. I guess you get what you pay for, insofar as free eBooks are concerned. I'm having to spend a lot of time, cleaning up the ones I've decided are "keepers" because like you, I hate sentences that break off mid-line & etc. RocketGranny. Last edited by rocketgranny; 03-12-2010 at 12:37 PM. |
03-12-2010, 12:49 PM | #4 |
Member
Posts: 10
Karma: 10
Join Date: Dec 2007
Device: Kindle 2
|
Thanks for the heads up! this can come in very handy indeed for Project Gutenber and other fellows that are using line breaks like crazy as well. It's more common than it should.
|
03-12-2010, 01:11 PM | #5 |
Guru
Posts: 742
Karma: 2825929
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1; iPad Air; iPhone 7; Kobo Libra; Kindle Oasis 3
|
We already have something on-site that will take care of the problem. Do a search for Stingo's Macro.
Jim |
03-12-2010, 04:22 PM | #6 |
Reading is sexy
Posts: 1,303
Karma: 544517
Join Date: Apr 2009
Device: none
|
Thanks for this!
|
03-18-2010, 04:45 PM | #7 |
Member
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
|
03-20-2010, 02:44 AM | #8 |
Zealot
Posts: 139
Karma: 1057240
Join Date: Mar 2007
Location: Brighton, England
Device: Sony PRS-T1, Kindle 3G, Kindle DX
|
I haven't checked your script, but some of the checks I did last time I cleaned a gutenburg text included:
- if the next character after a linebreak isn't a lowercase character, assume it's a real line break. - if the character before the line break is a full stop, assume it's a real line break. It seemed to work for me, not sure I still have the regex's around, I'll go have a look. |
Tags |
line endings, lines, project gutenberg, reformat, text |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How are the mobi and epub files at Project Gutenberg? | ficbot | General Discussions | 2 | 04-16-2010 06:57 PM |
Text tool for formatting Gutenberg text files | bob_ninja | Workshop | 5 | 11-13-2007 12:28 PM |
turning project gutenberg txt files to pdfs | kamyar22 | Sony Reader | 12 | 01-27-2007 08:33 AM |
Utility for converting Gutenberg books. | FangornUK | Sony Reader | 2 | 11-02-2006 10:16 AM |
Python Gutenberg E-text Project: PyGE | ignatz | Deals and Resources (No Self-Promotion or Affiliate Links) | 2 | 09-17-2004 01:18 PM |