![]() |
#1 |
Member
![]() Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
Utility for Project Gutenberg Text Files
Hi Everybody:
I stumbled on a great little application written by Duncan Jauncey that removes the embedded line-returns in a Project Gutenberg text file. http://www.duncanjauncey.com/gutenberg.html Update: Thursday, March 18, 2010 I ran across some other issues with a Project Gutenberg text file ... and decided to work up my own, expanded, script to handle embedded line returns, starting tabs, insert & remove blank lines, etc. The "GrannyFix" page is 100% HTML and JavaScript, so it runs on your own computer. I've pasted in the entire "Pride & Prejudice" text and the textarea handles it OK. Because I'm using Regular Expressions to comb through the file, it's a lot faster than the Jauncey script. Future development plans are to (1) tighten up the regex and (2) add more options as I work with other Gutenberg files. Link: www.rocketgranny.com/freebies/GrannyFix.html PS: Right-click the page to view source, copy and paste to your own system. Thanks. Last edited by rocketgranny; 03-18-2010 at 04:50 PM. |
![]() |
![]() |
![]() |
#2 |
C L J
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,911
Karma: 21115458
Join Date: Dec 2008
Location: Birmingham UK
Device: Sony e-reader 505, Kindle PW2, Kindle PW3, Kobo Libra2
|
Does this maintain paragraphing? I hate text files with line returns, there are breaks in all the wrong places. Yuck! Thanks granny.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
Hi Bookcat:
Paragraphing is maintained IF there is a blank line between paragraphs. The logic is that it replaces all line returns with a blank space UNLESS it finds 2 consecutive line returns, in which case it removes one and leaves the other. I guess you get what you pay for, insofar as free eBooks are concerned. I'm having to spend a lot of time, cleaning up the ones I've decided are "keepers" because like you, I hate sentences that break off mid-line & etc. RocketGranny. Last edited by rocketgranny; 03-12-2010 at 12:37 PM. |
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 10
Karma: 10
Join Date: Dec 2007
Device: Kindle 2
|
Thanks for the heads up!
![]() |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 765
Karma: 2825929
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1; iPad Air; iPhone 7; Kobo Libra; Kindle Oasis 3
|
We already have something on-site that will take care of the problem. Do a search for Stingo's Macro.
Jim |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Reading is sexy
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,303
Karma: 544517
Join Date: Apr 2009
Device: none
|
Thanks for this!
|
![]() |
![]() |
![]() |
#7 |
Member
![]() Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
|
|
![]() |
![]() |
![]() |
#8 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 139
Karma: 1057240
Join Date: Mar 2007
Location: Brighton, England
Device: Sony PRS-T1, Kindle 3G, Kindle DX
|
I haven't checked your script, but some of the checks I did last time I cleaned a gutenburg text included:
- if the next character after a linebreak isn't a lowercase character, assume it's a real line break. - if the character before the line break is a full stop, assume it's a real line break. It seemed to work for me, not sure I still have the regex's around, I'll go have a look. |
![]() |
![]() |
![]() |
Tags |
line endings, lines, project gutenberg, reformat, text |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How are the mobi and epub files at Project Gutenberg? | ficbot | General Discussions | 2 | 04-16-2010 06:57 PM |
Text tool for formatting Gutenberg text files | bob_ninja | Workshop | 5 | 11-13-2007 12:28 PM |
turning project gutenberg txt files to pdfs | kamyar22 | Sony Reader | 12 | 01-27-2007 08:33 AM |
Utility for converting Gutenberg books. | FangornUK | Sony Reader | 2 | 11-02-2006 10:16 AM |
Python Gutenberg E-text Project: PyGE | ignatz | Deals and Resources (No Self-Promotion or Affiliate Links) | 2 | 09-17-2004 01:18 PM |