Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book General > Deals and Resources (No Self-Promotion or Affiliate Links)

Notices

Reply
 
Thread Tools Search this Thread
Old 03-04-2010, 11:03 AM   #1
rocketgranny
Member
rocketgranny is on a distinguished road
 
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
Utility for Project Gutenberg Text Files

Hi Everybody:

I stumbled on a great little application written by Duncan Jauncey that removes the embedded line-returns in a Project Gutenberg text file.

http://www.duncanjauncey.com/gutenberg.html

Update: Thursday, March 18, 2010
I ran across some other issues with a Project Gutenberg text file ... and decided to work up my own, expanded, script to handle embedded line returns, starting tabs, insert & remove blank lines, etc. The "GrannyFix" page is 100% HTML and JavaScript, so it runs on your own computer. I've pasted in the entire "Pride & Prejudice" text and the textarea handles it OK.

Because I'm using Regular Expressions to comb through the file, it's a lot faster than the Jauncey script. Future development plans are to (1) tighten up the regex and (2) add more options as I work with other Gutenberg files.

Link: www.rocketgranny.com/freebies/GrannyFix.html

PS: Right-click the page to view source, copy and paste to your own system. Thanks.

Last edited by rocketgranny; 03-18-2010 at 04:50 PM.
rocketgranny is offline   Reply With Quote
Old 03-05-2010, 02:09 AM   #2
BookCat
C L J
BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.BookCat ought to be getting tired of karma fortunes by now.
 
BookCat's Avatar
 
Posts: 2,912
Karma: 21115458
Join Date: Dec 2008
Location: Birmingham UK
Device: Sony e-reader 505, Kindle PW2, Kindle PW3, Kobo Libra2
Does this maintain paragraphing? I hate text files with line returns, there are breaks in all the wrong places. Yuck! Thanks granny.
BookCat is offline   Reply With Quote
Advert
Old 03-12-2010, 12:34 PM   #3
rocketgranny
Member
rocketgranny is on a distinguished road
 
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
Hi Bookcat:

Paragraphing is maintained IF there is a blank line between paragraphs.

The logic is that it replaces all line returns with a blank space UNLESS it finds 2 consecutive line returns, in which case it removes one and leaves the other.

I guess you get what you pay for, insofar as free eBooks are concerned. I'm having to spend a lot of time, cleaning up the ones I've decided are "keepers" because like you, I hate sentences that break off mid-line & etc.

RocketGranny.

Last edited by rocketgranny; 03-12-2010 at 12:37 PM.
rocketgranny is offline   Reply With Quote
Old 03-12-2010, 12:49 PM   #4
raulf
Member
raulf began at the beginning.
 
raulf's Avatar
 
Posts: 10
Karma: 10
Join Date: Dec 2007
Device: Kindle 2
Thanks for the heads up! this can come in very handy indeed for Project Gutenber and other fellows that are using line breaks like crazy as well. It's more common than it should.
raulf is offline   Reply With Quote
Old 03-12-2010, 01:11 PM   #5
Strether
Guru
Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.Strether ought to be getting tired of karma fortunes by now.
 
Strether's Avatar
 
Posts: 742
Karma: 2825929
Join Date: Feb 2007
Location: Fresno
Device: Kindle 1; iPad Air; iPhone 7; Kobo Libra; Kindle Oasis 3
We already have something on-site that will take care of the problem. Do a search for Stingo's Macro.

Jim
Strether is offline   Reply With Quote
Advert
Old 03-12-2010, 04:22 PM   #6
queentess
Reading is sexy
queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.queentess ought to be getting tired of karma fortunes by now.
 
queentess's Avatar
 
Posts: 1,303
Karma: 544517
Join Date: Apr 2009
Device: none
Thanks for this!
queentess is offline   Reply With Quote
Old 03-18-2010, 04:45 PM   #7
rocketgranny
Member
rocketgranny is on a distinguished road
 
Posts: 19
Karma: 54
Join Date: Feb 2010
Location: San Francisco, CA
Device: Nook
Hi Strether:

The Stingo Macro is fine tool ... if you want to use MS-Word.

RocketGranny
rocketgranny is offline   Reply With Quote
Old 03-20-2010, 02:44 AM   #8
NightGeometry
Zealot
NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.NightGeometry ought to be getting tired of karma fortunes by now.
 
NightGeometry's Avatar
 
Posts: 139
Karma: 1057240
Join Date: Mar 2007
Location: Brighton, England
Device: Sony PRS-T1, Kindle 3G, Kindle DX
I haven't checked your script, but some of the checks I did last time I cleaned a gutenburg text included:
- if the next character after a linebreak isn't a lowercase character, assume it's a real line break.
- if the character before the line break is a full stop, assume it's a real line break.

It seemed to work for me, not sure I still have the regex's around, I'll go have a look.
NightGeometry is offline   Reply With Quote
Reply

Tags
line endings, lines, project gutenberg, reformat, text

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How are the mobi and epub files at Project Gutenberg? ficbot General Discussions 2 04-16-2010 06:57 PM
Text tool for formatting Gutenberg text files bob_ninja Workshop 5 11-13-2007 12:28 PM
turning project gutenberg txt files to pdfs kamyar22 Sony Reader 12 01-27-2007 08:33 AM
Utility for converting Gutenberg books. FangornUK Sony Reader 2 11-02-2006 10:16 AM
Python Gutenberg E-text Project: PyGE ignatz Deals and Resources (No Self-Promotion or Affiliate Links) 2 09-17-2004 01:18 PM


All times are GMT -4. The time now is 11:41 AM.


MobileRead.com is a privately owned, operated and funded community.