Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-04-2011, 06:53 AM   #1
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Text to HTML (or any e-book format, really) program that detects chapters?

I have many old text files that are broken up into chapters by simple labels. Chapter 1 (or Chapter I) and so forth.

I've been just loading them into Sigil, but then I have to go through it and mark each chapter by hand.

While this isn't that big a deal, only takes like 10 minutes each, it does get kind of old. It seems like there must be a way to automate it, and then be able to save it as a html file with the chapters labeled so I can import that into Sigil.

Any ideas? (Sadly, ten years ago I could probably write a program to do this, but I don't even have a programming language installed on my computer anymore...)
JeremyR is offline   Reply With Quote
Old 02-04-2011, 08:08 AM   #2
Paulinafrica
Zealot
Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.Paulinafrica can successfully navigate the Paris bus system.
 
Paulinafrica's Avatar
 
Posts: 118
Karma: 36978
Join Date: Sep 2010
Location: Johannesburg, South Africa
Device: Kindle Android, Kindle 3 Wi-Fi
Hello Jeremy, Would it be possible for you to use Calibre to do this for you?
Paulinafrica is offline   Reply With Quote
Advert
Old 02-04-2011, 09:17 AM   #3
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Regular expression find and replace?

It would depend on the particular system/editor. In Vim I would search for:

^\s*Chapter\s*[0-9XVI]*\s*$

That breaks down like this:
^ = start of line
\s* = any amount (inluding none) of whitespace (spaces, tabs, etc.)
Chapter = the word Chapter
\s* = any amount of whitespace again
[0-9XVI]* = a string of any length consisting only of 0-9 X and V and I
\s* = any amount of whitespace again
$ = the end of the line
And replace with

<h1>&<\/h1>

which is the search pattern but with <h1> HTML tags around it.

Not sure if Sigil supports something like that (I don't use it), but it might. If not, you can use any advanced Text editor.
frabjous is offline   Reply With Quote
Old 02-04-2011, 08:50 PM   #4
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Quote:
Originally Posted by Paulinafrica View Post
Hello Jeremy, Would it be possible for you to use Calibre to do this for you?
I actually thought Calibre did this at first (it's what gave me the idea). But it apparently only detects chapters that have the h1/h2 tags around it
JeremyR is offline   Reply With Quote
Old 02-04-2011, 08:51 PM   #5
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Quote:
Originally Posted by frabjous View Post
Regular expression find and replace?

It would depend on the particular system/editor. In Vim I would search for:

^\s*Chapter\s*[0-9XVI]*\s*$

That breaks down like this:
^ = start of line
\s* = any amount (inluding none) of whitespace (spaces, tabs, etc.)
Chapter = the word Chapter
\s* = any amount of whitespace again
[0-9XVI]* = a string of any length consisting only of 0-9 X and V and I
\s* = any amount of whitespace again
$ = the end of the line
And replace with

<h1>&<\/h1>

which is the search pattern but with <h1> HTML tags around it.

Not sure if Sigil supports something like that (I don't use it), but it might. If not, you can use any advanced Text editor.
Thanks! I will probably end up having to try that. I was hoping for a simpler (and more automated) answer, but this is probably as good as it gets.
JeremyR is offline   Reply With Quote
Advert
Old 02-05-2011, 12:58 PM   #6
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by JeremyR View Post
I actually thought Calibre did this at first (it's what gave me the idea). But it apparently only detects chapters that have the h1/h2 tags around it
Calibre lets you define your own search criteria.

Dale
DaleDe is offline   Reply With Quote
Old 02-05-2011, 10:15 PM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
calibre has an entire heuristic processing section. One of the heuristics is for detecting and applying the appropriate style to chapter headings. TXT input defaults will try to auto detect the paragraph type and the formatting used. If Textile or Markdown are not detected the default is to enable the majority of the heuristic processing options.
user_none is offline   Reply With Quote
Old 02-06-2011, 11:31 AM   #8
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by JeremyR View Post
Any ideas? (Sadly, ten years ago I could probably write a program to do this, but I don't even have a programming language installed on my computer anymore...)
You might want to install Perl (a/k/a Practical Extraction and Report Language.) It's designed for manipulating text files and performing programmed edits on the content. You use regular expressions to locate what you want to change and take actions based on the results.

It's cross-platform, free, and open source. The best Windows version is probably Active State's: http://www.activestate.com/activeperl/downloads

They sell supported commercial versions, but a free Community edition is available, with source.

You can also look at Text2HTML, an open source conversion program built on Perl.
_______
Dennis
DMcCunney is offline   Reply With Quote
Old 02-10-2011, 06:05 PM   #9
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Quote:
Originally Posted by DaleDe View Post
Calibre lets you define your own search criteria.

Dale
Unless I am reading it wrong (and that is quite possible), it only searches things that have been tagged. But my problem is, as it's plain text, it hasn't been tagged with anything.

Quote:
Originally Posted by user_none View Post
calibre has an entire heuristic processing section. One of the heuristics is for detecting and applying the appropriate style to chapter headings. TXT input defaults will try to auto detect the paragraph type and the formatting used. If Textile or Markdown are not detected the default is to enable the majority of the heuristic processing options.
As impressive as Calibre is as a whole, all I can say is, it just doesn't work for the ones I've tried it on..

Indeed, not only does it not detect chapters, it runs together most of the paragraphs making the end result unreadable.

So really, Calibre doesn't enter the picture into this process. I am just looking for something to apply before loading the file into Sigil (which detects the paragraphs just fine).
JeremyR is offline   Reply With Quote
Old 02-10-2011, 07:04 PM   #10
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Quote:
Originally Posted by DMcCunney View Post
You might want to install Perl (a/k/a Practical Extraction and Report Language.) It's designed for manipulating text files and performing programmed edits on the content. You use regular expressions to locate what you want to change and take actions based on the results.
Thanks for the suggestion, but I think I am well past learning any new programming languages (since I was never great at it to begin with). I think I will just look for my old Visual Basic cds and see what I can do with that...
JeremyR is offline   Reply With Quote
Old 02-10-2011, 08:16 PM   #11
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by JeremyR View Post
Unless I am reading it wrong (and that is quite possible), it only searches things that have been tagged. But my problem is, as it's plain text, it hasn't been tagged with anything.



As impressive as Calibre is as a whole, all I can say is, it just doesn't work for the ones I've tried it on..

Indeed, not only does it not detect chapters, it runs together most of the paragraphs making the end result unreadable.

So really, Calibre doesn't enter the picture into this process. I am just looking for something to apply before loading the file into Sigil (which detects the paragraphs just fine).

Calibre could search for something like the word Chapter I think if all your chapters start with that word.

Have you just tried pasting the text onto an empty Sigil page?

Dale
DaleDe is offline   Reply With Quote
Old 02-10-2011, 09:29 PM   #12
JeremyR
Guru
JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.JeremyR ought to be getting tired of karma fortunes by now.
 
JeremyR's Avatar
 
Posts: 973
Karma: 2458402
Join Date: Aug 2010
Location: St. Louis
Device: Kindle Keyboard, Nook HD+
Thanks for the help everyone (esp frabjous ), I've got it now.

Sigil will do it, sort of. It mostly supports REGEX

So I searched for

CHAPTER [0-9XVI]+

and then replaced with

<h3>\0</h3>

or if you want it to add chapter break marks

<hr class="sigilChapterBreak" /><h3>\0</h3>
JeremyR is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] calibre detects chapters, doesn't add to TOC Corey.Langner Calibre 17 09-25-2011 07:22 PM
Order of Chapters in HTML->ePub alias_neo Calibre 9 05-16-2011 11:55 AM
HTML to MOBI text format is off when I get it on Kindle cloudyvisions Calibre 5 07-14-2010 12:42 AM
Convert html to e-book format udav Other formats 1 01-26-2010 01:19 PM
chapters (HTML-files) not showing up erik5000 ePub 1 12-21-2009 04:22 PM


All times are GMT -4. The time now is 01:57 PM.


MobileRead.com is a privately owned, operated and funded community.