Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-25-2011, 03:43 AM   #1
Sylver
Addict
Sylver began at the beginning.
 
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
Help! txt to epub conversion markup: linked notes

Hi, I have a huge txt file that contains hundreds of short stories each with a notes section, it's in a format of something like the following example, can some kindly explain to me how can I easily add a markup mapping for the notes in an text editor (hopefully something that can be done by global replacing/adding) so that when converted to epub in Calibre, the number and the corresponding notes can interlinked with each other. The file has notes number [1] under each story, how can I make sure the notes no. 1 for word "queen" don't mess up with other words "shoe", "alone" that are also labeled as notes no. 1 (but for another story). Thank you so much!. Following is a watered down version of the text format (the real stories are much much longer and usually have about 100 notes under each story):

## Story One:

"My God," said the Queen[1], "I'm pregnant[2]. I wonder who the father is."

Notes:

[1]queen: a female monarch.
[2]pregnant: with child or young as a woman or female mammal.

## Story Two:

For sale: baby shoes[1], never worn[2].

Notes:
[1]shoe: an external covering for the human foot.
[2]worn: past participle of wear - to carry or have (a garment, etc).

## Story Three:

The last man on Earth sat alone[1] in a room. There was a knock[2] on the door.

Notes:
[1]alone: apart, or isolated from others.
[2]knock: the sound of knocking, especially a rap, as at a door.
Sylver is offline   Reply With Quote
Old 02-25-2011, 07:27 AM   #2
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
It's going to be quite a bit of work.

A few questions,
What text editor do you use? Can it search in selection only?

How many individual short stories are there?

Does it have to be in markdown?
Does any of your text have any other markup apart from headers (###), i.e. bold/italics?

The reason I ask is if we convert it to Textile, it will be easier to do the footnotes.

Do the footnotes need to be able to link back to the place they linked from?
Some readers don't have a backup button, and need to have a link back.

I can see how I'd do it in Textile (my preference txt format), using two different search and replaces, doing each stories text, then its footnotes,
then the next stories text, and then its footnotes.
Perkin is offline   Reply With Quote
Old 02-25-2011, 07:57 AM   #3
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Interesting problem. Are you familiar with regular expressions? If so, you could use the search & replace feature. I could see something along the lines of
Code:
(?s)(?P<number>\[\d+\])(?P<word>\w+:)(?P<text>.*?)(?P=number)(?P=word)
as search expression work, you could then replace that with
Code:
\g<number>\g<word>\g<text>\g<number>\g<word>
plus whatever linking markup you need inserted around the first number/word pair.

The caveat here is that this is just off the top of my head. This may work, or it may fail catastrophically
Manichean is offline   Reply With Quote
Old 02-25-2011, 09:03 AM   #4
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
The BIG problem is that all the footnotes are re-using the same numbers.

Manichean, One of the problems is that Markdown can't do id tags.
(I suppose we could add html for those tags, then it would be a reasonably simple regex)
Perkin is offline   Reply With Quote
Old 02-25-2011, 09:39 AM   #5
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Right, I've done it with the small example given.
It's three Search and replaces, Two on the orig markdown text file, then when converted to epub, a S&R done in Sigil

Also, the footnotes have to be seperated by blank lines, instead of on following lines as in example.

Open the original markdown txt file
Search
Code:
^\[(\d+)\](\S+)
Replace
Code:
<sup>\1</sup><a id="fn\1" href="#fnr\1">\2<a>
Edit: replacing all occurences

Then
Search
Code:
(\S+)\[(\d+)\]
Replace
Code:
\1<sup><a id="#fnr\2" href="#fn\2">\2</a></sup>
Edit: replacing all occurences

Save file, convert with calibre to epub, Using pagebreak in structure detection with default Xpath expression, so each story would the be in their own file.

Then open in Sigil,
Search
Code:
../Text/index_split_000.html
Replace with nothing.

Edit: replacing all occurences in CodeView, and replace in all files


Save, you should then have the correctly done file.

Last edited by Perkin; 02-25-2011 at 09:41 AM. Reason: Added info
Perkin is offline   Reply With Quote
Old 02-25-2011, 12:04 PM   #6
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Perkin View Post
The BIG problem is that all the footnotes are re-using the same numbers.

Manichean, One of the problems is that Markdown can't do id tags.
(I suppose we could add html for those tags, then it would be a reasonably simple regex)
Huh? First off, I'm suggesting a regex in the search and replace field using named backreferences, I don't understand your Markdown comment... Second, I use the number and the word to identify the footnote, so reusing numbers shouldn't be a problem. I think that using what I suggested you could do in one S&R pass what you suggested doing using multiple passes plus Sigil.
Manichean is offline   Reply With Quote
Old 02-25-2011, 12:43 PM   #7
Sylver
Addict
Sylver began at the beginning.
 
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
Thanks, Perkin and Manichean.
I am using UltraEdit version 15. I tried Perkin's replacing, it does not work, as all notes numbers are referencing the first story notes, and I do need the notes to be able to link back to original text. The original text has about 500 stories each varying from a few sentences to a few pages of length.
I am still trying to understand Manichean's suggestions as I am not very good at regular expression. Will give it a try once I understand this better.
Sylver is offline   Reply With Quote
Old 02-25-2011, 12:45 PM   #8
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Sorry, just the way I worded it. (I'm not very clear when trying to explain things)

I tried to say that the BIG problem, is that the OP's file uses all the same numbers for each of the stories and so any link references written in the replace would link to the first footnote reference in the converted file, hence the need to use Sigil and the replace of the '../Text/index_split_000.html', that way they would only link to ones in their own file rather than all jumping to the link in the first file.

Since Markdown text doesn't do markup for an id for a link tag to reference, as in html tag <a href="#idname">, I had it so that what's inserted is what the html would be in the final output instead of the markdown.

Going from the OP's sample, it seemed an easier solution to use the 2 s&r's to do the task, although using your one regex would work just as well, the fiddly bit was working out what would work in the final output, that's why I broke it into two, working each one out separately.

Hope it's clearer (note I said clearer and not clear)
Perkin is offline   Reply With Quote
Old 02-25-2011, 12:51 PM   #9
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Quote:
Originally Posted by Sylver View Post
as all notes numbers are referencing the first story notes, and I do need the notes to be able to link back to original text. The original text has about 500 stories each varying from a few sentences to a few pages of length.
That's why I said to do the s&r in Sigil in the converted epub, If when converting in calibre, you set the 'split on chapters' each story will get its own file in the epub, BUT all the links as they all reference the same numbers would jumpp to the first file.
Then the S&R in sigil, replacing the links 'jump' reference, removing the '../Text/index_split_000.html', would then make them link only to their own file (story) footnote links.

Edit:
I tried it on the 'Three story' sample, and it did work, and after the epub footnote links did only link to their correct places and the back links as well.

Last edited by Perkin; 02-25-2011 at 12:55 PM.
Perkin is offline   Reply With Quote
Old 02-25-2011, 01:07 PM   #10
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Perkin, now I understand what you mean.

Please note that if you decide to go with my method, you'd still need to adapt the replace text to include whatever Markdown uses for links. I don't know Markdown, so I only wrote down what ought to reproduce the input.
Manichean is offline   Reply With Quote
Old 02-25-2011, 01:14 PM   #11
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Quote:
Originally Posted by Manichean View Post
Perkin, now I understand what you mean.


Quote:
Please note that if you decide to go with my method, you'd still need to adapt the replace text to include whatever Markdown uses for links. I don't know Markdown, so I only wrote down what ought to reproduce the input.
The 'extra' stuff would just need adapting from the s&r's given earlier.

Last edited by Perkin; 02-25-2011 at 01:17 PM.
Perkin is offline   Reply With Quote
Old 02-25-2011, 01:25 PM   #12
Sylver
Addict
Sylver began at the beginning.
 
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
Thanks, Perkin, I see what you mean and am getting closer. But how do you force split on chapters in Calibre, I guess that's what I am missing in conversion.
Sylver is offline   Reply With Quote
Old 02-25-2011, 01:34 PM   #13
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
For the conversions I done
In 'conversion - Structure detection' section
Detect chapters (default)
Quote:
//*[((name()='h1' or name()='h2') and re:test(., 'chapter|book|prologue|epilogue|glossary|section|p art\s+', 'i')) or @class = 'chapter']
chapter mark : pagebreak

Insert page breaks before, Ive got
Quote:
//*[name()='h1' or name()='h2' or name()='h3']
Perkin is offline   Reply With Quote
Old 02-25-2011, 02:09 PM   #14
Sylver
Addict
Sylver began at the beginning.
 
Posts: 234
Karma: 40
Join Date: Apr 2010
Device: The Nook, iPad
Thanks, Perkin, it only works one way (from main story to notes) but not back link, I guess your first S&R has a syntax error, why it is ending with <a> but not </a>? And I am guessing the second S&R has issues also, why there is two # there, should one be without a #. I am not familiar with this regular expression, but it looks fishy. I will edit and recompile to see how it works. Many thanks for the help.

Last edited by Sylver; 02-25-2011 at 02:21 PM.
Sylver is offline   Reply With Quote
Old 02-25-2011, 02:32 PM   #15
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 651
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Sorry

You're right on both counts, add the / to the <a> in the first replace statement and remove the first # (in the id tag) in the second replace.

Next time I'll copy and paste them rather than type them out.

Apologies.
Perkin is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preserving <br /> on epub -> txt conversion billingd Calibre 1 08-11-2010 07:24 AM
Conversion: EPUB to TXT Starson17 Calibre 11 05-29-2010 01:31 PM
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 06:06 PM
HTML to TXT conversion alkr Calibre 3 10-02-2009 10:54 AM
Batch conversion of txt BlackVoid Sony Reader 8 11-17-2007 10:53 PM


All times are GMT -4. The time now is 06:02 AM.


MobileRead.com is a privately owned, operated and funded community.