Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-14-2010, 11:22 AM   #16
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by dwanthny View Post
Maybe I'm looking in the wrong spot but the one under structure detection has a editable area.
The checkbox you need to check is "Preprocess input file to possibly improve structure detection" There is an editable box below that, "insert page breaks before" xpath, but the xpath and preprocessing don't have anything to do with one another, the GUI just needs a bit better layout in this case.


Quote:
Originally Posted by dwanthny View Post
Just like with every other format if garbage was the source garbage is what they ended up with.
I did find some crappy lrf files after further investigation, in every case so far they were sourced from ugly lit files. Beyond that they were broken into individual flows in random places based on file size, just like Calibre does when it can't find good split points. This makes them even less desirable to use as a source for conversion.
ldolse is offline   Reply With Quote
Old 09-14-2010, 11:32 AM   #17
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,909
Karma: 5035037
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Chapter detection, by its very nature needs to match tags, not random regexps. In fact for links to chanpters to work the tags must be block tags like heading, div or p. Or anchors.

That is why the options use XPath.

If you are asking for a generic regexp pre-processing option, that can be done (though really I would suggest using the conversion debug option, then using whatever regexp tool you like (I prefer vim) on the files in the input directory. zip up the input directory and have calibre convert that.
kovidgoyal is offline   Reply With Quote
Old 09-14-2010, 11:41 AM   #18
tonyx3
Connoisseur
tonyx3 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2010
Device: Nexus One
Quote:
Originally Posted by kovidgoyal View Post
Chapter detection, by its very nature needs to match tags, not random regexps. In fact for links to chanpters to work the tags must be block tags like heading, div or p. Or anchors.

That is why the options use XPath.

If you are asking for a generic regexp pre-processing option, that can be done (though really I would suggest using the conversion debug option, then using whatever regexp tool you like (I prefer vim) on the files in the input directory. zip up the input directory and have calibre convert that.
Thanks, Kovid. Appreciate your work, and your input.



Yeah, it should be with tags.

But when working with a poorly formatted source file, in which the chapter headings aren't tagged properly, that doesn't always work.


I guess I'm just looking for a way to not have to convert a file to rtf (or some other format) and work on it in another program, and then bring it back to calibre.

When I'm working with such a file, it's a bit frustrating to be able to write a simple regex that can match the chapter headings (I can test it in the header removal preview tool, since chapters don't have a test feature), in spite of their untagged format, but then not be able to use it to simplify my conversion.

As for links to chapters working, shouldn't it be able to use my simple regex to match the chapter headings, and then tag them with the appropriate header tags in my output file?
tonyx3 is offline   Reply With Quote
Old 09-14-2010, 11:46 AM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,909
Karma: 5035037
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by tonyx3 View Post
As for links to chapters working, shouldn't it be able to use my simple regex to match the chapter headings, and then tag them with the appropriate header tags in my output file?
No, you can't just insert random tags into XHTML and have it still be valid. A regex can match anything at all. It could match a fragment something like:

Code:
<span class="ONE">Bad, bad bad<a>
Where only the fragment in bold is matched. How do you suggest calibre insert tags in this case?
kovidgoyal is offline   Reply With Quote
Old 09-14-2010, 11:50 AM   #20
tonyx3
Connoisseur
tonyx3 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2010
Device: Nexus One
[QUOTE=kovidgoyal;1109321]No, you can't just insert random tags into XHTML and have it still be valid. A regex can match anything at all. It could match a fragment something like:

Code:
<span class="ONE">Bad, bad bad<a>
Ok, that's true.

The potential to completely mess up the file would be there, for sure.


What suggestion would you give for working on my original example, without using a third format and manually editing it?

Or is there no way at all?
tonyx3 is offline   Reply With Quote
Old 09-14-2010, 01:01 PM   #21
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,909
Karma: 5035037
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I would say use a two stage process, like I described earlier.
kovidgoyal is offline   Reply With Quote
Old 09-14-2010, 09:30 PM   #22
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,838
Karma: 12535517
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by ldolse View Post
The checkbox you need to check is "Preprocess input file to possibly improve structure detection" There is an editable box below that, "insert page breaks before" xpath, but the xpath and preprocessing don't have anything to do with one another, the GUI just needs a bit better layout in this case.
That explains my confusion, thanks for the clarification.
DoctorOhh is offline   Reply With Quote
Reply

Tags
chapter, regex

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can't detect Cybook Gen 3 minca Calibre 4 08-09-2010 08:50 AM
SD Class support drdman Astak EZReader 6 10-30-2009 12:42 AM
ePub Chapters vs. Stanza Chapters kjk Sigil 4 09-14-2009 10:50 AM
What do need to detect a Kindle 2? TallMomof2 Calibre 3 02-24-2009 05:00 PM
TeX class nsg Sony Reader 3 11-05-2007 07:58 PM


All times are GMT -4. The time now is 01:53 AM.


MobileRead.com is a privately owned, operated and funded community.