View Single Post
Old 01-25-2012, 03:52 AM   #1
Longmatys
Junior Member
Longmatys began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2012
Device: Kindle 3
Question Search & Replace problems with pdf conversion

Hi All,

I have some difficulty with search & replace. I'm quite familiar with regex, but I'm not sure if I understand correctly, how does it work in calibre.

My problem is, that I'm trying to convert pdf to mobi. Everything is quite ok, and the last thing is, that there are headings and page numbers to be removed. Page numbers were quite easy, but I have problems with page headings. I tried to use the debug mode, but it does not help me either.

My question is, from debugging process I got three directories:
  • input
  • parsed
  • processed
  • structure

From what I read, the regex processing should be run against parsed files. But I really doubt it.

I use two regex patterns:
  1. \n[0-9]+ <br>
  2. 0[^\n]*?0\s*<br>

The first one removes page numbers and works fine. The second one is meant to remove lines, which begins and ends with 0 with any text between them (it contains the name of chapters - so it keeps changing)

I'm confused, that the first one works - in xhtml files in parsed directory, there are no
Code:
<br>
tags??

Is there a way, where I can step in the process - do some manual search/replace and then continue? Maybe via html export/import?


Input exhibit:
Code:
<hr>
<A name=8></a>0&nbsp;&nbsp;Doba&nbsp;jedová&nbsp;0&nbsp;<br>
dostávají&nbsp;velmi&nbsp;obtížně&nbsp;a&nbsp;pomalu,&nbsp;protože&nbsp;znamenají&nbsp;velký&nbsp;zásah&nbsp;<br>
parsed exhibit:
Code:
na  veřejnost 0  Doba jedová 0 </p>
<p>dostávají velmi obtížně a pomal
processed exhibit:
Code:
veřejnost 0  Doba jedová 0 </p>
<p class="calibre1">dostávají velmi obtížně a pomalu
structure exhibit:
Code:
veřejnost 0  Doba jedová 0 </p>
<p>dostávají velmi
regex builder:
Code:
<hr>
<A name=8></a><IMG src="index-8_1.jpg"><br>
0  Doba jedová 0 <br>
dostávají velmi obtížně a pomalu
Does anyone know how calibre behaves to endlines? Can I use them in regexes? (I did, but is it wise?)
Longmatys is offline   Reply With Quote