Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-25-2012, 04:52 AM   #1
Longmatys
Junior Member
Longmatys began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2012
Device: Kindle 3
Question Search & Replace problems with pdf conversion

Hi All,

I have some difficulty with search & replace. I'm quite familiar with regex, but I'm not sure if I understand correctly, how does it work in calibre.

My problem is, that I'm trying to convert pdf to mobi. Everything is quite ok, and the last thing is, that there are headings and page numbers to be removed. Page numbers were quite easy, but I have problems with page headings. I tried to use the debug mode, but it does not help me either.

My question is, from debugging process I got three directories:
  • input
  • parsed
  • processed
  • structure

From what I read, the regex processing should be run against parsed files. But I really doubt it.

I use two regex patterns:
  1. \n[0-9]+ <br>
  2. 0[^\n]*?0\s*<br>

The first one removes page numbers and works fine. The second one is meant to remove lines, which begins and ends with 0 with any text between them (it contains the name of chapters - so it keeps changing)

I'm confused, that the first one works - in xhtml files in parsed directory, there are no
Code:
<br>
tags??

Is there a way, where I can step in the process - do some manual search/replace and then continue? Maybe via html export/import?


Input exhibit:
Code:
<hr>
<A name=8></a>0&nbsp;&nbsp;Doba&nbsp;jedová&nbsp;0&nbsp;<br>
dostávají&nbsp;velmi&nbsp;obtížně&nbsp;a&nbsp;pomalu,&nbsp;protože&nbsp;znamenají&nbsp;velký&nbsp;zásah&nbsp;<br>
parsed exhibit:
Code:
na  veřejnost 0  Doba jedová 0 </p>
<p>dostávají velmi obtížně a pomal
processed exhibit:
Code:
veřejnost 0  Doba jedová 0 </p>
<p class="calibre1">dostávají velmi obtížně a pomalu
structure exhibit:
Code:
veřejnost 0  Doba jedová 0 </p>
<p>dostávají velmi
regex builder:
Code:
<hr>
<A name=8></a><IMG src="index-8_1.jpg"><br>
0  Doba jedová 0 <br>
dostávají velmi obtížně a pomalu
Does anyone know how calibre behaves to endlines? Can I use them in regexes? (I did, but is it wise?)
Longmatys is offline   Reply With Quote
Old 01-25-2012, 05:59 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,110
Karma: 780247
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
If you want to see what the regex will run against you should use the Wizard that is invoked by the button next to the regex boxes.
itimpi is offline   Reply With Quote
 
Advertisement
Reply

Tags
calibre, conversion, pdf, replace, search

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Automatic entity conversion screwing up search and replacerch and replace ldolse Sigil 10 01-05-2012 02:46 AM
Help with a search & replace mmholt Library Management 11 10-21-2011 07:49 PM
Problems with 'search-and-replace' conversion in versions 0.8.21 and/or 0.8.22? GMRabelink Calibre 0 10-14-2011 04:38 PM
Search & Replace :help: krussell Calibre 3 08-02-2011 05:45 PM
Search & Replace Pat Nickholds Sigil 2 10-22-2010 12:18 AM


All times are GMT -4. The time now is 07:36 PM.


MobileRead.com is a privately owned, operated and funded community.