Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 10-31-2011, 07:17 AM   #1
pureambient
on twitter: @pureambient
pureambient began at the beginning.
 
Posts: 13
Karma: 10
Join Date: May 2011
Location: central scotland
Device: Kindle
PDF to MOBI Conversion Questions

Hello Forum

I've spent many many hours so far trying to learn how to effectively convert PDFs to MOBI.

I've been using Microsoft Access since 1997, so I understand in principle what a regular expression is, and how it works within the FIND & REPLACE window of calibre.

I have learned some of the shortcuts and am learning some of the syntax needed to write effective expressions.


But - I am struggling.

I sat down to write a comprehensive process for myself, which starts with Metadata clean up and then moves onto Conversion.


I took a PDF with only a minor number of irritants, and applying all the knowledge I've gained over the past weeks, I wrote three effective expressions that actually produced a PERFECT book. Completely clean, all rubbish removed. That was an EASY book, however.


Then I got a difficult book, one with horrific advertising and "click here to buy" and logos and tons and tons of absolute rubbish STREWN through the PDF file.

I began the same process, trying to identify target strings, and run conversions. And this is where the trouble begins. I have yet to succeed with this book, for a number of reasons.

1) What are you supposed to do if you cannot "fix" all of a book's problems with JUST THREE expressions?

I ASSUMED that what you would do would be, load the first 3 expressions, and convert to MOBI.


Then, for the NEXT 3 conversions, you would select the converted MOBI (so you are STARTING with the book that you have PARTIALLY fixed - NOT the original PDF now) and you would run the NEXT 3 conversions against that.


But I ran into problems, after set three (conversions 7, 8, 9) I noticed that conversions 1, 2 3 were BACK, so somehow I was NOT converting the converted MOBI, but maybe the PDF ????????


So the question is: What do I do, what is the EXACT PROCESS, when I want to run more than 3 expressions against a PDF?




The other question is:

2) Can you "save" a set of expressions to run against other books?

The reason I want to do that is I want to work out what my 12 conversions are against one book, then find all books with similar problems, then BULK convert ALL of them using this "one set" of 12 master expressions - if you see what I mean.

Once developed (and so far, I have failed, but I will get there) I wish I could store that set forever, because I might run across the same advertisements or whatever in future.

My workaround so far is to store all expressions in a text document of known good expressions (and known bad ones, too, to learn from - what NOT to do).


Please let me know what to do when you need MORE than the 3 expressions.


Thanks!

dave
pureambient is offline   Reply With Quote
Old 10-31-2011, 07:45 AM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by pureambient View Post
1) What are you supposed to do if you cannot "fix" all of a book's problems with JUST THREE expressions?
You use an actual editor designed for such intensive changes. I recommend convert to EPUB, edit with Sigil (obviously), then convert to your preferred format.

Quote:
Originally Posted by pureambient View Post
2) Can you "save" a set of expressions to run against other books?
Does the little around next to the entry not give you a list of previous expressions you've used to choose from?
user_none is offline   Reply With Quote
Advert
Old 10-31-2011, 09:25 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by user_none View Post
Does the little around next to the entry not give you a list of previous expressions you've used to choose from?
I don't mean this to sound smug, but is "the little around" the same as the dropdown arrow? We have so many folks from all over the world it is not out of the realm of possibilities that the statement is a use of words I'm not familiar with.
DoctorOhh is offline   Reply With Quote
Old 10-31-2011, 10:39 AM   #4
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by dwanthny View Post
I don't mean this to sound smug, but is "the little around" the same as the dropdown arrow? We have so many folks from all over the world it is not out of the realm of possibilities that the statement is a use of words I'm not familiar with.
Auto correct strikes again. Around was supposed to be arrow.
user_none is offline   Reply With Quote
Old 10-31-2011, 12:09 PM   #5
han_32
Member
han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.han_32 ought to be getting tired of karma fortunes by now.
 
Posts: 20
Karma: 2139376
Join Date: Aug 2011
Device: Kindle 3
Usually the little drop down box only shows the expressions you wrote there last - if you write the expression checking things out in the regex builder, the expression you type on regex builder will not show up on the drop down menu. I work around it doing copy and paste.

About 3 expressions not being enough to clean everything out, if you are cleaning leaving the replace field empty (just deleting), you can on the same regex field use OR to use several different regex expressions. For example

(example1|example2|example3) - it can get pretty long

if it´s around a page break < hr > tag use OR at will before or after, or try to make the regex more general. But you can aggroup different expressions with OR and it will match any of them - though keep in mind it will try to match each one in order, left to right, it can be useful to put the narrowest first, and the more general last (sorry if this makes no sense, I am no good at explaining this in english).


About converting a converted mobi, well when you click convert again, it will give you a choice between if you want to convert from pdf or mobi. If pdf then of course the first changes are discarted, if mobi, well, it´s already different code, the logical expression which would match the pdf-to-html might not match the mobi.
han_32 is offline   Reply With Quote
Advert
Reply

Tags
conversion, expressions, mobi, pdf

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF to MOBI conversion? curstpriest Amazon Kindle 12 10-18-2010 01:56 AM
pdf to mobi conversion issue dkritso109 Calibre 16 10-08-2010 06:10 AM
PDF to Mobi Conversion rayh Calibre 2 09-24-2010 02:33 AM
New conversion questions: Getting rid of huge left margin Epub to Mobi geekgeek Calibre 2 08-31-2010 11:00 PM
PDF->MOBI questions tlc Calibre 1 04-07-2009 05:21 AM


All times are GMT -4. The time now is 04:45 AM.


MobileRead.com is a privately owned, operated and funded community.