10-31-2011, 07:17 AM | #1 |
on twitter: @pureambient
Posts: 13
Karma: 10
Join Date: May 2011
Location: central scotland
Device: Kindle
|
PDF to MOBI Conversion Questions
Hello Forum
I've spent many many hours so far trying to learn how to effectively convert PDFs to MOBI. I've been using Microsoft Access since 1997, so I understand in principle what a regular expression is, and how it works within the FIND & REPLACE window of calibre. I have learned some of the shortcuts and am learning some of the syntax needed to write effective expressions. But - I am struggling. I sat down to write a comprehensive process for myself, which starts with Metadata clean up and then moves onto Conversion. I took a PDF with only a minor number of irritants, and applying all the knowledge I've gained over the past weeks, I wrote three effective expressions that actually produced a PERFECT book. Completely clean, all rubbish removed. That was an EASY book, however. Then I got a difficult book, one with horrific advertising and "click here to buy" and logos and tons and tons of absolute rubbish STREWN through the PDF file. I began the same process, trying to identify target strings, and run conversions. And this is where the trouble begins. I have yet to succeed with this book, for a number of reasons. 1) What are you supposed to do if you cannot "fix" all of a book's problems with JUST THREE expressions? I ASSUMED that what you would do would be, load the first 3 expressions, and convert to MOBI. Then, for the NEXT 3 conversions, you would select the converted MOBI (so you are STARTING with the book that you have PARTIALLY fixed - NOT the original PDF now) and you would run the NEXT 3 conversions against that. But I ran into problems, after set three (conversions 7, 8, 9) I noticed that conversions 1, 2 3 were BACK, so somehow I was NOT converting the converted MOBI, but maybe the PDF ???????? So the question is: What do I do, what is the EXACT PROCESS, when I want to run more than 3 expressions against a PDF? The other question is: 2) Can you "save" a set of expressions to run against other books? The reason I want to do that is I want to work out what my 12 conversions are against one book, then find all books with similar problems, then BULK convert ALL of them using this "one set" of 12 master expressions - if you see what I mean. Once developed (and so far, I have failed, but I will get there) I wish I could store that set forever, because I might run across the same advertisements or whatever in future. My workaround so far is to store all expressions in a text document of known good expressions (and known bad ones, too, to learn from - what NOT to do). Please let me know what to do when you need MORE than the 3 expressions. Thanks! dave |
10-31-2011, 07:45 AM | #2 | |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Does the little around next to the entry not give you a list of previous expressions you've used to choose from? |
|
Advert | |
|
10-31-2011, 09:25 AM | #3 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
I don't mean this to sound smug, but is "the little around" the same as the dropdown arrow? We have so many folks from all over the world it is not out of the realm of possibilities that the statement is a use of words I'm not familiar with.
|
10-31-2011, 10:39 AM | #4 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Auto correct strikes again. Around was supposed to be arrow.
|
10-31-2011, 12:09 PM | #5 |
Member
Posts: 20
Karma: 2139376
Join Date: Aug 2011
Device: Kindle 3
|
Usually the little drop down box only shows the expressions you wrote there last - if you write the expression checking things out in the regex builder, the expression you type on regex builder will not show up on the drop down menu. I work around it doing copy and paste.
About 3 expressions not being enough to clean everything out, if you are cleaning leaving the replace field empty (just deleting), you can on the same regex field use OR to use several different regex expressions. For example (example1|example2|example3) - it can get pretty long if it´s around a page break < hr > tag use OR at will before or after, or try to make the regex more general. But you can aggroup different expressions with OR and it will match any of them - though keep in mind it will try to match each one in order, left to right, it can be useful to put the narrowest first, and the more general last (sorry if this makes no sense, I am no good at explaining this in english). About converting a converted mobi, well when you click convert again, it will give you a choice between if you want to convert from pdf or mobi. If pdf then of course the first changes are discarted, if mobi, well, it´s already different code, the logical expression which would match the pdf-to-html might not match the mobi. |
Advert | |
|
Tags |
conversion, expressions, mobi, pdf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to MOBI conversion? | curstpriest | Amazon Kindle | 12 | 10-18-2010 01:56 AM |
pdf to mobi conversion issue | dkritso109 | Calibre | 16 | 10-08-2010 06:10 AM |
PDF to Mobi Conversion | rayh | Calibre | 2 | 09-24-2010 02:33 AM |
New conversion questions: Getting rid of huge left margin Epub to Mobi | geekgeek | Calibre | 2 | 08-31-2010 11:00 PM |
PDF->MOBI questions | tlc | Calibre | 1 | 04-07-2009 05:21 AM |