dslreports logo
 
    All Forums Hot Topics Gallery
spc
Search similar:


uniqs
746

2kmaro
Think

join:2000-07-11
Oklahoma City, OK

2 recommendations

2kmaro

OmniPage 15 - .pdf to source converter: a Review

Well, more like a first impressions than a full review.

For a long time I've been looking for a good way to take Acrobat .pdf files and convert them back to their source type file such as Word or Excel. Most attempts to do this result in an awful mess that doesn't even come close to the layout or retain formatting of the .pdf document. Especially if a document had many graphics or tables in it.

Recently I was offered the opportunity to get a product called OmniPage 15 at roughly half-price, and after some soul searching decided to risk the $80 (yes, full retail is $150) and give it a try. You may recognize some of the company's other products: PaperPort, Dragon Naturally Speaking, and RealSpeak among others.

OmniPort comes in a standard and a Professional edition. I got the standard version. The Pro version also contains a couple of other of their products and exports to a few additional file types, but costs $499.99 instead of 'only' $149.99

So how goes it with this thing?
First it's a large download and install file: 404MB
Second, it requires a key to activate and does require activation like Microsoft products - and it can be done via internet connection just like an MS product.
Third, Registration: They ask you to fill out a LOT of information (looks like marketing demographic info to me) if you elect to Register (separate from activating) it. I chose to leave most of the optional info (age, reading habits, hobbies etc) blank and made sure to click the "don't spam me" checkbox in the process.

How does it work?
Pretty well! I gave it a try on two .pdf documents that I had available and ended up with good results after stumbling around a little.

The first document was an 8-page .pdf file that had been created from an Excel workbook. I'd call it 90-95% successful. There were a few cells in some columns that ended up as merged cells on a couple of output pages while ending up as separate rows of info on others (they were multi-line word-wrapped entries originally, I think). And in a few places/columns it looked like the positioning of some merged cells had thrown them a small curve. But all in all it was a usable result that I could easily extract the real information from without further change, and could have turned into a very presentable Excel workbook with not too much editing.

The next test was a kind of stress test. I have a 760 page .pdf book file (electronic copy of a real book I purchased that came with the book). This one took me a couple of tries. During the first attempt the final output seems to have gotten corrupted. That may have been my fault - while OmniPage was trying to build the .doc file I was also trying to use Word to open up another huge .xml file and dealing with some emails. In any event, the end product wasn't re-loadable in Word.

I decided to give OmniPage the benefit of a doubt and deleted the offensive .doc file and started over. The second time things seem to have been VERY successful. I now have a 760 page .doc file that, with minor exceptions, is laid out and looks very much like the source .pdf file!

I think the minor exceptions are my fault - operator unfamiliarity with all the OmniPage settings. For example, headers and footers weren't dealt with properly - they ended up inside of the actual pages of the .doc file, and I remember seeing an option somewhere for "look for headers and footers" which I probably didn't use.

User Interface There's more here than I could possibly describe in a short review. You can really get down inside of a source document and tweak things. But for those, like me, that just want to get a job done quick and dirty they have a "1..2..3" step fast track function:
1 - Load or Scan your source file/document (yes it is also an OCR processor)
2 - Process for output (tell it what type of document you want to spit out)
3 - Save the output to file (or send in email or save to clipboard)
During the save you're given a whole host of possible file formats, including every graphic type you've probably ever seen, along with .xml, .doc, .xls, .pdf and more.

Drawbacks - I installed this on my laptop computer which happens to have 1GB RAM and is powered with an AMD 64 3700+ - a quick little system. I noticed that sometimes the "feedback" that an operation had started seemed slow to be given, causing me to try to do something twice when I'd already started it once. So a little patience after clicking a button is advised.
The processing can take some time - Not unreasonable, but noticeable and at least they do give you process progress bar indicator so you'll know something is going on.
The output files were HUGE!!
That source 760 page .pdf document was 14,978 KB in size.
The .xlm file turned out to be 169,199 KB! and the Word .doc file weighed in at a very hefty 241,268 KB.

I used a little macro magic in Word to create my very own 760 page document with a large graphic on each page along with text and it only came out to 39,217 KB, so OmniPage is having to add a lot of filler in there somewhere.

I opened up the 241MB file and selected everything and pasted to a new document and it dropped down to 121MB - still large, but still saved some disk space. I'm thinking that the graphics in the created file are not optimized. Remember that this product pretty much uses OCR type processing to take the source document and get it ready for output.

But in the final Word document, tables are tables and graphics are graphics that you can work with (edit, copy, etc) just like normal. You can cut and paste and edit the text. Same for the Excel file - it works just like an Excel file should.

There are two big uses I could see for most people with this software.
The first one being using it as I have done: to take an available .pdf file and turn it into another document type so that data can be extracted from it and so it can be searched easily in a familiar tool.
The second one being to take your paper documents and scan them very accurately into electronic editions of those documents. OmniPage even includes a text editor so that you can spell check and correct the 'scanned' document before saving it out to disk.

I think the price tag on this probably puts it out of the realm of a tool you'd get if you only had to use it a few times over a long period of time. But if you do these kinds of things routinely or if the ability to get data out of .pdf files easily and work with it is important, then it's probably well worth the money.

To give you an idea of the work that can be saved, consider the Excel data I was working with. The .pdf file had been created with Excel, I know that for a fact. But when I tried copying the data using Acrobat and pasting it into a new Excel document, the data did not go back into cells properly: all information on one row was simply placed into column A of a row in the Excel copy. With OmniPage, the data separation into cells was retained. That alone will save me literally hours of work rebuilding the data into usable format.

And now that I've got it - you folks know where you can come to (very infrequently) and request conversion from some .pdf file to something you can use easier. And no, I won't do much with obviously copyrighted material like that electronic copy of my purchased book.

lawguru
join:2001-12-12
Atlanta, GA

lawguru

Member

I use this program at work quite frequently, and it does take some time to read a large PDF file. However, if you scan to a TIFF file instead, the conversion is much quicker.