dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
4253
share rss forum feed


removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40

"Going Paperless"

One of my clients wants to "go paperless" - I put quotation marks around this because this line of business (insurance/legal service) deals with quite a bit of paperwork. This client currently has a warehouse full of bankers boxes - probably 300+ of them - all of which contain closed cases/jobs/projects. Scanning all of the old documents is simply not going to happen because of the sheer volume involved, so the most sensible option would be to start scanning new files while phasing out the boxes as their "expiry" time runs out.

The only "must have" I'm aware of so far is OCR so that a file can be pulled from the archives and easily searched when needed.

My concerns as of now:

1. Scanning. This customer has two copiers that support scanning to PDF. The documents would be scanned by clerical staff at the customer's location. Would a specialized piece of equipment work better or can these be used?

2. Software. Is there any software that would be best suited for a project like this? »OCR Software that works? mentions several different OCR software - OmniPage looks like it might do exactly what this customer needs.

3. Storage. This customer deals with the occasional 4-6GB video and some 1-2GB photo collections, but these are few and far between compared to 200+ page documents. I'm going to eventually scan a banker's box of documents to get a rough idea of how much disk space the average box requires, but I'm thinking that a NAS with RAID will definitely be needed here. I've priced some Dell NAS setups with Windows Storage Server so far. What kind of hardware are you guys using for storage at the moment?

4. Backups. This isn't as big a deal since the files in question are archived/closed files and not active projects, but these will still need to be backed up on a somewhat regular basis.

I realize that this is a bit broad, but I'm quite keen to see how some of you guys have put together similar projects. I'll be sure to update this thread as I come up with the best option for my client. Thanks!
--
irc.removed.us - #dslr



Rob
In Deo speramus.
Premium
join:2001-08-25
Kendall, FL
kudos:2
Reviews:
·Comcast

My organization uses OnBase (»www.hyland.com/onbase-and-ecm/onbase-12.aspx) with a (I believe) Epson scanner. Given the sheer volume of paper we scan, there are two dedicated scanning stations with two full time employees who scan all documents.
--
CheckSite.us | YourIP.us | Reverseip.us


JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
reply to removed

The company I work for does, among other things, paper surveys. You know, those school #2 pencil type forms? In fact most of our clients are school systems or other government departments such as recreation or after school care. We also do online HTML and PDF surveys sometimes as well.

We use a piece of software called Teleform for this that handles the document creation, reading in, and any manual verification needed (as good as OCR is, people's handwriting can be atrocious, I know mine is lol). For reading in, the forms are scanned to multi-page TIFF or PDF in batches and then imported to the software. How those batches are defined varies by the project really, and you/they will likely have their own batch definition that works best. Orientation marks and a unique code in one corner helps to align it in the software and attach it to the right project.

I realize Teleform is probably not what you need but the parent company (Cardiff, Verity, Autonomy, whatever they are calling themselves now) also makes a product called LiquidOffice that from what I remember in their sales sheet is designed for "going paperless". The main reason I've discovered it is they moved the HTML/PDF eForm module from Teleform to LiquidOffice. If all you want is to make OCR'd PDFs then yeah you could use something like OmniPage or Acrobat I'm sure, but there are software packages out there specifically for "going paperless" including from Microsoft (Office InfoPath is designed for this).

For scanning, if you have that many bankers boxes you will want a REAL scanner, not a copier with an ADF. Those will work for supplementing or as a last resort but even our small scanning needs by comparison pretty much requires a real scanner. Are these forms double sided (some of ours are)? Most copiers have to do two passes of the same document to handle double sided forms, a real scanner can have two scan heads and scan both sides in one pass.

We have two copiers and one scanner. Our original scanner from before I joined was a Fujitsu, I forget the model but all of our new stuff is Lanier/Ricoh. Scanner is a Lanier IS760D and the copiers are Ricoh MP C3000 and MP 2000 (the smaller one was at a client location for remote scanning and it came back to us when the project was done). The scanner is SCSI based (it has USB but I couldn't even get the computer to acknowledge it was hooked up), but it also has an optional network module that provides ethernet or optional PCMCIA based wifi (no idea what card it uses, mine didn't come with one as I used ethernet) so you don't need a computer to control it.

Now, on to file storage. I located one of our recently scanned batches. A 130 image PDF with letter size images is 27.7 MB. The same batch scanned to TIFF is 9.74 MB. Both documents are 300 DPI. The PDFs are fairly linear in terms of size of batch. That 130 image batch was originally 13 10-image files at about 2.15 MB each and they all totaled 27.9 MB. So whether you do an entire box in one large PDF or do individual PDFs for each file, the size will be roughly the same.

Our file storage is handled by a virtualized Server 2008 machine on Hyper-V but it doesn't have to be anything special. The scanner's network module is SMB aware (it does NOT like spaces in the share name though), and IIRC also supports NFS and possibly even FTP. The VHD I gave the server for the central file share and "scan to" folders is 500 GB with 284 GB currently in use. Some completed projects have been moved to a hidden share/drive for only certain people to access. That VHD is a mere 80 GB and only 16 GB have been used. Let me put it this way, the VHDs for my file server, backup server and teleform server are all on a 2250 GB array consisting of four 750's in RAID 5 and IIRC I still have 900GB or so free on the array (only about 200 if all the VHDs fully expand).


PrntRhd
Premium
join:2004-11-03
Fairfield, CA
Reviews:
·Comcast

3 edits
reply to removed

The copiers will do scanning just fine, they are faster and more durable than small scanners.
You have to choose the type of scanning. Typically they can do scan to folder (SMB) or FTP or SMTP via email. The first option is greatly preferred over email due to email attachment size limitations and magnified issues with email server performance and backing up the email server.
You can scan to folders by month/year or by category or they can get a copier with integrated OCR scan capability (some Canon copiers have this as an option) so the files produced will be searchable by title and words in the content. Note this may be somewhat slow for perusing a large group of files. You can also get options that allow scan to "Compact PDF" which cuts the file size dramatically with a small reduction in quality.
Most if not all copiers have the scan options as a one time charge and don't charge per scan page.
The equipment must be good quality/durable to prevent double feeds of the originals when scanning.
You may want security options to securely erase HDD temp files and another to SSL the data transmissions between copier and server.
Higher end copier auto feeders do 2 sided scans in one pass.
See Canon iRADVC5051 or iRADV6055.

Like JoelC707 says, there are dedicated scanner devices as well, just have to get a good one.



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40
reply to JoelC707

said by JoelC707:

Now, on to file storage. I located one of our recently scanned batches. A 130 image PDF with letter size images is 27.7 MB.

Thanks - this is exactly what I was looking for as far as storage goes. Your example comes out to ~21KB per page, which I'll round up to 35KB just to prepare for the worst case scenario. I'm told that the biggest boxes they have contain "5 or 6" reams of paper - so we'll go with 6 reams for a total of 3000 sheets. If they were to scan 300 boxes with 3000 sheets of paper each, the total will come out to just a hair over 30GB.

Assuming that (again, worst case scenario), each box has 8GB of data on CDs/DVDs/etc. to be archived, we're looking at 2400GB of data on top of the 30GB of scanned documents. I'm beginning to see that my expectation of having to build out a 12TB+ storage system won't be happening now.

I should add that they won't be scanning in any of their existing boxes unless their plans change. The idea here is to start scanning their new archives while slowly destroying boxes that have exceeded the necessary retention period. 5-10 years from now ... no more boxes!

Storage, however, is still a concern. Their current bulk storage system - an older ReadyNAS device - has 4x 250GB drives in a RAID5 array. Speed leaves a bit to be desired as well. I could always upgrade their existing platform with some 2TB or 3TB drives, but I'm also debating just scrapping the old NAS and going with something new.

Thanks again for you guys' recommendations. I'll continue posting updates as I make progress here.
--
irc.removed.us - #dslr


Wily_One
Premium
join:2002-11-24
San Jose, CA
Reviews:
·AT&T U-Verse

If this is a law office, I hope you're charging them out the butt. ;)

If it's any help, a couple of years ago I researched scanners for somewhat similar requirements, namely:
•  TWAIN support (driver)
•  Automatic Document Feeder
•  Automatic Duplexing
•  Support legal size paper
•  Output to PDF

I did a simple low/med/high breakdown for their consideration:

Make/Model     Duplex Spd     Input Cap
Ricoh IS760D      122 ipm     200 sheets
Canon DR-4010C     84 ipm     100 sheets
HP Scanjet N6010   36 ipm      50 sheets

Those models are likely dated by now, but the current iterations of each may be comparable.

(Storage was not a concern since they were going to leverage their CMS.)


JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
reply to removed

Hey, I think you missed a digit in the per-page calculation (or possibly you divided by 1300 not 130). The 130 image PDF at 27.7MB comes out to 28364.8KB, divided back out by 130 comes to 218.19KB. The 10 image PDF at 2.15MB is 2201.6KB, and that divided by 10 is 220.16KB.

Let's round that up to 350KB (to keep with your example) and see what I come up with. Assuming 3000 sheets per box that's 1,050,000KB, 1025MB or right at 1GB per box. At 300 boxes, that's still only 300GB for all the boxes. You still won't exactly need that 12TB storage system for it but still.

Honestly this calculation is going to depend on two things: DPI of the scan and whether it is B&W or color. I've got a single page PDF that was scanned in color (it's a greyscale document so I did color so it didn't turn all the shading to pure black), not sure of the DPI (300 I think) but this single letter size PDF is 1.31MB by itself. This was scanned on my computer though, not any of the scanners at the office but I can't imagine that makes much difference.

Being a law type office I would assume most of their printed papers are text only. In that case you should have no trouble leaving it at B&W only or greyscale if your scanner supports that. Oh and I doubt you will need it but the Lanier/Ricoh IS760D (and our Ricoh copiers) support 11x17 tabloid feeding/scanning as well as some of what we do is done on that size paper instead (folded, it looks like a 4-page booklet of 8.5x11). I would suspect most high-end scanners support this size and everything in between but chances are all you need is letter/legal which nearly any scanner supports.



workablob

join:2004-06-09
Houston, TX
kudos:3
Reviews:
·Comcast
reply to removed

Give these guys a look see.

»www.westbrooktech.com/software_s···tis.html

SQL back-end.

Very robust.

Dave
--
I may have been born yesterday. But it wasn't at night.



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40
reply to JoelC707

said by JoelC707:

Hey, I think you missed a digit in the per-page calculation (or possibly you divided by 1300 not 130). The 130 image PDF at 27.7MB comes out to 28364.8KB, divided back out by 130 comes to 218.19KB. The 10 image PDF at 2.15MB is 2201.6KB, and that divided by 10 is 220.16KB.

Good catch! Doing math at 11PM (or any time of the day in my case) is never a good idea.

Met with the client today and we've agreed to reuse my customer's existing NAS system and upgrade the disks to 4x 2TB in a RAID5 array. I've also done some research on scanners and have picked out »www.newegg.com/Product/Product.a···Scanners based on the great reviews and the manufacturer's claim that a searchable PDF document can be created with just one button push.

I'm going to order the scanner this week, put a file box through it, and see if I can get something to go horribly wrong. If not, they'll be good to go as soon as the NAS partitions have been resized.

Thanks again for the help! More details to come as I start playing with the scanner...
--
irc.removed.us - #dslr

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5

That's a good looking scanner. Chances are they don't need something REALLY big anyway and the speed should be good on it. It's a stated 20 ppm on color, not sure on B&W. The IS760D I have is 122 as shown in a post above. I can confirm it can do that fast of scanning (speed will depend on DPI as well). The rubberish foam bump stop on the output of the ADF feed path has a bunch of gouges in it from paper edges smacking it at high speed lol.



EUS
Kill cancer
Premium
join:2002-09-10
canada
Reviews:
·voip.ms
reply to removed

We as well have been in the process of "going paperless" for the last two years running.
OCR has stymied us to the point of giving up on it, as our docs have a mixture of handwriting, regular, and irregular text.
I look forward to reading up on some more OCR solutions.
--
~ Project Hope ~



JRW2
R.I.P. Mom, Brian, Ziggy, Max and Zen.
Premium
join:2004-12-20
La La Land
kudos:5
Reviews:
·Optimum Online
reply to removed

I don't do this type of work, so I am only playing devils advocate here.

I saw a program several years ago when they first started doing paperless offices, they went over several pros and cons to it...
One thing you have to consider is that if you do OCR of the documents, you should still retain an image of the original document, to prevent OCR errors from causing issues.
So you will need an image of each page, preferably in a high quality, and the OCR'd output of that document.

Storage and backups are a BIG consideration if they are going to be destroying the originals after they convert them...

My $0.02....
--
Politics is a disease, we need a cure!
In constant search for intelligent life on Earth!



Serbtastic
You Know How Many People I Have Buried?
Premium
join:2002-02-24
Stoney Creek, ON
reply to removed

Look into EMC ApplicationXtender for document management (»www.emc.com/enterprise-content-m···nder.htm). I supported it for years and it is very good. It also has add ons to aid in automatic indexing or documents. It will also OCR for you, keeping the image as well.



Rob
In Deo speramus.
Premium
join:2001-08-25
Kendall, FL
kudos:2
Reviews:
·Comcast
reply to removed

Also, you mentioned insurance/legal service. I know that some law firms are required to backup all their data to a WORM device (Write Once, Read Many) for compliance purposes.
--
CheckSite.us | YourIP.us | Reverseip.us



jester121
Premium
join:2003-08-09
Lake Zurich, IL
reply to removed

Our Konica Minolta guy demo'ed a commercial grade document management/OCR system when we got our new Bizhub MFP copiers in. We use PDF scan to SMB shares a fair amount, but no OCR. My point is, if you already have hardware in place, check with that vendor to see who they've already interfaced with.



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40
reply to removed

Roadblock #1: I want to generate a backup of their data before proceeding with the NAS disk upgrade. I have a Genie Backup Manager Pro 8.0 license on one of the workstations and have tried to generate a backup, but the application crashes about 60% into the process. GBM's error log doesn't show any useful information as to what happened. Do you guys know of any free/reasonably priced (preferably free) tools that can be used to copy all of the data from the network share to my external USB disk and (this is important) verify that it's all complete?
--
irc.removed.us - #dslr


tomdlgns
Premium
join:2003-03-21
Chicago, IL
kudos:1

said by removed:

Roadblock #1: I want to generate a backup of their data before proceeding with the NAS disk upgrade. I have a Genie Backup Manager Pro 8.0 license on one of the workstations and have tried to generate a backup, but the application crashes about 60% into the process. GBM's error log doesn't show any useful information as to what happened. Do you guys know of any free/reasonably priced (preferably free) tools that can be used to copy all of the data from the network share to my external USB disk and (this is important) verify that it's all complete?

ms sync toy

»www.microsoft.com/en-us/download···id=15155

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5

+1

Once you use SyncToy, you'll never go back to anything else for massive file copies like that. It's great for updates too (manual backups for example) as it can be told to only copy new/changed items instead of everything (or having to sit though a bunch of "file exists" dialog boxes in explorer).



AVD
Respice, Adspice, Prospice
Premium
join:2003-02-06
Onion, NJ
kudos:1
reply to removed

look at alfresco, open source document management.

www.alfresco.com



JB
Stay Gold
Premium
join:2009-05-14
kudos:1
Reviews:
·Cogeco Cable
reply to removed

Use Kofax/Xerox to scan em in.

EMC Documentum for a solid document management system at a good price.

Can't believe no one has mentioned Sharepoint
--
I know you are the only one
A little taste of heaven
You know I am The only one
Your bitter taste of hell



drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

1 recommendation

said by JB:

Can't believe no one has mentioned Sharepoint

Please never open your mouth again.


Nightfall
My Goal Is To Deny Yours
Premium,MVM
join:2001-08-03
Grand Rapids, MI
Reviews:
·ooma
·Comcast
·Callcentric
·Site5.com
reply to removed

Something else I will mention.

We are currently using a product called Docushare for our document management and it is going well. The biggest delay is making the process workflows and getting everyone on the same page. The workflows are used when we look at a process that people are doing currently, changing it to a digital process, and then documenting and implementing it.

In order to implement any paperless or document management solution, you are going to need to set time aside to get everyone on the same page when it comes to going paperless. Look at every process that uses paper and how you can move it digitally.

These processes really are half of the battle after you implement document management.
--
My domain - Nightfall.net



drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

I would also note that digital signatures are a must...



JB
Stay Gold
Premium
join:2009-05-14
kudos:1
reply to drew

WTFO



drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

SharePoint is never a solution to any need.



nokken

join:2001-02-07
Memphis, TN

Regarding SharePoint: TFS Team Rooms



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40
reply to removed

Any recommendations on solid NAS devices? I'm looking for something with 4 SATA bays, RAID5 support, ability to join a domain - ideally in a 1U enclosure. The NAS I'm working with is not cooperating. I'm working with their support staff but need to have a "Plan B" in place.

Thanks!
--
irc.removed.us - #dslr