dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
35
share rss forum feed

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
reply to removed

Re: "Going Paperless"

The company I work for does, among other things, paper surveys. You know, those school #2 pencil type forms? In fact most of our clients are school systems or other government departments such as recreation or after school care. We also do online HTML and PDF surveys sometimes as well.

We use a piece of software called Teleform for this that handles the document creation, reading in, and any manual verification needed (as good as OCR is, people's handwriting can be atrocious, I know mine is lol). For reading in, the forms are scanned to multi-page TIFF or PDF in batches and then imported to the software. How those batches are defined varies by the project really, and you/they will likely have their own batch definition that works best. Orientation marks and a unique code in one corner helps to align it in the software and attach it to the right project.

I realize Teleform is probably not what you need but the parent company (Cardiff, Verity, Autonomy, whatever they are calling themselves now) also makes a product called LiquidOffice that from what I remember in their sales sheet is designed for "going paperless". The main reason I've discovered it is they moved the HTML/PDF eForm module from Teleform to LiquidOffice. If all you want is to make OCR'd PDFs then yeah you could use something like OmniPage or Acrobat I'm sure, but there are software packages out there specifically for "going paperless" including from Microsoft (Office InfoPath is designed for this).

For scanning, if you have that many bankers boxes you will want a REAL scanner, not a copier with an ADF. Those will work for supplementing or as a last resort but even our small scanning needs by comparison pretty much requires a real scanner. Are these forms double sided (some of ours are)? Most copiers have to do two passes of the same document to handle double sided forms, a real scanner can have two scan heads and scan both sides in one pass.

We have two copiers and one scanner. Our original scanner from before I joined was a Fujitsu, I forget the model but all of our new stuff is Lanier/Ricoh. Scanner is a Lanier IS760D and the copiers are Ricoh MP C3000 and MP 2000 (the smaller one was at a client location for remote scanning and it came back to us when the project was done). The scanner is SCSI based (it has USB but I couldn't even get the computer to acknowledge it was hooked up), but it also has an optional network module that provides ethernet or optional PCMCIA based wifi (no idea what card it uses, mine didn't come with one as I used ethernet) so you don't need a computer to control it.

Now, on to file storage. I located one of our recently scanned batches. A 130 image PDF with letter size images is 27.7 MB. The same batch scanned to TIFF is 9.74 MB. Both documents are 300 DPI. The PDFs are fairly linear in terms of size of batch. That 130 image batch was originally 13 10-image files at about 2.15 MB each and they all totaled 27.9 MB. So whether you do an entire box in one large PDF or do individual PDFs for each file, the size will be roughly the same.

Our file storage is handled by a virtualized Server 2008 machine on Hyper-V but it doesn't have to be anything special. The scanner's network module is SMB aware (it does NOT like spaces in the share name though), and IIRC also supports NFS and possibly even FTP. The VHD I gave the server for the central file share and "scan to" folders is 500 GB with 284 GB currently in use. Some completed projects have been moved to a hidden share/drive for only certain people to access. That VHD is a mere 80 GB and only 16 GB have been used. Let me put it this way, the VHDs for my file server, backup server and teleform server are all on a 2250 GB array consisting of four 750's in RAID 5 and IIRC I still have 900GB or so free on the array (only about 200 if all the VHDs fully expand).



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40

said by JoelC707:

Now, on to file storage. I located one of our recently scanned batches. A 130 image PDF with letter size images is 27.7 MB.

Thanks - this is exactly what I was looking for as far as storage goes. Your example comes out to ~21KB per page, which I'll round up to 35KB just to prepare for the worst case scenario. I'm told that the biggest boxes they have contain "5 or 6" reams of paper - so we'll go with 6 reams for a total of 3000 sheets. If they were to scan 300 boxes with 3000 sheets of paper each, the total will come out to just a hair over 30GB.

Assuming that (again, worst case scenario), each box has 8GB of data on CDs/DVDs/etc. to be archived, we're looking at 2400GB of data on top of the 30GB of scanned documents. I'm beginning to see that my expectation of having to build out a 12TB+ storage system won't be happening now.

I should add that they won't be scanning in any of their existing boxes unless their plans change. The idea here is to start scanning their new archives while slowly destroying boxes that have exceeded the necessary retention period. 5-10 years from now ... no more boxes!

Storage, however, is still a concern. Their current bulk storage system - an older ReadyNAS device - has 4x 250GB drives in a RAID5 array. Speed leaves a bit to be desired as well. I could always upgrade their existing platform with some 2TB or 3TB drives, but I'm also debating just scrapping the old NAS and going with something new.

Thanks again for you guys' recommendations. I'll continue posting updates as I make progress here.
--
irc.removed.us - #dslr


Wily_One
Premium
join:2002-11-24
San Jose, CA
Reviews:
·AT&T U-Verse

If this is a law office, I hope you're charging them out the butt. ;)

If it's any help, a couple of years ago I researched scanners for somewhat similar requirements, namely:
•  TWAIN support (driver)
•  Automatic Document Feeder
•  Automatic Duplexing
•  Support legal size paper
•  Output to PDF

I did a simple low/med/high breakdown for their consideration:

Make/Model     Duplex Spd     Input Cap
Ricoh IS760D      122 ipm     200 sheets
Canon DR-4010C     84 ipm     100 sheets
HP Scanjet N6010   36 ipm      50 sheets

Those models are likely dated by now, but the current iterations of each may be comparable.

(Storage was not a concern since they were going to leverage their CMS.)


JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5
reply to removed

Hey, I think you missed a digit in the per-page calculation (or possibly you divided by 1300 not 130). The 130 image PDF at 27.7MB comes out to 28364.8KB, divided back out by 130 comes to 218.19KB. The 10 image PDF at 2.15MB is 2201.6KB, and that divided by 10 is 220.16KB.

Let's round that up to 350KB (to keep with your example) and see what I come up with. Assuming 3000 sheets per box that's 1,050,000KB, 1025MB or right at 1GB per box. At 300 boxes, that's still only 300GB for all the boxes. You still won't exactly need that 12TB storage system for it but still.

Honestly this calculation is going to depend on two things: DPI of the scan and whether it is B&W or color. I've got a single page PDF that was scanned in color (it's a greyscale document so I did color so it didn't turn all the shading to pure black), not sure of the DPI (300 I think) but this single letter size PDF is 1.31MB by itself. This was scanned on my computer though, not any of the scanners at the office but I can't imagine that makes much difference.

Being a law type office I would assume most of their printed papers are text only. In that case you should have no trouble leaving it at B&W only or greyscale if your scanner supports that. Oh and I doubt you will need it but the Lanier/Ricoh IS760D (and our Ricoh copiers) support 11x17 tabloid feeding/scanning as well as some of what we do is done on that size paper instead (folded, it looks like a 4-page booklet of 8.5x11). I would suspect most high-end scanners support this size and everything in between but chances are all you need is letter/legal which nearly any scanner supports.



removed
Premium,VIP
join:2002-02-08
Houston, TX
kudos:40

said by JoelC707:

Hey, I think you missed a digit in the per-page calculation (or possibly you divided by 1300 not 130). The 130 image PDF at 27.7MB comes out to 28364.8KB, divided back out by 130 comes to 218.19KB. The 10 image PDF at 2.15MB is 2201.6KB, and that divided by 10 is 220.16KB.

Good catch! Doing math at 11PM (or any time of the day in my case) is never a good idea.

Met with the client today and we've agreed to reuse my customer's existing NAS system and upgrade the disks to 4x 2TB in a RAID5 array. I've also done some research on scanners and have picked out »www.newegg.com/Product/Product.a···Scanners based on the great reviews and the manufacturer's claim that a searchable PDF document can be created with just one button push.

I'm going to order the scanner this week, put a file box through it, and see if I can get something to go horribly wrong. If not, they'll be good to go as soon as the NAS partitions have been resized.

Thanks again for the help! More details to come as I start playing with the scanner...
--
irc.removed.us - #dslr

JoelC707
Premium
join:2002-07-09
Lanett, AL
kudos:5

That's a good looking scanner. Chances are they don't need something REALLY big anyway and the speed should be good on it. It's a stated 20 ppm on color, not sure on B&W. The IS760D I have is 122 as shown in a post above. I can confirm it can do that fast of scanning (speed will depend on DPI as well). The rubberish foam bump stop on the output of the ADF feed path has a bunch of gouges in it from paper edges smacking it at high speed lol.



Rob
In Deo speramus.
Premium
join:2001-08-25
Kendall, FL
kudos:3
reply to removed

Also, you mentioned insurance/legal service. I know that some law firms are required to backup all their data to a WORM device (Write Once, Read Many) for compliance purposes.
--
CheckSite.us | YourIP.us | Reverseip.us