The company I work for does, among other things, paper surveys. You know, those school #2 pencil type forms? In fact most of our clients are school systems or other government departments such as recreation or after school care. We also do online HTML and PDF surveys sometimes as well.
We use a piece of software called Teleform for this that handles the document creation, reading in, and any manual verification needed (as good as OCR is, people's handwriting can be atrocious, I know mine is lol). For reading in, the forms are scanned to multi-page TIFF or PDF in batches and then imported to the software. How those batches are defined varies by the project really, and you/they will likely have their own batch definition that works best. Orientation marks and a unique code in one corner helps to align it in the software and attach it to the right project.
I realize Teleform is probably not what you need but the parent company (Cardiff, Verity, Autonomy, whatever they are calling themselves now) also makes a product called LiquidOffice
that from what I remember in their sales sheet is designed for "going paperless". The main reason I've discovered it is they moved the HTML/PDF eForm module from Teleform to LiquidOffice. If all you want is to make OCR'd PDFs then yeah you could use something like OmniPage or Acrobat I'm sure, but there are software packages out there specifically for "going paperless" including from Microsoft (Office InfoPath is designed for this).
For scanning, if you have that many bankers boxes you will want a REAL scanner, not a copier with an ADF. Those will work for supplementing or as a last resort but even our small scanning needs by comparison pretty much requires a real scanner. Are these forms double sided (some of ours are)? Most copiers have to do two passes of the same document to handle double sided forms, a real scanner can have two scan heads and scan both sides in one pass.
We have two copiers and one scanner. Our original scanner from before I joined was a Fujitsu, I forget the model but all of our new stuff is Lanier/Ricoh. Scanner is a Lanier IS760D and the copiers are Ricoh MP C3000 and MP 2000 (the smaller one was at a client location for remote scanning and it came back to us when the project was done). The scanner is SCSI based (it has USB but I couldn't even get the computer to acknowledge it was hooked up), but it also has an optional network module that provides ethernet or optional PCMCIA based wifi (no idea what card it uses, mine didn't come with one as I used ethernet) so you don't need a computer to control it.
Now, on to file storage. I located one of our recently scanned batches. A 130 image PDF with letter size images is 27.7 MB. The same batch scanned to TIFF is 9.74 MB. Both documents are 300 DPI. The PDFs are fairly linear in terms of size of batch. That 130 image batch was originally 13 10-image files at about 2.15 MB each and they all totaled 27.9 MB. So whether you do an entire box in one large PDF or do individual PDFs for each file, the size will be roughly the same.
Our file storage is handled by a virtualized Server 2008 machine on Hyper-V but it doesn't have to be anything special. The scanner's network module is SMB aware (it does NOT like spaces in the share name though), and IIRC also supports NFS and possibly even FTP. The VHD I gave the server for the central file share and "scan to" folders is 500 GB with 284 GB currently in use. Some completed projects have been moved to a hidden share/drive for only certain people to access. That VHD is a mere 80 GB and only 16 GB have been used. Let me put it this way, the VHDs for my file server, backup server and teleform server are all on a 2250 GB array consisting of four 750's in RAID 5 and IIRC I still have 900GB or so free on the array (only about 200 if all the VHDs fully expand).