dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
492
share rss forum feed


inGearX
3.1415 9265

join:2000-06-11
New York

Wikipedia and Wikitravel offline...

Why can't there just be a simple HTML dump of wikipedia or wikitravel to the filesystem and then the folder with all the HTML files/folders could be placed on a wide range of devices

much like the way CIA - The World Factbook does it - »www.cia.gov/library/publications/download/

instead for Wikitravel you have:

»wikitravel.org/en/Wikitravel:Off···pedition

and for WikiPedia it is even more complex:

»en.wikipedia.org/wiki/Wikipedia:···download

please advise...
thank you...


cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7
Because Wikipedia entry isn't just a flat html file that can be zipped up. It's a dynamically generated CMS essentially with years of revisions that it also maintains. It's like asking why Amazon or EBay can't just have a html dump.


darcilicious
Cyber Librarian
Premium
join:2001-01-02
Forest Grove, OR
kudos:4
Actually the CIA site uses Plone (a CMS) but the external/public side is basically a static HTML copy of the site that is managed in Plone internally.

Just because a site is dynamically generated does not preclude a static version of the same site from existing / being accessible.


usa2k
Blessed
Premium,MVM
join:2003-01-26
Redford, MI
kudos:3
Thanks for that comparison - never heard of Plone
said by »plone.org/ :

Plus, all of this Python & NoSQL goodness is wrapped in a sparkling new theme that is beautiful, accessible and easy to customize.

So Plone is *probably* a flat file system though.
Plone uses ZODB - an Object-Orientated Database

I bet wikipedia would be a huge file! And many things would become quickly out of date.

In an ever connected world, why need an offline version?
--

Jim, VoIP 12/2002, VOIPo 2/2007
FAH-Tool~Pets~Join Artist-247


darcilicious
Cyber Librarian
Premium
join:2001-01-02
Forest Grove, OR
kudos:4
said by usa2k:

In an ever connected world, why need an offline version?

I travel in many places in my state alone that don't have cell service or public Internet service. Where is this ever-connected world you live in precisely?

Hard to believe but there is not 100% Internet coverage in the world, or even in the US.


cdru
Go Colts
Premium,MVM
join:2003-05-14
Fort Wayne, IN
kudos:7
reply to darcilicious
said by darcilicious:

Just because a site is dynamically generated does not preclude a static version of the same site from existing / being accessible.

Yes I know. The first CMS I ever dealt with "published" the public side of the site as static files based on dynamic data. It wasn't a horrible design, but it got quite inefficient the larger and larger the site grew.

CIA World Factbook has ~196 countries, plus a relatively limited number of supporting pages and comes in at 272 MB compressed. That's easy to manage and publish an offline version. Since the scope of the site is limited and likely created entirely in house, copyright issues aren't a concern.

Compare that to Wikipedia. It has nearly 4m content pages, 800k media files, grows 30k articles a month with an edit every half a second. The compressed dump is over 7GB and uncompressed, it's over 31GB. There can be significant copyright image use depending on the context which the image is used...wikipedia may have a legal fair use claim, but other use of it possibly may not be permitted. This doesn't even consider the actual page HTML generation requirements. Content for a page isn't stored in HTML (e.g. here's BBR's page data). There is a non-insignifiant effort required to construct each page based on the content's markup to create the navigation, formatting, etc. Plus for cross referencing, you'd also have to check if other wikientries actually exist.

It's all possible, it just doesn't happen anymore as there are better methods to generate the pages.