-
Notifications
You must be signed in to change notification settings - Fork 40
CiteSeerX instance #123
Comments
@mekarpeles Is this something @ArchiveLabs could do? |
For the public access stuff, I believe so. |
Sorry for letting you wait so long -- CSX looks pretty heavy, can you set it up on pollux? Or on another separate host. How much storage do you think that'll need? |
@lgierth I think the database is ~4TB compressed, plus whatever extra overhead CSX requires. @cleegiles Help? :) |
It's a bit larger, more like 5T. But most of this are the compressed PDFs - 6.8M The database, xml and extracted text is much smaller - compressed respectively 20G, 30G, 100G |
It would be best to take this up here:
On 12/6/15 3:02 AM, David A Roberts wrote:
|
@davidar do you still wanna work on this? We could just get you a host with a big disk and you're root |
Apparently, the best way forward regarding mirroring CSX's PDF collection is to setup our own CSX instance, and then mirror that to IPFS. They'll give us a copy of their data to get started, but we'll be responsible for handling DMCA takedowns, etc.
@lgierth Thoughts?
Cc: @jbenet
The text was updated successfully, but these errors were encountered: