forked from liheyuan/OZoneStore
-
Notifications
You must be signed in to change notification settings - Fork 0
Not a Key-Value Database, but a simple store solution for KeSeek.com
lkunxyz/OZoneStore
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Why OZoneStore? =============== Yes, there are already key-value databases. But those hardly meet demands at KeSeek.com. KeSeek.com is an distributed search engine that built upon many low-end-virtual-machines. We DO need a store solution to fulfill the following demands: (1) Use RAM as less as it can be Because the cost, we need to support put/get 100,000 documents on a machine that RAM is under 32MB. It sounds funny, but very important for KeSeek.com. Nearly no no-sql may achieve this. Instead, many nosql dbs use lots of Cache to speed up lookup, which is not very important for an search engine, see DONT'S (1). (2) Less Disk I/O Operation Our virtual machines are usually shared with lots of users. We try to build our system over different Host-Machins. But the Disk I/O is still expensive for we don't have any RAM for cache like other nosql databses. Also, thanks to the architecture of KeSeek.com, we seperate crawler(put doc) and serach(get doc) in totally different stage, which means we don't need to update key/value while read docs. So we use an static sorted array to keep all keys. (3) It can suppourt at least Python and C & C++ We decide to develop OZoneStore in C & C++. But KeSeek.com is major used Python. So, thrift is used to solve cross-language problem. (4) An unique key An unique DocID for every document perferd to be pre-generated before index. Only a small frantion of nosql suppout incremental ID. We implement this in OZoneStore. (5) Fast && Memory efficient Traverse During the indexing process, we need to get documents one by one. As the original key file is appending according to value file's offset, we need not to sort it, but load it into memory. The traverse of value file from begining to the end is the most efficient way. Also, we DON'T need some feature as other nosql provided. (1) Cache to speed up key->value lookup Search engine need only to get value from key at the stage of display result & snippet to users. The docs to get is rather random and usually 10 at once, which is not so performance demand. Pure one Disk I/O per key & value is enough to fulfill this demand. We'd like to save some RAM and bring our Search Engine cost down. (2) Realtime 7*24 online service For our search engine, the index && crawler is an offline process, which means we crawl web pages, index it and push it online. All data won't change forever. This is true for most of the small searh engine that don't update index in realtime. Why it is called OZoneStore? =========================== I'd like to name each my open source project an molecule in air for all KeSeek.com sub-project. OZone is O3 that bring fresh. I'd like use ozone to name it for indicated swift, fresh and samll.
About
Not a Key-Value Database, but a simple store solution for KeSeek.com
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published