-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Cuckoo fails to store in standard MongoDB with certain reports #358
Comments
Additional Note: Getting this error with |
Yes, that happens if the report is too big to be stored inside Mongo. We tried to minimize that by splitting the behavioral section, but sometimes other sections might as well be too mig. |
I believe it's the Memory Analysis section with this piece. Might want to split that as well? |
Is it possible to you to identify the section "too big" or share with us the sample? Is it bba4e9627554fef3476b1bea0d52763c442e75e50c9584bcfb012aabf203f05a on malwr? |
It should be that one on malwr, since that's where I got it. It triggers when volatility is turned on with default settings. I also have yara rules in place, but I doubt those are tipping it over. |
I can include any config files you're interested in, or a listing of any directories. |
Yes makes sense. Some sections of the Volatility report might be massive sometimes. I encountered a similar issue before. |
I retried to analyze the mentioned sample - everything worked like a charm, including mongo reporting with enabled volatility processing (memory analysis). I suspect that this was related to even bigger behavior logs on this file with full networking / endpoints at the time. Indeed the report is quite huge right now but for me at least it fits in mongo. Overall the statement is: while we know about the possibility of too big reports, there's not much we can do about it right now without changing the semantics of our database storage scheme. My suggestion long-term would be to store volatility results in a separate collection, linking back to the analysis report. That should improve the situation for most samples. However this is breaking backwards compat and thus won't make it into this release. |
I have encountered this without volatility enabled. Why can't we just store reports in mongodb using GridFS? |
@begosch because it is useless, a file system storage would be better at that point. |
FYI, I bumped into the same issue with 74678a11c3d3fe69718289fbb95ec3fe734347e5ec2a8f0c9ecf1b9a179cd89c I still have the analysis output on my disk, which is 458M in total. Please let me know if it is useful to you.
|
The above too errors are basically unhandleable at Cuckoo's current state. It really requires a rewrite of some of the processing modules. I wrote up some code to help debug these (and optionally delete overly large keys) -- I suggested to upstream in IRC that it be implimented to help debug which processing modules are to fault, especially for custom installs/modules. I specifically observed this with both behavior and volatility processing modules. I personally use the deletion feature because I prefer some data over no data. :) Feel free to reference / use / rip: |
Again, lost some data, as opposed to lost all data for an analysis. At cuckoo's current state it's one or the other. The best solution in my opinion would be to modify processing modules to not allocate >16Mb per key. No need to migrate to another data store for a problem that happens relatively rarely against most modern malware samples. |
This issue has been resolved, right? Or should we also split up other parts of the report a bit more (just like we split behavioral logs into pieces)? |
Haven't ran into this issue for quite a while, so going to assume it's fixed for now. |
I've got this error today "BSON document too large (18064026 bytes) - the connected server supports BSON document sizes up to 16777216 bytes. " |
i got it today with 2.0-dev any possible solution? i checked this http://stackoverflow.com/a/25553887
|
@doomedraven Could you share a |
@jbremer for sure, here you have it, 1gb json O_o https://www.dropbox.com/s/q4l5zidzfhlmd50/report.json.zip?dl=1 |
@jbremer certainly dont want to flood this issue, but I am seeing this more frequently with Ransomware (Specifically Locky). Likely due to the massive amount of File I/O. Here is the JSON report: |
if you google you will see what you can't change that, i have posted many times my mongo.py reporting with fix, do search in issues |
@doomedraven |
no, that isn't related to cuckoo, well cuckoo generate a lot of output sometimes depend on the sample, but meh, that is so easy to fix, and posted fix so many time so idk why that still not merged if so many people complains, but you can easilly fix that |
Gotcha. I have moved over to postgre sql db for now... so yeah, I can go into the file and make the change. |
you can't use psql for webgui |
Wait, psql is necessary if you want to have multiple vm, correct? Then, webgui is not properly hooked with the new db? This is sort of infrastructure flaw if this is the case. Correct me if I am not understanding how the Cuckoo infrastructure. |
no, did you check the manual? *sql part is only to manage tasks, mongo for webgui |
@doomedraven I totally brainfarted. Oops, the tasks on different db than mongodb is for web stuff. |
read the manual |
I did. Tasks are on SQlite by default and webgui on mongodb. The issue lies on the mongodb with upload size. I think I get it. I upgraded the SQLite db to postgresql db. Thanks. |
No, just don't use sql lite, I use MySQL with 15 machines in the pool and
it works fine.
…On Wed, Jan 31, 2018 at 6:25 AM, netpacket ***@***.***> wrote:
Wait, psql is necessary if you want to have multiple vm, correct? Then,
webgui is not properly hooked with the new db? This is sort of
infrastructure flaw if this is the case. Correct me if I am not
understanding how the Cuckoo infrastructure.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ_imGoyWOk9cu5_FPxK0NpOugKuhypjks5tP1B2gaJpZM4CRnyK>
.
|
Lolz,
…On Wed, Jan 31, 2018 at 6:34 AM, netpacket ***@***.***> wrote:
@doomedraven <https://github.com/doomedraven> I totally brainfarted.
Oops, the tasks on different db than mongodb is for web stuff.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ_imM2otiw3K4C7bOVZjHxl-E3bwXIhks5tP1KlgaJpZM4CRnyK>
.
|
SparkyNZL, Lol indeed. I migrated off the default sql db to psq db. |
Hmm are you talking about the upload of large files ? to cuckoo ? over the
WebUI ? there are a number of places which might trip you up,
in fhe conf files there are file size limits, make sure these are correct.
if the logging and output from the files it too big it will not be able to
be ingested into mongodb causing a 404 error in the webui, to address this
most of the time it is the memory dumps (process) which are blowing these
BSON sizes out, and if you are submitting a big file and doing a memory
dumb of it , it will fail to import this information into mongodb.
You should see an error if you are running the processing in debug mode,
read the errors they are actually really helpful !
Cheers and I hope this helpful
…On Wed, Jan 31, 2018 at 6:49 AM, netpacket ***@***.***> wrote:
I did. Tasks are on SQlite by default and webgui on mongodb. The issue
lies on the mongodb with upload size. I think I get it. I upgraded the
SQLite db to postgresql db. Thanks.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ_imOjccl7TjCcuZq7Q9FSTQbCQiqE1ks5tP1YrgaJpZM4CRnyK>
.
|
yeah, it does it job if you only have one vm ! which is really good if you
have a IR box running it.
…On Wed, Jan 31, 2018 at 7:52 AM, netpacket ***@***.***> wrote:
SparkyNZL, Lol indeed. I migrated off the default sql db to psq db.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#358 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQ_imHqi-vQbMTf7L_6yf00358RhKZvSks5tP2UBgaJpZM4CRnyK>
.
|
SparkyNZL, I definitely run the debug component on. What you mentioned makes sense to look for bottlenecks in misconfiguration. |
@wroersma We have a couple of major Cuckoo Core changes upcoming that will, among many other things, mitigate this issue. It hasn't been properly addressed yet because it takes months of hard work with a non-backwards compatible result. Please remember this is an open source project and we do our best with the resources that we have ;-) |
I'd be happy to help write whatever code is needed as long as it's something that will be accepted into the project. It's just hard when we don't know what you guys are working on and even if you will take the PR. |
I am having mayor troubles with mongodb issues, when it will be solve? |
can you elaborate your problem? |
Just ran into this with Cuckoo-2.0.6. The processing module crashed with same error message and I have no idea what caused this. I like the idea given by @KillerInstinct - get an incomplete report into mongo rather than no report at all. The only problem is that his fix is not patchable with the latest version of cuckoo. |
|
@doomedraven Well cuckoo only logs the error that report is larger than 16MB. As I can't access the report through WebGUI, how can I tell what caused the report size to be larger than 16MB? Most of the times the reports get written to Mongo fine so why this particular report is too large is what I meant by |
access the server, but do searcher in issues, i just recentry posted fixed mongodb.py file which fixed that |
It would be great if you could link the fixed file here so that maybe we can test it. I did search the issues and that is how I landed on this thread. In another you say yara may be to blame but I am not using any additional yara rules. In the server that the report.json is a whopping 370MB. My limited understanding (and mucking around in text editor with that large file) tells me either process or file count is too large but I can't confirm this. |
i doubt what that will be merged, so you need to replace that file by yourself #2570 |
Can I just ask something about this please? I get the limitation but I'm testing my setup using files from this cuckoo cert site (cuckoo.cert.ee). I download samples from here and upload them to mine. How is it that this work in Cuckoo yet mine give this MongoDB error? I clearly have something configured differently. Can someone help please? It's almost every file I upload that errors making my instance unusable. |
2014-07-28 10:23:16,041 [lib.cuckoo.core.plugins] ERROR: Failed to run the reporting module "MongoDB":
Traceback (most recent call last):
File "/home/cuckoo/cuckoo/lib/cuckoo/core/plugins.py", line 499, in process current.run(self.results)
File "/home/cuckoo/cuckoo/modules/reporting/mongodb.py", line 195, in run self.db.analysis.save(report)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 228, in save return self.insert(to_save, manipulate, safe, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 306, in insert continue_on_error, self.__uuid_subtype), safe)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 732, in _send_message
(request_id, data) = self.__check_bson_size(message)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 709, in __check_bson_size
(max_doc_size, self.__max_bson_size))
InvalidDocument: BSON document too large (17837322 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.
Suggested Fix:
Might be possible to fix using gridfs (Maybe? I just saw it in a stackexchange post) or some other limiter. Or recompiling mongodb with a patch to increase this limit.
Either way, a check should be put in place to prevent this error from occurring.
The text was updated successfully, but these errors were encountered: