Skip to content
This repository has been archived by the owner on Apr 26, 2021. It is now read-only.

Cuckoo fails to store in standard MongoDB with certain reports #358

Open
SwissKid opened this issue Jul 28, 2014 · 66 comments
Open

Cuckoo fails to store in standard MongoDB with certain reports #358

SwissKid opened this issue Jul 28, 2014 · 66 comments

Comments

@SwissKid
Copy link

2014-07-28 10:23:16,041 [lib.cuckoo.core.plugins] ERROR: Failed to run the reporting module "MongoDB":
Traceback (most recent call last):
File "/home/cuckoo/cuckoo/lib/cuckoo/core/plugins.py", line 499, in process current.run(self.results)
File "/home/cuckoo/cuckoo/modules/reporting/mongodb.py", line 195, in run self.db.analysis.save(report)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 228, in save return self.insert(to_save, manipulate, safe, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 306, in insert continue_on_error, self.__uuid_subtype), safe)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 732, in _send_message
(request_id, data) = self.__check_bson_size(message)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 709, in __check_bson_size
(max_doc_size, self.__max_bson_size))
InvalidDocument: BSON document too large (17837322 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

Suggested Fix:
Might be possible to fix using gridfs (Maybe? I just saw it in a stackexchange post) or some other limiter. Or recompiling mongodb with a patch to increase this limit.

Either way, a check should be put in place to prevent this error from occurring.

@SwissKid
Copy link
Author

Additional Note: Getting this error with
bba4e9627554fef3476b1bea0d52763c442e75e50c9584bcfb012aabf203f05a
from malwr.com, so theoretically could be reproduced on any system with that same malware.

@botherder
Copy link
Member

Yes, that happens if the report is too big to be stored inside Mongo. We tried to minimize that by splitting the behavioral section, but sometimes other sections might as well be too mig.

@SwissKid
Copy link
Author

I believe it's the Memory Analysis section with this piece. Might want to split that as well?

@jekil
Copy link
Member

jekil commented Jul 29, 2014

Is it possible to you to identify the section "too big" or share with us the sample? Is it bba4e9627554fef3476b1bea0d52763c442e75e50c9584bcfb012aabf203f05a on malwr?

@SwissKid
Copy link
Author

It should be that one on malwr, since that's where I got it. It triggers when volatility is turned on with default settings. I also have yara rules in place, but I doubt those are tipping it over.

@SwissKid
Copy link
Author

I can include any config files you're interested in, or a listing of any directories.

@botherder
Copy link
Member

Yes makes sense. Some sections of the Volatility report might be massive sometimes. I encountered a similar issue before.

@jekil jekil self-assigned this Aug 15, 2014
@rep
Copy link
Member

rep commented Feb 23, 2015

I retried to analyze the mentioned sample - everything worked like a charm, including mongo reporting with enabled volatility processing (memory analysis).

I suspect that this was related to even bigger behavior logs on this file with full networking / endpoints at the time. Indeed the report is quite huge right now but for me at least it fits in mongo.

Overall the statement is: while we know about the possibility of too big reports, there's not much we can do about it right now without changing the semantics of our database storage scheme.

My suggestion long-term would be to store volatility results in a separate collection, linking back to the analysis report. That should improve the situation for most samples. However this is breaking backwards compat and thus won't make it into this release.

@rep rep added the Task label Feb 23, 2015
@rep rep modified the milestones: 1.3, 1.2 Feb 23, 2015
@begosch
Copy link

begosch commented Mar 2, 2015

I have encountered this without volatility enabled. Why can't we just store reports in mongodb using GridFS?

@jekil
Copy link
Member

jekil commented Mar 2, 2015

@begosch because it is useless, a file system storage would be better at that point.
Can you please share (even privately) the sample?

@botherder botherder removed the Task label Apr 1, 2015
@kcchu
Copy link

kcchu commented Jun 7, 2015

FYI, I bumped into the same issue with 74678a11c3d3fe69718289fbb95ec3fe734347e5ec2a8f0c9ecf1b9a179cd89c

I still have the analysis output on my disk, which is 458M in total. Please let me know if it is useful to you.

2015-05-31 00:22:05,855 [lib.cuckoo.core.plugins] ERROR: Failed to run the repor
ting module "MongoDB":
Traceback (most recent call last):
  File "/var/local/cuckoo/cuckoo/lib/cuckoo/core/plugins.py", line 505, in proce
ss
    current.run(self.results)
  File "/var/local/cuckoo/cuckoo/modules/reporting/mongodb.py", line 215, in run
    self.db.analysis.save(report)
  File "/var/local/cuckoo/py/local/lib/python2.7/site-packages/pymongo/collection.py", line 285, in save
    return self.insert(to_save, manipulate, safe, check_keys, **kwargs)
  File "/var/local/cuckoo/py/local/lib/python2.7/site-packages/pymongo/collection.py", line 409, in insert
    gen(), check_keys, self.uuid_subtype, client)
DocumentTooLarge: BSON document too large (17321912 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

@KillerInstinct
Copy link
Contributor

The above too errors are basically unhandleable at Cuckoo's current state. It really requires a rewrite of some of the processing modules. I wrote up some code to help debug these (and optionally delete overly large keys) -- I suggested to upstream in IRC that it be implimented to help debug which processing modules are to fault, especially for custom installs/modules. I specifically observed this with both behavior and volatility processing modules. I personally use the deletion feature because I prefer some data over no data. :)

Feel free to reference / use / rip:
KillerInstinct/cuckoo-modified@ac8ecf8

@KillerInstinct
Copy link
Contributor

Again, lost some data, as opposed to lost all data for an analysis. At cuckoo's current state it's one or the other. The best solution in my opinion would be to modify processing modules to not allocate >16Mb per key. No need to migrate to another data store for a problem that happens relatively rarely against most modern malware samples.

@jbremer
Copy link
Member

jbremer commented Aug 24, 2015

This issue has been resolved, right? Or should we also split up other parts of the report a bit more (just like we split behavioral logs into pieces)?

@botherder botherder removed this from the 1.3 milestone Oct 5, 2015
@jbremer
Copy link
Member

jbremer commented Nov 22, 2015

Haven't ran into this issue for quite a while, so going to assume it's fixed for now.

@jbremer jbremer closed this as completed Nov 22, 2015
@GelosSnake
Copy link

I've got this error today "BSON document too large (18064026 bytes) - the connected server supports BSON document sizes up to 16777216 bytes. "
Seems like the same issue.
Related sample sha256 hash: 1ccc286d33d3fec1853e8f4c17eb7faea390725a8cfe03d23944eedc5bf8d58c
https://malwr.com/submission/status/N2I4ZThmOWRlODZlNDAyNmIwNjNhYjkzYWI3NjQ0ZTI/
https://malwr.com/submission/status/ZmFkN2YyMzE2OGZjNDZkNTk5MGIyYjVmMjAxYjZiNTU/

@doomedraven
Copy link
Contributor

i got it today with 2.0-dev

any possible solution? i checked this http://stackoverflow.com/a/25553887

ERROR:lib.cuckoo.core.plugins:Failed to run the reporting module "MongoDB":
Traceback (most recent call last):
  File "/opt/cuckoo/utils/../lib/cuckoo/core/plugins.py", line 506, in process
    current.run(self.results)
  File "/opt/cuckoo/utils/../modules/reporting/mongodb.py", line 227, in run
    self.db.analysis.save(report)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 2182, in save
    check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 530, in _insert
    check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 512, in _insert_one
    check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 218, in command
    self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 346, in _raise_connection_failure
    raise error
DocumentTooLarge: BSON document too large (18575204 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.

@jbremer
Copy link
Member

jbremer commented Apr 21, 2016

@doomedraven Could you share a report.json for this analysis?

@doomedraven
Copy link
Contributor

@jbremer for sure, here you have it, 1gb json O_o

https://www.dropbox.com/s/q4l5zidzfhlmd50/report.json.zip?dl=1

@kholbrook1303
Copy link
Contributor

@jbremer certainly dont want to flood this issue, but I am seeing this more frequently with Ransomware (Specifically Locky). Likely due to the massive amount of File I/O.

Here is the JSON report:
https://www.dropbox.com/s/agv77vutwnon4ro/report.json?dl=0

@doomedraven
Copy link
Contributor

if you google you will see what you can't change that, i have posted many times my mongo.py reporting with fix, do search in issues

@netpacket
Copy link

@doomedraven
I did not realize this was mongo issue not Cuckoo Sandbox setup infrastructure thing. Thanks.

@doomedraven
Copy link
Contributor

no, that isn't related to cuckoo, well cuckoo generate a lot of output sometimes depend on the sample, but meh, that is so easy to fix, and posted fix so many time so idk why that still not merged if so many people complains, but you can easilly fix that

@netpacket
Copy link

Gotcha. I have moved over to postgre sql db for now... so yeah, I can go into the file and make the change.

@doomedraven
Copy link
Contributor

you can't use psql for webgui

@netpacket
Copy link

Wait, psql is necessary if you want to have multiple vm, correct? Then, webgui is not properly hooked with the new db? This is sort of infrastructure flaw if this is the case. Correct me if I am not understanding how the Cuckoo infrastructure.

@doomedraven
Copy link
Contributor

no, did you check the manual? *sql part is only to manage tasks, mongo for webgui

@netpacket
Copy link

@doomedraven I totally brainfarted. Oops, the tasks on different db than mongodb is for web stuff.

@doomedraven
Copy link
Contributor

read the manual

@netpacket
Copy link

I did. Tasks are on SQlite by default and webgui on mongodb. The issue lies on the mongodb with upload size. I think I get it. I upgraded the SQLite db to postgresql db. Thanks.

@SparkyNZL
Copy link

SparkyNZL commented Jan 30, 2018 via email

@SparkyNZL
Copy link

SparkyNZL commented Jan 30, 2018 via email

@netpacket
Copy link

SparkyNZL, Lol indeed. I migrated off the default sql db to psq db.

@SparkyNZL
Copy link

SparkyNZL commented Jan 30, 2018 via email

@SparkyNZL
Copy link

SparkyNZL commented Jan 30, 2018 via email

@netpacket
Copy link

SparkyNZL, I definitely run the debug component on. What you mentioned makes sense to look for bottlenecks in misconfiguration.

@jbremer
Copy link
Member

jbremer commented Feb 4, 2018

@wroersma We have a couple of major Cuckoo Core changes upcoming that will, among many other things, mitigate this issue. It hasn't been properly addressed yet because it takes months of hard work with a non-backwards compatible result. Please remember this is an open source project and we do our best with the resources that we have ;-)
If no other questions arise I'll be closing this issue - in the end we're aware of it.

@wroersma
Copy link
Contributor

I'd be happy to help write whatever code is needed as long as it's something that will be accepted into the project. It's just hard when we don't know what you guys are working on and even if you will take the PR.

@nahaye
Copy link

nahaye commented Jun 6, 2018

I am having mayor troubles with mongodb issues, when it will be solve?

@doomedraven
Copy link
Contributor

doomedraven commented Jun 6, 2018

can you elaborate your problem?

@redbaron4
Copy link

Just ran into this with Cuckoo-2.0.6. The processing module crashed with same error message and I have no idea what caused this.

I like the idea given by @KillerInstinct - get an incomplete report into mongo rather than no report at all. The only problem is that his fix is not patchable with the latest version of cuckoo.

@doomedraven
Copy link
Contributor

error message and I have no idea what caused this. if you don't post it how you expect someone to help you? ;)

@redbaron4
Copy link

@doomedraven Well cuckoo only logs the error that report is larger than 16MB. As I can't access the report through WebGUI, how can I tell what caused the report size to be larger than 16MB? Most of the times the reports get written to Mongo fine so why this particular report is too large is what I meant by I have no idea what caused this

@doomedraven
Copy link
Contributor

access the server, but do searcher in issues, i just recentry posted fixed mongodb.py file which fixed that

@redbaron4
Copy link

It would be great if you could link the fixed file here so that maybe we can test it.

I did search the issues and that is how I landed on this thread. In another you say yara may be to blame but I am not using any additional yara rules.

In the server that the report.json is a whopping 370MB. My limited understanding (and mucking around in text editor with that large file) tells me either process or file count is too large but I can't confirm this.

@doomedraven
Copy link
Contributor

i doubt what that will be merged, so you need to replace that file by yourself #2570

@k41zen
Copy link

k41zen commented Jan 23, 2019

Can I just ask something about this please? I get the limitation but I'm testing my setup using files from this cuckoo cert site (cuckoo.cert.ee). I download samples from here and upload them to mine. How is it that this work in Cuckoo yet mine give this MongoDB error? I clearly have something configured differently.

Can someone help please? It's almost every file I upload that errors making my instance unusable.

This was referenced Dec 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests