Cuckoo fails to store in standard MongoDB with certain reports #358

SwissKid · 2014-07-28T17:33:54Z

2014-07-28 10:23:16,041 [lib.cuckoo.core.plugins] ERROR: Failed to run the reporting module "MongoDB":
Traceback (most recent call last):
File "/home/cuckoo/cuckoo/lib/cuckoo/core/plugins.py", line 499, in process current.run(self.results)
File "/home/cuckoo/cuckoo/modules/reporting/mongodb.py", line 195, in run self.db.analysis.save(report)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 228, in save return self.insert(to_save, manipulate, safe, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 306, in insert continue_on_error, self.__uuid_subtype), safe)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 732, in _send_message
(request_id, data) = self.__check_bson_size(message)
File "/usr/lib/python2.7/dist-packages/pymongo/connection.py", line 709, in __check_bson_size
(max_doc_size, self.__max_bson_size))
InvalidDocument: BSON document too large (17837322 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

Suggested Fix:
Might be possible to fix using gridfs (Maybe? I just saw it in a stackexchange post) or some other limiter. Or recompiling mongodb with a patch to increase this limit.

Either way, a check should be put in place to prevent this error from occurring.

The text was updated successfully, but these errors were encountered:

SwissKid · 2014-07-28T22:34:53Z

Additional Note: Getting this error with
bba4e9627554fef3476b1bea0d52763c442e75e50c9584bcfb012aabf203f05a
from malwr.com, so theoretically could be reproduced on any system with that same malware.

botherder · 2014-07-29T03:05:27Z

Yes, that happens if the report is too big to be stored inside Mongo. We tried to minimize that by splitting the behavioral section, but sometimes other sections might as well be too mig.

SwissKid · 2014-07-29T03:07:04Z

I believe it's the Memory Analysis section with this piece. Might want to split that as well?

jekil · 2014-07-29T07:57:33Z

Is it possible to you to identify the section "too big" or share with us the sample? Is it bba4e9627554fef3476b1bea0d52763c442e75e50c9584bcfb012aabf203f05a on malwr?

SwissKid · 2014-07-29T20:53:18Z

It should be that one on malwr, since that's where I got it. It triggers when volatility is turned on with default settings. I also have yara rules in place, but I doubt those are tipping it over.

SwissKid · 2014-07-29T21:24:58Z

I can include any config files you're interested in, or a listing of any directories.

botherder · 2014-08-02T23:54:29Z

Yes makes sense. Some sections of the Volatility report might be massive sometimes. I encountered a similar issue before.

…investigating #358

rep · 2015-02-23T13:37:53Z

I retried to analyze the mentioned sample - everything worked like a charm, including mongo reporting with enabled volatility processing (memory analysis).

I suspect that this was related to even bigger behavior logs on this file with full networking / endpoints at the time. Indeed the report is quite huge right now but for me at least it fits in mongo.

Overall the statement is: while we know about the possibility of too big reports, there's not much we can do about it right now without changing the semantics of our database storage scheme.

My suggestion long-term would be to store volatility results in a separate collection, linking back to the analysis report. That should improve the situation for most samples. However this is breaking backwards compat and thus won't make it into this release.

begosch · 2015-03-02T18:16:12Z

I have encountered this without volatility enabled. Why can't we just store reports in mongodb using GridFS?

jekil · 2015-03-02T22:21:27Z

@begosch because it is useless, a file system storage would be better at that point.
Can you please share (even privately) the sample?

kcchu · 2015-06-07T20:07:19Z

FYI, I bumped into the same issue with 74678a11c3d3fe69718289fbb95ec3fe734347e5ec2a8f0c9ecf1b9a179cd89c

I still have the analysis output on my disk, which is 458M in total. Please let me know if it is useful to you.

2015-05-31 00:22:05,855 [lib.cuckoo.core.plugins] ERROR: Failed to run the repor
ting module "MongoDB":
Traceback (most recent call last):
  File "/var/local/cuckoo/cuckoo/lib/cuckoo/core/plugins.py", line 505, in proce
ss
    current.run(self.results)
  File "/var/local/cuckoo/cuckoo/modules/reporting/mongodb.py", line 215, in run
    self.db.analysis.save(report)
  File "/var/local/cuckoo/py/local/lib/python2.7/site-packages/pymongo/collection.py", line 285, in save
    return self.insert(to_save, manipulate, safe, check_keys, **kwargs)
  File "/var/local/cuckoo/py/local/lib/python2.7/site-packages/pymongo/collection.py", line 409, in insert
    gen(), check_keys, self.uuid_subtype, client)
DocumentTooLarge: BSON document too large (17321912 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.

KillerInstinct · 2015-07-13T13:44:46Z

The above too errors are basically unhandleable at Cuckoo's current state. It really requires a rewrite of some of the processing modules. I wrote up some code to help debug these (and optionally delete overly large keys) -- I suggested to upstream in IRC that it be implimented to help debug which processing modules are to fault, especially for custom installs/modules. I specifically observed this with both behavior and volatility processing modules. I personally use the deletion feature because I prefer some data over no data. :)

Feel free to reference / use / rip:
KillerInstinct/cuckoo-modified@ac8ecf8

KillerInstinct · 2015-07-14T14:41:05Z

Again, lost some data, as opposed to lost all data for an analysis. At cuckoo's current state it's one or the other. The best solution in my opinion would be to modify processing modules to not allocate >16Mb per key. No need to migrate to another data store for a problem that happens relatively rarely against most modern malware samples.

jbremer · 2015-08-24T01:49:35Z

This issue has been resolved, right? Or should we also split up other parts of the report a bit more (just like we split behavioral logs into pieces)?

jbremer · 2015-11-22T01:20:36Z

Haven't ran into this issue for quite a while, so going to assume it's fixed for now.

GelosSnake · 2016-01-05T13:38:08Z

I've got this error today "BSON document too large (18064026 bytes) - the connected server supports BSON document sizes up to 16777216 bytes. "
Seems like the same issue.
Related sample sha256 hash: 1ccc286d33d3fec1853e8f4c17eb7faea390725a8cfe03d23944eedc5bf8d58c
https://malwr.com/submission/status/N2I4ZThmOWRlODZlNDAyNmIwNjNhYjkzYWI3NjQ0ZTI/
https://malwr.com/submission/status/ZmFkN2YyMzE2OGZjNDZkNTk5MGIyYjVmMjAxYjZiNTU/

doomedraven · 2016-04-20T19:53:50Z

i got it today with 2.0-dev

any possible solution? i checked this http://stackoverflow.com/a/25553887

ERROR:lib.cuckoo.core.plugins:Failed to run the reporting module "MongoDB":
Traceback (most recent call last):
  File "/opt/cuckoo/utils/../lib/cuckoo/core/plugins.py", line 506, in process
    current.run(self.results)
  File "/opt/cuckoo/utils/../modules/reporting/mongodb.py", line 227, in run
    self.db.analysis.save(report)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 2182, in save
    check_keys, manipulate, write_concern)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 530, in _insert
    check_keys, manipulate, write_concern, op_id, bypass_doc_val)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 512, in _insert_one
    check_keys=check_keys)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 218, in command
    self._raise_connection_failure(error)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/pool.py", line 346, in _raise_connection_failure
    raise error
DocumentTooLarge: BSON document too large (18575204 bytes) - the connected server supports BSON document sizes up to 16793598 bytes.

jbremer · 2016-04-21T09:18:02Z

@doomedraven Could you share a report.json for this analysis?

doomedraven · 2016-04-21T09:29:59Z

@jbremer for sure, here you have it, 1gb json O_o

https://www.dropbox.com/s/q4l5zidzfhlmd50/report.json.zip?dl=1

kholbrook1303 · 2016-05-17T22:18:54Z

@jbremer certainly dont want to flood this issue, but I am seeing this more frequently with Ransomware (Specifically Locky). Likely due to the massive amount of File I/O.

Here is the JSON report:
https://www.dropbox.com/s/agv77vutwnon4ro/report.json?dl=0

doomedraven · 2018-01-30T17:11:19Z

if you google you will see what you can't change that, i have posted many times my mongo.py reporting with fix, do search in issues

netpacket · 2018-01-30T17:12:32Z

@doomedraven
I did not realize this was mongo issue not Cuckoo Sandbox setup infrastructure thing. Thanks.

doomedraven · 2018-01-30T17:15:11Z

no, that isn't related to cuckoo, well cuckoo generate a lot of output sometimes depend on the sample, but meh, that is so easy to fix, and posted fix so many time so idk why that still not merged if so many people complains, but you can easilly fix that

netpacket · 2018-01-30T17:21:17Z

Gotcha. I have moved over to postgre sql db for now... so yeah, I can go into the file and make the change.

doomedraven · 2018-01-30T17:22:00Z

you can't use psql for webgui

netpacket · 2018-01-30T17:25:00Z

Wait, psql is necessary if you want to have multiple vm, correct? Then, webgui is not properly hooked with the new db? This is sort of infrastructure flaw if this is the case. Correct me if I am not understanding how the Cuckoo infrastructure.

doomedraven · 2018-01-30T17:26:47Z

no, did you check the manual? *sql part is only to manage tasks, mongo for webgui

netpacket · 2018-01-30T17:34:20Z

@doomedraven I totally brainfarted. Oops, the tasks on different db than mongodb is for web stuff.

doomedraven · 2018-01-30T17:43:35Z

read the manual

netpacket · 2018-01-30T17:49:22Z

I did. Tasks are on SQlite by default and webgui on mongodb. The issue lies on the mongodb with upload size. I think I get it. I upgraded the SQLite db to postgresql db. Thanks.

SparkyNZL · 2018-01-30T18:47:41Z

No, just don't use sql lite, I use MySQL with 15 machines in the pool and it works fine.

…

On Wed, Jan 31, 2018 at 6:25 AM, netpacket ***@***.***> wrote: Wait, psql is necessary if you want to have multiple vm, correct? Then, webgui is not properly hooked with the new db? This is sort of infrastructure flaw if this is the case. Correct me if I am not understanding how the Cuckoo infrastructure. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#358 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ_imGoyWOk9cu5_FPxK0NpOugKuhypjks5tP1B2gaJpZM4CRnyK> .

SparkyNZL · 2018-01-30T18:48:36Z

Lolz,

…

On Wed, Jan 31, 2018 at 6:34 AM, netpacket ***@***.***> wrote: @doomedraven <https://github.com/doomedraven> I totally brainfarted. Oops, the tasks on different db than mongodb is for web stuff. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#358 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ_imM2otiw3K4C7bOVZjHxl-E3bwXIhks5tP1KlgaJpZM4CRnyK> .

netpacket · 2018-01-30T18:52:41Z

SparkyNZL, Lol indeed. I migrated off the default sql db to psq db.

SparkyNZL · 2018-01-30T18:54:17Z

Hmm are you talking about the upload of large files ? to cuckoo ? over the WebUI ? there are a number of places which might trip you up, in fhe conf files there are file size limits, make sure these are correct. if the logging and output from the files it too big it will not be able to be ingested into mongodb causing a 404 error in the webui, to address this most of the time it is the memory dumps (process) which are blowing these BSON sizes out, and if you are submitting a big file and doing a memory dumb of it , it will fail to import this information into mongodb. You should see an error if you are running the processing in debug mode, read the errors they are actually really helpful ! Cheers and I hope this helpful

…

On Wed, Jan 31, 2018 at 6:49 AM, netpacket ***@***.***> wrote: I did. Tasks are on SQlite by default and webgui on mongodb. The issue lies on the mongodb with upload size. I think I get it. I upgraded the SQLite db to postgresql db. Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#358 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ_imOjccl7TjCcuZq7Q9FSTQbCQiqE1ks5tP1YrgaJpZM4CRnyK> .

SparkyNZL · 2018-01-30T18:55:07Z

yeah, it does it job if you only have one vm ! which is really good if you have a IR box running it.

…

On Wed, Jan 31, 2018 at 7:52 AM, netpacket ***@***.***> wrote: SparkyNZL, Lol indeed. I migrated off the default sql db to psq db. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#358 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQ_imHqi-vQbMTf7L_6yf00358RhKZvSks5tP2UBgaJpZM4CRnyK> .

netpacket · 2018-01-30T19:00:02Z

SparkyNZL, I definitely run the debug component on. What you mentioned makes sense to look for bottlenecks in misconfiguration.

jbremer · 2018-02-04T09:17:18Z

@wroersma We have a couple of major Cuckoo Core changes upcoming that will, among many other things, mitigate this issue. It hasn't been properly addressed yet because it takes months of hard work with a non-backwards compatible result. Please remember this is an open source project and we do our best with the resources that we have ;-)
If no other questions arise I'll be closing this issue - in the end we're aware of it.

wroersma · 2018-02-16T18:15:45Z

I'd be happy to help write whatever code is needed as long as it's something that will be accepted into the project. It's just hard when we don't know what you guys are working on and even if you will take the PR.

nahaye · 2018-06-06T09:37:39Z

I am having mayor troubles with mongodb issues, when it will be solve?

doomedraven · 2018-06-06T09:38:45Z

can you elaborate your problem?

redbaron4 · 2018-12-03T09:34:43Z

Just ran into this with Cuckoo-2.0.6. The processing module crashed with same error message and I have no idea what caused this.

I like the idea given by @KillerInstinct - get an incomplete report into mongo rather than no report at all. The only problem is that his fix is not patchable with the latest version of cuckoo.

doomedraven · 2018-12-03T10:26:20Z

error message and I have no idea what caused this. if you don't post it how you expect someone to help you? ;)

redbaron4 · 2018-12-03T11:13:50Z

@doomedraven Well cuckoo only logs the error that report is larger than 16MB. As I can't access the report through WebGUI, how can I tell what caused the report size to be larger than 16MB? Most of the times the reports get written to Mongo fine so why this particular report is too large is what I meant by I have no idea what caused this

doomedraven · 2018-12-03T12:06:14Z

access the server, but do searcher in issues, i just recentry posted fixed mongodb.py file which fixed that

redbaron4 · 2018-12-03T12:11:30Z

It would be great if you could link the fixed file here so that maybe we can test it.

I did search the issues and that is how I landed on this thread. In another you say yara may be to blame but I am not using any additional yara rules.

In the server that the report.json is a whopping 370MB. My limited understanding (and mucking around in text editor with that large file) tells me either process or file count is too large but I can't confirm this.

doomedraven · 2018-12-03T12:18:24Z

i doubt what that will be merged, so you need to replace that file by yourself #2570

k41zen · 2019-01-23T16:31:13Z

Can I just ask something about this please? I get the limitation but I'm testing my setup using files from this cuckoo cert site (cuckoo.cert.ee). I download samples from here and upload them to mine. How is it that this work in Cuckoo yet mine give this MongoDB error? I clearly have something configured differently.

Can someone help please? It's almost every file I upload that errors making my instance unusable.

botherder added the Bug (confirmed) label Aug 2, 2014

jekil self-assigned this Aug 15, 2014

rep added a commit that referenced this issue Feb 23, 2015

adding a bit of robustness to netlog/uploads, noticed problems while …

0a797f0

…investigating #358

rep added the Task label Feb 23, 2015

rep modified the milestones: 1.3, 1.2 Feb 23, 2015

botherder removed the Task label Apr 1, 2015

botherder removed this from the 1.3 milestone Oct 5, 2015

jbremer closed this as completed Nov 22, 2015

doomedraven mentioned this issue Apr 20, 2016

DocumentTooLarge: BSON document too large #877

Closed

jbremer added the Issue close upcoming label Feb 4, 2018

This was referenced Dec 3, 2019

yara rules probleme #2908

Closed

MogoDB issue with YARA rules #2914

Closed

Cuckoo fails to store in standard MongoDB with certain reports #358

Cuckoo fails to store in standard MongoDB with certain reports #358

Comments

SwissKid commented Jul 28, 2014

SwissKid commented Jul 28, 2014

botherder commented Jul 29, 2014

SwissKid commented Jul 29, 2014

jekil commented Jul 29, 2014

SwissKid commented Jul 29, 2014

SwissKid commented Jul 29, 2014

botherder commented Aug 2, 2014

rep commented Feb 23, 2015

begosch commented Mar 2, 2015

jekil commented Mar 2, 2015

kcchu commented Jun 7, 2015

KillerInstinct commented Jul 13, 2015

KillerInstinct commented Jul 14, 2015

jbremer commented Aug 24, 2015

jbremer commented Nov 22, 2015

GelosSnake commented Jan 5, 2016

doomedraven commented Apr 20, 2016

jbremer commented Apr 21, 2016

doomedraven commented Apr 21, 2016

kholbrook1303 commented May 17, 2016

doomedraven commented Jan 30, 2018

netpacket commented Jan 30, 2018

doomedraven commented Jan 30, 2018

netpacket commented Jan 30, 2018

doomedraven commented Jan 30, 2018

netpacket commented Jan 30, 2018

doomedraven commented Jan 30, 2018

netpacket commented Jan 30, 2018

doomedraven commented Jan 30, 2018

netpacket commented Jan 30, 2018

SparkyNZL commented Jan 30, 2018 via email

SparkyNZL commented Jan 30, 2018 via email

netpacket commented Jan 30, 2018

SparkyNZL commented Jan 30, 2018 via email

SparkyNZL commented Jan 30, 2018 via email

netpacket commented Jan 30, 2018

jbremer commented Feb 4, 2018 • edited Loading

wroersma commented Feb 16, 2018

nahaye commented Jun 6, 2018

doomedraven commented Jun 6, 2018 • edited Loading

redbaron4 commented Dec 3, 2018

doomedraven commented Dec 3, 2018

redbaron4 commented Dec 3, 2018

doomedraven commented Dec 3, 2018

redbaron4 commented Dec 3, 2018

doomedraven commented Dec 3, 2018

k41zen commented Jan 23, 2019 • edited Loading

jbremer commented Feb 4, 2018 •

edited

Loading

doomedraven commented Jun 6, 2018 •

edited

Loading

k41zen commented Jan 23, 2019 •

edited

Loading