- State “DONE” from “RPG” [2015-07-13 Mon 16:13] Yes, this has been cleared.
- Assuming it’s the same as the submitted-for-review page count, but it doesn’t say anywhere.
- I emailed Scott.
- State “DONE” from “WAIT” [2015-07-13 Mon 16:13]
- I emailed Scott.
- I think we email to Scott again, yes?
- No, we push a new “version” to EasyChair.
- At link https://trac.prelude-ids.org/wiki/PreludeCorrelator
- Yep, it’s gone. SANS has a whitepaper about Prelude with a paragraph about the correlator, http://www.sans.org/reading-room/whitepapers/awareness/prelude-hybrid-ids-framework-33048
- Oh, I think this might be where it lives now: https://www.prelude-siem.org/projects/prelude/wiki/PreludeCorrelator
- Robert, I think the last one is right, but I don’t see that we
stored a copy of the old one. I’ve gone ahead and updated the
reference. Please have a look, and either set to DONE or add
comments.
- Hm, there’s no attribution on the new page to Yoann Vandoorselaere. Does this look like the same page? I totally don’t remember. For the time being at least, I’ve made a separate citation for this new page, citing it instead of the old one.
- Ugh, @misc citation looks terrible without an author. Will come back to this later.
- Aha! This page clears it up, https://www.prelude-siem.org/projects/prelude/news . It is the same stuff; they just changed the URLs. Back to using the old reference, and the URL there changed.
- State “WONTFIX” from “TODO” [2015-07-14 Tue 15:59]
- At least on my PDF viewer, the URL for Google Scholar doesn’t copy out well. I can’t think of anything we can do about that, but any ideas?
- As I said elsewhere, even when I did copy out the Google Scholar URL, it didn’t work.
- State “DONE” from “RPG” [2015-07-13 Mon 16:30]
- The Schwartz article is still there, but they’re rebranded themselves, and the URL we give redirects. Robert, I’ve updated our reference with the new URL; please either mark DONE or comment.
- I had to redo this: the article seems to have moved again.
- State “DONE” from “JM” [2015-07-14 Tue 09:23]
- One place we have “Query:” before the URL; other places just a bare URL. “InformationWeek web site,” preceeds the URL while “Prelude Correlator online documentation.” follows it (and note comma vs. period separating URL from overall artifact title).
- State “WONTFIX” from “TODO” [2015-07-14 Tue 15:58]
- The period comes with the URL too easily. I’m tempted to put a nonbreaking space (or half space) in between the URL and period.
- I wonder if we could do something with the
\url
macro when it appears in the context of a citation, so that it doesn’t get slurped like that. It would actually be nice if Bibtex had a URL field.
- State “WONTFIX” from “TODO” [2015-07-14 Tue 15:59]
- Is this a AAAI style thing, or is it a decision of ours to use monospaced instead of proportionate sans-serif?
- I think it’s the URL style file. Probably not worth fixing.
- State “DONE” from “TODO” [2015-07-14 Tue 09:13] This complaint not sufficiently detailed to be actionable. I (rpg) made a quick pass over the bib file and fixed some capitalization. Other than that, calling it good.
- State “DONE” from “RPG” [2015-07-14 Tue 12:02]
- “I wonder how the underlying qualitative models were obtained - were they learned or made by domain expert? From the paper, I guess they were hand made because the learning part is never mentioned.”
- JM added some text to {Experimental Design} to make it even clearer that we’re generating these models.
- So we did write strongly about learning models not being realistic here, but it’s in the conclusion. Should we move/repeat in the introduction as well?
- I think this is fixed together with the following in IDS fusion paragraph
- State “DONE” from “TODO” [2015-07-14 Tue 12:02]
- “If this is correct [that the underlying qualitative models were made by a domain expert], the results of the paper are not surprising - the very rare events that the models capture successfully are obviously included in the model because domain experts anticipated such behavior; maybe not all but at least some rare events were anticipated.”
- Moot, probably, but address when we discuss generated vs. learned models?
- See IDS fusion paragraph
- State “DONE” from “TODO” [2015-07-14 Tue 09:31]
- “During clustering and assessment are there ever edges between hypothesis nodes indicating sub-events of particular kinds of attacks? I imagine that this would be what the ‘high-level events’ or ‘ontology of attacks’ are, but it is unclear.”
- Clarify.
- Added a new sub-figure to the set of Bayes networks generated by MIFD with component events.
- State “DONE” from “RPG” [2015-07-14 Tue 10:13]
- “At the end of the intrusion detection system, the authors use Figure 1 to help us understand the scale of the false positive problem. It would help the reader if the time frame of the reports was included as well. Is this over a minute? A day? A week?”
- Robert, your experience more than mine applies here. Maybe add to the last sentence before {Related Work} where we discuss this figure, “from 16,000 raw reports in XXX hours to 0 noteworthy events flagged over the same period”?
- State “DONE” from “RPG” [2015-07-14 Tue 10:09]
- “I find strange that there is no discussion of the plausibility of the experiments although they are admittedly artificial.”
- Robert, what do you think of appending to the 2nd paragraph of
{Experimental Design}, after “in principle inaccessible”:
The models we generate have the same structure as models used in deployments of Scyllarus for real scenarios. What we observe here is the change in performance as the generated models diverge from best-case assumptions behind the use of qualitative probability in the same way as the models of real scenarios diverge from these assumptions.
- I liked this, so tweaked the wording a little and put it in.
- State “DONE” from “JM” [2015-07-14 Tue 11:01]
- “The fact that there are several types of IDSes (from rule-based to hybrid) is irrelevant to this work just because their integration is assumed to be part of the clustering problem?”
- I added a sentence to the {Intrusion detection fusion} section to clarify.
- I (rpg) extended this a little to beat it into the reader’s consciousness.
- State “DONE” from “JM” [2015-07-14 Tue 11:41]
- “Networks can be articulated in sub-networks of different sensitivity to attacks (perhaps due to the type of users or the type of information distributed there), this is not discussed. Is it assumed that a network is somehow homogeneous from the attack viewpoint?”
- Hm, something to discuss in the conclusion?
- I added this sentence to the conclusions: “In future work, we would like
to examine more complex inference patterns that arise from “knotty”
Bayes nets, and the accuracy of the assessment process in models that take
into account how attacks spread through the topology of the underlying
computer network.”
- Feel free to mark this as “DONE”
- State “DONE” from “RPG” [2015-07-14 Tue 13:52]
- “The authors do not discuss the limitation to three classes (likely, plausible and unlikely). Can the technique manage more classes? Would it still behave nicely?”
- Added clarification to sentence in {The Scyllarus and MIFD systems} after paragraph with “…ranks hypotheses as either likely, plausible or unlikely”. Robert, please mark DONE or discuss.
- I just searched scyllarus-and-mifd.tex and I don’t find “ranks” anywhere in there. So I’m not sure what you mean by having added a clarification to this sentence.
- It’s in this commit: https://bitbucket.org/rpgoldman/qr-2015/commits/0dcf034fc46afbee08956d735b94b35ef9969b79. Seems to still be in zplus.tex.
- I added a discussion to zplus.tex. I think this is done, but wanted you to have a chance to check.
- State “DONE” from “RPG” [2015-07-14 Tue 11:58]
- “The reference to an ‘ontology of attacks’ (end of Related Work) is not picked up in the rest of the paper.”
- Picked up the reference a few paragraphs later, below \textbf{Clustering}. Robert, please mark DONE or discuss.
- They mean
$e$ , but fixed.
- State “DONE” from “RPG” [2015-07-14 Tue 13:47]
Robert, please double-check that the one that’s there is correct.
- State “DONE” from “TODO” [2015-07-14 Tue 16:00]
- State “DONE” from “TODO” [2015-07-14 Tue 16:00]
- Via EasyChair
- State “DONE” from “TODO” [2014-08-22 Fri 12:50]
- And add x-axis label
- State “DONE” from “TODO” [2014-08-22 Fri 12:49]
[please review my response here. Need to thrash out the precise nature of the contribution, then we can press forward.]
We wrote a nice argument for Reviewer 4 to justify these experiments as a validation of Z+ itself as an approximation of actual quantitative probability, but as written this isn’t a stated main thrust of the paper: in the intro it doesn’t come up until the third paragraph, it gets just one sentence in the conclusion, and isn’t well-promoted in the abstract.
RPG: I don’t think we can make an argument that we are experimenting to validate Z+ as an approximation of standard probability theory. That can be demonstrated analytically. What we are doing is experimenting to validate that this approximation can usefully be applied to information fusion (with particular application to IDS, which has pathological features that make it especially suited to qualitative abstraction (notably that not only are we unable to get these probabilities in a particular case, there is no imaginable case where we could get them…).
Does this make sense?
- JM: Yes.
So we should think about strengthening these higher-level claims in the introduction. Some ideas:
- State “CANCELED” from “TODO” [2014-08-22 Fri 14:33]
Not exactly this.
And then introduce IDS as a particular example.
So we can say something like: Of the more than 250 papers citing the original Z+ paper\footnote{As enumerated in July 2014 in a Google Scholar search, <URL>.}, there are ample explorations of the system’s properties (notably CITE, CITE and CITE), proposals — but not post-deployment reports — of applications of Z+ to real-world systems (CITE, CITE and CITE), OTHER CATEGORY (CITE, CITE and CITE), and OTHER CATEGORY (CITE, CITE and CITE). However, this seems to be the first research which …
And then move on to (possibly a less aggressive version of) the text we have in the rebuttal.
See list of citations. These were the only applications papers I cauld find. I don’t think that we can put in all the citations for things that are not applications papers, because of the page limit (I’m pretty sure a AAAI paper must be shorter than UAI).
Secure copies of the ZCitations and put in paper opp subdir
- State “DONE” from “JM” [2014-08-15 Fri 13:26] Did this some time ago.
- In progress: one found on Google search, others requested from HCLib.
- State “DONE” from “JM” [2014-07-25 Fri 13:32] Converted to a footnote.
RPG: Agreed. Garbage collect.
Or make it a footnote/remark that acknowledges its necessity, but doesn’t go into details.
- Reviewer 11: [R]emoving old event hypotheses and sensor reports
to free up resources does not seem to be a right approach for
intrusion detection if these events can still be useful. There
should be a justification as to why removing these will not
affect the system performance. Where to store them is not so
important.
- Rebuttal: We’d intended this remark as something of an aside, since (as the paper states) we are not actually doing this cleanup in the system under experimentation. The volume of data which IDSs encounter necessitates some sort of garbage collection. You are correct that the timing of such clean up must be carefully considered. In the full system, hypotheses are only purged after an interval since receipt of the last related IDS report. All reports and hypotheses are archived in a database. It would probably be best to simply cut this minor aside from the paper, since garbage-collection plays no role in these experiments.
The reviewers objected to the unavailability of the code, and we mentioned in the rebuttal that we could make a limited license for research available. Should we address this ahead of time? How would this impact review anonymity?
Are we still able to pull together a full distribution of a version of the code that generated these results?
Can we distribute a version of the MIFD code that does not contain other bits of STRATUS? I think it would be a lot easier to do the distribution if we don’t have to give away other STRATUS stuff, especially mission models.
- State “DONE” from “JM” [2014-07-25 Fri 13:40]
Well…really just two, and I’d copied all of the info from the UAI list to here. But sure — links added.
There are a bunch of places where you have “From UAI’14 task list.” It would be nicer if we had a hyperlink there (put a target in the corresponding UAI task?). As it is, it seems like linear search to find these previous tasks.
- From UAI’14 task list.
- If space permits.
- From UAI’14 task list.
- Ours is a little stale. If you have time: take a quick peek to see if anyone’s citing the related work we cite, and see if we should refer to them.
- Reviewer 11: The related work section could be expanded and
presented more clearly, to separate methods such as FSMs or Bayes
networks from implementations like ARIS or Arcsight.
- Rebuttal was the general “thanks, we’ll address that.”
- Revier 1: However, the conclusions and experiments would highly
benefit from the comparison to other methods […].
- Rebuttal: While this would be desirable, existing IDS fusion techniques do not share an input format or output definition. Most simply cluster and do not categorize hypotheses by posterior likelihood. However, it would be simple for other researchers to adapt their systems to our data sets: our synthetic reports would be simple to translate into other input formats.
Is there a reasonable way to address this?
- Reviewer 1: It would have been good if the authors could demonstrate there results on a complete data set and not only sample generated experiments.
- RPG: Wait I don’t even… What does reviewer 1 even mean by “a complete data set”? I think we are going to have to ignore this. I’d love to have a sentence in the paper, though, to explain why this is impossible.
- JM: I’m not even sure where to bring this up. Maybe just at the end of Sec. 4.1? Or at the end of the pargraph starting “Having generated the sensor reports”?
- State “CANCELED” from “TODO” [2014-07-27 Sun 11:16]
One reviewer referred to the impact over time on recall and precision. I’m not sure what to make of that.
- Reviewer 1: However, the conclusions and experiments would highly benefit from the comparison to other methods as well as a better demonstration of the impact over time on the recall and precision.
I don’t even know what this means.
[JM] OK, let’s just kill it - marking as cancelled.
- [JM] My first thought was some sort of 3-D histogram, but these don’t exist in gnuplot. Maybe a multiplot, three across, with the same axes setup for each? The opaque 3D surfaces might be useful as well.
- [JM July 27] Robert, I’m committed a rough version of graph breaking up the three plots, and rendering them as opaque surfaces. I didn’t spend time fine-tuning; want to get your take on the basic layout before doing any further tweaking.
- <2014-08-24 Sun> [JM] Re-did these in gnuplot, in part for the lower axis captioning but also for control of spacing, and the x-axes are all labelled. Do you think we should also label the y-axes? They’d all be clear from the caption.
- <2014-08-24 Sun> Re-did the splot that was very hard to read. This one isn’t perfect, but I think it is easier to see the point. What do you think?
These sections copied from UAI’14 submission section
When the vary-kappa with 2 reports finishes I’ll set up another run which holds the kappa-translations constant, but which varies the fp-kappa value actually used in the coinflip, to make it more likely that we get FPs.
- Rebuttal: [Reviewer 11] noted in the review that we “used three sensors for possible fusion.” That’s not exactly right — although we do use a ratio of three sensors detecting each attack in most experiments, we vary that ratio in the first experiment.
- State “DONE” from “JM” [2014-07-27 Sun 11:42]
- Reviewer 11: A justification should be provided for the choice of
parameter settings on page 6.
- Rebuttal: In all cases, the initial settings for the configuration parameters are both typical values for an IDS, and consistent with Z+’s assumptions. The experiments vary these values in ways which we anticipate would be worse for MIFD’s performance, to examine our expected performance both under typical conditions, and under less typical conditions approaching worst-case.
- Reviewer 11: System Z+, Bayesian networks, clustering methods (for sensor fusion), etc. should be introduced more clearly. What are benign event hypotheses and what are attack event hypotheses? What are complex events? Even though they are not the focus of the paper, they should at least be introduced to give a reader an idea of the difference between all these different types of events.
- Reviewer 1: The notion of Z+ ranking and Bayesian networks seems
to be sound, however, these experiments make it difficult to
understand how it compares to other state of the art systems.
- Rebuttals to both were the general “thanks, we’ll address that.”
- State “DONE” from “JM” [2014-07-27 Sun 12:55]
- Reviewer 11: [T]he notations in the table on page 5 are not
properly introduced. What is ω? A possible world or else? I
took that φ is a formula (a hypothesis). What does φ ∈
ω mean?
- Rebuttal was the general “thanks, we’ll address that.”
- We can define the symbols. All of these are standard for probability theory, so Reviewer 11 is being an idiot.
- Reviewer 11: I am not able to establish how outputs from
different sensors are related (or correlated) to generate a
single output (referred to as clustering by the authors in
several places). The description on page 5 right column is
confusing.
- Rebuttal was the general “thanks, we’ll address that.”
- Give an example DAG to show the effect of clustering? Review and edit the description.
- No, what he’s missing is that clustering isn’t what gives the final “single output” (I also think he has pages 3 and 5 confused; clustering is explained on 3, not 5.). I made the output of the Clustering process more explicit and earlier.
- Reviewer 11: It seems to me that boosting could be used as an
alternative approach. Should the paper compare this approach to
boosting or at least mention it in Related Work?
- Rebuttal: Boosting does provide an alternative method for fusion, but is not appropriate to IDS fusion. Boosting is based on learning, but IDS fusion problems do not admit use of ML: we do not get labeled data of attacks, nor do we get reference data showing systems operating without attacks. We will add a citation to papers about the failure of ML for IDS. These challenges (see [Sommer&Paxson,2010;Gates&Taylor,2006] on the challenges of ML in IDS, full cites available upon request) account for our adoption of a qualitative, non-adaptive framework. Also note that different IDSes do not share a single “ontology” of events; part of the function of MIFD’s clustering is to align reports with different frames of reference.
Another point made in our rebuttal to Reviewer 4 to make more explicit in the text. Also applicable to diagnosis. I name-checked this in the introduction. Might need some more work, and make sure we hit it in the conclusions as well.
Maybe search forward from EMERALD, etc.?
- [ ] I think the introduction oversimplifies and leaves too much out. Yes, we do evaluate what happens as the qualitative approximation breaks down, but that’s not all we do. For example, we worry about base rate issues, and we look at what happens when we require different numbers of sensors for corroboration. Probably look at the “Objectives” section and use that as a guide: that’s a more comprehensive statement. Note that the statement is good, but it lacks motivation, which should flow all the way back to the introduction.
- [x] We use the terms “precision” and “recall” before they are defined (in the “Metrics” paragraph.).
- State “DONE” from “TODO” [2014-03-19 Wed 18:21]
Write this last
- State “DONE” from “TODO” [2014-03-19 Wed 17:49]
- State “DONE” from “TODO” [2014-03-19 Wed 17:49]
- State “CANCELED” from “TODO” [2014-07-16 Wed 11:01]
Copied to AAAI’15 tasks, and this task marked as cancelled.
If space permits.
- State “DONE” from “TODO” [2014-03-19 Wed 18:01]
- State “DONE” from “WAIT” [2014-03-19 Wed 17:49]
- State “DONE” from “RPG” [2014-03-18 Tue 22:45]
Do this to check before John goes
- In progress.
- Committed this morning.
- State “CANCELED” from “WAIT” [2014-07-16 Wed 10:58] Copied to AAAI’15 tasks, and this task marked as cancelled.
- State “CANCELED” from “CANCELED” [2014-07-16 Wed 10:59]
Copied to AAAI’15 tasks, and this task marked as cancelled. - State “CANCELED” from “WAIT” [2014-07-16 Wed 10:58]
- State “CANCELED” from “WAIT” [2014-07-16 Wed 10:59]
Copied to AAAI’15 tasks, and this task marked as cancelled.
- Category legend with plot
- Each plot in separate figure for ease of cross-reference
- State “DONE” from “RPG” [2014-03-19 Wed 17:16]
Only if time permits.
- State “DONE” from “TODO” [2014-03-19 Wed 17:35]
- State “CANCELED” from “TODO” [2014-07-16 Wed 11:00]
Copied to AAAI’15 tasks, and this task marked as cancelled.
- State “DONE” from “TODO” [2014-03-19 Wed 18:25]
Since locations don’t have anything special about them, we had confusing discussion about the fungibility of locations versus runs. I (rpg) cut that discussion, but I think it would be fine to cut references to locations altogether.
- State “DONE” from “TODO” [2014-03-19 Wed 18:19]
There’s nothing in the paper to tell the reader how many samples were in our tets.
- State “DONE” from “TODO” [2014-07-16 Wed 11:01]
- State “DONE” from “TODO” [2014-07-16 Wed 11:01]
- State “DONE” from “TODO” [2014-07-16 Wed 11:01]
- State “DONE” from “TODO” [2014-07-16 Wed 11:01]
I believe that these papers are relevant to our claim that no one else is applying System Z+
- Z-log: Applying System-Z
- Modeling Drug Mechanism Knowledge Using Evidence and Truth
Maintenance
- This one doesn’t actually seem to say much about it, and doesn’t have experimental results.
- An Investigation of Potency of eWOM Messages with a Focus on Subjective Rank Expressions
- Intelligent systems using Web-pages as knowledge base for statistical decision making
- There’s a ton of “scyllarus” in it, and not much MIFD. Suggests we say “MIFD, and its predecessor, Scyllarus…” and then don’t use the name “Scyllarus” again.
- Generally too much throat-clearing: it’s at line 11 that we first say what we are doing here. Before that, we say what has been done before. This should be flipped. Say up front what we are doing here, and then give the setting.
- There’s nothing in the abstract that says why the reader should care about these results.
- I’m very concerned about saying “it is difficult to draw \emph{general} conclusions about an intrusion detection approach based on field testing.” Unless the reader understands why this is the case, which requires a lot of explanation, this seems very sketchy.
- There’s a mismatch between the abstract (Scyllarus, Scyllarus, Scyllarus) and the topic paragraph, which carefully avoids using the name. If we think it’s ok to use Scyllarus, we should use it once in the topic paragraph, instead of the somewhat tortured “In previous work…” and “Our original system.”
- Make the claim that we provide the only empirical analysis of the System Z+ scheme in application.
- Discuss the generality of the scheme – see my remarks about information fusion and diagnostic reasoning.
- For a reader not so familiar with this field, probably need to define IDS fusion.
- We could postpone abstract revision until the rest of the paper is stitched up and make sure we hit all the high points.
- See above about, “it’s difficult to draw general conclusions.” It’s ok to say this here, but to allay the reader’s suspicions, we should promise to explain why later on.
- Also, we need to brag about what Scyllarus and MIFD have achieved.
- I have tried to make the above modifications, but left some FIXME’s around.
- The paragraph starting “More generally…” doesn’t seem to knit in with the structure of the argument. I suggest it be rewritten in a more standard “callous dismissal of others’ work” form. In particular, the key sentence, is relegated to a footnote, and does not clearly say “we have found no study of system Z+ in which it is evaluated in an application content.” “Few seem to detail…”, “typically”, etc. is too hedged. So something more like “Another area of contribution is the study of System Z+. There have been no experimental analyses of applications based on System Z+. There have been some abstract studies of the calculus by X, Y, and Z, but their evaluations were limited to…” This will also help clearly call out the contribution.
- Possibly we should revise to clearly state the contributions of this paper, rather than leaving the reader to winkle them out of our prose. We can keep that in mind going forward.
Also, I think they should mostly be easier to read in black and white.
- The exception is Figure 6.
- I was left wondering why there’s both red and green there. I suspect that the green is supposed to show we’re looking at the underside.... Maybe try redrawing in black adn white and see if that helps?
- The second of the three figures has no legend at all, and the third has only one axis labeled. Shouldn’t all three axes be labeled on all three figures?