Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watch is posting events that aren't specified in a configuration #1079

Closed
andreleblanc11 opened this issue May 30, 2024 · 4 comments
Closed
Labels
bug Something isn't working Discussion_Needed developers should discuss this issue. Priority 3 - Important worrisome impact... should study carefully v3only Only affects v3 branches.

Comments

@andreleblanc11
Copy link
Member

Scenario

  • Start up a basic watch that has fileEvents create in the configuration.
  • Do a sr3 show watch/my-config | grep fileEvents and it should return
sarra@my-server~/.cache/sr3/log$ sr3 show watch/watch-dir_f03 | grep fileEvents
'fileEvents': {'create'}, 
  • Start up the watch and dump a large data set with multiple subdirectories inside the path specified in the watch configuration
# From the watches' `path`
cp -r /path/to/my/data ./
  • When you check the logs of the watch, it should show mkdir and modify events being posted downstream, even though these weren't specified inside of the configuration.
# From the logs of the watch
2024-05-30 17:00:38,702 [DEBUG] sarracenia.flowcb.gather.file on_add on_add mkdir /home/sarra/sarra_devdocroot/bulletins_to_post/rootdir None
2024-05-30 17:00:38,702 [DEBUG] sarracenia.flowcb.gather.file on_add on_add mkdir /home/sarra/sarra_devdocroot/bulletins_to_post/rootdir/dir1 None
2024-05-30 17:00:38,702 [DEBUG] sarracenia.flowcb.gather.file on_add on_add mkdir /home/sarra/sarra_devdocroot/bulletins_to_post/rootdir/dir1/subdir1 None
...
2024-05-30 17:00:43,170 [INFO] sarracenia.flowcb.log after_post posted to exchange: my-exchange topic: v03.post a directory with baseUrl: file://sarra@localhost/ relPath: bulletins rename: /
2024-05-30 17:00:43,170 [INFO] sarracenia.flowcb.log after_post posted to exchange: my-exchange topic: v03.post.rootdir a directory with baseUrl: file://sarra@localhost/ relPath: bulletins/dir1 rename: /dir1
2024-05-30 17:00:43,170 [INFO] sarracenia.flowcb.log after_post posted to exchange: my-exchange topic: v03.post.rootdir.dir1 a directory with baseUrl: file://sarra@localhost/ relPath: bulletins/dir1/subdir1 rename: /dir1/subdir1

Analysis

  • I searched inside of the watch code sarracenia/flowcb/gather/file.py and noticed that there is a method called on_created which logs when the mkdir event is called. However, I wasn't able to figure out where this is being called from the code.
  • My best educated guess is that in the walk method we go through the entire tree recursively and when we get to post1file we don't check for events when determining if the file is a directory or not. FYI This is probably wrong.

# path is a file
elif os.path.isfile(path) or os.path.isdir(path):
messages.extend(self.post_file(path, lstat))

@andreleblanc11 andreleblanc11 added bug Something isn't working Priority 3 - Important worrisome impact... should study carefully v3only Only affects v3 branches. Discussion_Needed developers should discuss this issue. labels May 30, 2024
@andreleblanc11
Copy link
Member Author

Adding discussion needed if ever it isn't addressed before the dev meeting next week

@petersilva
Copy link
Contributor

petersilva commented May 31, 2024

flowcb/gather/file.py Algorithm

1st call to gather():

  • gather() is the main entry point.
  • the first time gather() is called, it does something called priming which involves traversing the tree once to set watches on each directory that configured. the routine for that is watch_dir()
  • watch_dir() is for the root of each tree to be watched. We create an Observer() (from the watchdog python package)
  • the watchdog.Observer class will generate events, and that's what the on_whatever routines are for. We add new_events to a queue with those routines.
  • then we add directories to the observer with calls to walk_priming()
  • Once the watches are set, events will be generated in the background asynchronously.
  • after that first pass, the instance marks itself as primed ...

subsequent calls to gather()

  • the on_whatever routines have been called in the background accumulating self.new_events.
  • After priming, subsequent calls to gather result in calling wakeup()
  • The watchdog.Observer has an internal queue, but it has limited capacity, and if you don't pick up the events quickly enough, it can lose them. That's why the on_whatever handlers put them on a python self.new_events OrderedDict (which should not drop anything.)
  • in wakeup(), we go through all the new events that have shown up since the last call, For each event, we call process_event()
  • process_event() interprets a watchdog event, and triggers creation of sarracenia messages to reflect them.

so what I hope happens with a mkdir?

  • observer notices the activity and calls on_created() with an event that has is_directory set.
  • on_created() should then add an 'mkdir' event to the new_events queue.
  • next time gather() is called, it calls wakeup()
  • wakeup() should call process_event()
  • it should (around line 476) in that routine then call post1file ... with is_directory=True... ONLY if 'mkdir' is in self.o.fileEvents. which it isn't so it should not be called.

oh... that's stupid... I get it...


      if event == 'mkdir' and 'mkdir' in self.o.fileEvents:
            return (True, self.post1file(src, lstat, is_directory=True))
        elif self.o.create_modify:
            return (True, self.post1file(src, lstat))
        return (True, [])

with the thing above... it calls the file version if mkdir is not in events... instead of ignoring it.
so it should really be:

      if event == 'mkdir' :
           if 'mkdir' in self.o.fileEvents:
                return (True, self.post1file(src, lstat, is_directory=True))
           return(True,[])   # ignore mkdir if not in fileEvents
      elif self.o.create_modify:
          return (True, self.post1file(src, lstat))
      return (True, [])

Can you try making that change, and see if it helps?

@andreleblanc11
Copy link
Member Author

That looks to have fixed it! mkdir events are still noticed in the logs by the watchdog. However, the new directories don't get posted downstream anymore 😄

@andreleblanc11 andreleblanc11 changed the title Watch is noticing events that aren't specified in a configuration Watch is posting events that aren't specified in a configuration May 31, 2024
@petersilva
Copy link
Contributor

OK make a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Discussion_Needed developers should discuss this issue. Priority 3 - Important worrisome impact... should study carefully v3only Only affects v3 branches.
Projects
None yet
Development

No branches or pull requests

2 participants