-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only global tasks are started and necessary ES indexes are missing #585
Only global tasks are started and necessary ES indexes are missing #585
Comments
I had a similar issue with sirmordred whereby it would complete one cycle of collection and enrichment but then not start another, it would only perform the sortinghat tasks on a loop. I would be interested in what the solution is here. |
I figured out the problem, basically a synchronization issue in sirmordred. debug log
The interesting parts are:
At first, the __execute_initial_load function tries to collect the projects from the configured
I added an extra log there. In task_manager.py the loop only runs when the stopper not set. As you can see from the logs
There's a 3 seconds delay between In a limited environment, as we created in ECS, this time-based synchronization will not work. My quick and dirty solution was to change the loop in So from this: while not self.stopper.is_set():
# we give 1 extra second to the stopper, so this loop does
# not finish before it is set.
time.sleep(1)
for task in self.tasks:
logger.debug('[%s] Tasks started: %s', self.backend_section, task)
try:
task.execute()
except Exception as ex:
logger.error("[%s] Exception in Task Manager %s", self.backend_section, ex, exc_info=True)
TasksManager.COMM_QUEUE.put(sys.exc_info())
raise
logger.debug('[%s] Tasks finished: %s', self.backend_section, task)
timer = self.__get_timer(self.backend_section)
if timer > 0 and self.config.get_conf()['general']['update']:
logger.info("[%s] sleeping for %s seconds ", self.backend_section, timer)
time.sleep(timer) To this: while True:
# we give 1 extra second to the stopper, so this loop does
# not finish before it is set.
time.sleep(1)
for task in self.tasks:
logger.debug('[%s] Tasks started: %s', self.backend_section, task)
try:
task.execute()
except Exception as ex:
logger.error("[%s] Exception in Task Manager %s", self.backend_section, ex, exc_info=True)
TasksManager.COMM_QUEUE.put(sys.exc_info())
raise
logger.debug('[%s] Tasks finished: %s', self.backend_section, task)
timer = self.__get_timer(self.backend_section)
if timer > 0 and self.config.get_conf()['general']['update']:
logger.info("[%s] sleeping for %s seconds ", self.backend_section, timer)
time.sleep(timer)
if self.stopper.is_set():
break Now all the backend tasks are started too because the projects are collected correctly. |
I never liked how this was implemented. It's a bit messy the way threads and tasks are handled so it's prone to bugs like the one you are mentioning. @ncsibra-lab49 I think your code fixes this problem but I'm not sure if it could lead to other race conditions. Any insight about it? |
I don't think it leads to a race condition because the shared objects (backend_tasks, global_tasks, stopper, etc.) are not modified in the child or parent thread. The real solution is to avoid time-based synchronization. The other option is to control the number of runs in the parent process, so remove the loop from |
When mordred starts, it loads the dashboards to Kibana and next, it reads the list of projects to analyze. Due to a race condition, the thread that reads the list of projects could not even start, so data backends won't have any input to work with. This commit fixes the problem adding isolated code that will run the initial tasks. Once these tasks are finished, the main loop will start the task manager. Fixes chaoss/grimoirelab#585 Signed-off-by: Santiago Dueñas <[email protected]>
When mordred starts, it loads the dashboards to Kibana and next, it reads the list of projects to analyze. Due to a race condition, the thread that reads the list of projects could not even start, so data backends won't have any input to work with. This commit fixes the problem adding isolated code that will run the initial tasks. Once these tasks are finished, the main loop will start the task manager. Fixes chaoss/grimoirelab#585 Signed-off-by: Santiago Dueñas <[email protected]>
When mordred starts, it loads the dashboards to Kibana and next, it reads the list of projects to analyze. Due to a race condition, the thread that reads the list of projects could not even start, so data backends won't have any input to work with. This commit fixes the problem adding isolated code that will run the initial tasks. Once these tasks are finished, the main loop will start the task manager. Fixes chaoss/grimoirelab#585 Signed-off-by: Santiago Dueñas <[email protected]>
When mordred starts, it loads the dashboards to Kibana and next, it reads the list of projects to analyze. Due to a race condition, the thread that reads the list of projects could not even start, so data backends won't have any input to work with. This commit fixes the problem adding isolated code that will run the initial tasks. Once these tasks are finished, the main loop will start the task manager. Fixes chaoss/grimoirelab#585 Signed-off-by: Santiago Dueñas <[email protected]>
This issue is fixed now. The new release of GrimoireLab will include the fix. |
We deployed GrimoireLab components to AWS ECS, mostly based on the docker-compose example using the latest image versions.
Sortinghat, sortinghat-worker, sirmordred and nginx are separate services.
They're using managed AWS resources, MariaDB 10.6.5, Elasticsearch 6.8 and Redis 7.0.
Config files (
${...}
parts are replaced from env at container start):setup.cfg
projects.json structure
The real
projects.json
contains around 30.000 lines, overall ~4000 repo in every category.At first, it worked correctly, collected 8,533,696 documents in elasticsearch and created these indexes/aliases:
indices
aliases
The first problem is that not all the indexes are created from the
setup.cfg
e.g.: github_enriched, github-pull_enriched, github_event_enriched, etc.This could be because it wasn't able to finish the whole enrichment process.
The bigger issue is, that now it does not start any of the git/github tasks, only the global tasks:
all.log
Enabled debug logging too, but does not show any kind of error or other reason why the git/github etc. tasks are not running.
But looks like it knows that some backend task should be started, but only start global tasks.
debug log snippet
Tried to restart the container multiple times, but does not help.
Any idea what's the issue here?
The text was updated successfully, but these errors were encountered: