Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove files after build #5680

Merged
merged 31 commits into from
Jun 7, 2019
Merged

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented May 9, 2019

Needs #5678

I need to put this under a feature flag and test locally. Tests are passing for now :)

Closes #5646

@stsewd stsewd added the PR: work in progress Pull request is not ready for full review label May 9, 2019
humitos
humitos previously requested changes May 9, 2019
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

We need to decide how to handle this on Corporate before merging and deploying this PR, though.

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
readthedocs/core/utils/__init__.py Outdated Show resolved Hide resolved
@stsewd
Copy link
Member Author

stsewd commented May 9, 2019

Notes:

But I think we can just don't cache anything since we are going to remove it

  • We face race conditions when executing more than one build in the same version.

This happens now, but isn't noticeable because we don't remove the whole checkout, now we do (and
all the build servers, not just the current one).

  • Independently if the build success or fails, I'm removing the files anyway.

I'm removing the files from all the build servers, if we don't want to be so aggressive, we can just remove the files from the current one.

@stsewd
Copy link
Member Author

stsewd commented May 9, 2019

Also, a general solution for the race conditions I think is having a way to query the build queue of the project, that way we can limit to build one version at the time (if the version is the same) and get rid of the lock when cloning a repo.

@humitos
Copy link
Member

humitos commented May 13, 2019

I'm removing the files from all the build servers, if we don't want to be so aggressive, we can just remove the files from the current one.

If two builds (same version) are triggered almost at the same time on different builders, the first one that ends will remove the files from the other one which is still building. Won't this be a problem?

@stsewd
Copy link
Member Author

stsewd commented May 13, 2019

Removing only from the same builder will have the same problems with race conditions #5680 (comment)

for dir_ in ('checkouts', 'envs', 'conda')
]
for del_dir in del_dirs:
broadcast(type='build', task=remove_dirs, args=[(del_dir,)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definitely shouldn't be a broadcast. We only do locking per-server, so doing this will definitely break things.

@stsewd
Copy link
Member Author

stsewd commented May 13, 2019

We decided to extend the lock, right now we release the lock after the clone step. Also, we can rely on the db for this. I'll see if this can fix in a separated PR, because it's useful for the current code too :)

@stsewd
Copy link
Member Author

stsewd commented May 14, 2019

Just a note for myself: We need to remove the checkout after we do the versions sync when a new branch is created. This is another task, separated from update_docs.

@stsewd
Copy link
Member Author

stsewd commented May 14, 2019

I'm not adding the clean_build task in the old webhooks because they are going to be removed #5479. I could add them if you want.

os.path.join(version.project.doc_path, dir_, version.slug)
for dir_ in ('checkouts', 'envs', 'conda')
]
# TODO: the max_lock_age can be lower than the default
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, default is fine. Because a build can take till x miinutes to timeout. If this task is triggered, it would start to delete everything and the build will fail, when maybe it only needed some more minutes.

Artifacts are copied to rtd_builds after build
@stsewd
Copy link
Member Author

stsewd commented May 14, 2019

So, I was able to reduce the percent of race conditions. The trick is keep using locks: one lock per task that modify files (build, clean build, sync versions).

Right now the biggest problem is that build_docs acquire 3 locks in the whole task. That lead to spaces in time where another task could take the lock and start modifying files that the other task expects to be as they were before releasing the lock.

I'm reducing the number of locks in build_docs to 2 in #5695. We can have only one after #5672

Till now, not changes are required in the corporate site.

@stsewd
Copy link
Member Author

stsewd commented May 14, 2019

Note for myself: I still need to add a lock in the syn versions step, but it would be more easy after #5695

@stsewd stsewd requested a review from a team May 14, 2019 23:43
args=(version.pk,),
immutable=True,
),
clean_build_task.signature(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a doubt here.

I understand that when we receive a webhook this task is called to sync the versions. So, if the default_version is triggered for a build as well, it could happen that this clean_build_task removes the files while the build is still building.

In other words, since calling sync_versions may trigger a build for stable version, I think that there is one case where this could break. Consider this scenario:

  1. the project default_version is set as stable
  2. we receive a webhook that calls sync_versions and it triggers a build for stable
  3. the build for the stable version starts in the same builder
  4. clean_build_task is called with stable version because it was the default_version of the project
  5. files for stable version are removed from the builder while it was building

I understand that if we merge #5695 as well (lock the whole build), step number 5) won't happen because clean_build_task tries to acquire the lock at this line: https://github.com/rtfd/readthedocs.org/pull/5680/files/0a6445edb9621256f28f673e1a9ae8d4a3831704..b613bbb98f99fb507fbe84826d8fa90de2db4358#diff-b9399e1d3499066c5564f98a620e8881R1434

Please, double check this and confirm if I'm right or wrong. Although, it seems to be a very improbable case :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we should only be cleaning up the files for the specific version we're building?

This is another vote for single build per builder in my mind. We're wasting a lot of time worrying about this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm acquiring a lock for the clean build task in

https://github.com/rtfd/readthedocs.org/pull/5680/files#diff-b9399e1d3499066c5564f98a620e8881R1434

And skipping the deletion if there is another task with a lock. That avoids the problem of deleting stuff if we have another task expecting files to be there.

I only need to update the code acquire a lock in the sync_versions task after #5695, that way clean_build will skip the deletion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we should only be cleaning up the files for the specific version we're building?

We are doing that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And skipping the deletion if there is another task with a lock. That avoids the problem of deleting stuff if we have another task expecting files to be there.

Since we are not locking the whole build, this can still happen in between the locks, right?

@stsewd
Copy link
Member Author

stsewd commented May 22, 2019

  1. we are confident that clean_build_task is going to be executed in the same server where the sync_repository_task/update_docs_task was executed

Good question, honestly I'm not sure. I should take a look the rest of tasks how they handle that or see the celery docs.

  1. we are aware that because we are not locking the whole build, we clean_build_task could delete the files in between the build steps (while outside the lock)

Yes, but only in one case (between the clone step and the build step there is a release of the block for some seconds).

That is blocked because of #5672

We used to have like 3 lock releases here :)

  1. we are triggering an extra task (clean_build_task) on all the builds which will be a no-op task sometimes (this will be always true for corporate site) and I'm not sure if that will have any impact

I don't understand this one. We are using a feature flag, so only projects with the flag will trigger the task.

step = UpdateDocsTaskStep(task=self)
return step.run(project_id, *args, **kwargs)
finally:
clean_build_task(kwargs.get('version_pk'))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to make sure we always have a version_pk we need #5765

@stsewd
Copy link
Member Author

stsewd commented Jun 4, 2019

So, I re implemented this, so the process is executed in the same server, and divide it in small PRs (#5765, #5748).

Now I'm using a try/finally block to clean the build files.

I need to test this more locally.

@ericholscher
Copy link
Member

@stsewd think merging the other PR for locks caused some conflicts here. Sorry about that.

@stsewd
Copy link
Member Author

stsewd commented Jun 5, 2019

@ericholscher that was expected actually :)

@stsewd
Copy link
Member Author

stsewd commented Jun 5, 2019

Fix for tests on master is in #5162

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks way simpler that what we had before :)

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much better implementation. ❤️ I'm 👍 to ship it, especially since it's behind a feature flag.

readthedocs/projects/tasks.py Outdated Show resolved Hide resolved
@stsewd stsewd merged commit e67937e into readthedocs:master Jun 7, 2019
@stsewd stsewd deleted the remove-files-after-build branch June 7, 2019 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove build artifacts and repo storage after build
3 participants