Skip to content

Conversation

@MisterDA
Copy link
Contributor

The first step is to remove the dup file in the log tailing to be sure that when/if we want to move the log file, there are no open file descriptors to it. This can be done by switching to Readonly when the log is finished.
The second step is to let the store decide the location of the log file. There's a bit of a chicken-and-egg problem here: the store was responsible for moving the log file, but the Db_store module creates it a sets its location, and the store doesn't have access to the location. By letting the store set the log file location, it becomes able to set it to a location independent of the results directories, or to the original location and move it if there's no problem with it.

cc @patricoferris

- don't duplicate the log file file descriptor but switch to reading
  the file when the log finishes;
- replace the `Finished state with the `Readonly state.
Checking the result for the Docker backend requires asynchronous calls
to Docker.
For the currently available stores, the original location is retained.
Copy link
Contributor

@talex5 talex5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how this works. I would expect that simply reopening the same log path (but read-only) would have the same problem.

I would expect get_build to:

  1. Create a new Open log file.
  2. Perform the build.
  3. Close the file and change the log state to Finalising.
  4. Snapshot the result.
  5. Set the log state to Readonly with the final location, and wake any readers waiting for the end of the Finalising state.

Or possibly the Finalising state could simply be Busy. That would also work for creating the log, where we have a similar problem (currently solved messily by using set_log to resolve an Lwt promise).

dst (Bytes.sub_string buf 0 n);
aux (i + avail)
| `Readonly path -> readonly_tail path buf i
| _ -> Lwt_result.return ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| _ -> Lwt_result.return ()
| `Empty -> Lwt_result.return ()

| `Open (fd, cond) ->
t.state <- `Finished;
| `Open (path, fd, cond) ->
t.state <- `Readonly path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this has the same problem: a reader will open the path in the temporary clone, preventing it from being removed (unless a read-only FD is OK somehow?).

Instead, I think this needs to transition to a new Finalising of unit Lwt.t state or similar, where the promise resolves once the log is ready for reading in its new location.

(** [delete t id] removes [id] from the store, if present. *)

val result : t -> id -> string option
val result : t -> id -> string option Lwt.t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be async? It seems useful to have an atomic way of finding out whether something exists in the store. Otherwise, how do we know the result is still valid by the time it has returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the Docker backend I'm making a call to Docker to check whether the result as a Docker image exists. I use functions from Os which calls Lwt_process.exec and they're asynchronous. I could call Unix.create_process instead and wait for the termination to fix the TOCTOU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move that to a separate PR then. It doesn't have anything to do with the build log problem.

val result : t -> id -> string option Lwt.t
(** [result t id] is the path of the build result for [id], if present. *)

val log_file : t -> id -> string Lwt.t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want a different log location for each store?

@patricoferris
Copy link
Contributor

patricoferris commented Nov 24, 2021

Thanks @MisterDA :))

I'm not sure if this fixes the log problem I was describing before (the moving log files bit is useful for patricoferris#5 however). I mean having log files outside of the build directory should in theory solve the problem, I'm just not sure if there is a reason the logs currently get saved in the build directory.

The problem I was seeing is Db_store.get_build calls Raw.build which in this case was Zfs. If the build fails then it calls

Zfs.destroy t ds `Only >>= fun () ->

but as far as I can tell at that point the log file is still open so trying to destroy the dataset gets a resource busy error (only on macOS...)

@MisterDA MisterDA marked this pull request as draft November 30, 2021 14:27
@MisterDA MisterDA closed this Jan 5, 2022
@MisterDA MisterDA deleted the build-log branch January 5, 2022 14:46
@MisterDA MisterDA mentioned this pull request Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants