Replies: 2 comments 1 reply
-
I'd be interested in this as well, because in DVC itself there isn't really a good solution, and it's in my opinion the main user experience problem with DVC (as opposed to a technical implementation or design concern). |
Beta Was this translation helpful? Give feedback.
-
Hi, @RomeoV and @gwerbin! The main challenge with diffs in Dud is that creating useful diffs is extremely context dependent--especially for binary data. For example, diffing images requires a very different approach than diffing JSON files, or Parquet files, or GIS data, etc. Add directories of binary files to the mix and you have another variant to contend with: Which files in this directory were added? Which were changed (and how)? Which were deleted? Git provides diffs because diffing text files can use a (mostly) universal algorithm. FWIW, I've documented a diff feature in #111, but it would be a very basic algorithm, and it would only compare the currently checked-out artifacts (files/directories) with their checksums in their stage file(s). One of the other challenges here is that Dud totally relies on Git (or another SCM tool) for retrieving historical versions. (To get a past version of a stage, you'd need to first checkout a past version of the stage file from Git, then checkout the stage with Dud.) So for Dud to provide a high-level diff command, it would have to call out to Git, and I'm reluctant to do this. Dud is purposefully decoupled from SCM tools to keep things simple--both in implementation and user experience. Once we start tangling Dud and Git, the simplicity of Dud UX suffers greatly. All that said, I'd be happy to hear any ways you'd think Dud could provide a diff feature that would simplify your workflows! Please use #111 for further discussion. |
Beta Was this translation helpful? Give feedback.
-
Hello, thanks for this tool!
Is there a way to load multiple versions of the same file for some kind of comparison? Say, I want to compute some statistic of the data for two sets of data and plot the comparison. Right now I guess the way would be to check out one "revision", compute and store the statistics, then check out another "revision" etc.
Do you have any thoughts if "diffing" even makes sense in this context, and whether it belongs in the scope of this project?
Beta Was this translation helpful? Give feedback.
All reactions