Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate files #158

Closed
jyn514 opened this issue Feb 13, 2020 · 4 comments
Closed

Deduplicate files #158

jyn514 opened this issue Feb 13, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@jyn514
Copy link
Collaborator

jyn514 commented Feb 13, 2020

Currently, calling Files.add(name, file) twice in a row will duplicate the entire file, wasting memory. It would be nice to deduplicate by only storing the file once. This could be done by storing a HashMap<name, FileID> which returns the id if the file already exists.

This would mean that users can no longer add two different files with the same name, but I think that's fine as long as it's documented.

@jyn514 jyn514 added the enhancement New feature or request label Feb 13, 2020
@brendanzab
Copy link
Owner

I think the original intention was for users to keep track of the FileId themselves, and use that to refer to and update files, but I understand that might be a bit confusing for people. Yet another reason why I think that maintaining our own file database is a bad idea… I think we should support single file reporting easily out of the box, and have an example of how to extend this to work with multiple files, but that's it. That way people can customise things to their use case 🤔

@jyn514
Copy link
Collaborator Author

jyn514 commented Feb 19, 2020

Well, I could be including the same file more than once, if this wasn't part of codespan I'd have to implement it myself. e.g.

main.c:

#include <stdio.h>
#include <limits.h>

stdio.h: #include <limits.h>

@icsaszar
Copy link

Now that the Files trait exists as of #164 users should be implement their own solutions and decide if they need deduplication or not, but I agree that supporting single file use cases would be very nice (especially for codespan-lsp since there a simple Vec for a file database wouldn't really cut it).

A bit unrelated, but I think it would be better if it was in the codespan crate, so if I only use codespan and codespan-lsp I don't need codespan-reporting. I'm working on a language server with tree-sitter and I need a way to convert points to byte positions for Tree::edit().

@brendanzab
Copy link
Owner

I'm going to say that this should be handled in a custom implementation of codespan_reporting::Files. I've tried to make implementing these relatively painless, but if you think I can do better, let me know!

For users of codespan, you can use the Files::update function to update files in-place. You will have to keep a hold of the FileId though - the file name is used purely for display purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants