Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Mostly) Bug Fixes for writing Strings #35

Merged
merged 6 commits into from
Jun 18, 2024
Merged

(Mostly) Bug Fixes for writing Strings #35

merged 6 commits into from
Jun 18, 2024

Conversation

joshday
Copy link
Collaborator

@joshday joshday commented Jun 14, 2024

The main thing here is fixing some string write bugs (places where length should've been ncodeunits).

There's a few other minor things such as:

  • Support for writing Char.
  • Strings are now padded by '\0' instead of ' '. I think this is the "correct" way but now I can't actually find the resource I used that said this. I'm fine to either keep or drop this change.
  • Changed tests to not depend on a.dbf file in the test directory. Instead we write a DataFrame to dbf so that we know right away what the data is supposed to look like.
  • DBFTables.write(path::String, table) now returns the path instead of bytes written.

@visr
Copy link
Collaborator

visr commented Jun 18, 2024

Strings are now padded by '\0' instead of ' '. I think this is the "correct" way but now I can't actually find the resource I used that said this.

I checked the references in the README: https://github.com/JuliaData/DBFTables.jl?tab=readme-ov-file#format-description-resources.
In https://www.independent-software.com/dbase-dbf-dbt-file-format.html it mentions that data is padded with spaces, but field names are padded with NULL bytes. So I think the current padding is correct. Did you run into any related issues with this?

I'm in favor of throwing an error instead of warning and truncating strings that are too long, but technically this is a breaking change. If you think it's harmless, we should at least clearly communicate the behavior change in this PR.

Otherwise, this looks good to me, nice fixes and test improvements.

@joshday
Copy link
Collaborator Author

joshday commented Jun 18, 2024

In https://www.independent-software.com/dbase-dbf-dbt-file-format.html it mentions that data is padded with spaces, but field names are padded with NULL bytes. So I think the current padding is correct. Did you run into any related issues with this?

No issues, but I may have been skimming over resources too quickly and confused the field name padding with the string padding. I did run into some data in the wild that was padded with '\0', but that's easy to handle on read. I've reverted this.

I'm in favor of throwing an error instead of warning and truncating strings that are too long, but technically this is a breaking change. If you think it's harmless, we should at least clearly communicate the behavior change in this PR.

I think it's harmless/the most correct thing to do, but I'm going to add a "DBF Quirks" section to the README since this isn't the only weird thing users might want to be aware of.

@joshday joshday merged commit e5d4c37 into main Jun 18, 2024
3 checks passed
@joshday joshday deleted the jd/stringfixes branch June 18, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants