Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characters "\N" stripped from table data exported "Tab Delimited" #251

Closed
A9G-Data-Droid opened this issue Jul 15, 2021 · 7 comments
Closed

Comments

@A9G-Data-Droid
Copy link
Contributor

I have been using table export "Tab Delimited" to keep track of some front end settings I like to ship with my front ends.

I noticed in the most recent test with version 3.4.13 that my data got mangled. Specifically I have a field value that contains a path which includes "\N". These two characters were stripped from the restored table data.

It looks like this is part of ExportTableDataAsTDF
MultiReplace(Nz(fld.Value), "\", "\\", vbCrLf, "\r\n", vbCr, "\r", vbLf, "\n", vbTab, "\t")

and ImportTableDataTDF
MultiReplace(CStr(varLine(intCol)), "\\", "\", "\r\n", vbCrLf, "\r", vbCr, "\n", vbLf, "\t", vbTab)

EXAMPLE:
This
\\PathThatContains\Nwillbetruncated\example.txt

Would become this
\\PathThatContainswillbetruncated\example.txt

@joyfullservice
Copy link
Owner

Hmm... That's a tricky problem to solve. The \n code is very likely to be included in legitimate content, as you found in your settings. Of course you could export to XML instead, but it doesn't have the simple readability of a CSV. Any thoughts based on this discussion?

@A9G-Data-Droid
Copy link
Contributor Author

The solution involves escaping each symbol that is going to be modified by the replace, before export, and then de-escaping after import. I know that this replace code is itself an attempt to escape tab and EOL markers. Now you have to escape your escaped output because what you are outputting is valid text that could exist in the value string.

So if the field value contains "\r\n", "\r", "\n", or "\t" those need to be escaped. To avoid an endless loop of escaping we need to use an invalid character as the escape character so that a user can't accidentally use an escape character in their table values.

According to Microsoft Access Data Types

"CHAR, LONGVARCHAR, and VARCHAR | A character string literal can contain any ANSI character (1-255 decimal)"

The only thing I can think of is to use a UNICODE character as the escape char but we know that could get us in to trouble with internationalization.

@hecon5
Copy link
Contributor

hecon5 commented Jul 15, 2021

So, what if we use an emoji for fire exit? 👨‍🚒 or give up a d use the white flag of surrender to give in to escapes?🏳

@A9G-Data-Droid
Copy link
Contributor Author

Or we could choose an unlikely candidate like an unprintable character. If someone is putting unprintable characters in to their database I would wonder why.

@joyfullservice
Copy link
Owner

joyfullservice commented Jul 15, 2021

We could also wrap them in curly braces like this: My text {\n} on the next line. It would be pretty obvious in the code, and much less likely to collide with legitimate content. The drawback is that it is non-standard...

joyfullservice added a commit that referenced this issue Jul 15, 2021
Using curly braces helps to avoid issues where legitimate data may include `\n`, such as in the case of a path name. See #251
@joyfullservice
Copy link
Owner

Unless anyone else has a better suggestion, the curly braces should solve the issue of avoiding the data collisions while still being readable in the source file.

@joyfullservice joyfullservice added the pending resolved Possibly resolved, needs testing or confirmation label Jul 15, 2021
joyfullservice added a commit that referenced this issue Jul 16, 2021
Using interim substitution character (Chr(26)) to ensure that escaped codes do not collide with existing data. Fixes #251
@joyfullservice
Copy link
Owner

Thinking about this some more this morning, it occurred to me that we can just use an interim substitute character for the backslash in the sequence of replacements to completely resolve this issue. A slash in the original content will be replaced with a double-slash, and when restoring, the double-slash will be restored back to a single slash. Because we are using the interim replacement of Chr$(26) (ASCII substitution character), it will no longer collide with existing data. (Unless someone decided to use the substitution character in the content, which seems unlikely.)

    FormatStringForTDF = MultiReplace(strValue, _
        "\", Chr$(26), _
        vbCrLf, "\r\n", _
        vbCr, "\r", _
        vbLf, "\n", _
        vbTab, "\t", _
        Chr$(26), "\\")

    FormatStringFromTDF = MultiReplace(strTDFValue, _
        "\\", Chr$(26), _
        "\r\n", vbCrLf, _
        "\r", vbCr, _
        "\n", vbLf, _
        "\t", vbTab, _
        Chr$(26), "\")

This achieves the ideal solution of human-readable standard, recognizable codes in the text, but not causing problems if you happen to use a path like c:\my\new\folder where \n is already embedded in the string. It will look like this in TDF: c:\\my\\new\\folder and restore to the original value when loading back into the table.

I am pretty confident that this resolves the issue, but feel free to reopen if you encounter any problems!

@joyfullservice joyfullservice removed the pending resolved Possibly resolved, needs testing or confirmation label Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants