Test ascii #187

A9G-Data-Droid · 2021-03-03T23:28:01Z

I fleshed out the basExtendedChars.bas based on discussion in issue #186

This adds all the ASCII characters to the test DB to detect export failures Built and exported with the latest version 3.3.1

- Closes joyfullservice#186

joyfullservice · 2021-03-04T21:50:52Z

Could you adjust this PR to target the dev branch? That will give us a change to test and tweak things before rolling it over to master. Thanks!

joyfullservice · 2021-03-04T22:06:38Z

Nice! It allowed me to change the base branch on the PR... On to some testing...

joyfullservice · 2021-03-04T22:20:00Z

Good work! One note is that we need to include the UTF-8 BOM anytime the content of a file contains Unicode characters. For example, in the testing database, frmMain.bas includes Unicode checkmark characters, and needs the BOM to display correctly in some VCS programs (i.e. SourceTree)

A9G-Data-Droid · 2021-03-04T22:29:36Z

Good news. I am always writing the BOM now. I would rather have 1 format instead of guessing on both export and import. This is the default when you tell a stream to be UTF-8 and so let's keep that convention.

A9G-Data-Droid · 2021-03-04T23:00:35Z

I hate calling external DLLs from VBA because it looks obtuse. I have replaced the WideChar calls with a native function that is much smaller and easier to read. I also added a unit test to show that they work.

A9G-Data-Droid · 2021-03-05T00:05:25Z

So I did an export of my main project with the new version and I'm seeing almost 5 seconds off total export time. That's anecdotal and not a tightly controlled test but it's promising.

Version 3.3.1

Done. (30.41 seconds)

Version 3.3.3

Done. (25.03 seconds)

joyfullservice · 2021-03-05T14:43:34Z

Good news. I am always writing the BOM now. I would rather have 1 format instead of guessing on both export and import. This is the default when you tell a stream to be UTF-8 and so let's keep that convention.

You have some very valid considerations here... Up to this point I was taking the approach that the UTF-8 BOM should only be used where it was actually needed, where the file contained Unicode or Extended characters. While technically not required, the BOM does help the display in certain programs such as diffing tools.

My reasoning for selectively using the BOM was:

It is technically not required, per the Unicode spec.
It adds additional characters to the beginning of the file that may be confusing to the user, especially in the case of simple text files with otherwise understandable content.
My research seemed to point to avoiding the use of the BOM except for situations where it was truly needed.

However, based on your suggestion, and other conversations in #154, I can see some benefits to always using the UTF-8 BOM (as a signature) in the files used in this project.

It gives a single, consistent output format for all source files, as discussed in Inconsistent character encoding in exported files #154.
It is the default output when using the stream object to save UTF-8 content, keeping things simple.
It eliminates the complexity of needing to check for UTF-8 characters when saving files.
It helps in situations where Unicode content may be inserted into an existing source file (i.e. resolving source file conflicts) that might not otherwise have the BOM.
The vast majority of software used with version control is already designed to handle the BOM.

Based on this, and the feedback I have seen from others using the add-in, I am willing to go ahead and make this change to always include the BOM in output files. I don't do this lightly, knowing that this has the impact of changing a large number of export files in every project going forward, but it does seem like this is the best long-term direction to take with the UTF-8 standardization in this project.

hecon5 · 2021-03-05T15:09:40Z

Testing/Testing.accdb.src/reports/rptNavigationPaneGroups.bas

@@ -74,7 +74,7 @@ Begin Report
        Begin FormHeader
            KeepTogether = NotDefault
            Height =960
-            BackColor =15849926
+            BackColor =15064278


@A9G-Data-Droid, @joyfullservice: This is what I was experiencing on #183, have you noticed this? Any clues as to what the heck is happening?

Does this have to do with exporting in different versions of Access? I notice this when comparing exports from Access 2010 and newer versions... I haven't actually dug into comparing the color values.

I'm not sure! I noticed it when me and another dev were exporting the same file (export-import-export) on the same machine will sometimes do this, from their dev machine export - mine import - mine export will almost always cause this.
We verified same OS and Office Version, as well; so this makes it more fun.

Note that we use themes; I'm wondering if this might have something to do with it; but I notice it also when I export the Version Control as well, too.

This happens to me every time. Themes might be a clue. Most colors are not defined as a concrete number. Instead they are references to the system colors. Things like "Text Dark", "Text Light", "Background Dark Header" and "Background Light Header" come from Windows system settings and do not correspond to the same RGB value on every computer.

Oooh, I didn't ask the other dev if they're using the same Windows Theme; I'll check with them. I could also export, change my theme, import, export.

A9G-Data-Droid · 2021-03-05T16:34:47Z

Testing/Testing.accdb.src/vbe-project.json

  },
  "Items": {
    "Name": "VCS Testing",
    "Description": "For automated testing of Version Control",
    "FileName": "rel:Testing.accdb",
-    "HelpFile": "",
+    "HelpFile": "99672516",


@hecon5 this is even more mysterious than the colors changing. I get random numbers in this HelpFile property every time I export. I never change this value manually. It should be a path. I'm not sure where this number is being introduced.

You can see it from the VBE by going to the Project Properties:

SAME! It has repeatedly boggled me; I've simply reverted it and moved on with my life, but yeah; no idea why it does this. I'll even occasionally get kanji characters there, but cannot for the life of me figure it out.

- Don't read the full file just to look at the first 3 characters.

- remove unused code

hecon5 · 2021-03-05T18:57:52Z

Version Control.accda.src/modules/modEncoding.bas

    Perf.OperationStart "Unicode Check"
-    Set reg = New VBScript_RegExp_55.RegExp
-    With reg
+    With New VBScript_RegExp_55.RegExp


Because the regex is already included as a reference, you can also call this from With New RegExp; though this does guarantee you get the flavor you wanted, it may make it harder to move to the next one (lol, VBA's never getting updated) if it ever is.

Agreed. I've never seen this explicit version call out. I always use just RegExp.

There are some functionality differences between versions 1.0 and 5.5 (such as lazy quantifiers) but since this is the only one referenced in the project, I would think we are probably safe to just use RegExp.

- Don't manually add a BOM because it can result in two

A9G-Data-Droid · 2021-03-05T19:15:16Z

I apologize for changing all the files in the test database in this PR. I know it gets ugly. I had created a bug and then fixed it. This proves that the bug is fixed because we can export and import against this set of test DB files and it works.

A9G-Data-Droid · 2021-03-05T19:17:29Z

The bug was that if you manually insert the BOM during WriteBinaryFile you can end up with JSON files that have 2 BOMs. Then the JSON detection fails because it's looking for a { bracket and it sees the 2nd BOM and goes BOOM.

A9G-Data-Droid · 2021-03-05T19:21:56Z

I think I'm finally done messing with this PR. I performed a Full Export on my main project and the index.json shows that all the ExportDate changed and the Hash all stayed the same. This says to me this current working version is capable of exporting all my stuff the same way that it did before! All in 25 seconds.

Testing/Testing.accdb.src/modules/basExtendedChars.bas

joyfullservice · 2021-03-05T19:39:07Z

I think I'm finally done messing with this PR. I performed a Full Export on my main project and the index.json shows that all the ExportDate changed and the Hash all stayed the same. This says to me this current working version is capable of exporting all my stuff the same way that it did before! All in 25 seconds.

Excellent! Let me run it on a couple of my projects, including an ADP project, to make sure it is working fine on my end too. I am also curious to compare the performance logs after the change.

We need to find another source for sample extended characters.

joyfullservice · 2021-03-05T20:45:24Z

I did some performance testing to compare this PR with the dev branch using a couple large projects. Exports are pretty similar in performance, but building from source took a pretty drastic performance hit with Unicode file conversion going from less than a second to over 36 seconds on a large project.

My hunch is that this is due to the way we are reading and writing the files, but I am going to do some testing to see if we can bring this back down to close to what we had before.

A9G-Data-Droid · 2021-03-05T21:01:40Z

Those blocks labeled "Unicode Conversion" are areas I think that we could switch to happening line at a time instead of reading it all in to memory, performing some operation on the block, and then writing it out. It could all be done automatically by the stream like I do for the ReEncoder. I know that you have seen good performance improvement with the 128k blocks but I think that was the cause of the original corruption.

…s-vcs-integration into testASCII

A9G-Data-Droid · 2021-03-05T21:33:08Z

Ok, that turned out to be a simple switch.

My time is reduced further:
Done. (23.64 seconds)

A9G-Data-Droid · 2021-03-05T21:34:55Z

If nothing else, you now have a single place to optimize and it will affect 4 conversions that used to be separate. (This is what I get for saying I was done)

joyfullservice · 2021-03-05T21:38:04Z

If nothing else, you now have a single place to optimize and it will affect 4 conversions that used to be separate. (This is what I get for saying I was done)

Sounds good. I am just about finished with the optimizing... I think I will probably need to resolve some source conflicts when I get through, based on your latest commits... 😄 But hey, that's what this project is for, to allow us to resolve conflicts at the source level, right! 😄

joyfullservice · 2021-03-05T21:50:48Z

Nailed it! Getting performance similar or even slightly faster than the original version. Doing a few more cleanup tweaks...

This allows you to easily rename log files for comparison testing.

Since we are no longer doing anything fancy that requires these to be in `modEncoding`, I have returned them to `modFileAccess` where they fit better with the purpose of the module.

Adjustments to encoding functions to utilize stream approach.

Functions *only* used for testing should be in the unit testing module, not alongside production code.

This is used in several places.

Logging a few more functions, and expanding performance reports table.

joyfullservice · 2021-03-05T22:55:51Z

Okay, @A9G-Data-Droid - Can you try this out on your system? Let me know if you encounter any issues with the chunked reading this time.

A9G-Data-Droid

These used to be called elsewhere. If they are no longer used it should all be removed.

joyfullservice · 2021-03-05T23:10:28Z

These used to be called elsewhere. If they are no longer used it should all be removed.

Are you referring to the test functions I moved in 51f0b59? If so, I am happy to remove them. I just didn't want to make the assumption and delete your work. 😄

A9G-Data-Droid · 2021-03-05T23:11:32Z

Yes, I have verified that they aren't being used now that we have centralized all the encoding.

A9G-Data-Droid · 2021-03-05T23:13:29Z

Why is ConvertUcs2Utf8 the odd one out that doesn't match it's complimentary ConvertUtf8Ucs2 ?

Verified the output from the testing database, updating a few files that have changed. (Other than color values.)

joyfullservice · 2021-03-05T23:20:33Z

Why is ConvertUcs2Utf8 the odd one out that doesn't match it's complimentary ConvertUtf8Ucs2 ?

That one involves some special handling for mixed UTF-16 content where the extended/Unicode characters are encoded, but the rest of the file isn't. (I see this with ADP projects.) If there is a way to do that with the other ReEncode function, we can simplify this function as well. (But this can be an additional update after merging this PR.)

A9G-Data-Droid and others added 5 commits March 3, 2021 15:25

Add all ASCII characters

3fb96f7

This adds all the ASCII characters to the test DB to detect export failures Built and exported with the latest version 3.3.1

Reproduced joyfullservice#186 with a longer line

dece32e

New StringHasUnicode closes joyfullservice#186

13cfd5e

Export with BOM always false

aa1afe3

New Encoding method

9aea695

- Closes joyfullservice#186

joyfullservice changed the base branch from master to dev March 4, 2021 22:05

Replace DLL calls with native functions

4e8b977

Replace DLL calls with native functions

fa42593

hecon5 reviewed Mar 5, 2021

View reviewed changes

A9G-Data-Droid commented Mar 5, 2021

View reviewed changes

A9G-Data-Droid added 2 commits March 5, 2021 09:32

Always write UTF8 with BOM

b7f73d1

- Don't read the full file just to look at the first 3 characters.

Make FileHasBom look more like the original

7c53791

- remove unused code

hecon5 reviewed Mar 5, 2021

View reviewed changes

A9G-Data-Droid added 2 commits March 5, 2021 11:10

Use ReadFile and GetFileBytes

a473eb5

- Don't manually add a BOM because it can result in two

Export using latest version

d5f4c19

joyfullservice reviewed Mar 5, 2021

View reviewed changes

Testing/Testing.accdb.src/modules/basExtendedChars.bas Outdated Show resolved Hide resolved

Remove copyrighted content

043cc68

We need to find another source for sample extended characters.

A9G-Data-Droid added 3 commits March 5, 2021 13:21

ReUse existing converters

a3b8a28

Oops, use Unicode instead of autodetect

1bd7460

Merge branch 'testASCII' of https://github.com/A9G-Data-Droid/msacces…

9c8b5ac

…s-vcs-integration into testASCII

joyfullservice added 6 commits March 5, 2021 16:41

Ignore all log files

09e610e

This allows you to easily rename log files for comparison testing.

Move file reading and writing functions to correct module

247f3b4

Since we are no longer doing anything fancy that requires these to be in `modEncoding`, I have returned them to `modFileAccess` where they fit better with the purpose of the module.

Retool encoding functions

abc6c7c

Adjustments to encoding functions to utilize stream approach.

Remove testing code

51f0b59

Functions *only* used for testing should be in the unit testing module, not alongside production code.

Add constant for chunk size

75ced35

This is used in several places.

Improve performance logging

2c98da1

Logging a few more functions, and expanding performance reports table.

Increment build number

f98ce48

A9G-Data-Droid commented Mar 5, 2021

View reviewed changes

Update testing database

d16abbb

Verified the output from the testing database, updating a few files that have changed. (Other than color values.)

joyfullservice merged commit 801f39e into joyfullservice:dev Mar 5, 2021

joyfullservice mentioned this pull request May 4, 2021

Non-ASCII characters get distorted after ex- and import #217

Closed

joyfullservice mentioned this pull request Nov 2, 2021

Hashing Irregularities #279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test ascii #187

Test ascii #187

A9G-Data-Droid commented Mar 3, 2021

joyfullservice commented Mar 4, 2021

joyfullservice commented Mar 4, 2021

joyfullservice commented Mar 4, 2021

A9G-Data-Droid commented Mar 4, 2021 •

edited

Loading

A9G-Data-Droid commented Mar 4, 2021 •

edited

Loading

A9G-Data-Droid commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

hecon5 Mar 5, 2021

joyfullservice Mar 5, 2021

hecon5 Mar 5, 2021 •

edited

Loading

A9G-Data-Droid Mar 5, 2021

hecon5 Mar 5, 2021

A9G-Data-Droid Mar 5, 2021

hecon5 Mar 5, 2021

hecon5 Mar 5, 2021

A9G-Data-Droid Mar 5, 2021

joyfullservice Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021 •

edited

Loading

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021 •

edited

Loading

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

A9G-Data-Droid left a comment

joyfullservice commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

Test ascii #187

Test ascii #187

Conversation

A9G-Data-Droid commented Mar 3, 2021

joyfullservice commented Mar 4, 2021

joyfullservice commented Mar 4, 2021

joyfullservice commented Mar 4, 2021

A9G-Data-Droid commented Mar 4, 2021 • edited Loading

A9G-Data-Droid commented Mar 4, 2021 • edited Loading

A9G-Data-Droid commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hecon5 Mar 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021 • edited Loading

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021 • edited Loading

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

A9G-Data-Droid left a comment

Choose a reason for hiding this comment

joyfullservice commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

A9G-Data-Droid commented Mar 5, 2021

joyfullservice commented Mar 5, 2021

A9G-Data-Droid commented Mar 4, 2021 •

edited

Loading

A9G-Data-Droid commented Mar 4, 2021 •

edited

Loading

hecon5 Mar 5, 2021 •

edited

Loading

A9G-Data-Droid commented Mar 5, 2021 •

edited

Loading

A9G-Data-Droid commented Mar 5, 2021 •

edited

Loading