-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test ascii #187
Test ascii #187
Conversation
This adds all the ASCII characters to the test DB to detect export failures Built and exported with the latest version 3.3.1
- Closes joyfullservice#186
Could you adjust this PR to target the |
Nice! It allowed me to change the base branch on the PR... On to some testing... |
Good work! One note is that we need to include the UTF-8 BOM anytime the content of a file contains Unicode characters. For example, in the testing database, |
Good news. I am always writing the BOM now. I would rather have 1 format instead of guessing on both export and import. This is the default when you tell a stream to be UTF-8 and so let's keep that convention. |
I hate calling external DLLs from VBA because it looks obtuse. I have replaced the WideChar calls with a native function that is much smaller and easier to read. I also added a unit test to show that they work. |
So I did an export of my main project with the new version and I'm seeing almost 5 seconds off total export time. That's anecdotal and not a tightly controlled test but it's promising. Version 3.3.1
Version 3.3.3
|
You have some very valid considerations here... Up to this point I was taking the approach that the UTF-8 BOM should only be used where it was actually needed, where the file contained Unicode or Extended characters. While technically not required, the BOM does help the display in certain programs such as diffing tools. My reasoning for selectively using the BOM was:
However, based on your suggestion, and other conversations in #154, I can see some benefits to always using the UTF-8 BOM (as a signature) in the files used in this project.
Based on this, and the feedback I have seen from others using the add-in, I am willing to go ahead and make this change to always include the BOM in output files. I don't do this lightly, knowing that this has the impact of changing a large number of export files in every project going forward, but it does seem like this is the best long-term direction to take with the UTF-8 standardization in this project. |
@@ -74,7 +74,7 @@ Begin Report | |||
Begin FormHeader | |||
KeepTogether = NotDefault | |||
Height =960 | |||
BackColor =15849926 | |||
BackColor =15064278 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@A9G-Data-Droid, @joyfullservice: This is what I was experiencing on #183, have you noticed this? Any clues as to what the heck is happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to do with exporting in different versions of Access? I notice this when comparing exports from Access 2010 and newer versions... I haven't actually dug into comparing the color values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure! I noticed it when me and another dev were exporting the same file (export-import-export) on the same machine will sometimes do this, from their dev machine export - mine import - mine export will almost always cause this.
We verified same OS and Office Version, as well; so this makes it more fun.
Note that we use themes; I'm wondering if this might have something to do with it; but I notice it also when I export the Version Control as well, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happens to me every time. Themes might be a clue. Most colors are not defined as a concrete number. Instead they are references to the system colors. Things like "Text Dark", "Text Light", "Background Dark Header" and "Background Light Header" come from Windows system settings and do not correspond to the same RGB value on every computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, I didn't ask the other dev if they're using the same Windows Theme; I'll check with them. I could also export, change my theme, import, export.
}, | ||
"Items": { | ||
"Name": "VCS Testing", | ||
"Description": "For automated testing of Version Control", | ||
"FileName": "rel:Testing.accdb", | ||
"HelpFile": "", | ||
"HelpFile": "99672516", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hecon5 this is even more mysterious than the colors changing. I get random numbers in this HelpFile property every time I export. I never change this value manually. It should be a path. I'm not sure where this number is being introduced.
You can see it from the VBE by going to the Project Properties:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SAME! It has repeatedly boggled me; I've simply reverted it and moved on with my life, but yeah; no idea why it does this. I'll even occasionally get kanji characters there, but cannot for the life of me figure it out.
- Don't read the full file just to look at the first 3 characters.
- remove unused code
Perf.OperationStart "Unicode Check" | ||
Set reg = New VBScript_RegExp_55.RegExp | ||
With reg | ||
With New VBScript_RegExp_55.RegExp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the regex is already included as a reference, you can also call this from With New RegExp
; though this does guarantee you get the flavor you wanted, it may make it harder to move to the next one (lol, VBA's never getting updated) if it ever is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I've never seen this explicit version call out. I always use just RegExp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some functionality differences between versions 1.0 and 5.5 (such as lazy quantifiers) but since this is the only one referenced in the project, I would think we are probably safe to just use RegExp
.
- Don't manually add a BOM because it can result in two
I apologize for changing all the files in the test database in this PR. I know it gets ugly. I had created a bug and then fixed it. This proves that the bug is fixed because we can export and import against this set of test DB files and it works. |
The bug was that if you manually insert the BOM during |
I think I'm finally done messing with this PR. I performed a Full Export on my main project and the index.json shows that all the |
Excellent! Let me run it on a couple of my projects, including an ADP project, to make sure it is working fine on my end too. I am also curious to compare the performance logs after the change. |
We need to find another source for sample extended characters.
I did some performance testing to compare this PR with the My hunch is that this is due to the way we are reading and writing the files, but I am going to do some testing to see if we can bring this back down to close to what we had before. |
Those blocks labeled "Unicode Conversion" are areas I think that we could switch to happening line at a time instead of reading it all in to memory, performing some operation on the block, and then writing it out. It could all be done automatically by the stream like I do for the ReEncoder. I know that you have seen good performance improvement with the 128k blocks but I think that was the cause of the original corruption. |
Ok, that turned out to be a simple switch. My time is reduced further: |
If nothing else, you now have a single place to optimize and it will affect 4 conversions that used to be separate. (This is what I get for saying I was done) |
Sounds good. I am just about finished with the optimizing... I think I will probably need to resolve some source conflicts when I get through, based on your latest commits... 😄 But hey, that's what this project is for, to allow us to resolve conflicts at the source level, right! 😄 |
Nailed it! Getting performance similar or even slightly faster than the original version. Doing a few more cleanup tweaks... |
This allows you to easily rename log files for comparison testing.
Since we are no longer doing anything fancy that requires these to be in `modEncoding`, I have returned them to `modFileAccess` where they fit better with the purpose of the module.
Adjustments to encoding functions to utilize stream approach.
Functions *only* used for testing should be in the unit testing module, not alongside production code.
This is used in several places.
Logging a few more functions, and expanding performance reports table.
Okay, @A9G-Data-Droid - Can you try this out on your system? Let me know if you encounter any issues with the chunked reading this time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These used to be called elsewhere. If they are no longer used it should all be removed.
Are you referring to the test functions I moved in 51f0b59? If so, I am happy to remove them. I just didn't want to make the assumption and delete your work. 😄 |
Yes, I have verified that they aren't being used now that we have centralized all the encoding. |
Why is |
Verified the output from the testing database, updating a few files that have changed. (Other than color values.)
That one involves some special handling for mixed UTF-16 content where the extended/Unicode characters are encoded, but the rest of the file isn't. (I see this with ADP projects.) If there is a way to do that with the other ReEncode function, we can simplify this function as well. (But this can be an additional update after merging this PR.) |
I fleshed out the
basExtendedChars.bas
based on discussion in issue #186