-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double escaping text in XML files? #352
Comments
This seems to be a bug introduced by #314. While Access ExportXML may incorrectly encode the ampersand character in an a TableDef index name, it DOES correctly encode table data. The end result is that the output of ExportXML for table data is correct, but then calling SanitizeXML on the output file results in double encoding. It appears the application of SanitizeXML must be restricted to only those files (or even parts of files) that require it. |
For a local fix I'm just removing the call to |
Another thought... It appears MS have made a change to ExportXML sometime in the last few months. That same file, exported with my local fix, now produces this diff:
which is what I would expect, but does mean ExportXML is now encoding characters that previously were not encoded. It might be necessary to evaluate if SanitizeXML is still needed. |
Thanks for the research and testing on this! We can certainly limit the sanitizing by version if different versions of Access behave differently on the character encoding. I will try to do a little testing on Access 2010 when I have a chance... |
Apart from comment text, this can break things if you have escaped characters in validation rules. For example, I found that when I had a ValidationRule on a field of Microsoft® Access® for Microsoft 365 MSO (Version 2209 Build 16.0.15629.20152) 32-bit |
My current workaround is scripting an edit of the exported tbldef XML files changing |
I just came across this issue recently when I tried to migrate my few local tables from Tab Delimited to XML. Particularly, it botched the Before (in this custom ribbon, I only hide the Help toolbar): After exporting and rebuilding from source: For the time being, I will keep the exported data as Tab Delimited for this table. But I do think this issue should be fixed. @joyfullservice did you look into it? Do you need help? |
Does it have this problem in the latest build on the Let me know if you still see this issue after building the |
Ah, I didn't think of checking the Maybe you can add this information as a "known issue" of the 3.x branch. As a side note, the export time is roughly longer by 30% between 3.x and 4.x branch, mainly due in the tables category (x5) and the Read File operation (x10). I'm not really sure why, I haven't looked precisely into it. |
One concern is whether it's due to the changes introduced in #388 . A quick test would be to compare the version from this commit which is the last one before #388 was merged. I do expect some slowdown due to the new table connection checks introduced in the PR but would like to confirm whether it's actually 30% slower which seems surprising. |
Please note that I used the latest released build, which is the version 4.0.9. I'll try again with the 4.0.10 once available and share the results. |
@Indigo744 - Attached is a fresh build from the current |
I did a fair bit of testing on this with one of my most complex databases, comparing between the 4.04 and the current The chart below compares the two version with the seconds involved in running some of the bigger operations: Here are the actual performance reports from version 4.0.10 on this database during a full export:
|
@joyfullservice @bclothier
@joyfullservice How much work is still needed before a 4.0.x release? Is it PROD ready yet or should I wait more stability? |
One thing I can't help not noticing is that in 3.4.26, the I do not know enough about the performance measurement implementation to be sure whether that affects categories. For example, 3.4.26 reports 3.38 seconds whereas 4.0.10 reports 6.10 seconds but is that because it is now measuring the time more accurately than previously? That does have the unfortunate side effect of snowing out where the actual slowdown is. I'm glad it's only 10% slower given the other changes that were added. |
Thanks for posting the performance reports! That is really helpful, especially on a very large, complex database. One thing that has me a bit mystified is why we see such a performance difference in the Read File function... I have verified in the source code, and the function itself is identical between these versions. 🤔 If you take out this difference, version 4.0.10 is actually a few seconds faster overall, which would make sense to me, given some of the additional optimizations in the newer version. (In my testing I was finding the newer version generally slightly faster.) I did notice something interesting with the Read File function on my computer yesterday. I noticed that the read times seemed higher than I was expecting, and my computer was doing a lot more with memory, CPU and disk IO. Windows Explorer seemed to be using quite a bit of CPU, so I restarted the process. Subsequent exports went much faster, and the Read File function was back in the expected range. This might have been a fluke thing with my computer, but it was interesting to note. Regarding the performance tracking, the newer version is going to be more accurate, especially in regard to the Other Operations. I got that cleaned up a bit more in the newer version to ensure we were tracking more operations that were slipping through the cracks in earlier versions. |
Great question! I have been using it in production, and it is working great for me. The main things remaining before release is to finish working through the last few remaining objects to add merge support, then finish out the merge build functionality. (This will allow you to merge in a few changed source files into an existing database without needing to build the entire thing from scratch.) The merge build will be a game-changer in a multi-developer context because it allows you to quickly and easily merge in another developer's changes without having to stop and build everything from source. The other significant change I am planning to implement before the general 4.0 rollout is the splitting out the VBA code from form and report exports. This is discussed in more detail in #378, and would mean that a form has two source files. One with the object definition, and another corresponding class file with the VBA code. I am pretty close to finishing a way to make this split while still preserving the git history for those using git as their VCS back end. I am pretty comfortable with v4 at this point, and don't really anticipate any other major breaking changes in this version as we head towards the general release. There is a little fine tuning left on the conflict detection (particularly in relation to orphaned files), but that is all new functionality anyway. |
I have re-tested with v4.0.34 rather than 3.4.23 and the export and import of validation rules with '<' characters in now works as well as table field comments with ', ", <, > and & characters. So I think this issue can be closed. |
When exporting table contents, the latest version (3.4.23) seems to be double-escaping text in XML files.
Here's one sample diff from within a file that hasn't been exported since before I upgraded to 3.4.23
If you were just escaping I'd expect
Observation Req'd
, but it seems the escaping has been applied twice. I haven't tried building from source but I suspect this is not correct.The text was updated successfully, but these errors were encountered: