-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-ASCII characters get distorted after ex- and import #217
Comments
Hi Christian, Thank you for reporting this! Can you provide some more details on your environment and the versions that you are using? That will help in better understanding the problem.
Also, would you be able to attach a small database example that demonstrates the problem? That will help me to reproduce the issue on my system. Thank you!
|
Hi Adam, thank you for the reply!
Of course, I can provide a demo database. But I will need a bit of time for this... I think, the best idea would be to do no conversion at all. Just output the chars as provided by MSAccess. |
Hi Christian, I am also using Microsoft Access 2010 (32-bit) so that will make it easier for me to test. I do not have Windows 7, but we can probably still test the issue on my Windows 10 computer. The sample database does not have to be very complicated. Just create a blank database file, and add a form that includes some extended characters that demonstrate the issue. Export to source, then build from source to see if the characters are different from the original. Regarding the UTF-8 conversion, we have had some pretty involved discussions on this and the clear consensus of the users was that UTF-8 is the universal standard for storing and representing multi-lingual content, and worked best across different version control systems and development tools. (Pretty much everything supports UTF-8) All source files are now saved in UTF-8 BOM format. See #180, #187, #154, #187 for additional reading on this. The original database should be reconstructed using the original encoding. If it isn't, then this is something we should probably fix. |
Well, the problem definitely occurs already during export, since the resulting textfiles do replace all special characters with the same byte code. |
Do you happen to have the code page you use on your machine? https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers |
The current codepage is windows-1252 (which is default for nearly all languages except for Arabic and Asian ones). |
Thanks to your sample file, I think I have found the issue... The Sanitize routine expects the source file to be exported from Microsoft Access in UTF-8 format. The sample MDB file was using file version 4.0, which is encoded with the system codepage. If I create a new blank database, and import the table and form, they export just fine. (The database version for that file is 14.0) If I read the export file using the system encoding, it exports and converts the file just fine. This is correctly converted to UTF-8 I think the solution here is to use the system encoding when reading the file if the file version is below a certain point. Obviously version 4.0 needs this, but I am not sure technically where the cutoff happens where the export changes to Unicode... I will see if I can find something on this... |
When exporting source from a database in Microsoft Access 2000 format (version 4.0), objects are saved using the system codepage, and must be read as such when opening the file to sanitize it. Newer database versions export as UTF-8 automatically, so specifying the codepage is not necessary when reading the file. Fixes #217
I have made an update that I think may resolve this issue when exporting source from older file formats. Can you try the attached version and see if it works correctly in your environment? |
Thank you very much for addressing this issue! Can I do anything to help you with debugging this issue? PS: Found out, there is not only the "Access 2000" format, but also "Access 2002-2003", "Access 2007" and I think, also "Access 97". Do they have to be treted separately? |
@cwuensch - The nice thing about this being a Microsoft Access add-in is that you can actually debug it live. After opening the add-in for the first time in that instance of Microsoft Access, you can see the project loaded in the VBA editor. If you open the I would also be curious to know what the GetSystemEncoding function returns on your system. You can test this by pressing ctl+G to jump to the immediate window, and type On my system it returns |
?GetSystemEncoding returns "windows-1252" on my system as well. (As I said, this is the default setting in Windows for nearly all languages) But... The problem here is: CurrentDb.Version is a String, not a number! The comparison CurrentDb.Version <= 4 returns False. Why? Here are some examples: Problem is here, that the number <-> String conversins depend on local settings. And in German locale, a decimal point is a comma, instead of a full stop. That's why "4.0" is converted into the number "4" on your system, but into "40" on mine. |
Nice catch! |
Oh no! This way it should work:
|
The database version is stored as a string, but should be converted to a number before performing arithmetic. However, some locales use a period the way English numbers use a comma. See #217 for details.
@cwuensch - Give this version a try... |
Thank you very much! I can confirm, that v3.3.33 now exports all elements into working UTF-8. May I ask a further question? |
@cwuensch, when Access loads an Add-in (such as this one), if you edit the addin within the "opening" Access file, changes will not be saved. This is a double edged sword: it allows easy debugging, and to try things out that might otherwise ruin files. Downside is that once you close the session, it will discard any settings. This is a really easy way to load "extras" for users and ensure they don't break things for everyone else. Anyway, to actually answer your question, I suggest the following;
These should probably be made into a wiki page, but they're here for now. |
Updated the wiki; it's here: Editing and Contributing |
Thank you @hecon5 for your detailled instructions. |
@cwuensch - To make changes to the add-in, you simply close the install form by pressing escape or clicking the X in the top right corner as @hecon5 shows in the screenshot. Just remember that the add-in that you are editing is not the same as the add-in that runs when you click the menu item. After making changes to the add-in, you will want to install it by running the Autoexec macro, or by reopening the database. After reinstalling, the new changes will be available in the installed add-in, where you can test it with other databases. What I will often do is test small changes in the loaded add-in, just like I described in the debugging process. When I am satisfied that it is working the way I want it to, then I copy those changes over to the working copy of the add-in. Also note that if the application-level add-in is loaded, you will not be able to install the update. The application-level add-in (the installed version) opens when it is first called from the menu item, and stays loaded until that instance of Microsoft Access is closed. The add-in will warn you that it cannot install if the installed add-in is already loaded. This probably sounds confusing, but it makes a lot of sense once you understand what is going on. 😄 As to your question on whether the files need to be converted back to the system encoding before import, this would depend on which file version was created when the database is built from source. Right now, I think the add-in just uses the current default database version, so this would work fine if the default is set to 2007 or newer. If the new database was created in a legacy format, you might have problems importing the UTF-8 encoded characters. Here is what the option looks like in Microsoft Access 2010: In the If we wanted the rebuilt database version to match the original, we should probably have the code specify the version to match the exported database. This hasn't come up as a need yet, but if someone would find this helpful or important, we could create a new issue to add this functionality. |
Thank you for all your explanations! 1.) I think, creating a blank database with the user's "Default file format" seems reasonable. Especially, converting an existing database to a new version might be a reasonable use-case for the add-in. So I would not change that. 2.) But, IF the import has to be different, depending on the version of the newly created database, maybe the user's configuration should be considered? |
@cwuensch, is this working? If so, we should probably update the released version to include this fix. |
When building from source, the add-in now attempts to use the same file format from the original database. This is important if you are maintaining databases in legacy versions. Also added error handling to give the user some additional clues on how to change the collation order if the database file creation fails. See #217
I have just pushed some additional changes to the Although not directly related to the original issue, this was a side issue I was able to work through for the benefit of those that are working with files in older formats. Attached is a copy of the latest |
Hi Adam, But could you please do me the favor and compile the previous version v3.3.35 for me? |
@cwuensch, I took the liberty of making a new issue for that, as it's an interesting idea, and I'm not sure it would be out of scope, but I think it could use some discussion before we do anything with it. |
Thank you @hecon5! |
@cwuensch - Did the version 3.3.36 have problems building your database? If so, I would love to be able to resolve the issue for the benefit of everyone. If you need to convert a database from one version to another, you can use the Save & Publish menu item, and select which version to convert the database to. |
I don't think so. (Could not yet try 3.3.36, but will do!) |
Hi there,
thank you for your great add-in! It is extremely more user-friendly than the original version. :)
But there is still a bug with UTF-8 conversion in your add-in (which is not present in the original version):
When I export all objects from a database, and import them into a new DB, all non-ASCII characters get distorted.
Even worse: They are all mapped to the same byte-code, which makes it impossible to restore them afterwards!!
The exported files in
queries/.bas
forms/.bas
already contain the damaged chars.
Files in modules/*.bas seem okay.
The (original) branch in https://github.com/msaccess-vcs-integration/msaccess-vcs-integration does preserve special chars correctly.
In the attached screenshots, you can see a comparison of an exported form, left is original repo with correct chars (ANSI, not UTF-8) - right is the distorted export from this fork.
Can you help with this, please?
The text was updated successfully, but these errors were encountered: