Non-ASCII characters get distorted after ex- and import #217

cwuensch · 2021-05-04T12:00:29Z

Hi there,
thank you for your great add-in! It is extremely more user-friendly than the original version. :)

But there is still a bug with UTF-8 conversion in your add-in (which is not present in the original version):

When I export all objects from a database, and import them into a new DB, all non-ASCII characters get distorted.
Even worse: They are all mapped to the same byte-code, which makes it impossible to restore them afterwards!!

The exported files in
queries/.bas
forms/.bas
already contain the damaged chars.

Files in modules/*.bas seem okay.

The (original) branch in https://github.com/msaccess-vcs-integration/msaccess-vcs-integration does preserve special chars correctly.

In the attached screenshots, you can see a comparison of an exported form, left is original repo with correct chars (ANSI, not UTF-8) - right is the distorted export from this fork.
Can you help with this, please?

joyfullservice · 2021-05-04T12:28:47Z

Hi Christian,

Thank you for reporting this! Can you provide some more details on your environment and the versions that you are using? That will help in better understanding the problem.

Which version of Microsoft Access are you using?
Is it 64-bit or 32-bit?
Which version of the add-in are you using?
What are the local language settings on your computer?
Are you using the Windows 10 Beta System UTF-8 option described in Subscript out of range - (UTF-8 System Encoding) #180?

Also, would you be able to attach a small database example that demonstrates the problem? That will help me to reproduce the issue on my system.

Thank you!

Adam

cwuensch · 2021-05-04T12:37:22Z

Hi Adam,

thank you for the reply!
I will shortly answer your questions:

Microsoft Access 2010 (32 bit)
I tried a lot of versions, including the latest
Local language is German (what exactly do you need?)
No, I did not find this option. And I am using Windows 7.

Of course, I can provide a demo database. But I will need a bit of time for this...

I think, the best idea would be to do no conversion at all. Just output the chars as provided by MSAccess.
Also, I would rather prefer to omit the Unicode BOM in the resulting files.

joyfullservice · 2021-05-04T13:03:43Z

Hi Christian,

I am also using Microsoft Access 2010 (32-bit) so that will make it easier for me to test. I do not have Windows 7, but we can probably still test the issue on my Windows 10 computer.

The sample database does not have to be very complicated. Just create a blank database file, and add a form that includes some extended characters that demonstrate the issue. Export to source, then build from source to see if the characters are different from the original.

Regarding the UTF-8 conversion, we have had some pretty involved discussions on this and the clear consensus of the users was that UTF-8 is the universal standard for storing and representing multi-lingual content, and worked best across different version control systems and development tools. (Pretty much everything supports UTF-8) All source files are now saved in UTF-8 BOM format. See #180, #187, #154, #187 for additional reading on this.

The original database should be reconstructed using the original encoding. If it isn't, then this is something we should probably fix.

cwuensch · 2021-05-04T13:32:56Z

Well, the problem definitely occurs already during export, since the resulting textfiles do replace all special characters with the same byte code.
Attached is a little demo database, which shows the problem on my system.
Demo.zip

hecon5 · 2021-05-04T13:39:22Z

Do you happen to have the code page you use on your machine? https://docs.microsoft.com/en-us/windows/win32/intl/code-page-identifiers

cwuensch · 2021-05-04T13:46:51Z

The current codepage is windows-1252 (which is default for nearly all languages except for Arabic and Asian ones).
It is nearly identical with iso-8859-15 resp. iso-8859-1 (latin-1)

joyfullservice · 2021-05-04T14:09:33Z

Thank you for the sample database. I can reproduce the issue on my system.

joyfullservice · 2021-05-04T14:30:00Z

Thanks to your sample file, I think I have found the issue... The Sanitize routine expects the source file to be exported from Microsoft Access in UTF-8 format. The sample MDB file was using file version 4.0, which is encoded with the system codepage. If I create a new blank database, and import the table and form, they export just fine. (The database version for that file is 14.0)

If I read the export file using the system encoding, it exports and converts the file just fine.

This is correctly converted to UTF-8

I think the solution here is to use the system encoding when reading the file if the file version is below a certain point. Obviously version 4.0 needs this, but I am not sure technically where the cutoff happens where the export changes to Unicode... I will see if I can find something on this...

When exporting source from a database in Microsoft Access 2000 format (version 4.0), objects are saved using the system codepage, and must be read as such when opening the file to sanitize it. Newer database versions export as UTF-8 automatically, so specifying the codepage is not necessary when reading the file. Fixes #217

joyfullservice · 2021-05-04T14:58:06Z

I have made an update that I think may resolve this issue when exporting source from older file formats. Can you try the attached version and see if it works correctly in your environment?
Version_Control_v3.3.32.zip

cwuensch · 2021-05-04T17:41:14Z

Thank you very much for addressing this issue!
I installed and tested v3.3.32. (The GUI also displayed the new version number)
But the bug persists with no difference, here.

Can I do anything to help you with debugging this issue?

PS: Found out, there is not only the "Access 2000" format, but also "Access 2002-2003", "Access 2007" and I think, also "Access 97". Do they have to be treted separately?

joyfullservice · 2021-05-04T18:07:58Z

@cwuensch - The nice thing about this being a Microsoft Access add-in is that you can actually debug it live. After opening the add-in for the first time in that instance of Microsoft Access, you can see the project loaded in the VBA editor. If you open the modSanitize module, you can add a breakpoint in the SanitizeFile function. This function reads the natively exported form object source file, and where the codepage is determined. If you step through the code (Shift+F8) it should reach the highlighted line. If you then press F8, it should step into the GetSystemEncoding function which is what determines how to read the file contents.

I would also be curious to know what the GetSystemEncoding function returns on your system. You can test this by pressing ctl+G to jump to the immediate window, and type ?GetSystemEncoding and hit enter.

On my system it returns windows-1252 which reads the file correctly in the ReadFile function.

cwuensch · 2021-05-04T19:59:38Z

?GetSystemEncoding returns "windows-1252" on my system as well. (As I said, this is the default setting in Windows for nearly all languages)

But... The problem here is: CurrentDb.Version is a String, not a number! The comparison CurrentDb.Version <= 4 returns False.
It should be replaced by (Left$(CurrentDb.Version,1) <= 4).

Why? Here are some examples:
?CDbl("4.0")
40
?CDbl("4,0")
4
?(CurrentDb.Version <= 4.0)
Falsch (=false)
?(CurrentDb.Version <= 4)
Falsch
?(CDbl(CurrentDb.Version)<= 4.0)
Falsch
?(CDbl(CurrentDb.Version)<= 4)
Falsch
?(CurrentDb.Version = "4.0")
Wahr (=true)
?(Left$(CurrentDb.Version,1) <= 4)
Wahr

Problem is here, that the number <-> String conversins depend on local settings. And in German locale, a decimal point is a comma, instead of a full stop. That's why "4.0" is converted into the number "4" on your system, but into "40" on mine.

joyfullservice · 2021-05-04T20:12:40Z

Nice catch!

cwuensch · 2021-05-04T20:18:10Z

Oh no!
Sorry, this was a bad idea...
My suggestion would now ALWAYS read with system encoding, because "12.0" would be converted to 1...

This way it should work:

Public Function MainVer(ver As String) As Integer
  pos = InStr(ver, ".")
  If pos > 0 Then
    MainVer = CInt(Left$(ver, pos - 1))
  Else
    MainVer = CInt(ver)
  End If
End Function

The database version is stored as a string, but should be converted to a number before performing arithmetic. However, some locales use a period the way English numbers use a comma. See #217 for details.

joyfullservice · 2021-05-05T02:28:47Z

@cwuensch - Give this version a try...
Version_Control_v3.3.33.zip

cwuensch · 2021-05-05T09:42:50Z

Thank you very much! I can confirm, that v3.3.33 now exports all elements into working UTF-8.
Is it necessary to consider the different handling of Unicode in Access DB version 4.0 vs. 12.0 regarding the import as well?
(I think you changed this only for export. But maybe this is sufficient?)

May I ask a further question?
After you told me, how to debug the Add-In, I am quite curious: Can I also make changes to the code?
When I do what you told me, the changes will not be saved after closing MS Access. When I open the add-in-file, it just starts an automatic installation process and closes itself afterwards.

hecon5 · 2021-05-05T12:16:28Z

@cwuensch, when Access loads an Add-in (such as this one), if you edit the addin within the "opening" Access file, changes will not be saved. This is a double edged sword: it allows easy debugging, and to try things out that might otherwise ruin files. Downside is that once you close the session, it will discard any settings.

This is a really easy way to load "extras" for users and ensure they don't break things for everyone else.

Anyway, to actually answer your question, I suggest the following;

Fork this MS Access Repository into your GitHub account repos.
Clone your fork to a local repository alongside your other Access dev repos on your machine.
I put mine alongside some other Access repositories I utilize.
Switch to "Dev" Branch: I actually suggest renaming YOUR "dev" branch to something local to you, especially if you still use some of the legacyVCS tools. I named my local fork of this branch to "dev-addin".
Connect a second remote to the joyfullservice/msaccess-vcs-integration repository. This way you can track progress on the upstream ( joyfullservice/msaccess-vcs-integration) repository. There may be changes you don't want to pull into yours, or you may want customization not present on the upstream (in my environment, we have some particular changes required to ensure proper integration in our security environment and configurations that shouldn't be the default for everyone).
Go into your new local repo, and launch the Addin directly.
Make changes, use the add-in to export the add-in code, and commit/push/pull request just like any other repository.

These should probably be made into a wiki page, but they're here for now.

hecon5 · 2021-05-05T12:35:01Z

Updated the wiki; it's here: Editing and Contributing

cwuensch · 2021-05-05T12:50:00Z

Thank you @hecon5 for your detailled instructions.
But in fact, my question was slightly different:
How can I - just on my computer - edit (and save!) the MS Access Add-In file?
If I open the Add-In file, it just runs the installation and closes itself. How can I make changes to the file and save them?

hecon5 · 2021-05-05T13:02:28Z

Aha!

joyfullservice · 2021-05-05T14:05:58Z

@cwuensch - To make changes to the add-in, you simply close the install form by pressing escape or clicking the X in the top right corner as @hecon5 shows in the screenshot. Just remember that the add-in that you are editing is not the same as the add-in that runs when you click the menu item. After making changes to the add-in, you will want to install it by running the Autoexec macro, or by reopening the database. After reinstalling, the new changes will be available in the installed add-in, where you can test it with other databases.

What I will often do is test small changes in the loaded add-in, just like I described in the debugging process. When I am satisfied that it is working the way I want it to, then I copy those changes over to the working copy of the add-in. Also note that if the application-level add-in is loaded, you will not be able to install the update. The application-level add-in (the installed version) opens when it is first called from the menu item, and stays loaded until that instance of Microsoft Access is closed. The add-in will warn you that it cannot install if the installed add-in is already loaded. This probably sounds confusing, but it makes a lot of sense once you understand what is going on. 😄

As to your question on whether the files need to be converted back to the system encoding before import, this would depend on which file version was created when the database is built from source. Right now, I think the add-in just uses the current default database version, so this would work fine if the default is set to 2007 or newer. If the new database was created in a legacy format, you might have problems importing the UTF-8 encoded characters.

Here is what the option looks like in Microsoft Access 2010:

In the Build function, this is where you would specify the database format when creating a new database:

If we wanted the rebuilt database version to match the original, we should probably have the code specify the version to match the exported database. This hasn't come up as a need yet, but if someone would find this helpful or important, we could create a new issue to add this functionality.

cwuensch · 2021-05-05T14:20:19Z

Thank you for all your explanations!

1.) I think, creating a blank database with the user's "Default file format" seems reasonable. Especially, converting an existing database to a new version might be a reasonable use-case for the add-in. So I would not change that.

2.) But, IF the import has to be different, depending on the version of the newly created database, maybe the user's configuration should be considered?
(In my case, for example, the default file format is configured as "Access 2000", because there are customers with legacy software. But in a first quick test, the imported seemed to work nevertheless... that feels strange to me)

hecon5 · 2021-05-17T14:26:35Z

@cwuensch, is this working? If so, we should probably update the released version to include this fix.

When building from source, the add-in now attempts to use the same file format from the original database. This is important if you are maintaining databases in legacy versions. Also added error handling to give the user some additional clues on how to change the collation order if the database file creation fails. See #217

joyfullservice · 2021-05-17T19:50:04Z

I have just pushed some additional changes to the dev branch relating to the file format of the rebuilt database. The FileFormat of the old database is now stored in project.json and the new database is created using this same version. In other words, if you use Access 2010 to export source from a database in the Access 2000 file format, you can now build that database from source and it will be built in the legacy Access 2000 file format, just like the original database.

Although not directly related to the original issue, this was a side issue I was able to work through for the benefit of those that are working with files in older formats. Attached is a copy of the latest dev version in case you are not familiar with building it from source.
Version_Control_v3.3.36.zip

cwuensch · 2021-05-18T11:25:31Z

Hi Adam,
to my mind, it still makes more sense, if a newly created database would use the default format configured in MS Access.
This way, you could convert an "old" *.mdb-file into new *.accdb format. Also, you could stick with legacy formats, when you configure Access respectively.
But this is your choice, of course.

But could you please do me the favor and compile the previous version v3.3.35 for me?
(I did not manage to build this Access add-in file unfortunately...)

hecon5 · 2021-05-18T13:17:50Z

@cwuensch, I took the liberty of making a new issue for that, as it's an interesting idea, and I'm not sure it would be out of scope, but I think it could use some discussion before we do anything with it.

cwuensch · 2021-05-18T13:22:41Z

Thank you @hecon5!
But I think this would be a very low priority.
For me, personally, it would be perfectly fine, if someone could just provide a compiled 3.3.35 add-in version?

joyfullservice · 2021-05-18T13:23:24Z

@cwuensch - Did the version 3.3.36 have problems building your database? If so, I would love to be able to resolve the issue for the benefit of everyone.

If you need to convert a database from one version to another, you can use the Save & Publish menu item, and select which version to convert the database to.

cwuensch · 2021-05-18T13:24:59Z

I don't think so. (Could not yet try 3.3.36, but will do!)
But I would just personally prefer the old logic: That a database will be created in Access' default format.

joyfullservice added the need more details Further information needed to understand issue label May 4, 2021

joyfullservice added bug and removed need more details Further information needed to understand issue labels May 4, 2021

hecon5 mentioned this issue May 17, 2021

add BOM check before encoding module files #223

Closed

joyfullservice closed this as completed May 17, 2021

hecon5 mentioned this issue May 18, 2021

Feature: Select Build Version #224

Closed

hecon5 mentioned this issue May 28, 2021

Release 3.4.0 Steps #234

Closed

mwolfe02 mentioned this issue Oct 28, 2024

Non-ASCII characters import incorrectly from XML table data exports #560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-ASCII characters get distorted after ex- and import #217

Non-ASCII characters get distorted after ex- and import #217

cwuensch commented May 4, 2021 •

edited

Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021

hecon5 commented May 4, 2021

cwuensch commented May 4, 2021 •

edited

Loading

joyfullservice commented May 4, 2021

joyfullservice commented May 4, 2021

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 •

edited

Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 •

edited

Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 •

edited

Loading

joyfullservice commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 5, 2021

hecon5 commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 5, 2021

joyfullservice commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 17, 2021

joyfullservice commented May 17, 2021

cwuensch commented May 18, 2021

hecon5 commented May 18, 2021

cwuensch commented May 18, 2021

joyfullservice commented May 18, 2021

cwuensch commented May 18, 2021

Non-ASCII characters get distorted after ex- and import #217

Non-ASCII characters get distorted after ex- and import #217

Comments

cwuensch commented May 4, 2021 • edited Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021

hecon5 commented May 4, 2021

cwuensch commented May 4, 2021 • edited Loading

joyfullservice commented May 4, 2021

joyfullservice commented May 4, 2021

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 • edited Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 • edited Loading

joyfullservice commented May 4, 2021

cwuensch commented May 4, 2021 • edited Loading

joyfullservice commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 5, 2021

hecon5 commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 5, 2021

joyfullservice commented May 5, 2021

cwuensch commented May 5, 2021

hecon5 commented May 17, 2021

joyfullservice commented May 17, 2021

cwuensch commented May 18, 2021

hecon5 commented May 18, 2021

cwuensch commented May 18, 2021

joyfullservice commented May 18, 2021

cwuensch commented May 18, 2021

cwuensch commented May 4, 2021 •

edited

Loading

cwuensch commented May 4, 2021 •

edited

Loading

cwuensch commented May 4, 2021 •

edited

Loading

cwuensch commented May 4, 2021 •

edited

Loading

cwuensch commented May 4, 2021 •

edited

Loading