Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blueprint Chokes with Large global-state.json #12708

Closed
Tracked by #13192
tsmaeder opened this issue Jul 13, 2023 · 12 comments
Closed
Tracked by #13192

Blueprint Chokes with Large global-state.json #12708

tsmaeder opened this issue Jul 13, 2023 · 12 comments
Labels
contributor experience issues related to the contributor experience json issues related to the json language performance issues related to performance

Comments

@tsmaeder
Copy link
Contributor

Bug Description:

All of a sudden, Theia refuse do provide things like content assist in typescript files. Stuff like opening files and syntax coloring worked fine. I noticed that the global state file at ~/.theia-blueprint/plugin-storage/global-state.json was rather large (218KB). Opening the file in an IDE proved difficult (VS Code could only do it in "restricted" mode without langauge smarts). The file seemed to contain many entries from the gitlens extension. Uninstalling gitlens did not change the behavior. However, deleting the file restored correct function of Theia.

Additional Information

  • Operating System: Windows 11
  • Theia Version: Blueprint 1.39.0
@msujew
Copy link
Member

msujew commented Jul 13, 2023

I've actually noticed that as well. I got a 180MB global-state.json by accident (a buggy vscode extension was at fault) and Theia exhibited very weird behavior. Most notably, it got disconnected during startup as the file transmission chocked out the whole bandwidth.

@tsmaeder
Copy link
Contributor Author

@msujew good to hear this corroborated. My immediate suspicion is that we're parsing the file in a non-scalable way. VS Code (and Visual Studio) have trouble (aka stop responding) when I open the file in an editor.

@tsmaeder tsmaeder added performance issues related to performance json issues related to the json language contributor experience issues related to the contributor experience labels Jul 13, 2023
@msujew
Copy link
Member

msujew commented Jul 14, 2023

FYI, @jonah-iden also ran accidentally into this issue. It seems to be reproducible by using the python+Jupyter notebook extension, which stores massive data entries in the global-state.json. I assume vscode handles that somewhat differently and overrides entries, while we somehow always append more and more data to the json.

@msujew
Copy link
Member

msujew commented Jul 14, 2023

I've restarted the application with the python+Jupyter extension installed and get some reproducible behavior. The global-state.json file looks like this:

global-state.json
{
  "vadimcn.vscode-lldb": {
    "lastLaunchedVersion": "1.9.2"
  },
  "ms-python.python": {
    "PYTHON_GLOBAL_STORAGE_KEYS": [
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      },
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      },
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      },
      {
        "key": "isRemoteGlobalSettingCopiedKey",
        "defaultValue": false
      },
      {
        "key": "remoteWorkspaceFolderKeysForWhichTheCopyIsDone_Key",
        "defaultValue": []
      },
      {
        "key": "PYTHON_WAS_DISCOVERY_TRIGGERED_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA",
        "defaultValue": false
      },
      {
        "key": "PYTHON_ENV_INFO_CACHE",
        "defaultValue": []
      },
      {
        "key": "PYTHON_EXTENSION_GLOBAL_STORAGE_KEYS",
        "defaultValue": []
      },
      {
        "key": "PYTHON_ENV_INFO_CACHE",
        "defaultValue": []
      },
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      },
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      },
      {
        "key": "PYTHON_ENV_INFO_CACHE",
        "defaultValue": []
      },
      {
        "key": "PYTHON_ENV_INFO_CACHE",
        "defaultValue": []
      },
      {
        "key": "PYTHON_ENV_INFO_CACHE",
        "defaultValue": []
      },
      {
        "key": "VSCode.ABExp.FeatureData",
        "defaultValue": {
          "features": []
        }
      },
      {
        "key": "PythonTensorBoardWebviewPreferredViewGroup",
        "defaultValue": -1
      },
      {
        "key": "WORKSPACE_FOLDER_INTERPRETER_PATH_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
      },
      {
        "key": "preferredGlobalPyInterpreter"
      }
    ],
    "PYTHON_ENV_INFO_CACHE": [
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python36\\python.exe",
          "sysPrefix": "C:\\Python36",
          "ctime": 1601845804962,
          "mtime": 1545603568000
        },
        "display": "Python 3.6.8 64-bit",
        "version": {
          "major": 3,
          "minor": 6,
          "micro": 8,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.6 (64-bit)"
        },
        "source": [
          "path env var",
          "windows registry"
        ],
        "id": "C:\\PYTHON36\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.6.8 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python38\\python.exe",
          "sysPrefix": "C:\\Python38",
          "ctime": 1601845317994,
          "mtime": 1600869606000
        },
        "display": "Python 3.8.6 64-bit",
        "version": {
          "major": 3,
          "minor": 8,
          "micro": 6,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.8 (64-bit)"
        },
        "source": [
          "windows registry"
        ],
        "id": "C:\\PYTHON38\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.8.6 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python36\\python.exe",
          "sysPrefix": "C:\\Python36",
          "ctime": 1601845804962,
          "mtime": 1545603568000
        },
        "display": "Python 3.6.8 64-bit",
        "version": {
          "major": 3,
          "minor": 6,
          "micro": 8,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.6 (64-bit)"
        },
        "source": [
          "path env var",
          "windows registry"
        ],
        "id": "C:\\PYTHON36\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.6.8 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python38\\python.exe",
          "sysPrefix": "C:\\Python38",
          "ctime": 1601845317994,
          "mtime": 1600869606000
        },
        "display": "Python 3.8.6 64-bit",
        "version": {
          "major": 3,
          "minor": 8,
          "micro": 6,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.8 (64-bit)"
        },
        "source": [
          "windows registry"
        ],
        "id": "C:\\PYTHON38\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.8.6 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python36\\python.exe",
          "sysPrefix": "C:\\Python36",
          "ctime": 1601845804962,
          "mtime": 1545603568000
        },
        "display": "Python 3.6.8 64-bit",
        "version": {
          "major": 3,
          "minor": 6,
          "micro": 8,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.6 (64-bit)"
        },
        "source": [
          "path env var",
          "windows registry"
        ],
        "id": "C:\\PYTHON36\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.6.8 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python38\\python.exe",
          "sysPrefix": "C:\\Python38",
          "ctime": 1601845317994,
          "mtime": 1600869606000
        },
        "display": "Python 3.8.6 64-bit",
        "version": {
          "major": 3,
          "minor": 8,
          "micro": 6,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.8 (64-bit)"
        },
        "source": [
          "windows registry"
        ],
        "id": "C:\\PYTHON38\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.8.6 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python36\\python.exe",
          "sysPrefix": "C:\\Python36",
          "ctime": 1601845804962,
          "mtime": 1545603568000
        },
        "display": "Python 3.6.8 64-bit",
        "version": {
          "major": 3,
          "minor": 6,
          "micro": 8,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.6 (64-bit)"
        },
        "source": [
          "path env var",
          "windows registry"
        ],
        "id": "C:\\PYTHON36\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.6.8 64-bit"
      },
      {
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python38\\python.exe",
          "sysPrefix": "C:\\Python38",
          "ctime": 1601845317994,
          "mtime": 1600869606000
        },
        "display": "Python 3.8.6 64-bit",
        "version": {
          "major": 3,
          "minor": 8,
          "micro": 6,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.8 (64-bit)"
        },
        "source": [
          "windows registry"
        ],
        "id": "C:\\PYTHON38\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.8.6 64-bit"
      }
    ],
    "pylanceDefaultPromptMemento": true,
    "PYTHON_WAS_DISCOVERY_TRIGGERED_C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA": true,
    "remoteWorkspaceFolderKeysForWhichTheCopyIsDone_Key": [
      "C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA",
      "C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA",
      "C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA",
      "C:\\USERS\\MARK\\SOURCE\\REPOS\\THEIA"
    ],
    "isRemoteGlobalSettingCopiedKey": true
  },
  "ms-toolsai.jupyter": {
    "JupyterDetectionTelemetrySentMementoKey": true,
    "REGISTRATION_ID_EXTENSION_OWNER_MEMENTO_KEY": [
      {
        "extensionId": "_builtin.jupyterServerUrlProvider",
        "providerId": "ms-toolsai.jupyter"
      },
      {
        "extensionId": "ms-toolsai.jupyter",
        "providerId": "_builtin.jupyterServerUrlProvider"
      },
      {
        "extensionId": "_builtin.jupyterServerUrlProvider",
        "providerId": "ms-toolsai.jupyter"
      },
      {
        "extensionId": "ms-toolsai.jupyter",
        "providerId": "_builtin.jupyterServerUrlProvider"
      },
      {
        "extensionId": "_builtin.jupyterServerUrlProvider",
        "providerId": "ms-toolsai.jupyter"
      },
      {
        "extensionId": "_builtin.jupyterServerUrlProvider",
        "providerId": "ms-toolsai.jupyter"
      },
      {
        "extensionId": "ms-toolsai.jupyter",
        "providerId": "_builtin.jupyterServerUrlProvider"
      },
      {
        "extensionId": "ms-toolsai.jupyter",
        "providerId": "_builtin.jupyterServerUrlProvider"
      },
      {
        "extensionId": "_builtin.jupyterServerUrlProvider",
        "providerId": "ms-toolsai.jupyter"
      }
    ],
    "INTERPRETER_PATH_WAS_SELECTED_FOR_JUPYTER_SERVER": true
  }
}

As you can see, the extensions register another object each time they start, even though they should override the existing objects.

@tsmaeder
Copy link
Contributor Author

As you can see, the extensions register another object each time they start

I'm not seeing that. What text in the file indicates to you that they add a new entry on each start?

@msujew
Copy link
Member

msujew commented Jul 14, 2023

I'm not seeing that. What text in the file indicates to you that they add a new entry on each start?

The 8 entries in the json that look exactly like this:

{
        "name": "",
        "location": "",
        "kind": "global-other",
        "executable": {
          "filename": "C:\\Python36\\python.exe",
          "sysPrefix": "C:\\Python36",
          "ctime": 1601845804962,
          "mtime": 1545603568000
        },
        "display": "Python 3.6.8 64-bit",
        "version": {
          "major": 3,
          "minor": 6,
          "micro": 8,
          "release": {
            "level": "final",
            "serial": 0
          },
          "sysVersion": "3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)]"
        },
        "arch": 3,
        "distro": {
          "org": "PythonCore",
          "defaultDisplayName": "Python 3.6 (64-bit)"
        },
        "source": [
          "path env var",
          "windows registry"
        ],
        "id": "C:\\PYTHON36\\PYTHON.EXE",
        "detailedDisplayName": "Python 3.6.8 64-bit"
      },

I get a new entry in there every time the extension starts.

@msujew
Copy link
Member

msujew commented Jul 14, 2023

I was able to successfully reproduce the issue and filed an issue (and potential fix) with Microsoft, see microsoft/vscode-python#21635.

@tsmaeder
Copy link
Contributor Author

I don't think this is a problem in the extension: IMO, the problem lies here: https://github.com/eclipse-theia/theia/blob/master/packages/plugin-ext/src/main/node/plugins-key-value-storage.ts#L117

It reads the current version of global-state.json from disk an then uses uses deepmerge to merge the in-memory version into the on-disk version before writing the file back. But if you look at the default array merge function in deepmerge, here's what it does:

function defaultArrayMerge(target, source, options) {
	return target.concat(source).map(function(element) {
		return cloneUnlessOtherwiseSpecified(element, options)
	})
}

Basically, a concat of the old array with the new array. So we end up with the concatenation of the old array and the new array in the resulting state file, IMO. @paul-marechal could you chime in why we need a merge and not just let the last write on the file win?

@msujew
Copy link
Member

msujew commented Jul 17, 2023

@tsmaeder I see, thanks for investigating 👍

@tsmaeder
Copy link
Contributor Author

FWIW, VS Code is using an SQLite DB to store a JSON value per plugin. So my "aren't we recreating a DB" take wasn't that far fetched. Maybes something we should consider, as well.

@jonah-iden jonah-iden mentioned this issue Jul 17, 2023
2 tasks
@paul-marechal
Copy link
Member

why we need a merge and not just let the last write on the file win?

The goal was to avoid race conditions as much as possible, but it is now apparent that:

  1. Using deepmerge is indeed bogus with arrays
  2. The PluginsKeyValueStorage is a backend singleton shared by all plugin host processes: So concurrent access to the key/value stores should be safe

With that in mind, I opened #12717.

FWIW, VS Code is using an SQLite DB to store a JSON value per plugin. So my "aren't we recreating a DB" take wasn't that far fetched. Maybes something we should consider, as well.

I think the expectation was that using a SQL-based DB would be overkill, but if the performance issues persist then we might have to start using sqlite too.

@JonasHelming
Copy link
Contributor

@tsmaeder Is this actually solved?

@tsmaeder tsmaeder closed this as completed Aug 8, 2023
@JonasHelming JonasHelming mentioned this issue Dec 19, 2023
63 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor experience issues related to the contributor experience json issues related to the json language performance issues related to performance
Projects
None yet
Development

No branches or pull requests

4 participants