Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse workers can not connect to master after upgrading to 1.85.0 from 1.84.0 #15744

Closed
fredriklindberg opened this issue Jun 7, 2023 · 2 comments · Fixed by #15746
Closed
Labels
O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release

Comments

@fredriklindberg
Copy link

Description

After upgrading from 1.84.0 to 1.85.0 worker processes are unable to contact the main process.
Downgrading to 1.84.0 with the same configuration resolves the connection problems.

The configuration is using a shared instance_map block with a main section.

Could be that there is something wrong with the existing configuration and that it worked by chance.

Looking at the changelog for 1.85.0, pull request #15578 seems relevant.

Steps to reproduce

Upgrade from 1.84.0 to 1.85.0 using the attached configuraiton.

Homeserver

private server

Synapse Version

1.85.0

Installation Method

Docker (matrixdotorg/synapse)

Database

PostgreSQL, single server. Not Not restored from backup.

Workers

Multiple workers

Platform

Master process and workers are running as containers in a k8s cluster.

Configuration

Relevant parts of shared homeserver.yaml

pid_file: /var/run/homeserver.pid
log_config: "/config/log.config"
tls_private_key_path: "/etc/ssl/cluster-certificate/tls.key"
tls_certificate_path: "/etc/ssl/cluster-certificate/tls.crt"

listeners:
  - port: 8008
    tls: true
    bind_address: '0.0.0.0'
    type: http
    x_forwarded: true
    resources:
      - names: [client]
        compress: false
      - names: [federation]
        compress: false
  - port: 9093
    bind_address: '0.0.0.0'
    tls: true
    type: http
    resources:
     - names: [replication]

instance_map:
  main:
    host: synapse.default.svc
    port: 9093
    tls: true
  synapse_worker_0:
    host: synapse-worker-0.synapse-worker-headless.default.svc
    port: 9093
    tls: false
  synapse_worker_1:
    host: synapse-worker-1.synapse-worker-headless.default.svc
    port: 9093
    tls: false

stream_writers:
  events:
    - synapse_worker_0
    - synapse_worker_1

worker_replication_secret: "secret"
redis:
  enabled: true
  host: "redis.default.svc"
  port: 6379
  password: "redis-password"

worker.yaml

worker_name: synapse_worker_0
worker_app: synapse.app.generic_worker
worker_log_config: "/config/log.config"

worker_listeners:
  - port: 8008
    tls: true
    bind_addresses: ['0.0.0.0']
    type: http
    x_forwarded: true
    resources:
      - names: [client]
        compress: false
      - names: [federation]
        compress: false
  - port: 9093
    bind_address: '0.0.0.0'
    tls: false
    type: http
    resources:
     - names: [replication]

Relevant log output

Following is observed on the worker nodes

2023-06-07 05:26:49,967 - synapse.http.client - 933 - INFO - GET-53- Error sending request to  POST synapse-replication://master/_synapse/replication/presence_set_state/<redacted>: AttributeError 'str' object has no attribute 'decode'
2023-06-07 05:26:50,395 - synapse.federation.federation_server - 1488 - INFO - PUT-55- Failed to handle edu 'm.presence': SynapseError('502: Failed to talk to master process')

Anything else that would be useful to know?

No response

@erikjohnston
Copy link
Member

This looks to be a regression when using TLS for replication.

# The 'port' argument below isn't actually used by the function
self.context_factory.creatorForNetloc(
self.instance_map[worker_name].host,
self.instance_map[worker_name].port,
),

Those are strings, but creatorForNetloc is expecting bytes. I don't know why mypy didn't pick that up.

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release O-Uncommon Most users are unlikely to come across this or unexpected workflow labels Jun 8, 2023
erikjohnston added a commit that referenced this issue Jun 8, 2023
@erikjohnston
Copy link
Member

I don't know why mypy didn't pick that up.

... because context_factory is a IPolicyForHttps, which doesn't have type hints.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants