Skip to content

Seemingly random 502 Server Errors keep killing my quantum computations and a suggested workaround #291

@cgogolin

Description

@cgogolin

From time to time I see the following error message when using the IBM backend, and would like to know whether you are also experiencing this and maybe have an idea what is the root cause:

While running a circuit that normally executes just fine, I get the following exception log

../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:304: in flush
    self.receive([Command(self, FlushGate(), ([WeakQubitRef(self, -1)],))])
../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:266: in receive
    self.send(command_list)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <projectq.cengines._main.MainEngine object at 0x7fbfb2ee2ac8>, command_list = [<projectq.ops._command.Command object at 0x7fbfb2ee2630>]

    def send(self, command_list):
        """
            Forward the list of commands to the next engine in the pipeline.
    
            It also shortens exception stack traces if self.verbose is False.
            """
        try:
            self.next_engine.receive(command_list)
        except:
            if self.verbose:
                raise
            else:
                exc_type, exc_value, exc_traceback = sys.exc_info()
                # try:
                last_line = traceback.format_exc().splitlines()
                compact_exception = exc_type(str(exc_value) +
                                             '\n raised in:\n' +
                                             repr(last_line[-3]) +
                                             "\n" + repr(last_line[-2]))
                compact_exception.__cause__ = None
>               raise compact_exception  # use verbose=True for more info
E               Exception: Failed to run the circuit. Aborting.
E                raised in:
E               '  File "/home/cgogolin/.local/lib/python3.5/site-packages/projectq/backends/_ibm/_ibm.py", line 295, in _run'
E               '    raise Exception("Failed to run the circuit. Aborting.")'

../../.local/lib/python3.5/site-packages/projectq/cengines/_main.py:288: Exception

and on the console I then see:

- There was an error running your code:
502 Server Error: Bad Gateway for url: https://quantumexperience.ng.bluemix.net/api/users/login

The frequency of this error seems to be independent of the type of circuits I run and I get this from time to time, independently of the type of internet connection I use, so that I can exclude simple connection problems on my end.

Running with verbose=true reveals that the source of the error is in _run(self) around line 260 in _ibm.py, namely:

>           counts = res['data']['counts']
E           TypeError: 'NoneType' object is not subscriptable

i.e., res = send(...) did return None instead of actual results.

An strait forward workaround for me is to simply run send(...) until it returns a non-None result, e.g,, as follows:

if self._retrieve_execution is None:
        res = None
        retries = 10
        while(res is None and retries > 0):
            retries -= 1
            res = send(info, device=self.device,
                       user=self._user, password=self._password,
                       shots=self._num_runs, verbose=self._verbose)

In practices I virtually never need more than a second attempt to get a result. This makes me believe that the problem is also not related to me sending too many queries or other rate limiting mechanisms.

I have found other people having similar spurious 502 errors on blumix. Their application is (probably) not at all quantum related, so maybe we are just suffering from some classical middleware misconfiguration?

Could/Should ProjectQ handle such errors more gracefully?

Digging a little deeper, I see that send() calls _get_result() and this already has a retry mechanism built in. The only way I can see in wich _get_result() and then send() can return None without raising an Exception is if the json of the return value of requests.get() contains the element r_json['qasms'][0]['result'] and this element is None. I can thus also fix the problem by adding the and qasm['result'] is not None in the last but line of the following code in _get_result():

    for retries in range(num_retries):
        r = requests.get(urljoin(_api_url, suffix),
                         params={"access_token": access_token})
        r.raise_for_status()

        r_json = r.json()
        if 'qasms' in r_json:
            qasm = r_json['qasms'][0]
            if 'result' in qasm and qasm['result'] is not None:
                return qasm['result']

On a related note: Aren't the default values num_retries=3000 and interval=1 of _get_result(), which result in a total waiting time until the timeout of nearly one hour a bit long? Wouldn't it be nice to make those user customizable?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions