Skip to content

Comments

travis-ci build fix using custom test runner#180

Closed
Stypox wants to merge 16 commits intoTeamNewPipe:devfrom
Stypox:cl-build-fix
Closed

travis-ci build fix using custom test runner#180
Stypox wants to merge 16 commits intoTeamNewPipe:devfrom
Stypox:cl-build-fix

Conversation

@Stypox
Copy link
Member

@Stypox Stypox commented Jul 31, 2019

Briefly

This pull request introduces a custom test runner for services tests in NewPipeExtractor. Services tests connect and fetch data from servers, and this can create problems with too-many-requests and recaptcha-exceptions. See #177

Features

The NewPipeTestRunner works the same way as the default Junit runner: @Test (with optional parameters timeout and expected), @Ignore (with optional reason), @Before, @After, @BeforeClass, @AfterClass are supported.
The only differencies are these:

  • when a test function throws a ReCaptchaException, that function is marked as ignored.
  • when the failedTests/totalNotIgnoredTests ratio is above failRatioTriggeringIgnore (customizable), all failing test functions are marked as ignored. Probably the servers returned something wrong due to overload, so all test methods relying on the server response cannot work.
  • The time it takes to conduct any test is displayed as 0ms, since at first results are collected from every test function, then the failedTests/totalNotIgnoredTests ratio is calculated using those results, and only then the RunNotifier is notified about the results (but checking results takes less than 1ms). This is a side effect of the implementation details, but it is not a big deal.

In order to customize failRatioTriggeringIgnore I introduced the NewPipeTestRunnerOptions annotation, that is mandatory on tests run with NewPipeTestRunner. The annotation allows to choose a value for failRatioTriggeringIgnore. See this file for more information about it.

Every ignored/failed test generates a message explaining the reason and containing the stack trace.

Example

@RunWith(NewPipeTestRunner.class)
@NewPipeTestRunnerOptions(failRatioTriggeringIgnore = 0.6f)
public class MyServiceExractorTest { /* ... */ }

Feedback requested

tarvis-ci servers are very fast. The custom runner ignores failed tests when the service being tested returned an invalid response, but this could lead to many ignored tests. There are three ways to fix this: add some delay between tests to reduce overload, retry a test if it failed, or a combination of both. Let me know your opinion.
Another thing i'd like to get feedback on is how the value for failRatioTriggeringIgnore should be chosen. Is 0.6=60% ok?

Other improvements to tests

  • Add base class for youtube stream extractor tests
  • Assert that the stream urls' response code is not an error (e.g. not 404)

TODO

  • Add NewPipeTestRunner to YouTube tests
  • Add NewPipeTestRunner to SoundCloud tests
  • Add NewPipeTestRunner to MediaCCC tests

@Stypox Stypox changed the title Cl build fix tarvis-ci build fix Aug 1, 2019
@Stypox Stypox changed the title tarvis-ci build fix tarvis-ci build fix using custom test runner Aug 1, 2019
@Stypox
Copy link
Member Author

Stypox commented Aug 1, 2019

^ I added the description now, along with feedback requests :-)
@theScrabi @TobiGr @yausername

@Stypox Stypox changed the title tarvis-ci build fix using custom test runner travis-ci build fix using custom test runner Aug 1, 2019
@Stypox Stypox added the enhancement New feature or request label Aug 17, 2019
@mauriciocolli
Copy link
Contributor

Had a quick overview, my main concern is that it has a chance that some bugs/erros will go unnoticed, I imagine checking the logs frequently would be a part of the new workflow if this is merged.

The time it takes to conduct any test is displayed as 0ms, since at first results are collected from every test function [...]

It's very inconvenient to not know which test is running at the moment you are testing them. This would only be noticeable when using an IDE, but it is annoying when debugging for example.

It seems that this is indeed not fixable, in my tests, if fireTestIgnored is called after a fireTestStarted it just duplicates the result in the IDE's dashboard (considered a terminal event I guess). Using a fireTestAssumptionFailed is possible, but it doesn't matter because there's another issue which messes with tests results that happens to contain any error, as they will have a fireTestStarted but the fireTestFailure/fireTestFinished will have to wait, resulting in the dashboard hanging and not updating the next results until all of them finish running.

[...] add some delay between tests to reduce overload, retry a test if it failed, or a combination of both.

A combination of them would be more appropriate here I think, maybe exponential backoff (when retrying each test) would be a good fit.

Another thing i'd like to get feedback on is how the value for failRatioTriggeringIgnore should be chosen. Is 0.6=60% ok?

Not sure how, but 60% seems like an ok value.


One thing about your implementation though: I think you over complicated things by subclassing that class, it'd be better if you subclass the (default) BlockJUnit4ClassRunner instead. It'd also fix the bug that you can't run individual tests anymore.

Look at this similar runner example using it (not exactly the same behavior, just to demonstrate).
public class NewPipeTestRunner extends BlockJUnit4ClassRunner {

    public NewPipeTestRunner(Class<?> klass) throws InitializationError {
        super(klass);
    }

    private int testFailCount = 0;
    private int testIgnoredByUserCount = 0;
    private final float failRatioTriggeringIgnore = .6f; // Get from annotation

    private final List<MethodResultCollector> result = new ArrayList<>();

    private class MethodResultCollector {
        private final Description methodDescription;
        private final boolean ignoredByUser;
        private String ignoredByUserReason = "";
        private final Throwable thrownException;

        MethodResultCollector(FrameworkMethod frameworkMethod, Description description, Throwable thrownException) {
            this.ignoredByUser = frameworkMethod.getMethod().isAnnotationPresent(Ignore.class);
            this.thrownException = thrownException;
            this.methodDescription = description;

            if (ignoredByUser) {
                ignoredByUserReason = frameworkMethod.getMethod().getAnnotation(Ignore.class).value();
            }
        }

        boolean isIgnoredByUser() {
            return ignoredByUser;
        }

        boolean isSuccessful() {
            return thrownException == null;
        }

        void showResults(RunNotifier notifier, boolean ignoreAnyError) {
            if (ignoredByUser) {
                notifier.fireTestIgnored(methodDescription);
                System.out.println(methodDescription.getMethodName() + "() ignored because of @Ignore" +
                        (!ignoredByUserReason.isEmpty() ? (": " + ignoredByUserReason) : ""));
            } else if (thrownException != null) {
                if (thrownException instanceof AssumptionViolatedException) {
                    notifier.fireTestAssumptionFailed(new Failure(methodDescription, thrownException));
                    System.out.println(methodDescription.getMethodName() + "() ignored since it threw an AssumptionViolatedException, " +
                            "message = " + thrownException.getMessage());
                    thrownException.printStackTrace(System.out);

                } else if (thrownException instanceof ReCaptchaException) {
                    notifier.fireTestIgnored(methodDescription);
                    System.out.println(methodDescription.getMethodName() + "() ignored since it threw a ReCaptchaException");
                    thrownException.printStackTrace(System.out);

                } else if (ignoreAnyError) {
                    notifier.fireTestIgnored(methodDescription);
                    System.out.println(methodDescription.getMethodName() + "() ignored since the whole class " + getTestClass().getName() +
                            " has more than " + (int) (failRatioTriggeringIgnore * 100) + "% of failed tests");
                    thrownException.printStackTrace(System.out);

                } else {
                    notifier.fireTestStarted(methodDescription);
                    notifier.fireTestFailure(new Failure(methodDescription, thrownException));
                    notifier.fireTestFinished(methodDescription);
                }
            } else {
                notifier.fireTestStarted(methodDescription);
                notifier.fireTestFinished(methodDescription);
            }
        }

    }

    @Override
    public void run(RunNotifier notifier) {
        super.run(notifier);

        for (MethodResultCollector resultCollector : result) {
            if (!resultCollector.isSuccessful()) testFailCount++;
            else if (resultCollector.isIgnoredByUser()) testIgnoredByUserCount++;
        }

        final int notIgnoredTestCount = testCount() - testIgnoredByUserCount;
        final boolean ignoreAll = notIgnoredTestCount > 1 && (float) testFailCount / notIgnoredTestCount >= failRatioTriggeringIgnore;

        for (MethodResultCollector resultCollector : result) {
            resultCollector.showResults(notifier, ignoreAll);
        }
    }


    @Override
    protected void runChild(FrameworkMethod method, RunNotifier notifier) {
        final Description description = describeChild(method);

        @Nullable Throwable error = null;
        try {
            methodBlock(method).evaluate();
        } catch (Throwable e) {
            error = e;
        }

        result.add(new MethodResultCollector(method, description, error));
    }
}

So, to sum things up about this:

Pros:

  • Would let tests succeed in case of a service outage or instability.

Cons:

  • Sudden changes to the service would lead to ignoring the (real) failing tests.
  • False-positives of PR/tests passing, when in reality, it's not.
  • Need to check the log every time (questionable, but a con anyway).
  • Messed up timing in tests (matters when using an IDE).
  • If a test is ignored we would have to either trigger a rebuild or test locally anyway, we couldn't trust that it is passing if it's ignored.

In the end, I ended up having more cons about this than pros, tell me your opinion about this list.

I believe a simple retry (with the aforementioned exponential backoff) in the downloader would be enough for this project. And if the ReCaptchas are the problem, I don't think just ignoring them is the solution (refer to the last con in the list).

Doing like youtube-dl, that is, separating the core tests from the services which are classified as "Allowed Failures" in the CI, also sounds good.

import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class NewPipeTestRunner extends Runner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to my comment about the superclass. Some reasons are:

  • Individual tests can't be run anymore.
  • Hard-to-track bugs may rise from the custom implementation, as it isn't tested.

@Stypox
Copy link
Member Author

Stypox commented Oct 28, 2019

Thank you for the feedback, I will answer and try to address the issues you pointed out in the next days :-D

It works the same as the default junit runner except for these things:
- when a test function throws a `ReCaptchaException` the test is ignored with a message and a stack trace.
- when the failedTests/totalNotIgnoredTests ratio is above `errorRatioTriggeringIgnore`, all failing tests for that class are ignored. Probably YouTube returned something wrong.
- The time it takes to conduct any test is displayed as 0ms
Instead of the constant variable used before, now the user can define it. For more info see documentation in NewPipeTestRunnerOptions.java
Now exceptions coming from the implementation are treated correctly (causing the program to crash) instead of only making the current test fail.
It is >=1.0f so that it makes the runner never ignore any failed test, disabling the feature
This removes much of the code that was used to get and invoke methods. Now the class Runner is also able to run individual tests.
It is too risky to silently ignore tests that actually failed. The fact that many tests in a class fail indicates that there is a problem in the set-up methods (`@BeforeClass`). We can't assume that the problem in set-up methods is a network error, since it could as well be a programming error.

Also `ReCaptchaException`s are now considered a violated assumption (same as AssumptionViolatedException), since that's what they are. This allows to call `fireTestStarted` before invoking the tested method, thus correctly mesuring the elapsed time.
This will prevent overloads on services' servers.
Apparently age restricted videos have no related videos (on invidious it's the same, too)
Thanks to base class YoutubeStreamExtractorBaseTest
@Stypox
Copy link
Member Author

Stypox commented Nov 4, 2019

Now I have removed the failRatioTriggeringIgnore. After thinking more about it I agree with you.
This gives me the possibility to run tests normally (i.e. not caching results for later) so time is now measured properly.

I added three new options to the NewPipeTestRunnerOptions class:

  • classDelayMs: how many milliseconds to wait before executing the annotated class.
  • methodDelayMs: how many milliseconds to wait before executing each test method in the annotated class.
  • retry: how many times the tests should be retried before deeming them as failed.

The thing I didn't implement about time is exponential backoff, because I have no idea how it should work (which base should I use? Is it ok for the exponent to just be the current retry?)

I switched to BlockJUnit4ClassRunner and the code shrunk by a lot. Thanks for the suggestion ;-)

@mauriciocolli

@Stypox
Copy link
Member Author

Stypox commented Apr 10, 2020

Closing, as the only useful part of this PR is now in #309

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants