Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workarounds for anti-framing scripts #636

Open
Mr0grog opened this issue Nov 11, 2020 · 3 comments
Open

Add workarounds for anti-framing scripts #636

Mr0grog opened this issue Nov 11, 2020 · 3 comments
Labels

Comments

@Mr0grog
Copy link
Member

Mr0grog commented Nov 11, 2020

Some pages have anti-framing/anti-clickjacking code that checks whether the page is in a frame and hides the content and/or attempts to redirect the top frame to the page. For example, the https://www.census.gov/programs-surveys/economic-census.html has this code in the <head>:

<head>
  ...
  <style id="antiClickjack">body { display: none; }</style>
  <script type="text/javascript">
    if (self === top) {
      var antiClickjack =  document.getElementById("antiClickjack");
      antiClickjack.parentNode.removeChild(antiClickjack);
    } else {
      top.location = self.location
    }
  </script>
  ...
</head>

Since we show pages in iframes, this is a problem. We set restrictions on the frame’s code so it can’t redirect the top frame, but this still leaves us with a blank page (and in some cases, a broken page because the script might throw an exception). Some workaround ideas:

  • Inject a script that runs after page load (or maybe just at the end of the page?) checks whether the html or body element’s computed style has display: none or visibility: hidden. If so, explicitly set the elements’ style to display: block; visibility: visible;. Something like:

    [document.documentElement, document.body].forEach(function ensureVisible(element) {
        style = getComputedStyle(element);
        // Check and set these in one go because setting one and then checking
        // the next will cause layout thrashing.
        if (style.display === 'none' || style.visibility === 'hidden') {
            element.style.display = 'block';
            element.style.visibility = 'visible';
        }
    });

    Some downsides: won’t fix scripts that errored out, won’t work if the thing being hidden is some arbitrary wrapper element in the page (although we could maybe come up with some heuristics for that).

  • Wrap any scripts on the page in a with block that acts as a proxy for the window. For example, we’d transform the above example from census.gov to:

    <head>
      ...
      <style id="antiClickjack">body { display: none; }</style>
    
      <!-- Insert this element before the first <script> tag -->
      <script type="text/javascript">
        // Create a fake `window` object that makes `self` and `top` look identical.
        if (window.Proxy) {
          window.WINDOW_PROXY = new Proxy(window, {
            get (target, prop, receiver) {
              if (prop === "top" || prop === "self" || prop === "window") {
                return receiver;
              }
              return Reflect.get(target, prop, target);
            }
          });
        }
        else {
          window.WINDOW_PROXY = {self: window, top: window};
        }
      </script>
    
      <!-- Wrap the contents of any <script> tags in `with (WINDOW_PROXY) {...}` -->
      <script type="text/javascript">
        // Wrap the original contents of the script so properties are grabbed from
        // a special proxy object.
        with (WINDOW_PROXY) {
          if (self === top) {
            var antiClickjack =  document.getElementById("antiClickjack");
            antiClickjack.parentNode.removeChild(antiClickjack);
          } else {
            top.location = self.location
          }
        }
      </script>
      ...
    </head>

    Also not perfect: it only covers scripts that are in the page, rather than external references (i.e. <script src="some_url"></script>); the fallback version that doesn’t use Proxy could be error-prone in other ways (maybe just don’t support that case?). We could also expand this approach to solve some of the things that the iframe sandbox is causing errors with (e.g. referencing or setting document.cookie).

  • REALLY complex: add a service worker to essentially do the above to external scripts. This probably won’t work in a lot of cases (service workers don’t always apply) and may not really be worthwhile. It’s probably better accomplished by something even more messy: rewriting all [script] URLs so that the front-end server proxies them, and have it do this wrapping. On the other hand, proxying & rewriting (kinda like Wayback/the memento API) will solve lots of other issues, like CORS problems.

  • Any other ideas? These are the only two obvious approaches that jump out at me.

It might make the most sense to do a combination of the above. We could also push this into the HTML differ. instead of doing it here in the front-end.

@Mr0grog Mr0grog added the bug label Nov 11, 2020
@Mr0grog
Copy link
Member Author

Mr0grog commented Feb 17, 2021

FWIW, the “right” long-term solution is that we need to serve pages and diffs through a proxy (which probably needs its own subdomain to be safe) that acts kind of like a Memento API, and maybe uses Wombat (from PyWB). That’s a lot of work, though, and I’m thinking of smaller improvements we can make here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant