Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cookie test to serializer #56

Merged
merged 8 commits into from
Aug 9, 2023
Merged

Add cookie test to serializer #56

merged 8 commits into from
Aug 9, 2023

Conversation

patrickheeney
Copy link
Contributor

@patrickheeney patrickheeney commented Apr 4, 2023

Research issue #55

expect(res).toBe(
"\
WARC/1.0\r\n\
content-length: 62\r\n\
Copy link
Member

@ikreymer ikreymer Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should have the keepHeadersCase apply only to httpHeaders, and always use a Map for the WARC headers. WARC doesn't really have any special fields like Set-Cookie and would be concerned about breaking compatibility with other tools if we produced lowercase WARC headers. Simplest solution is probably to always use a Map for WARC headers and only apply this to the http headers.

Can make that change here:
https://github.com/webrecorder/warcio.js/blob/main/src/lib/warcrecord.ts#L106

Perhaps should even change the flag to keepHttpHeadersCase, though that'd be more of a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I thought about a bigger refactor of the httpHeaderCase but left it out of this PR because of breaking change. I thought what about requiring httpHeaders to be Headers class or Map and let the user decide instead of the library handling it. Or default to using Headers internally and have the keepHeadersCase convert to map before returning.

Copy link
Contributor Author

@patrickheeney patrickheeney Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger issue now that node supports multiple cookies natively is that going forward keepHeadersCase=true will not parse or create valid warc responses due to multiple set-cookie responses without a lot of internally handling with storing an array in the map and getting it back out. Might be better to depreciate that and require 18.14.2 with a breaking change at some point to "use the platform."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is solvable in user-land as well or a compatibility layer depending on what you want to maintain. The user can pass in a Map for old functionality or instance of Headers if case sensitive, and pass to native Headers for object or array.

class CaseSensitiveHeaders extends Headers {
    sensitiveNames = new Map();

    constructor(init) {
        super(init);
    }

    append(name, value) {
        this.sensitiveNames.set(name.toLowerCase(), name);
        return super.append(name, value);
    }

    * [Symbol.iterator]() {
        for (const [lowercaseName, sensitiveName] of this.sensitiveNames) {
            yield [sensitiveName, super.get(lowercaseName)];
        }
    }
}

const t2 = new CaseSensitiveHeaders();
t2.append('Header', 'value');
t2.append('header2', 'value2');
for (const [key, value] of t2) {
    console.log(key, value);
}

Or push the issues upstream since there is the .name in native case but not sure there is an easy way to access it.

const t = new Headers();
t.append('Header', 'value');
t.append('header2', 'value2');
console.log(t);

HeadersList {
  cookies: null,
  [Symbol(headers map)]: Map(2) {
    'header' => { name: 'Header', value: 'value' },
    'header2' => { name: 'header2', value: 'value2' }
  },
  [Symbol(headers map sorted)]: null
}

Copy link
Contributor Author

@patrickheeney patrickheeney Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARC doesn't really have any special fields like Set-Cookie and would be concerned about breaking compatibility with other tools if we produced lowercase WARC headers. Simplest solution is probably to always use a Map for WARC headers and only apply this to the http headers.

The problem with Map for the WARC headers is accessing them in a case insensitive way later. warcHeaders.get('Warc-Record-ID'). You'd have to write your own Map to convert everything to lowercase internally which is what Headers is already doing. However I agree lowercase is not a great "look" for these. Could look at canonicalizing them afterwards or internally with your own map implementation like https://github.com/datatogether/warc/blob/master/header.go#L23 . It would be something like Capital-Letter-Except-WARC-And-ID should get close with maybe a simple regex. A bit more maintenance cost and technically they are supposed to be case-insensitive anyway. Or let the user decide again by passing in their own Map or Headers if they don't like the defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thought (sorry for the random string of thoughts). Maybe there needs to be a custom map/header implementation no matter what since the goal of these types of projects is archiving. To some that may mean preserving the case, but more importantly it means saving the headers as the server sends them which may mean invalid to spec headers that Map and Headers is not going to account for. Or should you be true to the standards of what the spec allows which means you are not archiving it exactly as delivered.

One class similar to to node-fetch's headers that would handle an array of values of any type (Set-Cookie, X-Robots-Tag, or any custom values that are outside the spec), perhaps canonicalizing them, preserving, or lowercasing.

@ikreymer ikreymer merged commit 8f8f957 into webrecorder:main Aug 9, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants