-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cookie test to serializer #56
Conversation
test/testSerializer.test.ts
Outdated
expect(res).toBe( | ||
"\ | ||
WARC/1.0\r\n\ | ||
content-length: 62\r\n\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should have the keepHeadersCase
apply only to httpHeaders, and always use a Map for the WARC headers. WARC doesn't really have any special fields like Set-Cookie and would be concerned about breaking compatibility with other tools if we produced lowercase WARC headers. Simplest solution is probably to always use a Map for WARC headers and only apply this to the http headers.
Can make that change here:
https://github.com/webrecorder/warcio.js/blob/main/src/lib/warcrecord.ts#L106
Perhaps should even change the flag to keepHttpHeadersCase
, though that'd be more of a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya I thought about a bigger refactor of the httpHeaderCase but left it out of this PR because of breaking change. I thought what about requiring httpHeaders to be Headers class or Map and let the user decide instead of the library handling it. Or default to using Headers internally and have the keepHeadersCase convert to map before returning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger issue now that node supports multiple cookies natively is that going forward keepHeadersCase=true
will not parse or create valid warc responses due to multiple set-cookie responses without a lot of internally handling with storing an array in the map and getting it back out. Might be better to depreciate that and require 18.14.2
with a breaking change at some point to "use the platform."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is solvable in user-land as well or a compatibility layer depending on what you want to maintain. The user can pass in a Map for old functionality or instance of Headers if case sensitive, and pass to native Headers for object or array.
class CaseSensitiveHeaders extends Headers {
sensitiveNames = new Map();
constructor(init) {
super(init);
}
append(name, value) {
this.sensitiveNames.set(name.toLowerCase(), name);
return super.append(name, value);
}
* [Symbol.iterator]() {
for (const [lowercaseName, sensitiveName] of this.sensitiveNames) {
yield [sensitiveName, super.get(lowercaseName)];
}
}
}
const t2 = new CaseSensitiveHeaders();
t2.append('Header', 'value');
t2.append('header2', 'value2');
for (const [key, value] of t2) {
console.log(key, value);
}
Or push the issues upstream since there is the .name
in native case but not sure there is an easy way to access it.
const t = new Headers();
t.append('Header', 'value');
t.append('header2', 'value2');
console.log(t);
HeadersList {
cookies: null,
[Symbol(headers map)]: Map(2) {
'header' => { name: 'Header', value: 'value' },
'header2' => { name: 'header2', value: 'value2' }
},
[Symbol(headers map sorted)]: null
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARC doesn't really have any special fields like Set-Cookie and would be concerned about breaking compatibility with other tools if we produced lowercase WARC headers. Simplest solution is probably to always use a Map for WARC headers and only apply this to the http headers.
The problem with Map for the WARC headers is accessing them in a case insensitive way later. warcHeaders.get('Warc-Record-ID')
. You'd have to write your own Map to convert everything to lowercase internally which is what Headers
is already doing. However I agree lowercase is not a great "look" for these. Could look at canonicalizing them afterwards or internally with your own map implementation like https://github.com/datatogether/warc/blob/master/header.go#L23 . It would be something like Capital-Letter-Except-WARC-And-ID
should get close with maybe a simple regex. A bit more maintenance cost and technically they are supposed to be case-insensitive anyway. Or let the user decide again by passing in their own Map or Headers if they don't like the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thought (sorry for the random string of thoughts). Maybe there needs to be a custom map/header implementation no matter what since the goal of these types of projects is archiving. To some that may mean preserving the case, but more importantly it means saving the headers as the server sends them which may mean invalid to spec headers that Map and Headers is not going to account for. Or should you be true to the standards of what the spec allows which means you are not archiving it exactly as delivered.
One class similar to to node-fetch's headers that would handle an array of values of any type (Set-Cookie, X-Robots-Tag, or any custom values that are outside the spec), perhaps canonicalizing them, preserving, or lowercasing.
always use Map for warc headers, choose map/headers based on keepHeadersCase for http headers only fix warcrecord headers to be Record<string, any>
Research issue #55