Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Diff-Name with resource name #2

Merged
merged 9 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 69 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The core issue revolves around the mechanism by which these filter lists are upd

* Bandwidth Consumption: Continuously downloading the entire list, especially for minor updates, consumes unnecessary bandwidth.
* Latency: Each full download requires more time than if only the changes were fetched.
* Server Load: Hosting servers experience an unnecessary load
* Server Load: Hosting servers experience an unnecessary load.

## Solution

Expand All @@ -20,31 +20,62 @@ This approach significantly reduces bandwidth consumption, minimizes latency, an

### Changes To Filter Lists Metadata

In order to use the differential update mechanism, we propose several new metadata fields.
#### `! Diff-Path`

#### `! Diff-Path:`
In order to use the differential update mechanism we propose to add one new field to the filter list metadata: `Diff-Path`.

This field will provide the relative path where the differential file (diff) for the filter list can be found. This differential file will take the user from their current version of the filter list to the next version. Crucially, within this differential update, the `Diff-Path` field will be updated to point to the subsequent version's diff. This ensures that the ad blocker knows where to find the next differential update.

* `Diff-Path` must be a relative path to the filter list file.
`Diff-Path` also encodes additional information in the file name:

```adblock
${patchName}[-${resolution}]-${epochTimestamp}-${expirationPeriod}.patch[#${resourceName}]
```

* `patchName` - name of the patch file, an arbitrary string to identify the patch.
* `epochTimestamp` - epoch timestamp when the patch was generated (the unit of that timestamp depends on `resolution`, see below).
* `expirationPeriod` - expiration time for the diff update (the unit depends on `resolution`, see below).
* `resolution` - is an optional field, that specifies the resolution for both `expirationPeriod` and `epochTimestamp`. It can be either `h` (hours), `m` (minutes) or `s` (seconds). If `resolution` is not specified, it is assumed to be `h`.
* `resourceName` - a name of the resource that is being patched. This is used to support batch updates, see [Batch Updates](#batch-updates) section for more details.

The following limitations are imposed on the `Diff-Path`:

* `Diff-Path` MUST be a relative path to the filter list file, i.e. `/list-472234-1.patch` or `../list-472234-1.patch` or similar.
* `Diff-Path` is a mandatory field for enabling the differential updates mechanism.
* `Diff-Path` MUST point to a file name with the name format conforming to the format described above. If the file name is different, the field is considered invalid and the differential update mechanism is disabled for the filter list.
* `patchName` MUST be a string of length 1-64 with no spaces or other special characters. Validation regex: `[a-zA-Z0-9_.]{1,64}`.
* `epochTimestamp` MUST be a valid epoch timestamp (considering the unit specified in the `resolution` field).
* `expirationPeriod` MUST be a positive integer.
* `resourceName` is an optional part, it's explained in the [Resource name](#resource-name) section.

#### `! Diff-Name:`
#### Examples

This field is only mandatory for filter lists that support batch differential updates. It specifies the name of the resource to be patched. See the [Batch Updates](#batch-updates) section for more details.
* `list1_v1.0.0-m-28334180-60.patch#list1`
* Patch name is `list1_v1.0.0`.
* Resolution is set to `m` (minutes).
* Timestamp is `28334180` minutes from epoch, i.e. `Wed, 15 Nov 2023 12:20:00 GMT`.
* Expiration period is `60` minutes.
* Resource name is set to `list1`.
* `list1_v1.0.0-472236-1.patch`
* Patch name is `list1_v1.0.0`.
* Resolution is not specified so it is assumed to be `h` (hours).
* Timestamp is `472236` hours from epoch, i.e. `Wed, 15 Nov 2023 12:00:00 GMT`.
* Expiration period is `1` hour.
* Resource name is not specified, i.e. the patch does not support batch updates.

* `Diff-Name` must be a string of length 1-64. Validation regex: `[a-zA-Z0-9-_ ]{1,64}`.
* `Diff-Name` is only mandatory when the filter list supports batch differential updates. In all other cases it is ignored.
#### Resource name

#### `! Diff-Expires:`
If a list supports batch updates, the `Diff-Path` MUST also have a "hash" part, i.e. `/path.patch#resourceName`. This "hash" is the name of the resource to be patched. In this case, the ad blocker will only download the diff file once and then apply it to all lists that are specified in the diff file. See the [Batch Updates](#batch-updates) section for more details.

This is essentially a time-to-live (TTL) for the differential update. It dictates the frequency with which the ad blocker should attempt to fetch the differential update. For instance, if the `Diff-Expires` is set to `1 hour`, it means the ad blocker should attempt to download the differential update once every hour.
Later in the document it will be referred as "resource name".

`! Diff-Expires:` is an optional field. If it is not set, the ad blocker may fallback to `! Expires:` or to some pre-defined default value. It is recommended to have it specified to avoid inconsistency between different ad blockers.
* This part is only mandatory when the filter list supports batch differential updates.
* The "hash" part of the URL must be a string of length 1-64. Validation regex: `[a-zA-Z0-9-_]{1,64}`.
* When specified, `diff name` directive in the diff file MUST match the resource name, see [Diff Files Format](#diff-files-format) for more details.

#### `! Expires:`

`Expires` continues to work as it was working before, i.e. once in a while AdGuard will do the so-called **"full sync"**. When differential updates are available it is recommended to increase the value of `Expires` to a large value, e.g. `10 days`. This will ensure that the ad blocker will not do the full sync too often.
`Expires` continues to work as it was working before, i.e. once in a while the ad blocker will do the so-called **"full sync"**. When differential updates are available it is recommended to increase the value of `Expires` to a large value, e.g. `10 days`. This will ensure that the ad blocker will not do the full sync too often.

#### Diff Files Format

Expand All @@ -54,12 +85,14 @@ In order to support batch updates and be able to validate patch result, the stan

`diff name:[name] checksum:[checksum] lines:[lines]`

* `name` - name of a corresponding filter list (see `Diff-Name`).
* `name` - name of a corresponding filter list. It is only mandatory when [resource name](#resource-name) is specified in the list.
* `checksum` - the expected SHA1 checksum of the file after the patch is applied. This is used to validate the patch.
* `lines` - the number of lines that follow that make up the RCS diff block. Note, that `lines` are counted using the same algorithm as used by `wc -l`, i.e. it basically counts `\n`.

`diff` directive is optional. If it is not specified, the patch is applied without validation.

Note, that it is possible to extend the `diff` directive with additional fields not specified in the spec. The implementation should be able to ignore unknown fields.

> It is recommended to use the `diff checksum:` directive to validate the patching result. This will ensure that the patch is applied correctly and the resulting file is not corrupted.

### Algorithm
Expand All @@ -68,27 +101,33 @@ In order to support batch updates and be able to validate patch result, the stan

1. Refer to the `Diff-Path` to see if a differential update is available.
* If there are several lists with the same `Diff-Path`, download the diff file only once. Refer to the [Batch Updates](#batch-updates) section for the details on how batch patches are applied.
* Calculate the patch expiration date: `(${epochTimestamp} + ${expirationPeriod}) * ${resolution}`. If the expiration date in the past, the patch is considered expired and the ad blocker SHOULD attempt to download the update.
1. If the differential update is available, download and apply it to the current filter list.
* Once the differential update is applied, the `Diff-Path` within the list will be updated to point to the next differential update.
* At this point the ad blocker may decide either to wait for the `Diff-Expires` period and then try again or to immediately try to fetch the next differential update.
* If the differential update is not empty and applied, the `Diff-Path` within the list MUST be updated to point to the next differential update.
* At this point the ad blocker may decide to wait for a while before checking for the next differential update (see [2. Set Update Timer](#2-set-update-timer)) to ensure that the server is not overloaded.
1. If the differential update is not available the server may signal about that by returning one of the following responses:
* `404 Not Found`
* `200 OK` with empty content (content length is 0)
* `204 No Content`

In this case the ad blocker SHOULD wait for the `Diff-Expires` period and then try again.
In this case the ad blocker SHOULD wait for a while and then try again, see [2. Set Update Timer](#2-set-update-timer).

#### 2. Set Update Timer

Using the `Diff-Expires` value, set a timer for the next update check. For example, if it's set to `1 hour`, the ad blocker will wait for that duration before checking the `Diff-Path` again.
The update timer depends on the previous update check result.

1. If the differential update was not empty and applied successfully, the ad blocker SHOULD check the new `Diff-Path` file expiration time.
* If the expiration time is in the future, the ad blocker SHOULD wait until that time and then check for the update again.
* If the expiration time is in the past, the ad blocker SHOULD try to download the new patch and apply it.
1. If the differential update was empty and the list's `Diff-Path` stayed the same, the ad blocker SHOULD delay the next update for at least 30 minutes to avoid overloading the server.

#### 3. Fallback Mechanism

Any unexpected error during the update process should be treated as a fatal error and the ad blocker should wait until it is time for the full sync. Note, that it should respect the `Expires` value set by the filter list.
Any unexpected error during the update process SHOULD be treated as a fatal error and the ad blocker should wait until it is time for the full sync. Note, that it should respect the `Expires` value set by the filter list.

### Batch Updates

The mechanism allows having a single diff file for multiple filter lists. In order to achieve this, the `Diff-Name` field MUST be specified for each filter list that supports batch differential updates. The `Diff-Name` field is then used to match a filter list with its corresponding patch in the diff file. This is achieved by using the `diff name:` directive in the diff file which links a patch to a filter list.
The mechanism allows having a single diff file for multiple filter lists. In order to achieve this, the [resource name](#resource-name) MUST be specified for each filter list that supports batch differential updates. The [resource name](#resource-name) is then used to match a filter list with its corresponding patch in the diff file. This is achieved by using the `diff name:` directive in the diff file which links a patch to a filter list.

* The list that is getting patched MUST have this exact patch file specified in `Diff-Path`.
* If a filter list specified inside the batch patch is not installed in the ad blocker, the patch for this file SHOULD be ignored.
Expand All @@ -104,39 +143,39 @@ Let's take an example:

```adblock
! Title: List 1
! Diff-Path: ../patches/batch.patch
! Diff-Name: list1
! Diff-Path: ../patches/batch-28334120-60.patch#list1
```

* `Diff-Path` is relative to `list1.txt` location so the final URL of the diff file will be `https://example.com/patches/batch.patch`.
* `Diff-Name` is mandatory for lists that support batch differential updates.
* [Resource name](#resource-name) is set to `list1` here. It is mandatory for lists that support batch differential updates.
* Patch expiration period is set to `60` minutes, creation time is set to `28334120` (Unix timestamp in minutes).

* List 2

List URL is `https://example.com/list2/list2.txt`.

```adblock
! Title: List 2
! Diff-Path: ../patches/batch.patch
! Diff-Name: list2
! Diff-Path: ../patches/batch-28334120-60.patch#list2
```

* `Diff-Path` is relative to `list2.txt` location so the final URL of the diff file will be `https://example.com/patches/batch.patch`.
* `Diff-Name` is mandatory for lists that support batch differential updates.

* `batch.patch`
* [Resource name](#resource-name) is set to `list2` here. It is mandatory for lists that support batch differential updates.
* Patch expiration period is set to `60` minutes, creation time is set to `28334120` (Unix timestamp in minutes).

* `batch-28334120-60.patch`

A file that contains patches for both `list1.txt` and `list2.txt`. It uses the `diff name:` directive to point at which patch should be applied to which list.

```diff
```diff
diff name:list1 checksum:e3c9c883378dc2a3aec9f71578c849891243bc2c lines:3
d2 1
a2 1
! Diff-Path: patches/batch_new.patch
! Diff-Path: patches/batch_new.patch#list1
diff name:list2 checksum:be09384422b8d7f20da517d1245360125868f0b9 lines:3
d2 1
a2 1
! Diff-Path: patches/batch_new.patch
! Diff-Path: patches/batch_new.patch#list2
```

### Examples
Expand Down
10 changes: 6 additions & 4 deletions examples/01_simple/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,18 @@
A filter list with differential updates support.

* [filter_v1.0.0.txt](./filter_v1.0.0.txt) - the oldest version of the list.
* [patches/v1.0.0.patch](./patches/v1.0.0.patch) - a patch that provides differential update from `v1.0.0` to `v1.0.1`.
* [patches/v1.0.0-472234-1.patch](./patches/v1.0.0-472234-1.patch) - a patch that provides differential update from `v1.0.0` to `v1.0.1`.
Expiration period is set to `1` hour, creation time is set to `472234` (Unix timestamp in hours, i.e. `Wed, 15 Nov 2023 10:00:00 GMT`).
* [filter_v1.0.1.txt](./filter_v1.0.1.txt) - the next version of the list.
* [patches/v1.0.1.patch](./patches/v1.0.0.patch) - a patch that provides differential update from `v1.0.1` to `v1.0.2`.
* [patches/v1.0.1-472235-1.patch](./patches/v1.0.0-472234-1.patch) - a patch that provides differential update from `v1.0.1` to `v1.0.2`.
Expiration period is set to `1` hour, creation time is set to `472235` (Unix timestamp in hours, i.e. `Wed, 15 Nov 2023 11:00:00 GMT`).
* [filter.txt](./filter.txt) - the final version of the list. After all differential updates are applied you should get this version.

## How patch files were prepared

The patches are created using the `diff` utility:

```shell
diff -n filter_v1.0.0.txt filter_v1.0.1.txt > patches/v1.0.0.patch
diff -n filter_v1.0.1.txt filter.txt > patches/v1.0.1.patch
diff -n filter_v1.0.0.txt filter_v1.0.1.txt > patches/v1.0.0-472234-1.patch
diff -n filter_v1.0.1.txt filter.txt > patches/v1.0.1-472235-1.patch
```
2 changes: 1 addition & 1 deletion examples/01_simple/filter.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.2
! Diff-Path: patches/v1.0.2.patch
! Diff-Path: patches/v1.0.2-472236-1.patch
||example.net^
2 changes: 1 addition & 1 deletion examples/01_simple/filter_v1.0.0.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.0
! Diff-Path: patches/v1.0.0.patch
! Diff-Path: patches/v1.0.0-472234-1.patch
||example.org^
2 changes: 1 addition & 1 deletion examples/01_simple/filter_v1.0.1.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.1
! Diff-Path: patches/v1.0.1.patch
! Diff-Path: patches/v1.0.1-472235-1.patch
||example.com^
5 changes: 5 additions & 0 deletions examples/01_simple/patches/v1.0.0-472234-1.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
d2 3
a4 3
! Version: v1.0.1
! Diff-Path: patches/v1.0.1-472235-1.patch
||example.com^
5 changes: 0 additions & 5 deletions examples/01_simple/patches/v1.0.0.patch

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
d2 3
a4 3
! Version: v1.0.2
! Diff-Path: patches/v1.0.2.patch
! Diff-Path: patches/v1.0.2-472236-1.patch
||example.net^
19 changes: 13 additions & 6 deletions examples/02_validation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,42 @@
Example of a filter list with differential updates support where the patch files also contain the `diff` directive that can be used to validate the patching result.

* [filter_v1.0.0.txt](./filter_v1.0.0.txt) - the oldest version of the list.
* [patches/v1.0.0.patch](./patches/v1.0.0.patch) - a patch that provides differential update from `v1.0.0` to `v1.0.1`.
* [patches/v1.0.0-m-28334060-60.patch](./patches/v1.0.0-m-28334060-60.patch) - a patch that provides differential update from `v1.0.0` to `v1.0.1`.
* [filter_v1.0.1.txt](./filter_v1.0.1.txt) - the next version of the list.
* [patches/v1.0.1.patch](./patches/v1.0.0.patch) - a patch that provides differential update from `v1.0.1` to `v1.0.2`.
* [patches/v1.0.1-m-28334120-60.patch](./patches/v1.0.1-m-28334120-60.patch) - a patch that provides differential update from `v1.0.1` to `v1.0.2`.
* [filter.txt](./filter.txt) - the final version of the list. After all differential updates are applied you should get this version.
* [patches/v1.0.2-m-28334180-60.patch](./patches/v1.0.2-m-28334180-60.patch) - empty patch that signals that there's no patch available for the next update yet.

Note, that resolution is specified in the patch names (`m` for minutes).

## How patch files were prepared

The patches are created using the `diff` utility:

```shell
# Calculating the RFC diff for filter_v1.0.1.txt.
diff -n filter_v1.0.0.txt filter_v1.0.1.txt > patches/v1.0.0.patch
diff -n filter_v1.0.0.txt filter_v1.0.1.txt > patches/v1.0.0-m-28334060-60.patch

# Calc the SHA1 sum of filter_v1.0.1.txt and append it to the patch file.
FILENAME="filter_v1.0.1.txt" && \
PATCHFILE="patches/v1.0.0.patch" && \
PATCHFILE="patches/v1.0.0-m-28334060-60.patch" && \
SHASUM=$(shasum -a 1 $FILENAME | awk '{print $1}') && \
NUMLINES=$(wc -l < $PATCHFILE | awk '{print $1}') && \
echo "diff checksum:$SHASUM lines:$NUMLINES" | cat - $PATCHFILE > temp.patch && \
mv temp.patch $PATCHFILE

# Calculating the RFC diff for filter.txt.
diff -n filter_v1.0.1.txt filter.txt > patches/v1.0.1.patch
diff -n filter_v1.0.1.txt filter.txt > patches/v1.0.1-m-28334120-60.patch

# Calc the SHA1 sum of filter.txt and append it to the patch file.
FILENAME="filter.txt" && \
PATCHFILE="patches/v1.0.1.patch" && \
PATCHFILE="patches/v1.0.1-m-28334120-60.patch" && \
SHASUM=$(shasum -a 1 $FILENAME | awk '{print $1}') && \
NUMLINES=$(wc -l $PATCHFILE | awk '{print $1}') && \
echo "diff checksum:$SHASUM lines:$NUMLINES" | cat - $PATCHFILE > temp.patch && \
mv temp.patch $PATCHFILE

# Make an empty patch file to signal that there's no patch available for the
# next update yet.
touch patches/v1.0.2-m-28334180-60.patch
```
2 changes: 1 addition & 1 deletion examples/02_validation/filter.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.2
! Diff-Path: patches/v1.0.2.patch
! Diff-Path: patches/v1.0.2-m-28334180-60.patch
||example.net^
2 changes: 1 addition & 1 deletion examples/02_validation/filter_v1.0.0.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.0
! Diff-Path: patches/v1.0.0.patch
! Diff-Path: patches/v1.0.0-m-28334060-60.patch
||example.org^
2 changes: 1 addition & 1 deletion examples/02_validation/filter_v1.0.1.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
! Title: Diff Updates Simple Example List
! Version: v1.0.1
! Diff-Path: patches/v1.0.1.patch
! Diff-Path: patches/v1.0.1-m-28334120-60.patch
||example.com^
6 changes: 6 additions & 0 deletions examples/02_validation/patches/v1.0.0-m-28334060-60.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
diff checksum:1ce52b527d56a245f32138e014b1571c19cfb659 lines:4
d2 3
a4 3
! Version: v1.0.1
! Diff-Path: patches/v1.0.1-m-28334120-60.patch
||example.com^
6 changes: 0 additions & 6 deletions examples/02_validation/patches/v1.0.0.patch

This file was deleted.

6 changes: 6 additions & 0 deletions examples/02_validation/patches/v1.0.1-m-28334120-60.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
diff checksum:bc43fd3b69b5ad82fdc1524a1a419029a2dd4eae lines:5
d2 3
a4 3
! Version: v1.0.2
! Diff-Path: patches/v1.0.2-m-28334180-60.patch
||example.net^
6 changes: 0 additions & 6 deletions examples/02_validation/patches/v1.0.1.patch

This file was deleted.

Empty file.
Loading