Skip to content

[Feature]: Deduplicate crawls #2860

@SuaYoo

Description

@SuaYoo

Description

Repeated crawling of certain sites can often yield duplicate data, which adds significantly to the storage needed to store web archives. Deduplication reduces storage by not storing duplicate content.

Requirements

See subtasks

Context

Related:

Sub-issues

Metadata

Metadata

Assignees

Labels

back endRequires back end dev workfeature designThis issue tracks smaller sub issues that compose a featurefront endRequires front end dev workui/uxThis issue requires UI/UX work

Projects

Status

Todo

Relationships

None yet

Development

No branches or pull requests

Issue actions