Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Watermark Robustness #20

Open
mhellmeier opened this issue Mar 19, 2024 · 2 comments
Open

Increase Watermark Robustness #20

mhellmeier opened this issue Mar 19, 2024 · 2 comments
Assignees
Labels
component: watermarker Watermarker Library feature New feature or request

Comments

@mhellmeier
Copy link
Member

mhellmeier commented Mar 19, 2024

🚀 Feature Request

Current Problem

When changing a watermarked text, the watermark inside the cover text can get destroyed. This can occur by moving sentences inside a text, deleting content, adding new content, or copying existing content.

Proposed Solution

The overall robustness of the watermarker library needs to be increased.

One possible example:
If a small watermark is included inside an extended cover text (e.g., 10 times), the watermarker library should be able to extract the watermark even if 4 of the 10 watermark repetitions got destroyed.

Additional Context

If a control char is implemented (see #18), the watermarker library needs a strategy if this control char gets destroyed.

@mhellmeier mhellmeier added feature New feature or request component: watermarker Watermarker Library labels Mar 19, 2024
@mhellmeier
Copy link
Member Author

mhellmeier commented May 24, 2024

First starting points:

  • Update the squashing method for all watermarks so that it only returns one watermark based on the length (robustness increase for all watermarks)
  • Update the SizedWatermark so that it uses the size of the watermark (robustness increase for Sized Trendmarks)
  • Add optional parameter for the watermark extraction that gives information if multiple unique watermarks might be included (like multipleWatermarks = true). Optionally add a suggested number of watermarks counts if the previous parameter was set to true.
  • Check error correcting codes for the CRC32 trendmarks
  • Update Trendmark documentation with recommendations for robustness (for example, suggest using the SizedCRC32Watermark if robustness is important and mention the trade-offs).

Minor thing:

  • Update all the naming from SizedWatermark to SizedTrendmark etc.

Future ideas:

  • Usage of large language models to let the model check which was the original watermark
  • An optional toggle that filters for or prefers watermarks composed of "sensible" Unicode characters only (i.e. extended Latin alphabet, Arabic numerals, ampersand etc.)

@hnorkowski
Copy link
Contributor

hnorkowski commented Jun 28, 2024

First starting points:

* [ ]  Update the squashing method for all watermarks so that it only returns one watermark based on the length (robustness increase for all watermarks)

Finding the most plausible watermark on basic watermarks (i.e. just a list of bytes and you know nothing about what it represents) can be done with frequency analysis. This approach is generic and could be implemented as static function in Watermark. It needs to be a static function instead of a method because it requires taking a list of watermarks. The Watermarker and JvmWatermark could use the feature by default.

* [ ]  Update the `SizedWatermark` so that it uses the size of the watermark (robustness increase for Sized Trendmarks)

The basic usage is already implemented. The validate method of Trendmark checks for correct size, checksum, and hash (depending on the variant of Trendmark).

* [ ]  Check error correcting codes for the CRC32 trendmarks

The validation of the checksum is already implemented in the validateChecksum method that will be automatically called when calling the validate function. Maybe it is possible to correct errors with the CRC32 codes, I am not sure because CRC32 can correct single bit errors but our text watermarking alg. does not work with bits. currently its works with 4 states. A new method repair could be added to the Checksum interface and then every implementing checksum can implement a recovery strategy according to the specific checksum, if possible.

* [ ]  Update Trendmark documentation with recommendations for robustness (for example, suggest using the `SizedCRC32Watermark` if robustness is important and mention the trade-offs).

Further analysis methods could be implemented as static method in Trendmark. The extracting methods of (Jvm)Watermarker could be extended by another parameter trashing: Bool that defines if all Trendmarks which produce an error or warning are thrown away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: watermarker Watermarker Library feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants