add allowlist and denylist options to demo #568

roedoejet · 2024-10-25T21:44:06Z

fixes #485

PR Goal?

Allow basic allowlist/denylist functionality in demo. If an allowlist is supplied, then only those words/utterances are allowed to be synthesized. If a denylist is supplied, then only words/utterances not on the denylist are allowed to be synthesized. The former is more secure since it is more restricted. The latter can be hacked (I've done Unicode normalization to prevent Unicode homograph attacks, but there are a lot of other possible attacks, probably beyond scope of this PR).

Fixes?

#485

Feedback sought?

Sanity. Any improvements to make this more secure would be appreciated; there are a lot of hacks to get around the denylist at least. Some ideas may be able to be incorporated into this PR if they're easy enough, otherwise we can create issues.

Priority?

0.1.0a5

Tests added?

No tests available for the demo.

How to test?

create a plain text file and pass it through when running the demo (everyvoice demo fp.ckpt vocoder.ckpt --allowlist path_to_allowlist.txt) Then only those words should be able to be synthesized. Everything else should throw an error in the demo GUI and through the API

Confidence?

Medium

Version change?

no

Related PRs?

none

fixes #485

semanticdiff-com · 2024-10-25T21:44:08Z

Review changes with

Changed Files

File	Status
everyvoice/demo/app.py	1% smaller
everyvoice/cli.py	0% smaller
everyvoice/tests/test_text.py	0% smaller

github-actions · 2024-10-25T21:48:11Z

CLI load time: 0:00.31
Pull Request HEAD: 17fda8644c5f72d800af0021fa312a6bf7e3a8d0
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2024-10-25T21:48:57Z

Codecov Report

Attention: Patch coverage is 27.58621% with 21 lines in your changes missing coverage. Please review.

Project coverage is 76.18%. Comparing base (6827caa) to head (17fda86).

Files with missing lines	Patch %	Lines
everyvoice/demo/app.py	42.10%	11 Missing ⚠️
everyvoice/cli.py	0.00%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #568      +/-   ##
==========================================
+ Coverage   76.07%   76.18%   +0.10%     
==========================================
  Files          46       46              
  Lines        3386     3414      +28     
  Branches      460      467       +7     
==========================================
+ Hits         2576     2601      +25     
- Misses        707      710       +3     
  Partials      103      103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

joanise

Looks pretty straightforward, but also minimalist, is the feature requirement really that simple? For morphologically complex languages, this will be nearly impossible to use. And the deny list option will have little effect since it's trivial to circumvent with minor typos.

In any case, I didn't test this, but the code looks OK to me given the simple feature this PR implements.

dlothian · 2024-10-28T21:47:03Z

Looks pretty straightforward, but also minimalist, is the feature requirement really that simple? For morphologically complex languages, this will be nearly impossible to use. And the deny list option will have little effect since it's trivial to circumvent with minor typos.

In any case, I didn't test this, but the code looks OK to me given the simple feature this PR implements.

To address it not working for morphologically complex languages, could it also be set up to accept regex in the allow/deny list? And then instead of checking if it's an exact substring of the submitted text you could re.match it. Verbatim substring matching can also lead to a lot of false positive, see: The Scunthorpe Problem
Both the submitted text and the allow/deny are normalized but is case handled? Not as familiar with the code base yet so not sure if this is handled elsewhere.

marctessier · 2024-10-29T15:24:08Z

I tested. the Deny and Allow list and by principle it all works. BUT things need to be exact matches. Like we were talking earlier could be a bit better . EX: case insensitive , ignore symbols ?

cat allowlist.txt 
This is a test
hello
world
Hello World

Is this example, " This is a test " worked :-) but not this becasue of the period at end --> " This is a test. " or this even if all words are in the list but separated on 2 lines.. ( --> " This is a test world"

The same could be applied to the denylist..

We can approve this pull as-is and open a new "improvement" issue , this pull does answer the "spirit" of the intention of what is requested and works! Your call @roedoejet

marctessier · 2024-10-29T15:52:00Z

I forgot to add that some of the error log printing could be omited and only keep the last printed line.. ( the Oops, with the offending transaction. ) The rest of the traceback information is not needed / add noise to the logs...

 Traceback (most recent call last):
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api
    result = await self.call_function(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper
    response = f(*args, **kwargs)
  File "/home/tessierm/SANDBOX/EveryVoice/everyvoice/demo/app.py", line 49, in synthesize_audio
    raise gr.Error(
gradio.exceptions.Error: 'Oops, the word world , is not allowed to be synthesized by this model. Please contact the model owner.'

marctessier · 2024-10-29T15:52:11Z

I forgot to add that some of the error log printing could be omited and only keep the last printed line.. ( the Oops, with the offending transaction. ) The rest of the traceback information is not needed / add noise to the logs...

 Traceback (most recent call last):
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api
    result = await self.call_function(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/home/tessierm/miniforge3/envs/ev_dev.ap_485/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper
    response = f(*args, **kwargs)
  File "/home/tessierm/SANDBOX/EveryVoice/everyvoice/demo/app.py", line 49, in synthesize_audio
    raise gr.Error(
gradio.exceptions.Error: 'Oops, the word world , is not allowed to be synthesized by this model. Please contact the model owner.'

roedoejet · 2024-10-29T17:30:49Z

Thanks so much for this y'all! I've added case sensitivity, word tokenization for the denylist, punctuation removal, and duplicate character attack prevention. This should be a bit better. Did you end up checking the speed with a large allow/deny list @marctessier ?

marctessier · 2024-10-29T17:41:23Z

No not yet , thanks for the reminder. ... I will prep a couple test for that and compare the timing in a few minutes.

feat(demo): add allowlist and denylist options to demo

77c10d4

fixes #485

roedoejet requested review from joanise, dlothian, marctessier, SamuelLarkin and MENGZHEGENG October 25, 2024 21:44

joanise reviewed Oct 28, 2024

View reviewed changes

roedoejet added 3 commits October 29, 2024 10:21

feat(demo): add more robust text normalization

b16522e

docs: add clearer warning to denylist

5351774

test(demo): add doctest coverage for text normalization

17fda86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add allowlist and denylist options to demo #568

add allowlist and denylist options to demo #568

roedoejet commented Oct 25, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading

joanise left a comment

dlothian commented Oct 28, 2024

marctessier commented Oct 29, 2024

marctessier commented Oct 29, 2024

marctessier commented Oct 29, 2024

roedoejet commented Oct 29, 2024

marctessier commented Oct 29, 2024

add allowlist and denylist options to demo #568

Are you sure you want to change the base?

add allowlist and denylist options to demo #568

Conversation

roedoejet commented Oct 25, 2024 • edited Loading

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs?

semanticdiff-com bot commented Oct 25, 2024 • edited Loading

github-actions bot commented Oct 25, 2024 • edited Loading

codecov bot commented Oct 25, 2024 • edited Loading

Codecov Report

joanise left a comment

Choose a reason for hiding this comment

dlothian commented Oct 28, 2024

marctessier commented Oct 29, 2024

marctessier commented Oct 29, 2024

marctessier commented Oct 29, 2024

roedoejet commented Oct 29, 2024

marctessier commented Oct 29, 2024

roedoejet commented Oct 25, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

codecov bot commented Oct 25, 2024 •

edited

Loading