Improved input syntax for `frequency_differencing` operation #1106

praneethratna · 2023-07-31T18:00:06Z

Addresses #967. Now input for frequency_differencing can be given directly in the form of frequency differencing criteria instead of separating out freq/chan from operators.

CC @leewujung

emiliom · 2023-07-31T18:27:59Z

These improvements are very cool!

Since you're adding sympy as a dependency, how "heavy" a library is it? In other words, is it likely to create dependency resolution issues? I see the dependency tree sympy > mpmath > gmpy2, where gmpy2 apparently uses a C library ("gmpy2 is an optimized, C-coded Python extension module ...").

Are there lighter-weight alternatives to sympy that would meet our needs? sympy has tons of capabilities, and here I think we'd be using only a tiny subset.

requirements-dev.txt

praneethratna · 2023-07-31T18:39:31Z

These improvements are very cool!

Since you're adding sympy as a dependency, how "heavy" a library is it? In other words, is it likely to create dependency resolution issues? I see the dependency tree sympy > mpmath > gmpy2, where gmpy2 apparently uses a C library ("gmpy2 is an optimized, C-coded Python extension module ...").

I feel SymPy is lightweight compared to other mathematical libraries found online. Also SymPy has a hard dependency only on mpmath and I don't think it would create dependency resolution issues.

Are there lighter-weight alternatives to sympy that would meet our needs? sympy has tons of capabilities, and here I think we'd be using only a tiny subset.

I'm using SymPy for parsing numbers from strings using it's inbuilt parser. I'm not sure of any other libraries which can be used to do this, but there might be some libraries to do this task exclusively?

emiliom · 2023-07-31T18:45:10Z

Also SymPy has a hard dependency only on mpmath but I don't it would create dependency resolution issues.

Right! Scratch my comment about dependency resolution per se. I did see that the dependency was limited to mpmath > gmpy2, and we don't currently use those libraries. My apologies.

praneethratna · 2023-08-01T05:43:45Z

Right! Scratch my comment about dependency resolution per se. I did see that the dependency was limited to mpmath > gmpy2, and we don't currently use those libraries. My apologies.

No worries! I understand the clarification now. Since we don't use mpmath or gmpy2 libraries, these dependencies doesn't affect us. But we need to look at alternatives for SymPy if that makes the package slow.

leewujung · 2023-08-01T06:14:33Z

Since the equations are pretty predictable, one option would be to just do re all the way?

praneethratna · 2023-08-01T08:06:13Z

Since the equations are pretty predictable, one option would be to just do re all the way?

@leewujung Do you mean we use a single regex expression to check for the validity of the whole frequency differencing criteria? Wouldn't that make the code a bit messy?

leewujung · 2023-08-02T21:06:29Z

@leewujung Do you mean we use a single regex expression to check for the validity of the whole frequency differencing criteria? Wouldn't that make the code a bit messy?

I think it would be ok? For the freqAB case, the Hz/kHz/MHz/GHz cases are fixed, so the number in front of that could be extracted. For the chanAB case, it is the string between the double quotes. The binary comparison operator >, <, >=, <=, == are the allowed cases. And then we need to extracted the number before dB. This doesn't seem overly complicated?

praneethratna · 2023-08-03T02:54:12Z

I think it would be ok? For the freqAB case, the Hz/kHz/MHz/GHz cases are fixed, so the number in front of that could be extracted. For the chanAB case, it is the string between the double quotes. The binary comparison operator >, <, >=, <=, == are the allowed cases. And then we need to extracted the number before dB. This doesn't seem overly complicated?

@leewujung I think it shouldn't be overly complicated, it's just that i split everything into individual parts for better code debugging. I'll make these changes by today.

praneethratna · 2023-08-03T17:16:00Z

@leewujung Even if use a single re expression to check for validity, we again need to use separate re expressions to extract freq/chan values, operators and diff or am i understanding your point in a wrong way?

leewujung · 2023-08-25T01:09:20Z

Hi @praneethratna : sorry for my super late response. What I am thinking in terms of using re to grab out the pattern is like the following:

use patterns like "(?P<freqA>\d+)(?P<unitA>\w?)Hz" to match freqA and freqB, for example:
```
a = re.match("(?P<freq>\d+)\s*(?P<unit>\w?)Hz", "120kHz") # also matches arbitrary space between "120" and "kHz"
a["freq"]  # get the number associated with the frequency
a["unit"]  # get the unit associated with the frequency
```
then you can use a["unit"] to determine the right multiplication factor to get to the numbers in frequency_nominal, which is with units Hz (i.e. 120kHz is 120000 in frequency_nominal).

you can use the pattern above to build up the whole pattern, something like the following:

freqA_pattern = "(?P<freqA>\d+)\s*(?P<unitA>\w?)Hz"
freqA_pattern = "(?P<freqB>\d+)\s*(?P<unitB>\w?)Hz"
db_pattern = "(?P<db>\d+)\s*dB"
cmp_pattern = "\s*(?P<cmp>\S+?)\s*"
DIFF_MATCHER = re.compile(freqA_pattern + "\s*-\s*" + freqB_pattern + cmp_pattern + db_pattern)

and then use outputs of the matched results with some error checking and handling (like converting str to float, k to 1000 etc) to combine with your other parts of the code.

What do you think?

praneethratna · 2023-08-25T20:02:32Z

Hi @praneethratna : sorry for my super late response. What I am thinking in terms of using re to grab out the pattern is like the following:
use patterns like "(?P<freqA>\d+)(?P<unitA>\w?)Hz" to match freqA and freqB, for example:
a = re.match("(?P<freq>\d+)\s*(?P<unit>\w?)Hz", "120kHz") # also matches arbitrary space between "120" and "kHz"
a["freq"]  # get the number associated with the frequency
a["unit"]  # get the unit associated with the frequency
then you can use a["unit"] to determine the right multiplication factor to get to the numbers in frequency_nominal, which is with units Hz (i.e. 120kHz is 120000 in frequency_nominal).
you can use the pattern above to build up the whole pattern, something like the following:
freqA_pattern = "(?P<freqA>\d+)\s*(?P<unitA>\w?)Hz"
freqA_pattern = "(?P<freqB>\d+)\s*(?P<unitB>\w?)Hz"
db_pattern = "(?P<db>\d+)\s*dB"
cmp_pattern = "\s*(?P<cmp>\S+?)\s*"
DIFF_MATCHER = re.compile(freqA_pattern + "\s*-\s*" + freqB_pattern + cmp_pattern + db_pattern)
and then use outputs of the matched results with some error checking and handling (like converting str to float, k to 1000 etc) to combine with your other parts of the code.
What do you think?

@leewujung No worries! I understand it now, these suggestions helped me to refactor the code in a much simpler and clean way.

leewujung · 2023-08-28T03:04:49Z

@praneethratna : there are still places where you use sympy in the code. Could you remove them completely and use re instead? Thanks!

praneethratna · 2023-08-28T08:03:07Z

@leewujung I have removed the usage of sympy completely and also squashed all the commits into one.

echopype/mask/api.py

leewujung

@praneethratna : Thanks for this great PR!! It is awesome to have this intuitive way of putting in the freq diff equations.

I have some small suggestions for the organization and the tests:

Since in Add transient and impulse noise masks #1142 there are more functions added to the mask subpackage, how about in this PR moving the frequency-differencing-related functions (_check_source_Sv_freq_diff and check_freq_diff_and_parse_arguments) to a separate module, say echopype/mask/freq_diff.py and keep the main API function frequency_differencing in echopype/mask/api.py?
for the tests, currently test_frequency_differencing is more of an integrated test, and in it the frequency units are all Hz so the unit matching was not tested. How about adding another set of tests to test for the function you wrote check_freq_diff_and_parse_arguments? This way you can test parsed element values and xfail the ones that should fail, etc.
rename check_freq_diff_and_parse_arguments to _parse_freq_diff_eq to make it obvious that it is a private function, and also making it shorter
rename _check_source_Sv_freq_diff to _check_freq_diff_source_Sv: this more of a style thing, but I feel that having the freq_diff part first make it easier to know where this function is used.

leewujung · 2023-08-31T15:51:01Z

Also, I caught a bug / undesired behavior in the current _check_source_Sv_freq_diff function. In the current implementation, the values of frequency_nominal is not allowed to be duplicated:

echopype/echopype/mask/api.py

Lines 463 to 467 in 254ee10

    
           if len(set(source_Sv.frequency_nominal.values)) < source_Sv.frequency_nominal.size: 
        
               raise ValueError( 
        
                   "The provided source_Sv contains repeated frequency_nominal " 
        
                   "values, this is not allowed!" 
        
               )

However, it is possible that there are 2 channels (with different channel strings) that have the same frequency_nominal values (this is actually why we decided to use channel as the coordinate, instead of frequency_nominal, since coordinates cannot have duplicated values), and users may want to diff these 2 channels using the chanABEq. I think that potentially duplicated frequency_nominal values should only be checked if people use freqABEq to put in their equation.

Looking at the code again, what do you think if we take the following approach to change this function?

keep the first checks # check that channel and frequency nominal are in source_Sv intact
for the below three checks, make it such that only freqAB checks are executed when people use freqABEq, and only chanAB checks are executed when people use chanABEq
```
# make sure that the channel and frequency_nominal values are not repeated in source_Sv
# check that the elements of freqAB are in frequency_nominal
# check that the elements of chanAB are in channel
```
ie we only check for frequency_nominal to not be duplicated and elements of freqAB are in frequency_nominal when people use freqABEq, and only check for channel to not be duplicated and elements of chanAB are in channel when people use chanABEq.

echopype/tests/mask/test_mask.py

echopype/mask/freq_diff.py

echopype/tests/mask/test_mask.py

leewujung

Thanks @praneethratna for your speedy changes! I added some suggestions, and I think once they are addressed we can merge this PR. :)

leewujung

@praneethratna : Thanks a lot for your contribution. We can merge this PR now once the tests run through!

leewujung · 2023-08-31T21:21:12Z

Ugh sorry just remember one thing. In here

echopype/echopype/tests/utils/test_processinglevels_integration.py

Line 106 in 254ee10

def _freqdiff_applymask(test_ds):

frequency_differencing is called (this is something the CI does not catch, unless one uses [all tests ci] (see here). Could you quickly change it? Then I will merge. Thank you!

codecov-commenter · 2023-08-31T21:52:50Z

Codecov Report

Merging #1106 (d8b8b26) into dev (6fc5a1d) will decrease coverage by 9.44%.
Report is 53 commits behind head on dev.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##              dev    #1106      +/-   ##
==========================================
- Coverage   78.12%   68.69%   -9.44%     
==========================================
  Files          65        8      -57     
  Lines        6227      658    -5569     
==========================================
- Hits         4865      452    -4413     
+ Misses       1362      206    -1156

Flag	Coverage Δ
unittests	`68.69% <93.75%> (-9.44%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
echopype/mask/freq_diff.py	`93.44% <93.44%> (ø)`
echopype/mask/api.py	`91.66% <100.00%> (+4.16%)`	⬆️

... and 63 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

emiliom reviewed Jul 31, 2023

View reviewed changes

requirements-dev.txt Outdated Show resolved Hide resolved

leewujung added this to the 0.8.1 milestone Aug 2, 2023

praneethratna requested a review from emiliom August 20, 2023 19:05

praneethratna force-pushed the issue#967-mask branch from d829b55 to 189c23f Compare August 28, 2023 08:01