Ensure only ASCII character set used#12
Ensure only ASCII character set used#12davegill merged 1 commit intowrf-model:masterfrom davegill:ASCII_CODES_ONLY
Conversation
tools/finder.F
Outdated
| ELSE | ||
| PRINT *,' ' | ||
| PRINT *,'Troubles, with ',problem_line_count,' lines.' | ||
| PRINT *,'File uses only ISO-8859 character codes, outside the standard ASCII range of ',FIRST_VALID,' to ',LAST_VALID |
There was a problem hiding this comment.
Instead of "File uses only ISO-8859 character codes, outside the standard ASCII range of", perhaps something like "File uses character codes outside the standard ASCII range of"
There was a problem hiding this comment.
Mike,
The new print statement:
PRINT *,'File uses character codes outside the standard ASCII range of ',FIRST_VALID,' to ',LAST_VALID
tools/finder.F
Outdated
| EXIT big_read_loop | ||
| END IF | ||
|
|
||
| DO ind = 1 , MAX_LENGTH |
There was a problem hiding this comment.
The \tab character is fairly prominent in our code, and has ASCII code 9 (which is outside of the 32-127 range). Should this loop include that exception?
There was a problem hiding this comment.
Coincidentally, the 9 files that I tested, none of them do have a tab character. I have now fixed the logic in the code to ignore all tab character (ASCII code #9).
tools/finder.F
Outdated
| ! usage: | ||
| ! a.out < file.F | ||
|
|
||
| PROGRAM finder |
There was a problem hiding this comment.
Could we name this something more self-descriptive? Like "nonasciifinder"?
There was a problem hiding this comment.
Mike,
The source code now lists the main program as "non_ascii_finder"
tools/finder.F
Outdated
| line_count = 1 | ||
| problem_line_count = 0 | ||
|
|
||
| ! Loop over eah line of the input file. |
There was a problem hiding this comment.
Mike,
"each" now has a "c", also added "ubiquitously" to get right back up on that spelling horse.
chem/module_mozcart_wetscav.F
Outdated
| ! Output: GA --- â(x) | ||
| ! Purpose: Compute the gamma function Ahat(x) | ||
| ! Input : x --- Argument of Ahat(x) | ||
| ! ( x is not equal to 0,-1,-2,WHAT GOES HERE ) |
There was a problem hiding this comment.
@ravanah
What should be in that original u'u'u' string?
There was a problem hiding this comment.
Ravan,
It was pointed out, maybe this in supposed to be "..."?
Dave
There was a problem hiding this comment.
I agree that this was probably the indication for etcetera,
|
Aren't you going to commit updated var/convertor/wave2grid_kma/pvchkdv.F as well? |
tools/finder.F
Outdated
| ! line number and column count (for subsequent editing). | ||
|
|
||
| ! usage: | ||
| ! a.out file.F |
There was a problem hiding this comment.
How about adding the following (from your commit message) in the program itself.
build the finder program: gfortran -ffree-form finder.F
a.out some-file-name.F
…excluded TYPE: bug fix KEYWORDS: ISO, ASCII, sed, byte SOURCE: internal DESCRIPTION OF CHANGES: Authors of a few physics schemes likely used a "cut-and-paste" technique for including references and for units. The offending references used quite a few different characters for an intended dash (minus sign). The offending units all used a superscript numeral 2 to mean "squared", as in W/m^2. I changed some to m^2 and some to m2, as both are used in the modified schemes. There were a few other single modifications (an "a" with a carat hat, etc). All of the changes were to commented lines. The change are necessary to allow the use of sed to process the source code. Outside of the physics directory, a number of files also had characters outside of the Fortran character set (32-127). These were all in comments, but are still being removed. LIST OF MODIFIED FILES: chem/module_cam_mam_newnuc.F chem/module_gocart_dmsemis.F chem/module_gocart_seasalt.F chem/module_mozcart_wetscav.F chem/module_sea_salt_emis.F dyn_em/module_sfs_driver.F dyn_em/module_sfs_nba.F frame/module_cpl.F hydro/Routing/module_gw_gw2d.F phys/module_bl_mfshconvpbl.F phys/module_gocart_seasalt.F phys/module_ltng_cpmpr92z.F phys/module_ltng_crmpr92.F phys/module_ltng_iccg.F phys/module_mp_nssl_2mom.F phys/module_mp_wdm6.F phys/module_sf_bem.F phys/module_sf_bep.F phys/module_sf_bep_bem.F var/convertor/wave2grid_kma/pvchkdv.F (Thanks Jamie!) TESTS CONDUCTED: The sed program works on the modified files, and does not work on the original files.
|
@jamiebresch and @mkavulich Can you guys review again to see if my pull request may now proceed? Thanks |
|
@davegill We would like to have tools/find.F renamed to tools/non_ascii_finder.F |
|
@davegill Would it be hard to allow the verbosity level to be specified on the command-line, and to let the user give a list of files to be scanned as command-line arguments? I'm imagining something like this: |
|
Michael, a.out a.out -v a.out -v non_ascii_finder.F a.out -V non_ascii_finder.F a.out -VV fortran_2003_fflush_test.G a.out -v fortran_2003_fflush_test.F a.out -V fortran_2003_fflush_test.F a.out -VV fortran_2003_fflush_test.F Dave |
tools/non_ascii_finder.F
Outdated
| PRINT *,'where <verbose level> is -v when using this program with "find", and' | ||
| PRINT *,' <verbose level> is -V when processing a single file' | ||
| ! PRINT *,' <verbose level> is -VV is for developers and debugging' | ||
| PRINT *,'where <filename> is a WRF Fortran source file' |
There was a problem hiding this comment.
@davegill I suppose I may be nit-picking at this point, but why does the input have to be a WRF Fortran source file? Taking a more general view, this utility tells whether there are any characters outside the set of printable ASCII characters (or characters not acceptable to cpp, or whatever). Also, stating that -v is used when using the program with find doesn't really help anyone to understand what the effect of using -v actually is. Also, stating that -V is used when processing a single file can suggest that the program can process multiple files.
Generally, we could reconsider the printout produced by this program with a broader view of what the program could potentially be used for.
|
@davegill It's purely academic at this point, but it might be interesting to try to detect UTF-8 multi-byte encodings. For example, in the Really, though, the 68th and 69th characters together form a UTF-8 character; their binary encoding is This explains why the line correctly shows the superscript 2 as a single character, but the lines can't show any character. |
|
@mgduda There are tons of languages and encoding methods. For the source code purpose, I think ASCII-only is a good rule. |
|
@jamiebresch Agreed. To be clear, I was definitely not suggesting that we allow anything but printable ASCII characters in the source code (I think this may even be part of the Fortran standard); rather, I was only saying that, because UTF-8 can be used to encode more or less every language, and it is by far the largest encoding used on e.g., the web, that the checker program could be more clever and recognize multi-byte UTF-8 encodings for what they are, rather than printing two, three, or four error messages for the same multi-byte character. My previous comment only came about because I noticed that some of the messages from the checker referenced two characters, when I could only find one in the source code (e.g., the superscript 2), and I started looking further and thought the UTF-8 encoding bit was pretty cool. |
|
Below is a script version of a UTF-8 -> ASCII character converter. Craig #!/bin/sh @(#) utf2ascii Convert UTF-8 to ASCII text################################################ Convert file encoded in UTF-8 to ASCII text.Usage: utf2ascii filename################################################ ############################### Set trap to abort on signal############################### ##################################### Process command-line argument(s):##################################### exit
|
…rt_registry minor inconsequential removal of extra quote on memetum preturbations…
Synching up namelist templates
No description provided.