Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid zip/postal codes when using locale en_GB #2390

Open
8 of 10 tasks
mikeybinns opened this issue Sep 11, 2023 · 7 comments · May be fixed by #3234
Open
8 of 10 tasks

Invalid zip/postal codes when using locale en_GB #2390

mikeybinns opened this issue Sep 11, 2023 · 7 comments · May be fixed by #3234
Assignees
Labels
c: bug Something isn't working c: locale Permutes locale definitions has workaround Workaround provided or linked m: location Something is referring to the location module p: 1-normal Nothing urgent
Milestone

Comments

@mikeybinns
Copy link

mikeybinns commented Sep 11, 2023

Pre-Checks

Describe the bug

Very closely related to #1416 but for a different locale, so I though a new ticket would be appropriate.

Great Britain post codes also have some specific rules which means that some of the generated postcodes are invalid.

I have a REGEX string for validating if a GB post code is valid, so I'm sharing it here as hopefully it will help, even if only with automated testing. This regex is created based on this gov.uk document

/^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{1,2}[0-9]{1}[A-Z]{1})([0-9]{1}[ABDEFGHJLNPQRSTUWXYZ]{2})$/gi,

I think the biggest tripping block for Faker here is that the final two letters cannot be C I K M O V.

For those who come across this issue, here's a workaround to get you going in modern TypeScript:

/**
 * Generate a valid UK postcode using faker.
 * 
 * Faker's implementation is too generic to work with strict 
 * post code validators, so we must check the values returned 
 * to see if they are valid
 * @see https://github.com/faker-js/faker/issues/2390
 * 
 * @returns A promise which resolves to a valid UK postcode
 */
async function generatePostCode() {
  return new Promise<string>((resolve) => {
    let postcode = "";
    while (
      postcode
        // Remove all characters except numbers and letters (including spaces as spaces are optional)
        .replaceAll(/[^A-Z0-9]/gi, "")
        // Check value against REGEX
        .match(
          /^([A-Z]{1,2}[0-9]{1,2}|[A-Z]{1,2}[0-9]{1}[A-Z]{1})([0-9]{1}[ABDEFGHJLNPQRSTUWXYZ]{2})$/gi,
        )
    ) {
      postcode = faker.location.zipCode();
    }
    resolve(postcode);
  });
}

Minimal reproduction code

import { fakerEN_GB as faker } from "@faker-js/faker";
console.log(faker.location.zipCode());

Additional Context

In case the link breaks, here is that gov.uk document I mentioned.
Appendix_C_ILR_2017_to_2018_v1_Published_28April17.pdf

Environment Info

System:
  OS: macOS 13.5.1
  CPU: (8) arm64 Apple M1
  Memory: 480.53 MB / 16.00 GB
  Shell: 5.9 - /bin/zsh
Binaries:
  Node: 18.17.1 - ~/.nvm/versions/node/v18.17.1/bin/node
  npm: 9.6.7 - ~/.nvm/versions/node/v18.17.1/bin/npm
Browsers:
  Chrome: 116.0.5845.179
  Safari: 16.6
npmPackages:
  @faker-js/faker: ^8.0.1 => 8.0.2

Which module system do you use?

  • CJS
  • ESM

Used Package Manager

npm

@mikeybinns mikeybinns added c: bug Something isn't working s: pending triage Pending Triage labels Sep 11, 2023
@matthewmayer
Copy link
Contributor

I think the tricky thing is knowing when to stop!

Because in reality the first one-two chars can't be ANY A-Z, they can only be one of the https://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom valid postcode areas. So AB... and AL... are valid but not AA.. or AX...

So then its tempting to hardcode those options.

But then AL only goes up from 1...10, not 1...99. So maybe we should encode them, etc etc.

The only true way to validate if a postcode is valid is to compare it against the full https://en.wikipedia.org/wiki/Postcode_Address_File

In some cases i think its OK to return plausible but invalid data if the complexity costs of adding more validity checks are too high.

I don't really know on which side of this arbitrary line this suggestion falls though!

@mikeybinns
Copy link
Author

I understand what you're saying, but I think my suggestion is only related to format.

The document I've linked to (from gov.uk, which in the UK means it's an official resource) documents the format of the post code which includes possible newly created post codes. It doesn't mention specific area codes (except for examples), only the format for what can and cannot be a post code.

I understand that post codes created by faker may not match up to a real post code that actually exists, but I think it should not provide a post code that can never match because the format is incorrect.

I fully agree that hardcoding current post code areas etc. is a step too far for what the library attempts to achieve, but I do believe that my request falls on the "worth doing" side of the line, but also I understand that this may add a maintenance burden, so I understand if you decide you don't agree :)

@matthewmayer
Copy link
Contributor

i think you're right, however at the moment zipCode() uses faker.helpers.replaceSymbols under the hood which only supports a very basic set of substitutions: #, ? and *, so i dont think this is feasible by changing the patterns alone.

@ST-DDT
Copy link
Member

ST-DDT commented Sep 11, 2023

IIRC we (I) already considered removing replaceSymbols* in favor a more standardized approach such as string replace, regex or fake patterns.

@ST-DDT ST-DDT added p: 1-normal Nothing urgent c: locale Permutes locale definitions m: location Something is referring to the location module s: needs decision Needs team/maintainer decision and removed s: pending triage Pending Triage labels Sep 11, 2023
@ST-DDT ST-DDT added this to the vAnytime milestone Apr 4, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Apr 4, 2024

Team Decision

  • We should replace this usage of replaceSymbols with a call to fake (and potentially then to fromRegexp)

@ST-DDT ST-DDT removed the s: needs decision Needs team/maintainer decision label Apr 4, 2024
@ST-DDT ST-DDT self-assigned this Oct 26, 2024
@ST-DDT
Copy link
Member

ST-DDT commented Oct 26, 2024

I'll tackle this soon.

Would you consider the switch from replaceSymbols to fake a breaking change?

@ST-DDT
Copy link
Member

ST-DDT commented Oct 29, 2024

Here is a workaround, you can use until this is fixed:

const zipCode = faker.helpers.fake([
  '{{helpers.fromRegExp("[A-Z]{1,2}[0-9]{1,2} [0-9][ABDEFGHJLNPQRSTUWXYZ]{2}")}}',
  '{{helpers.fromRegExp("[A-Z]{1,2}[0-9][A-Z] [0-9][ABDEFGHJLNPQRSTUWXYZ]{2}")}}',
]);

@ST-DDT ST-DDT added the has workaround Workaround provided or linked label Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: bug Something isn't working c: locale Permutes locale definitions has workaround Workaround provided or linked m: location Something is referring to the location module p: 1-normal Nothing urgent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants