Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indicators: add R044 Business similarities between suppliers (or bidders): common addresses, personnel, phone numbers, etc. #94

Open
yolile opened this issue May 20, 2024 · 8 comments · May be fixed by #100
Assignees
Labels
cmd:indicators Relating to the indicators command

Comments

@yolile
Copy link
Member

yolile commented May 20, 2024

Methodology

Required OCDS fields: parties/roles IN 'supplier' OR 'tenderer', parties/identifier/id, (parties/contactPoint/telephone OR parties/address/streetAddress OR parties/address/postalCode OR parties/contactPoint/name OR parties/contactPoint/email)

Calculation method:

For suppliers k,j bidding in the same procedure i , flag if the procedure if the bidders have the same address (or phone number, contact point, email, etc):

R044i=1 if

parties/address/streetAddressk,i=parties/address/streetAddressj,i

@yolile yolile self-assigned this May 20, 2024
@yolile
Copy link
Member Author

yolile commented May 21, 2024

Ecuador publishes:

  • Address: countryName, locality, postalCode, region and street address
  • Contact point: name and URL, (but contactPoint/name is always the same as parties/name)

For address, I guess we want to compare, country, locality region and street address all together and not street address alone.

And do we want to calculate this for bidders in the same process only or in general?

@Camilamila @jpmckinney

@jpmckinney
Copy link
Member

Based on https://colab.research.google.com/drive/1q38GlyG7B_uPCsqaFBt1UvT5FnNvEtbM#scrollTo=yg8SFe-09kvD I think this indicator is within the same process only, but I haven't compared to the methodologies in the academic sources.

I think it makes sense to combine fields into a full address, yes.

We might discover that we need to do some normalization (e.g. normalize whitespace, lowercase, maybe normalize punctuation). There's more that can be done #33, but I think we'll limit to basics for now.

@yolile
Copy link
Member Author

yolile commented May 21, 2024

I think this indicator is within the same process only

True, you are right, because this one is related to detecting collusion. The example that the notebook refers to, however, is Control Ciudadano from Paraguay, and there, we did the exercise with all the bidders, not depending on whether they were bidding on the same process or not. But for this indicator, we can implement the original and documented methodology that is for the same process only.

@yolile
Copy link
Member Author

yolile commented May 21, 2024

We might discover that we need to do some normalization (e.g. normalize whitespace, lowercase, maybe normalize punctuation). There's more that can be done #33, but I think we'll limit to basics for now.

I tested and even without any normalization and with exact match comparison I got a lot of matching in Ecuador's data (at least comparing all bidders no matter the ocid)

@yolile
Copy link
Member Author

yolile commented May 21, 2024

What should be the output of the indicator? Besides flagging the OCID, do we want to output the matching bidders along with why they are similar?

@jpmckinney
Copy link
Member

We would flag the bidders like in R024, etc. (using set_result! and set_tenderer_map!). We can give a score of 1.0 for exact match. I don't think we have any way to add additional metadata about why they are similar.

@jpmckinney jpmckinney added the cmd:indicators Relating to the indicators command label May 23, 2024
@yolile
Copy link
Member Author

yolile commented May 27, 2024

I don't think we have any way to add additional metadata about why they are similar.

But should we?

@jpmckinney
Copy link
Member

jpmckinney commented May 27, 2024

Maybe open a new issue with R044 as an example, since we could add more metadata to any indicator. Right now we don’t have any user research telling us that users want more metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmd:indicators Relating to the indicators command
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants