Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ambiguous questions #3

Open
YoungJaeChoung opened this issue Jul 8, 2024 · 2 comments
Open

ambiguous questions #3

YoungJaeChoung opened this issue Jul 8, 2024 · 2 comments

Comments

@YoungJaeChoung
Copy link

YoungJaeChoung commented Jul 8, 2024

I think some questions have alternative queries.


  • file name: GeoNuclearData.json

  • question: 'How many nuclear power plants are in preparation to be used in Japan?'

  • query: 'SELECT count(*) FROM nuclear_power_plants WHERE Country = "Japan" AND Status = "Under Construction"'

  • possible query: "select count(*) from nuclear_power_plants where Country = 'Japan' and Status = 'Planned'"


  • file name: GeoNuclearData.json

  • question: Where is the first BWR type power plant built and located?

  • query: SELECT Longitude, Latitude FROM nuclear_power_plants WHERE ReactorType = "BWR" ORDER BY ConstructionStartAt LIMIT 1

  • possible query: select Name, Country from nuclear_power_plants where ReactorType = 'BWR' order by ConstructionStartAt limit 1


  • file name: GeoNuclearData.json

  • question: 'How many PHWR are there today?'

  • query: "select count(*) from nuclear_power_plants where ReactorType = 'PHWR' and Status != 'Shutdown';"

  • possible query: 'SELECT count(*) FROM nuclear_power_plants WHERE ReactorType = "PHWR"'


  • file name: GreaterManchesterCrime.json

  • question: 'Which area do most of the crimes happen?'

  • query: 'SELECT Location FROM GreaterManchesterCrime GROUP BY Location ORDER BY count(*) DESC LIMIT 1'

  • possible query: 'select LSOA from GreaterManchesterCrime group by LSOA order by count(*) desc limit 1;'


  • file name: GreaterManchesterCrime.json

  • question: Where is the safest area?

  • query: SELECT Location FROM GreaterManchesterCrime GROUP BY Location ORDER BY count(*) LIMIT 1

  • possible query: select LSOA from GreaterManchesterCrime group by LSOA order by count(*) asc limit 1

@YoungJaeChoung YoungJaeChoung changed the title ambiguous question in "GeoNuclearData.json" ambiguous questions Jul 8, 2024
@Chia-Hsuan-Lee
Copy link
Owner

Hello!
Indeed, in text-to-SQL benchmarks, it is not uncommon to have multiple valid SQLs for a question. And typically, during annotation process, humans couldn't list out all possible SQLs.
To resolve this issue, I would suggest take a look at evaluation methods other than Exact Match. For example, execution accuracy by BIRD-SQL.

Thanks for pointing this out anyways!

@YoungJaeChoung
Copy link
Author

YoungJaeChoung commented Jul 18, 2024

Hello! Indeed, in text-to-SQL benchmarks, it is not uncommon to have multiple valid SQLs for a question. And typically, during annotation process, humans couldn't list out all possible SQLs. To resolve this issue, I would suggest take a look at evaluation methods other than Exact Match. For example, execution accuracy by BIRD-SQL.

Thanks for pointing this out anyways!

Thank you for sharing the paper. I will read it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants