What's the "PRQL Way"? Cleaning up raw data with empty fields #4505

richb-hanover · 2024-05-26T00:13:48Z

richb-hanover
May 26, 2024

I have a database of real estate parcels that has a column with a two or three character code for the Zoning District: LCD, RD, BD, etc. The raw data is indexed by a primary key (parcel ID, or PID) but it also has a bunch of typos. I wrote a function to patch this up...

# Clean up zoning districts
# Returns two or three character district
let CleanZoningDistrict = zd -> case [
  zd == "ES" => "RD",
  zd == "R" => "RD",
  zd == "SFR" => "RD",
  zd == "URD" => "RD",
  zd == "LCD" => "LCD",
  zd == "CD" => "LCD",
  zd == "LDC" => "LCD",
  zd == "SD" => "RD",
  zd == "" => null,
  true => zd,
]

The raw data also has a bunch of empty fields ("") that are simply missing (bad data entry, I suspect.) I created a separate table with rows for each entry that has an empty Zoning District and I manually filled in another column containing a "CorrectedZoningDistrict" indexed on the same PID.

My thought was to join the two tables and if the Zoning District is "", use the "CorrectedZoningDistrict".

I can see how to do it in two steps - something like:

ZoningDistrict = CleanedZoningDistrict zd 
RealZoningDistrict = ZoningDistrict ?? CorrectedZoningDistrict # if main table is null, use the corrected table entry

Is there a "PRQL Way" to fold that lookup into the CleanZoningDistrict function? Thanks.

Answered by richb-hanover

May 30, 2024

I suspect the two-step process outlined above is the easiest way to do this. (It's clearly straightforward.)

I will need to use the resulting table in lots of different queries, so I plan to create a view (letting PRQL generate the SQL) so that I don't have to clutter up my (new) queries with all the setup. Thanks!

View full answer

richb-hanover · 2024-05-30T19:22:34Z

richb-hanover
May 30, 2024
Author

I suspect the two-step process outlined above is the easiest way to do this. (It's clearly straightforward.)

I will need to use the resulting table in lots of different queries, so I plan to create a view (letting PRQL generate the SQL) so that I don't have to clutter up my (new) queries with all the setup. Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's the "PRQL Way"? Cleaning up raw data with empty fields #4505

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

What's the "PRQL Way"? Cleaning up raw data with empty fields #4505

Uh oh!

Uh oh!

richb-hanover May 26, 2024

Replies: 1 comment

Uh oh!

richb-hanover May 30, 2024 Author

richb-hanover
May 26, 2024

richb-hanover
May 30, 2024
Author