-
Notifications
You must be signed in to change notification settings - Fork 18
Pareto formula for estimating the median given only subgroup totals #44
Comments
The top thing with the L1, L2, etc is a list of the breaks/groups of data. The second thing with the I1, I2, etc is a list of the width of those groups. The rest is still a little bit of a mystery. |
Is there any kind of documentation out there on the web we can learn from? |
Only in Sandy's brain AFAIK |
Here is the contents of the Excel file I've been told has been used as a model by others. Here are the formulas that appear in column E between rows 9 and 15. =IF(SUM(B$2:B8)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B7))/B8)*D8+C8,"")
=IF(SUM(B$2:B9)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B8))/B9)*D9+C9,"")
=IF(SUM(B$2:B10)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B9))/B10)*D10+C10,"")
=IF(SUM(B$2:B11)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B10))/B11)*D11+C11,"")
=IF(SUM(B$2:B12)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B11))/B12)*D12+C12,"")
=IF(SUM(B$2:B13)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B12))/B13)*D13+C13,"")
=IF(SUM(B$2:B14)>SUM(B$2:B$14)/2,(ABS(SUM(B$2:B$14)/2-SUM(B$2:B13))/B14)*D14+C14,"")
=MIN(E2:E14) |
Here is a Wolfram Alpha entry on pareto distribution. |
Here's a Wikipedia article that might be relevant. It would be great to see if we can find that Census source it cites. |
@anthonyjpesce acquired a source document from the Census, which I have posted online here. They pointed me to pages 16 and 17 of this document. ... What we've been calling Pareto is actually linear interpolation (we'll have to rename that), though it seems they use both depending on the application. I think we're going to stick with linear for our purposes. Here are those pages. |
I wonder how wrong doing a weighted average using the midpoints of each category is in general? So, clearly it's further off, but it's much easier for me to calculate (I'm calcing these medians from large sql queries where multiple steps (like finding which category is the mid category) is much harder than in a sequential programing language). |
Here's some SAS to get us started
The text was updated successfully, but these errors were encountered: