Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix discrepancy with pythonic results #46

Open
jakewilliami opened this issue Oct 30, 2020 · 2 comments
Open

Fix discrepancy with pythonic results #46

jakewilliami opened this issue Oct 30, 2020 · 2 comments

Comments

@jakewilliami
Copy link
Owner

jakewilliami commented Oct 30, 2020

There is a discrepancy in results of this algorithm compared to the Pythonic one. Both algorithms work, but produce different results.

@jakewilliami
Copy link
Owner Author

After f9b0719, I began to benchmark results of the basic.jl compared to Simon Hohberg's example.py. However, when I started this, I realised — something I had forgotten before now — though both algorithms correctly work, there was a discrepancy in the accuracy of my one compared to Hohberg's.

I spent around 12 hours straight last night, and into the wee hours, looking for the source of this discrepancy.

Upon inspection (results pushed in 3e9be4a), here is what I found:

  • There was a copy error in one calculation in the three_horizontal part of the cascade;
  • There was a copy error in the returning of get_vote;
  • There were a few off-by-one errors in create_features because I didn't realise that range(a) in Python is equivalent to 0:(a - 1) in Julia.

The former corrections I made did not change the results much. However, the last correction I am still unsure of.

One way to test that the procedure of algorithms (comparing Python to Julia) are the same is to simply test how many features it finds. Python found 2429 features for the standard test set, but Julia found 4520. Upon further inspection, the way I can fix this discrepancy is to change the x and the y in the inner-most loops to start searching from zero instead of one, and subtracting one from both end points.

However, even when the number of features obtained are the same, the results are different. They are not hugely different — as I say, both algorithms work. But they are different.

As it doesn't make sense for Julia to index from zero, I have kept the inner-most loops in create_features searching from one to the end point (see 269f26e). As a result, the number of features to search through is greater, and the results are closer to that of the Hohberg's algorithm.

One thing to note, which I found, was that Python reads directories seemingly randomly, where Julia reads directories alphabetically. I am unsure if this explains the persisting discrepancy, but it seems to change the results from obtaining the classification_error vector.

The question now is two-fold:

  • Is it correct that, even with the same number of features, the Julian algorithm differs from the Pythonic one? Is it explained by the order of images? Is this explained purely by indexing, or is there something else at play that I have not found?
  • Does this persisting discrepancy actually matter, as both algorithms are above-chance accurate with a sufficiently large training set?

@jakewilliami jakewilliami self-assigned this Oct 31, 2020
@jakewilliami jakewilliami added the make correct Make something more correct to what it should be doing label Oct 31, 2020
@jakewilliami jakewilliami removed their assignment Jan 27, 2021
@jakewilliami jakewilliami added help-wanted and removed make correct Make something more correct to what it should be doing labels Jan 27, 2021
@jakewilliami
Copy link
Owner Author

[ef4015fe] There was another copy error, which changed results:

# Previous results
    Faces:    312/472 (66.10169491525424% of faces were recognised as faces)
 Non-faces 12894/19572 (65.87982832618026% of non-faces were identified as non-faces)

# After fixing copy error
    Faces:    235/472 (49.78813559322034% of faces were recognised as faces)
 Non-faces 15457/19572 (78.97506642141835% of non-faces were identified as non-faces)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant