Fix discrepancy with pythonic results #46

jakewilliami · 2020-10-30T13:42:03Z

There is a discrepancy in results of this algorithm compared to the Pythonic one. Both algorithms work, but produce different results.

jakewilliami · 2020-10-31T00:07:46Z

After f9b0719, I began to benchmark results of the basic.jl compared to Simon Hohberg's example.py. However, when I started this, I realised — something I had forgotten before now — though both algorithms correctly work, there was a discrepancy in the accuracy of my one compared to Hohberg's.

I spent around 12 hours straight last night, and into the wee hours, looking for the source of this discrepancy.

Upon inspection (results pushed in 3e9be4a), here is what I found:

There was a copy error in one calculation in the three_horizontal part of the cascade;
There was a copy error in the returning of get_vote;
There were a few off-by-one errors in create_features because I didn't realise that range(a) in Python is equivalent to 0:(a - 1) in Julia.

The former corrections I made did not change the results much. However, the last correction I am still unsure of.

One way to test that the procedure of algorithms (comparing Python to Julia) are the same is to simply test how many features it finds. Python found 2429 features for the standard test set, but Julia found 4520. Upon further inspection, the way I can fix this discrepancy is to change the x and the y in the inner-most loops to start searching from zero instead of one, and subtracting one from both end points.

However, even when the number of features obtained are the same, the results are different. They are not hugely different — as I say, both algorithms work. But they are different.

As it doesn't make sense for Julia to index from zero, I have kept the inner-most loops in create_features searching from one to the end point (see 269f26e). As a result, the number of features to search through is greater, and the results are closer to that of the Hohberg's algorithm.

One thing to note, which I found, was that Python reads directories seemingly randomly, where Julia reads directories alphabetically. I am unsure if this explains the persisting discrepancy, but it seems to change the results from obtaining the classification_error vector.

The question now is two-fold:

Is it correct that, even with the same number of features, the Julian algorithm differs from the Pythonic one? Is it explained by the order of images? Is this explained purely by indexing, or is there something else at play that I have not found?
Does this persisting discrepancy actually matter, as both algorithms are above-chance accurate with a sufficiently large training set?

jakewilliami · 2021-11-01T03:31:57Z

[ef4015fe] There was another copy error, which changed results:

# Previous results
    Faces:    312/472 (66.10169491525424% of faces were recognised as faces)
 Non-faces 12894/19572 (65.87982832618026% of non-faces were identified as non-faces)

# After fixing copy error
    Faces:    235/472 (49.78813559322034% of faces were recognised as faces)
 Non-faces 15457/19572 (78.97506642141835% of non-faces were identified as non-faces)

jakewilliami added a commit that referenced this issue Oct 30, 2020

Slight copy-error corrections (hopefully a step towards #46)

3e9be4a

jakewilliami added a commit that referenced this issue Oct 30, 2020

Updated starting point for position (need to describe in #46)

269f26e

jakewilliami self-assigned this Oct 31, 2020

jakewilliami added the make correct Make something more correct to what it should be doing label Oct 31, 2020

jakewilliami mentioned this issue Oct 31, 2020

Optimise memory usage: GPU array storage and other small changes #26

Open

jakewilliami removed their assignment Jan 27, 2021

jakewilliami added help-wanted and removed make correct Make something more correct to what it should be doing labels Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix discrepancy with pythonic results #46

Fix discrepancy with pythonic results #46

jakewilliami commented Oct 30, 2020 •

edited

Loading

jakewilliami commented Oct 31, 2020

jakewilliami commented Nov 1, 2021

Fix discrepancy with pythonic results #46

Fix discrepancy with pythonic results #46

Comments

jakewilliami commented Oct 30, 2020 • edited Loading

jakewilliami commented Oct 31, 2020

jakewilliami commented Nov 1, 2021

jakewilliami commented Oct 30, 2020 •

edited

Loading