Fixed & Reactivated AutoFitImageClassificationTrainTest hanging by freeing old Tensor objects#4978
Fixed & Reactivated AutoFitImageClassificationTrainTest hanging by freeing old Tensor objects#4978mstfbl wants to merge 25 commits intodotnet:masterfrom
Conversation
…Container in used cases
…cationModelParameters
|
I am now disposing of Tensor models and their C-library objects in |
6e64d60 to
8660ecc
Compare
…ificationTrainTest-memoryFix
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
…://github.com/mstfbl/machinelearning into AutoFitImageClassificationTrainTest-memoryFix
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/API/ExperimentResults/CrossValidationExperimentResult.cs
Outdated
Show resolved
Hide resolved
…t .Dispose() calls
With the current implementation, modelFileInfo is never null as in the case where the given root directory is null, a temp file in a temp path is returned. In reply to: 614421392 [](ancestors = 614421392) Refers to: src/Microsoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs:22 in 087c0d5. [](commit_id = 087c0d5, deletion_comment = False) |
src/Microsoft.ML.AutoML/API/ExperimentResults/ExperimentResult.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.AutoML/API/ExperimentResults/ExperimentResult.cs
Outdated
Show resolved
Hide resolved
Yes, but that knowledge does not exist here in this function. modelFileInfo gets passed without any validation to ModelContainer. Tomorrow, someone else may make a code change without realizing that. In reply to: 614783589 [](ancestors = 614783589,614421392) Refers to: src/Microsoft.ML.AutoML/Experiment/Runners/RunnerUtil.cs:22 in 087c0d5. [](commit_id = 087c0d5, deletion_comment = False) |
…ificationTrainTest-memoryFix
Codecov Report
@@ Coverage Diff @@
## master #4978 +/- ##
==========================================
- Coverage 75.81% 73.32% -2.49%
==========================================
Files 993 1007 +14
Lines 181224 188128 +6904
Branches 19510 20246 +736
==========================================
+ Hits 137387 137937 +550
- Misses 38538 44665 +6127
- Partials 5299 5526 +227
|
AutoFitImageClassificationTrainTestis occasionally hanging, even after PR #4939 . The issue here is that Tensor objects saved inITransformer model(now renamed as 'estimatorModel`) are not being automatically freed by C#'s Garbage Collector, as these Tensor objects are made in TensorFlow's C libraries.Edited on 4/9/2020: This PR makes ExperimentResult and CrossValidationExperimentResult implements IDisposable to free remaining C Tensor objects in memory in a deterministic manner. This method of freeing these objects ensures the user will not face null-reference/use-after-free errors when trying to access the model, as this clean up is done after GC runs.
Edited on 4/9/2020: I confirmed that this fixes the hanging "out of memory" and/or "long running test" issues by running
AutoFitImageClassificationTrainTestfor 100 iterations, in addition to running all the other unit tests in this build. In all of these builds, none of the issues described occur. These builds all time-out because running 100 iterations ofAutoFitImageClassificationTrainTesttakes more than 1 hour.I have also reactivated the
AutoFitImageClassificationTrainTestunit test with this fix.**Edit: **