Skip to content

Conversation

@shaofei007
Copy link
Contributor

What changes were proposed in this pull request?

         class FileInputDStream

 [line 162]   protected[streaming] override def clearMetadata(time: Time) {
    batchTimeToSelectedFiles.synchronized {
      val oldFiles = batchTimeToSelectedFiles.filter(_._1 < (time - rememberDuration))
      batchTimeToSelectedFiles --= oldFiles.keys

The above code does not remove the old generatedRDDs. "super.clearMetadata(time)" was added to the beginning of clearMetadata to remove the old generatedRDDs.

How was this patch tested?

At the end of clearMetadata, the testing code (print the number of generatedRDDs) was added to check the old RDDS were removed manually.

### What changes were proposed in this pull request?
```DStreams
   class FileInputDStream

  protected[streaming] override def clearMetadata(time: Time) {
    batchTimeToSelectedFiles.synchronized {
      val oldFiles = batchTimeToSelectedFiles.filter(_._1 < (time - rememberDuration))
      batchTimeToSelectedFiles --= oldFiles.keys

```
The above code does not remove the old generatedRDDs. "super.clearMetadata(time)" was added to the beginning of clearMetadata to remove the old generatedRDDs.

## How was this patch tested
At the end of clearMetadata, the testing code (print the number of generatedRDDs) was added to check the old RDDS were removed.
[SPARK-21357][DStreams] FileInputDStream not remove out of date RDD
@srowen
Copy link
Member

srowen commented Jul 23, 2017

Seems reasonable, but, I am sort of not sure why it hasn't come up before. @tdas?

@SparkQA
Copy link

SparkQA commented Jul 23, 2017

Test build #3848 has finished for PR 18718 at commit 5a34428.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaofei007
Copy link
Contributor Author

@tdas
how do you think about this PR please?

@asfgit
Can we close it please?

@srowen
Copy link
Member

srowen commented Jul 28, 2017

@shaofei007 I actually think your fix is correct, so I wouldn't close it. The comment says: /** Clear the old time-to-files mappings along with old RDDs */ But only the superclass method clears the old RDDs. In the absence of any comments to the contrary, I'd merge this, at least to master

@srowen
Copy link
Member

srowen commented Jul 29, 2017

Merged to master

@asfgit asfgit closed this in 60e9b2b Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants