Skip to content

Conversation

@rakons
Copy link
Contributor

@rakons rakons commented Oct 2, 2019

This PR adds an option to filter duplicates when direct loading data from file and fcb backend.
The average complexity should be N ^ sqrt(2), thanks to the fact that not all the data is read on every entry, but only all the data left.

@zephyrbot zephyrbot added area: native port Host native arch port (native_sim) area: Settings Settings subsystem area: File System area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test labels Oct 2, 2019
@zephyrbot
Copy link

zephyrbot commented Oct 2, 2019

All checks are passing now.

checkpatch (informational only, not a failure)

-:752: WARNING:LONG_LINE: line over 80 characters
#752: FILE: tests/subsys/settings/functional/src/settings_basic_test.c:511:
+	zassert_equal(strlen(ldata->v) + 1, len, "e: \"%s\", a:\"%s\"", ldata->v, buf);

-:758: WARNING:LONG_LINE: line over 80 characters
#758: FILE: tests/subsys/settings/functional/src/settings_basic_test.c:517:
+	zassert_false(strcmp(ldata->v, buf), "e: \"%s\", a:\"%s\"", ldata->v, buf);

- total: 0 errors, 2 warnings, 774 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

@rakons rakons force-pushed the settings_filter_duplicates branch from 1d7ca54 to 32fea5d Compare October 3, 2019 06:23
@Laczen
Copy link
Contributor

Laczen commented Oct 3, 2019

@rakons, what do you mean by filter duplicates ? Is the idea to call the settings_handler just for the latest entry in the fs, or is it something else ?

@rakons
Copy link
Contributor Author

rakons commented Oct 3, 2019

@Laczen - yes. Ignore previous (old) copies and call the cb function only for the final data.

@Laczen
Copy link
Contributor

Laczen commented Oct 3, 2019

@rakons, OK. In that case I think you can achieve the same results without any changes to fcb. What you are trying to achieve is similar to what is done in the compress method. In that method a copy of an entry is done when there is no later entry in the fs that has the same name. In your case you would only call the load if there is no later entry in the fs with the same name. The load can be modified very easily to do the same functionality as the compress, you can remove the fcb_walk routine completely. For the nffs case the same methodology can be used. I would propose the filter setting to be a configuration option. By introducing a load variable that is set to true at the start for every entry and then walking over the remaining fs to see if you need to set load to false (if you find another entry with the same name) it is easy to use the configuration option to decide if you need to walk over the remaining items or not.
At a high level it would look like:

while (fcb_getnext(&cf->cf_fcb, &loc1.loc) == 0) {
   read name1
   load = true;
#if enabled(SETTINGS_SFCB_FILTER_DUPLICATE)
   loc2 = loc1;
   while (fcb_getnext(&cf->cf_fcb, &loc2.loc) == 0) {
       read name2
       if name1 equal to name2 {
           load = false;
           break;
       }
    }
#endif
   if (load) {
      call settings_handler
   }
}

@rakons
Copy link
Contributor Author

rakons commented Oct 3, 2019

It makes a sense. Limiting callbacks levels inside settings is always good move. Shouldnt we lock fcb while processing it using fcb_getnext?

@Laczen
Copy link
Contributor

Laczen commented Oct 3, 2019

It makes a sense. Limiting callbacks levels inside settings is always good move. Shouldnt we lock fcb while processing it using fcb_getnext?

@nvlsianpu, ^^^ should fcb be locked ?

@rakons rakons force-pushed the settings_filter_duplicates branch 2 times, most recently from 5a6eeb5 to 0309a35 Compare October 4, 2019 13:15
@rakons
Copy link
Contributor Author

rakons commented Oct 4, 2019

Reworked limiting callbacks level by @Laczen sugestion.
Currently fcb is not locked during data traversing. Waiting for @nvlsianpu to feedback if it should be locked. It seems that we are missing some locking while data compressing anyway inside fcb.

@nvlsianpu
Copy link
Contributor

It makes a sense. Limiting callbacks levels inside settings is always good move. Shouldnt we lock fcb while processing it using fcb_getnext?

@nvlsianpu, ^^^ should fcb be locked ?

I have been Looking deeply into code. Answer to this question is yes. Need to introduce few more lock into code for ensure multiple access compatibility

I have been Looking deeply into code. Answer to this question is yes. Need to introduce few more lock into code for ensure multiple access compatibility. I Discussed this with @rako of-channel.

@Laczen
Copy link
Contributor

Laczen commented Oct 4, 2019

It makes a sense. Limiting callbacks levels inside settings is always good move. Shouldnt we lock fcb while processing it using fcb_getnext?

@nvlsianpu, ^^^ should fcb be locked ?

I have been Looking deeply into code. Answer to this question is yes. Need to introduce few more lock into code for ensure multiple access compatibility

I have been Looking deeply into code. Answer to this question is yes. Need to introduce few more lock into code for ensure multiple access compatibility. I Discussed this with @rako of-channel.

@rakons, @nvlsianpu, are you sure about the locking ? AFAIR, locking in fcb is only required when doing a write (nothing else changes the fs), when a writing is done fcb allocates the space and locks the fs (fcb_append), data is written during this lock (first unlocking it and then doing the write, afterwards lock it back again). when the writing is finished (fcb_close) the last part is written and then the fs is unlocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed addition IMO does not justify an API change. Either enable it by default or use the filter_duplicate option only on settings_load_direct().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make a big sense to use it when all the settings are loaded at startup. It was the idea - do not change an external API. If somebody knows it is here and wishes to use it - set the specific field in the settings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are changing an external API. Settings can be configured to use a custom backend, for these you are changing the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvlsianpu - What is your opinion? Anybody more we can vote if it is worth to change API between settings and backend or is is better to add another backend function?
The first solution would fail during compilation if not adjusted. The second can lead to runtime error but if the functionality is not used, it would not require any adjustment from the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakons, there are other options: enable it by default (makes the load a little longer) or use config to enable it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would introduce this API change - it is no big change. Making it optional makes things harder to be maintained - I believe adaptation of a current custom back-end will be easy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvlsianpu, why not enable it by default ?

Copy link
Contributor

@nvlsianpu nvlsianpu Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API was introduced like ~2 weeks ago by #19077

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvlsianpu ?? There is a new change used for passing the bool filter_duplicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be if (rc <= 0) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If 0 is returned I wish to set pass_entry to 1 and continue the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something strange with the logic here, if no duplicates are found or if a duplicate is found you want to set pass_entry to 1 (unless the description of settings_fcb_check_duplicate is wrong).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass_entry = (rc == 0); - bracket added. If duplicates are found then rc = 1. If no duplicates is found - rc = 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this makes it more clear, but I still think it is not correct. Whenever an error is occurring during the find duplicates there is a jump out of the loop, while it should just be a continue. It should still load whatever it has found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could easily use the exact same methodology in fcb, doing the same thing makes it easier to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather pass the same variable name into file - is_duplicate does not explain everything and I am worried that it might be used for additional checking in the future. pass_entry explains what this variable does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I would not use is_duplicate, but use a boolean load = true. This explains everything, you set load to false when not needed. The methodology in the file is a lot simpler than in the fcb backend, so keep this methodology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think so. Moreover - to much of tabs is required in files and it makes it hard to keep line length limit. Splitting it into two functions line in fcb backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very easy to put this in a function for the file as well. This is no reason. Keep the methods equal! Just have this function return 0 if it finds a match, and an error code when it doesn't (a error: end of file or whatever error means loading is needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason to flatten the function return value so much. It might make sense to react in the same way when an error occur and no match is found. The functionality is the same now in both backends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakons, I do see a reason for both backends to flatten out the function return. At the moment settings reads as much as it can, when you don't flatten out the error settings will return no value at all because there is a error later in the backend, even when this error has nothing to do with the value being read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look into the current code?

@nvlsianpu
Copy link
Contributor

@Laczen fcb settings compression: when one thread is appending to scratch, other thread might append to, so it might occur that it causes even scratch sector full filling before compression is done.

What i see is that the fcb instance is preserved against its state violation., but not against something like that.

@Laczen
Copy link
Contributor

Laczen commented Oct 4, 2019

@Laczen fcb settings compression: when one thread is appending to scratch, other thread might append to, so it might occur that it causes even scratch sector full filling before compression is done.
What i see is that the fcb instance is preserved against its state violation., but not against something like that.

True, but fcb cannot protect against this, I think the settings lock protects against this. But of course this means you can only use the fcb fs for settings.

@rakons rakons force-pushed the settings_filter_duplicates branch 2 times, most recently from 420f0d3 to e072e61 Compare October 7, 2019 10:57
Copy link
Contributor

@Laczen Laczen Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue ? Or even better, remove the if, when length is zero it will set name2[0] = '\0'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, why not.

Copy link
Contributor

@fnde-ot fnde-ot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should consider deprecating backends storing duplicates at some point, but meanwhile this change makes sense to me. Nice job!

@Laczen
Copy link
Contributor

Laczen commented Oct 10, 2019

When the filter_duplicate is a option to the load routine, how do I decide to use it or not ?

This is left to the user and a clear answer needs to be provided.

@rakons
Copy link
Contributor Author

rakons commented Oct 11, 2019

The output for the meeting with @carlescufi and @jhedberg:
We are moving towards the backends that does not present the entity history.
FCB stays here for backward compatibility, FILE would always be there, but we does not expect performance from FILE backend.
What we should do:

  • Make an doc update that from 2.1 we does support the backends that presents the history (then we may remove some complexity from the modules that uses settings).
  • Filter out deleted entries.
  • Add filtering also for global loading function. Does not provide any optional flag in KConfig - this is now the requirement for the backend.

@rakons rakons force-pushed the settings_filter_duplicates branch from 3f41381 to 69f3d28 Compare October 11, 2019 13:30
This commit adds a possibility to activate duplicates filtering
during direct loading.

JIRA: NCSDK-3017

Signed-off-by: Radoslaw Koppel <[email protected]>
This commit adds a possibility to activate duplicates filtering
during direct loading.

JIRA: NCSDK-3017

Signed-off-by: Radoslaw Koppel <[email protected]>
@rakons rakons force-pushed the settings_filter_duplicates branch from 69f3d28 to b07ea3a Compare October 11, 2019 14:44
@rakons rakons requested a review from dbkinder as a code owner October 11, 2019 15:42
@rakons
Copy link
Contributor Author

rakons commented Oct 11, 2019

Seems done
@carlescufi - who can check the documentation?

@carlescufi
Copy link
Member

Seems done
@carlescufi - who can check the documentation?

@dbkinder will, he is automatically added when you edit .rst files.

@carlescufi
Copy link
Member

@Laczen could you take another look?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storied -> stored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by @dbkinder comment below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason you left the filter_duplicates as a parameter for this internal API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used to search the duplicates when storing. Storing procedure can have simpler algorithm to remove the duplicates with a * (x) complexity. It is left here for storing performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rakons ok, makes sense, thanks for the clarification!

Copy link
Contributor

@dbkinder dbkinder Oct 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity and fixing some misspellings, how about instead:

This means that the newest data entity is stored after any
older existing data entities. Starting with Zephyr 2.1, the
back-end must filter out all old entities and call the
callback with only the newest entity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Please check if this is what you meant.

@carlescufi
Copy link
Member

@rakons can you fix the checkpatch issues?

This commit updates the settings backend documentation
to clearly state that it cannot provide old entities
before the final one.

Signed-off-by: Radoslaw Koppel <[email protected]>
@rakons rakons force-pushed the settings_filter_duplicates branch from 96f47ad to d956035 Compare October 14, 2019 09:13
@carlescufi carlescufi merged commit bb9c453 into zephyrproject-rtos:master Oct 14, 2019
rakons added a commit to rakons/zephyr that referenced this pull request Oct 21, 2019
This commit fixes the FCB delete test after PR zephyrproject-rtos#19541.
Now the entity callback is not called on deleted element.

Issue: zephyrproject-rtos#19963

Signed-off-by: Radoslaw Koppel <[email protected]>
nashif pushed a commit that referenced this pull request Oct 21, 2019
This commit fixes the FCB delete test after PR #19541.
Now the entity callback is not called on deleted element.

Issue: #19963

Signed-off-by: Radoslaw Koppel <[email protected]>
tejlmand pushed a commit to tejlmand/zephyr that referenced this pull request Oct 30, 2019
This commit fixes the FCB delete test after PR zephyrproject-rtos#19541.
Now the entity callback is not called on deleted element.

Signed-off-by: Radoslaw Koppel <[email protected]>
Signed-off-by: Andrzej Puzdrowski <[email protected]>
@rakons rakons deleted the settings_filter_duplicates branch July 28, 2023 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Documentation area: File System area: native port Host native arch port (native_sim) area: Settings Settings subsystem area: Tests Issues related to a particular existing or missing test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants