NULL fields and tag removal #919

geigerzaehler · 2014-08-25T15:30:25Z

Deleting fields works by setting them to NULL in the database
The Zero plugin completely removes tags instead of storing the empty string (see zero: Remove tags instead of setting them to null #157).

sampsyo · 2014-08-25T16:30:04Z

Awesome; I think this is a great compromise—it's possible to mark something as "deleted" for the purposes of tags without exposing Nones all over the place to beets internals, which will be incredibly useful. Great idea.

One thing I'm slightly worried about is that this makes a store followed by a load "non-transparent". That is, previously, you could count on this:

print(a.somefield)
a.store()
a.load()
print(a.somefield)

always printing the same value twice. Now, if somefield had just been deleted, then you'll get None the first time and a normalized value the second time. This could come up, for example, in the modify command, where we ant to show the difference between the original and modified items, but the latter has not been though a store/load cycle so the None will be exposed.

I know it seems hacky, but maybe it would be useful to have separate state for indicating whether a field has been deleted? For example, Model could have a set called _null_fields that holds fixed field names that are missing values (i.e., they are NULL in the database and should be deleted from the tags).

geigerzaehler · 2014-08-25T16:39:23Z

I agree that the attributes should be invariant under store() and load(), but temporarily breaking the invariance should not be a problem. In the long run I think we should emit None values from the model. (Again the ReplayGain example comes to mind, where 0 means something different than None.) The _null_fields approach seems to be an unnecessary detour.

sampsyo · 2014-08-25T16:51:20Z

I know it's complicated, but I do think it's incredibly valuable for most fields (ReplayGain being an exception!) to avoid None values, at least in the steady state. It's a dynamically-typed version of Tony Hoare's "billion-dollar mistake": missing values are an exceptional case, so it's a shame to force all client code to do handle Nones when it just wants to, for example, print out item.title.upper().

If anything, I think I'd advocate for removing the field entirely from the object when it's unset. For example, del item.title would make subsequent a item.title access raise an AttributeError rather than silently returning None. This would at least make attribute presence behave the same for flexible attributes as for fixed attributes. We could even provide a separate API, like item.get('title'), that would be guaranteed to return a useful value.

I'll continue to ponder also, but this is a surprisingly tricky issue!

geigerzaehler · 2014-08-25T17:14:31Z

A tricky issue indeed, but I like the proposition in the second paragraph; it’s more pythonesque. Also, we already have a normalized API in place: the FormattedMapping.

sampsyo · 2014-08-25T17:21:20Z

Indeed, it could be that the only time you need "guaranteed success" is when formatting to a string, in which case item.formatted()['title'] would work nicely. The only question is whether there are lots places that need to get non-string normalized values: for example, if you want to sort by track number, it might be convenient to do items.sort(key=lambda i: i.track) and use 0 when the value is missing.

It's a pretty classic trade-off: we should endeavor to make the common case simple while keeping the uncommon case possible. If stuff like item.track is the common case, it would be a mistake to force all that code to either test for presence or use a more verbose API; we should instead make the "null-sensitive" version more verbose. But if in practice we mostly do item.get(key) anyway, then there's little harm in making basic access more complex.

geigerzaehler · 2014-08-25T18:06:17Z

for example, if you want to sort by track number, it might be convenient to do items.sort(key=lambda i: i.track) and use 0 when the value is missing.

It think i.get('track', 0) would be a better choice, because it is verbose. Using a different API than that of dict and mapping seems counterintuitive.

sampsyo · 2014-08-25T19:00:33Z

Yeah, behaving like a "normal" dict where values go away when they're deleted is a very strong argument—your translation of my example being a perfect instance where an explicit default is nicer. And it is indeed frustrating that flexible attributes are different from fixed attributes in this way; it really would be better if this could be considered an implementation detail.

Want to give that refactoring a try? We'll need to look for all cases in the code that assume the existence of fixed attributes and make them explicitly handle absence.

geigerzaehler · 2014-08-25T19:21:42Z

Want to give that refactoring a try? We'll need to look for all cases in the code that assume the existence of fixed attributes and make them explicitly handle absence.

Sure. What do you think of the following API

Model API

Actually two APIs: One for programmer input and one for user input.

model[key] Always returns same type or raises IndexError
model.key Same as above, but raises AttributeError
model.get(key, default=None) Same as above, but returns default
instead of raising. If the default keyword is not given, uses
Type.default.
model[key] = value Assigned value is validated, raises ValueError
on failure, uses Type.normalize()
model.key = value Same as above.
model.set(key, value) The value is a string of user input that is
parsed. If parsing fails, a ValueError is raised. Uses
Type.parse()

Types API

parse(value) Used by model.set()
normalize(value) Must be invertible, only simple type conversions,
raises ValueError, used by model[key] = value
default Used by model.get()
format(value) Always return unicode object, value might be None,
used by formatted.get().
to_sql(value) Used by model.store()
from_sql(value) Used by model.load()

Not sure if we need the last two.

Print API

formatted[key] Returns either string or raises IndexError
formatted.get(key, default=None) Uses type.format().

Invariants

model[key] = value
assert model[key] == value
model.store()
model.load()
assert model[key] == value

assert t.from_sql(t.to_sql(x)) == x
assert t.to_sql(t.from_sql(x)) == x

I don’t know if the second line is stricly necessary, or if you we will loose too much flexibility.

sampsyo · 2014-08-25T23:45:19Z

Yes, awesome! I like this direction.

Here are a few comments:

You mentioned Type.default, which I suggest amending to Type.normalize(None), which is what we currently use. For most (all?) of the types we currently have, this is exactly equivalent to Type.null.
I like the addition of model.set for completeness, but currently the only code that needs parsing is the modify command. And in that case, we take advantage of a nice performance optimization by doing all the string-parsing up front before doing a batch set on each object (obj.update(mods), on line 1276 of commands.py currently). So this might not currently be used if we want to keep that optimization.
Yes, also not sure if we need {from,to}_sql. Currently, all Python values are the same as their SQLite equivalents, except for filenames due to a necessary hack (see _bytes_keys).
I'm not sure either, but my intuition says that we may want the flexibility for model[key] = value ; model[key] == value to be false sometimes due to normalization. For example, it might be nice to coerce strs to unicodes when assigning into a string field, or raw datetimes to raw floats (or vice-versa) for date fields.

I'm excited! This seems like it could be a really clarifying simplification.

geigerzaehler · 2014-08-26T16:16:23Z

You mentioned Type.default, which I suggest amending to Type.normalize(None), which is what we currently use.

I was planning on just replacing missing values with a single value from Type.default, just like dict.get() does. The normalize() method is used for assigning fields and should not handle None. That’s what del model[key] is for.

For most (all?) of the types we currently have, this is exactly equivalent to Type.null.

default seems to me like a more descriptive name.

And in that case, we take advantage of a nice performance optimization by doing all the string-parsing up front before doing a batch set on each object (obj.update(mods), on line 1276 of commands.py currently). So this might not currently be used if we want to keep that optimization.

Its not really an optimization since we iterate and parse the modifications. With the new API it becomes

for key, val in mods.items():
    item.set(key, val)

so it’s actually shorter.

or raw datetimes to raw floats (or vice-versa) for date fields.

In this case I'd lean towards assigning either only datetimes or only floats and force the consumer to do the heavy lifting of conversion.

Yes, also not sure if we need {from,to}_sql. Currently, all Python values are the same as their SQLite equivalents, except for filenames due to a necessary hack (see _bytes_keys).

This seems to be the best argument for including the *_sql methods. Another is mtime and atime. In the item we might want to present them as datetimes but we have to serialize them to floats in the database.

sampsyo · 2014-08-26T16:32:41Z

The normalize() method is used for assigning fields and should not handle None. That’s what del model[key] is for.

Normalization already does handle None (that's most of its purpose). You're right that item.something = None should of course not delete the something field, but it is a nice way to reset to the default value for the field—something we don't otherwise have an API for.

This conversion/coercion stuff seems pretty orthogonal to the issue at hand, which is deletion—perhaps we should leave it in place for now to make this refactoring step more focused?

Its not really an optimization since we iterate and parse the modifications.

We're currently parsing the modifications once and then applying them N times to all the modified objects, without re-parsing. Your example would require re-parsing the modifications for every updated item (i.e., item.set implicitly parses). Perhaps I am missing something here?

geigerzaehler · 2014-08-27T13:56:44Z

Normalization already does handle None (that's most of its purpose). You're right that item.something = None should of course not delete the something field, but it is a nice way to reset to the default value for the field—something we don't otherwise have an API for.

You’re right, having an API for assigning default values makes sense. This has lead me to another issue. If don’t think that it makes sense for fixed fields to be missing. Otherwise something like this wouldn’t work as expected

item1.update(dict(item2))

The same issue arises with mediafiles which delete None valued tags.

This conversion/coercion stuff seems pretty orthogonal to the issue at hand, which is deletion.

I think they are pretty closely connected: We have to ask ourselves how deleted values are represented internally in the object and in the database—and this is handled by conversion and coercion. I think we should hijack this issue and do right.

We're currently parsing the modifications once and then applying them N times to all the modified objects, without re-parsing. [...] Perhaps I am missing something here?

It was me. I totally missed the outer loop over objects. We could provide a model.parse(key, value) method and model.update(dict) will iterate and use model[key] = value.

All this has lead me to the following conclusions.

We need to represent None values by None values internally and by NULL values in the database. Otherwise we can not pass a default value to model.get().
For fixed fields model[field] never raises a KeyError but might return None. Also del model[field] is the same as setting it to None.
normalize() should not handle None. If we assign None it is represented as None internally. If we use model[key] there is no need for normalization because the value is the same as the internal representation. There is also no need for normalizaion in model.get(): It just returns the same value as model[key] but replaces None with Type.default.

geigerzaehler · 2014-08-27T14:22:22Z

A basic implementation would look like this

geigerzaehler · 2014-09-10T13:06:48Z

This is now a working implementation. It is a bit hacky in parts but all tests pass. What do you think @sampsyo?

sampsyo · 2014-09-11T05:05:11Z

Hey @geigerzaehler; sorry for being so slow with this discussion! I've been a little busy with real-life stuff lately and therefore managing only to tread water with beets. I am working on formulating a better opinion here; I hope I'll have something smarter to say tomorrow.

geigerzaehler · 2014-09-11T13:18:22Z

Take your time—it’s quite a handful and I’m still not confident I didn’t break anything. I suggest you ignore the FormattedMapping stuff for now, since this is the most convoluted.

Removing a fixed field sets it to NULL in the database. If the model is loaded, all NULL values are passed to `Type.normalize()`, so attributes do not return `None`. Fixes beetbox#886.

We can now set model fields to `None`. `MediaFile` removes the tags corresponding to `None` fields. Hopefully fixes beetbox#157.

All field operations of a model use the methods of a `Type` instance to provide custom handling. Field operations include getting, setting, deleting and conversion to and from SQL. * There is a one-to-one mapping between SQL values and the values stored in the model. Conversion is done with `Type.from_sql()` and `Type.to_sql()`. `NULL` values in SQL are directly mapped to `None` values in the model * `Model[field]` retrieves the internal value stored in the model. For fixed field this either gives a value or `None`. For flex fields this may raise a `KeyError`. * `Model.get(field, default)` uses `Type.default` to replace missing and `None` values * `Model[field] = value` uses `Type.normalize()` to make sure internal values are consisten. Normalization is skipped for `None`. * `Model.set(field, value)` excpects a string value and uses `Type.parse()` to convert and normalize it to an internal value.

This abstraction allows us to generalise a lot of code. It also makes the the `normalize` method, and by extension assignment of values in models, more restrictive. I therefore had to fix a few test cases. For a description of the `model_type` property, see its documentation.

sampsyo · 2014-09-13T07:02:55Z

Thanks for your patience, and all the awesome work on this refactoring effort!

I think I have to advocate again for raising an exception for missing fields instead of returning None. I remain too concerned about the effect of exposing nulls silently throughout beets. In my experience, this kind of silent propagation of bad values can be subtle and pernicious. I just debugged a similar (unrelated) issue today. It's a troublesome anti-pattern for buggy code to not result in an error and instead just do the wrong thing.

There's also the other aforementioned issue of symmetry between fixed and flexible fields. It would be extremely valuable to make the two kinds of attributes as similar in behavior as possible—it should be possible for client code to be unaware of the distinction. The delete-means-None semantics breaks this.

Even so, there are other good pieces in here that we should add individually—from_sql and to_sql, for example, is definitely a good idea. And I'm sure lots of the work you have here in auditing for unchecked field accesses will be useful in an exception-triggering design too.

What do you think? I know there's the not-insignificant problem of making MediaFile work with missing fields. Perhaps we can make this a longer-term thing—we can take it one step at a time, maybe starting with MediaFile.

geigerzaehler · 2014-09-13T15:12:27Z

This is a mere proposal, so I’m perfectly fine if we just pick out the good parts. To be honest I think it has become a little to big to be merged (one sometimes gets carried away 😉) and I would like to split the discussion. There are three parts to this.

Discussing and implementing the semantics of the Model class.
Refactoring to make the code more transparent.
Working with media files to delete tags.

The first step should be 2. I will open another PR with some parts from this one and elaborate why I think it’s a good idea.

Then, I think, we should continue the discussion about semantics here. Maybe we could start be recording the current state. This involves the the interface for MediaFile that allows us to delete tags. For the discussion about new semantics code examples should be key. Not example implementations but examples of behavior, like informal test cases.

sampsyo · 2014-09-13T18:50:11Z

Fantastic—I look forward to seeing the ideas for refactoring; we could certainly use some more transparency in the dbcore stuff.

Describing the semantics is also critical. I'll start compiling some notes too. If we end up with something useful, we should consider adding it to the Sphinx docs too.

jtpavlock · 2020-07-08T22:50:55Z

Is this still relevant?

sampsyo · 2020-07-09T11:45:37Z

It's a super complicated issue and supporting tag deletion would still be great to do someday, but yes, I do think we can close this effort now.

sampsyo force-pushed the master branch 2 times, most recently from 2ded210 to 4b11eed Compare August 27, 2014 17:55

geigerzaehler force-pushed the none_fields branch 2 times, most recently from fd3f7ee to 4038b75 Compare September 10, 2014 12:58

sampsyo force-pushed the master branch from 72fd76b to 832f34c Compare September 11, 2014 02:38

Thomas Scholtes added 9 commits September 11, 2014 22:48

Modify command can remove fixed fields.

6e22ae7

Removing a fixed field sets it to NULL in the database. If the model is loaded, all NULL values are passed to `Type.normalize()`, so attributes do not return `None`. Fixes beetbox#886.

Zero plugin removes tags

11d4223

We can now set model fields to `None`. `MediaFile` removes the tags corresponding to `None` fields. Hopefully fixes beetbox#157.

FormattedMapping raises KeyError on unset fields

2bb78ad

Migrate library models to new dbcore API

a0d657e

Add new Model.parse() class method

0edb148

Formatter only fallsback for specific fields

0e8c50b

Fix typo in tests

9447410

Do not format user defaults

40acd92

Thomas Scholtes added 7 commits September 11, 2014 22:51

Migrate rest of beets to new DB API

31352e8

Use Model._to_sql

60a4749

PEP8 fixes

86ae86f

Add tests that expose bug with integer queries on flexible fields

b9b7391

Fix integer queries on flexible fields from 23362c9

9a0b581

Boolean types are correctly loaded from flex fields

2e412be

geigerzaehler force-pushed the none_fields branch from 326f6e7 to 2e412be Compare September 11, 2014 20:54

geigerzaehler mentioned this pull request Sep 11, 2014

Plugins and flexible field types #947

Closed

2 tasks

geigerzaehler mentioned this pull request Sep 13, 2014

Types for flexible attributes #951

Merged

geigerzaehler mentioned this pull request Sep 14, 2014

Refactor DBcore without changing behaviour #959

Merged

sampsyo added this to the 1.3.10: MediaFile++ milestone Oct 25, 2014

sampsyo mentioned this pull request Nov 30, 2014

cannot modify albumartist on single files #1124

Closed

sampsyo mentioned this pull request Nov 20, 2016

Configure metadata (itemfields) to be used #2273

Closed

flesk mentioned this pull request Mar 28, 2017

modify: Field deletion (foo!) should work for fixed attributes by setting them to null #2497

Closed

sampsyo mentioned this pull request May 15, 2017

RFC: replaygain: R128 support #2560

Merged

sampsyo closed this Jul 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NULL fields and tag removal #919

NULL fields and tag removal #919

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 26, 2014

sampsyo commented Aug 26, 2014

geigerzaehler commented Aug 27, 2014

geigerzaehler commented Aug 27, 2014

geigerzaehler commented Sep 10, 2014

sampsyo commented Sep 11, 2014

geigerzaehler commented Sep 11, 2014

sampsyo commented Sep 13, 2014

geigerzaehler commented Sep 13, 2014

sampsyo commented Sep 13, 2014

jtpavlock commented Jul 8, 2020

sampsyo commented Jul 9, 2020

NULL fields and tag removal #919

NULL fields and tag removal #919

Conversation

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 25, 2014

Model API

Types API

Print API

Invariants

sampsyo commented Aug 25, 2014

geigerzaehler commented Aug 26, 2014

sampsyo commented Aug 26, 2014

geigerzaehler commented Aug 27, 2014

geigerzaehler commented Aug 27, 2014

geigerzaehler commented Sep 10, 2014

sampsyo commented Sep 11, 2014

geigerzaehler commented Sep 11, 2014

sampsyo commented Sep 13, 2014

geigerzaehler commented Sep 13, 2014

sampsyo commented Sep 13, 2014

jtpavlock commented Jul 8, 2020

sampsyo commented Jul 9, 2020