Replies: 10 comments 28 replies
-
You should perhaps offer links to "Tout Culture and Le Vitrine". Le Vitrine should be spelled La Vitrine. |
Beta Was this translation helpful? Give feedback.
-
Since "Validating the timezone offset value will be challenging" should you carry on assigning it a weight of "12" at the risq of having many ERRORS in generating the score ? While startDate is considered required shouldn't it be left aside in a first phase and still the exercice allow to generate an indicative score to make your proof of concept ? |
Beta Was this translation helpful? Give feedback.
-
I think that location.address.postalCode should be assigned a higher weight value than 3. Location is a required (not a recommended value) and from my understanding of Artsdata, it can be used just as easily as a sameAs URI to automatically disambiguate location values. It is also much more likely that an organization will have a postal code than a sameAs value in their structured data. |
Beta Was this translation helpful? Give feedback.
-
There is a typo in the datatype. In Artsdata the datatypes are xsd:dateTime or xsd:date. The Artsdata pipeline sets the datatype to either xsd:date for dates (2024-08-02) or xsd:dateTime (2024-08-02T20:00:00-04:00) depending if there is time. This is important in Artsdata for searching and filtering by date/times. However, the JSON-LD @context of schema.org sets the datatype of all Event startDates and endDates to schema:Date, and not xsd:dateTime nor schema:DateTime. Since, in the wild, the vast majority of JSON-LD uses the schema.org @context, we really only need to check the syntax, and not the datatype. See my comments on detecting errors in timezone offset #123 (comment). In this project I will only be able to fix missing timezones, and not incorrect timezones. @fjjulien My questions:
Syntax should be CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
|
Beta Was this translation helpful? Give feedback.
-
Je comprends l'idée d'assigner plus de points aux propriétés recommandées, mais je me demande si c'est utile de le faire pour les propriétés obligatoires... Je m'explique: est-ce qu'il ne faudrait pas plutôt donner automatiquement zéro si les propriétés obligatoires ne sont pas remplies correctement? C'est la «barrière à l'entrée». Et on pourrait assigner un nombre fixe de points (ex: 30) pour l'ensemble des propriétés obligatoires. Ensuite, les propriétés supplémentaires ajoutent de la valeur. Dans le scénario actuel, si j'interprète bien:
Avec ma proposition, le premier événement aurait 0 (zéro) et le deuxième le nombre fixe (par exemple 30). On pourrait utiliser un autre approche, mais je pense qu'il faut absolument que les événements qui ne respectent pas les obligations ne puissent pas avoir plus de points que ceux qui les respectent. |
Beta Was this translation helpful? Give feedback.
-
Revised weighting proposalTaking into account all the excellent feedback provided so far, I would like to propose this revised weighting. It introduces a new category worth 4 points for properties that are deemed useful for disambiguation. Required properties are given a weight of 8 points and recommended properties are brought down to a weight of 2 points to address concerns that recommended properties may collectively have a higher cumulative value than required properties. I also propose to integrate @christianroy's proposal to give a null score if an event does not have all three required properties. Required + disambiguation property: 8+4= 12 points
Required properties: 8 points
Disambiguation properties: 4 points
Recommended properties: 2 points
Other properties: 1 point
Under this proposal:
|
Beta Was this translation helpful? Give feedback.
-
Although the scores are meant primarily for internal needs, they will sooner or later start to circulate. We may for example include the score in feedback to Digital Discoverability Program participants. However, a score on its own, without any scale or interpretation note is meaningless. In order to help external users make sense of the structured data scores, I think we should be able to provide qualitative interpretations base on score tiers. I would like to propose these suggested interpretations:
Please share your comments, and propose edits. |
Beta Was this translation helpful? Give feedback.
-
That's unfortunate, because properties such as endDate are more than mere
nice-to-have. Would it push the project behind if you at least included the
"optional" properties from the Artsdata-specific instructions?
If it does significantly impact the critical path, we will be okay with the
algorithm in the current format.
fj
Le jeu. 5 sept. 2024, 2 h 45 p.m., Gregory Saumier-Finch <
***@***.***> a écrit :
… @fjjulien <https://github.com/fjjulien> @christianroy
<https://github.com/christianroy> @dlh28 <https://github.com/dlh28>
@Liverace <https://github.com/Liverace> This weighting proposal has been
implemented with the exception of "Other properties: 1 point" .
https://github.com/culturecreates/artsdata-score
@fjjulien <https://github.com/fjjulien> I propose removing "Other
properties: 1 point" from scope because it is too open ended.
To try it out
You can test this on individual webpages by going to artsdata.ca, pasting
a webpage url into the top right search box, then in the options for
"External resources" click *dereference*, and then clickthe link *compute
score*. This will reload the webpage with the score added into the Event
data (keep scrolling down).
The next task is to enable a batch of webpage urls.
—
Reply to this email directly, view it on GitHub
<#120 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASW5GOUY3URKCL4WKAQEY6LZVCRDNAVCNFSM6AAAAABLWQBN2KVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTANJWGEYTSMA>
.
You are receiving this because you were mentioned.Message ID:
<culturecreates/artsdata-data-model/repo-discussions/120/comments/10561190
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Representing the structured data score over a 100-point scaleAn absolute score with a value between 0 and 74 meets all the research needs for the digital discoverability measurement project. However, since the discoverability measurement project team is planning to correlate the structured data score with a SERP score, which will likely have a different scale, it may become difficult for a human to make sense of scores on different scale. In order to make it easier for humans to interpret the structured data score and to compare it with a SERP score, we would like both scores to be transformed into matching 100-point scales. @saumier Could you implement a rule of three to transform the 74-point score into a 100-point score for human interpretation? No decimals are needed (you may round it to the integer). Rather than giving the minimal qualification score (28 for all three required property) a percentage of 39 (i.e. 28/74), we would like this qualification score to be ascribed a percentage value of 50 (and all higher scores distributed between 50 and 100). Note: This request falls in the "nice-to-have" category. If it is too difficult to display this percentage score in Nebula and to deliver it through the batch process (on top of the absolute score), we could easily run the rule of three outside of Artsdata. If you stumble upon a block, please do not waste countless hours on it this request. |
Beta Was this translation helpful? Give feedback.
-
@fjjulien @christianroy The batch tool is ready to be used. I'd like to make a demo to see if it meets the needs of this project. Is there a time you would like to meet for a demo? Note: the tool can also load websites with injected JSON-LD by rendering the javascript in a headless browser (this was not available at the start of this project but has been developed since). |
Beta Was this translation helpful? Give feedback.
-
Context
Goal
Overall goal: Quantify the quality of performing arts events’ structured data.
Specific goal: Design an algorithm that will assign numeric value (i.e. a “score) to a given performing arts event’s structured data. The score should be a reflection of the extent to which the structured data is suited for:
In the context of the Artsdata project, the reuse of structured data in events listings such as Tout Culture and La Vitrine represents a particularly important use case
Specifications
The algorithm should assign a point value to each attribute-value pair found within an event object. The point value (or ‘weighting’) should consider the importance of each attribute-value pair, relative to the above specific goals (disambiguation, and consumption-driven use cases). Those attribute-value pairs that are deemed critically important for these objectives should be given a higher weight. For example, some properties are “required” in Artsdata because they are essential for disambiguation. These properties should be given the highest weight, because disambiguation is a prerequisite for most use cases.
The algorithm’s assessment for each attribute-value pair should ideally be more than a simple true/false based on the presence or absence of a property. For those properties that are deemed important (for example, “required” or “recommended” status in Artsdata), the algorithm should also consider if the value is an expected object for the property (for example, the value for
location
should be an@type
Place) or if it is in the right format (for example, the value forstartDate
should follow the syntax of ISO 8601). It may also need to verify errors, such as the@id
andurl
properties having the same value.Initial weighting proposal
As a starting point for discussion, I would like to propose the following point values / weightings to specific attribute-value pairs:
Weight 12
Weight 6
Weight: 3
url
value)Weight: 1
Notes:
@id
properties for nested objects. Currently, very few sites assign an@id
to nested objects, and, of those who do, the value is rarely a valid URI. Unless the algorithm can easily assess the validity of the values, it is better not to assign a nested@id
a weighting higher than 1.performer.type
, website CMSs often automatically assign the same type to all performer objects (for example, “Person” or “PerformingGroup”), regardless of the actual nature of the performer entity. While it would be technically possible to design an algorithm that would be able to guess the performer type based on a reconciliation of theperformer.name
string, I believe this would be too much work for the potential benefit. For version 1 of the algorithm, we should blindly accept any expected value (i.e. Person or Organization), and not attempt to make a judgement on the validity of the value.Beta Was this translation helpful? Give feedback.
All reactions