Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enumerated-value event-TYPE #303

Open
tychonievich opened this issue Apr 20, 2023 · 2 comments
Open

Add enumerated-value event-TYPE #303

tychonievich opened this issue Apr 20, 2023 · 2 comments

Comments

@tychonievich
Copy link
Collaborator

(Copied from conversation about PR #301)

          # Introducing New Attribute/Event Tags vs Better Typification

I have a few points that need to be outlined and investigated as part of this discussion.

  1. GEDCOM files “in the wild”
  2. Typification of base Attributes/Events.
  3. Internationalization, cultural sensitivity, subject expertise

In the Wild GEDCOM

This body seems to rely heavily on GEDCOMs from the wild to drive its decision-making processes. I agree that having a collection of sample/example files is an important part of the data gathering process, specifically as a starting point for understanding what tags are used. However, have we done any analysis on these files to determine their validity and usefulness? In statistical analysis we must be sure that the members of the sample represent the entire population. The questions I have are:

  1. What is the “origin application” for each of these files?
  2. Are they a good cross section of all applications?
  3. Have the files been checked for origin date from the producing software?
  4. Are they culturally/regionally diverse?
  5. Do these files produce “custom tags” that have valid GEDCOM alternatives?

Typification of Attributes and Events

In data design we struggle with containing the breadth and width of the data structure. A serious look at the types of data being represented to prevent the breadth of the Attributes and Events needed for inclusion in any future Standard. Part of the job of considering any new addition Attribute/Event is the possibility of the addition being similar to or a continuation of another Attribute/Event. Typification of Event/Attribute would allow for queries to group similar events together in a single query.
There are several examples of this in the suggested additions from GEDCOM-X.

  • The addition suggests multiple new “union” types, but MARR can be Typified as either, a) Religious, b) Civil, c) Common Law, d) Partnership, e) other/phrase
  • The additions suggest multiple “military based” events. A single event MILT with Typification of, a) induction, b) commission, c) discharge, d) engagement, e) registration, f) deployment, g) other/phrase (NOTE: GEDCOM X has a dozen or more of these, some are subset of others (there are over 300 separation codes as part of a “discharge”. AWOL, Desertion being one)
  • The BURI event can be use in several ways, a) Inhumation, b) Burial at Sea, c) Cremation, d) Donation to Research, e) Lost, f) Natural, g) Green, h) Burial Tree or Scaffolding, i) cave, j) other/phrase
  • The additions suggest “health based” events. A singular HELT event with Typification of, a) surgery, b) illness, c) hospitalization, d) other/phrase, with a set of definitions that explain the use cases could work to reduce the breadth of new events
  • The additions suggest “travel related” events. I would like to rethink this entire concept. I’m not a fan of having separate IMMI and EMMI events rather (I’ve had need to document the “travel” between these events) I would like to see a Typification of moving events that encompass more variety in scope. I’m not sure if the event name Travel (TRVL), Movement (MOVE), Transit (TRANS) have possibilities to encompass the concept. Typification could be, a) Immigration, b) Emigration, c) Vacation, d) Visitation, e) other/phrase
  • DNA. I’m not an expert in this area, but I would look to find a way to Typify the various options. One way could be:
1 DNA H
2 TYPE mtDNA
1 DNA R1
2 TYPE Y-DNA
  • National ID Number (IDNO) and Social Security Number (SSN) are both “types” from the same concept and should be combined using Typification!
  • Physical Description (DSCR) would be a great Attribute to use Typification to identify all values of an individual’s physical description, TYPE enumeration could include {height, weight, eye color, tattoos, skin color, hair color, scars, lost limbs, other/phrase}. Doing this would eliminate the need for adding a dozen attributes used by the military and other recording entities, as well as the cumbersome descriptive list suggested by v5.5.1 GEDCOM. For example:
1 DSCR Brown
2 TYPE Hair
  • National or Tribal Origin (NATI) would benefit from Typification. The definition includes: national origin or other folk, house, kindred, lineage, or tribal interest. Use example:
1 NATI Norwegian
2 TYPE National Origin
1 NATI <Tribal Name>
2 TYPE Tribe
1 NATI <Farm Name>
2 TYPE Farm 
1 NATI Atreides
2 TYPE House
1 NATI Mackenzie
2 TYPE Clan

Internationalization, cultural sensitivity, subject expertise

I think that an important goal for this discussion is the inclusion of more data-points in the area of “Internationalization, cultural sensitivity, and subject expertise”.
I’ve already indicated that I’m not an expert in DNA so my concepts above may not be 100% correct. Genealogy is a science that requires some levels of expertise from various cultural and historical perspectives. My interpretation of some of the constructs of past GEDCOM Standards are that cultural concepts in recording Genealogical data were out of the scope and expertise of the original designers. This is not to condemn them for any wrongdoing, I’ve been doing genealogical research for over 40 years in various European locations, but I can’t say that I know all or understand all concepts, let alone understand those of Asian, Middle Eastern, South American or Native cultures.
While researching information for others I’ve come across historical uses for terms that are not covered by the current GEDCOM and may be out of the scope of understanding by the “steering committee”. One example is in Switzerland, the so-called "Bürgerort" (Place of origin). We know through historical context that many cultures (Asian, Middle Eastern) also used “place of origin” to record census information and collect taxes. Individuals/Families were required to return to their ancestral home to be counted. These cultures will use different names for these, and Typification can be used to support/provide a single GEDCOM “Fact” to record the location, but also provide value to support each culture’s term. For example:

1 ORGN <location name>
2 TYPE Bürgerort

Individuals with more expertise in other cultures should be found to investigate and bring together concepts that are found in multiple cultures but are called different things. Enumeration of TYPE values will also help in the translation effort that many applications could use to expand their customer base. The application I use does a lot of tag translations to support multiple languages from around the world and enumerated TYPE would be a step forward.

Originally posted by @Norwegian-Sardines in #301 (comment)

@chronoplexsoftware
Copy link

chronoplexsoftware commented May 4, 2023

A combination of better typificationand new event / attribute tags would be a more appropriate move in our view. It would allow maximal resuse of existing code (making adoption easier), whilst providing an opportunity to address concerns with localization.

Typification
The problems with the current TYPE tag mainly relate to it being freeform text. It causes significant issues with localization (that is both different languages and different ways of describing the same event / attribute in the same language). Moreover, it makes it virtually impossible to interpret generic EVEN and FACT - as there is no deterministic way for an application interpret the TYPE. We can see no other way to address these concerns but to enumerate the TYPE.

New event / attribute tags
We see very limited benefits in over abstracting the events / attributes into the purest possible number of event / attribute tags (see #290 (comment)). Whilst we understand the desire to keep the specifcation concise, such a step would require significant rewrites of all event and attribute code in applications and would hinder adoption.

Yet, we also agree it would be cumbersome to add tags for every possible suggested event or attribute where TYPE could be used. Surely the extensive list of known extension event / attribute tags could be intelligently abstracted into a smaller subset of new or existing event / attribute tags with an enumerated TYPE e.g. MILT for military related data, DSCR for physical description related data.

@cdhorn
Copy link
Contributor

cdhorn commented Apr 30, 2024

@tychonievich the CompGen group has compiled an extensive summary of GEDCOM custom tags that also identifies the applications that make use them that is certainly worth reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants