Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic types from Fast Text Analysis repository #38

Open
3 tasks
ivbeg opened this issue Jun 28, 2022 · 0 comments
Open
3 tasks

Add semantic types from Fast Text Analysis repository #38

ivbeg opened this issue Jun 28, 2022 · 0 comments
Assignees
Labels
improve semantic types Add/change/remove semantic types (entities)

Comments

@ivbeg
Copy link
Collaborator

ivbeg commented Jun 28, 2022

Original repository by Tim Segall https://github.com/tsegall/fta

  • Review and append missing semantic types to metacrafter registry
  • Add mapping table of FTA semantic types to registry identifiers
  • Add fta (Fast Text Analysis) to list of the tools

List 1. List of semantic types provided by Tim Segall

Semantic Type	Description	Locale
AIRPORT_CODE.IATA	IATA Airport Code	*
CHECKDIGIT.ABA	ABA Number (or Routing Transit Number (RTN))	*
CHECKDIGIT.CUSIP	North American Security Identifier	*
CHECKDIGIT.EAN13	EAN-13 Check digit (also UPC and ISBN-13)	*
CHECKDIGIT.IBAN	International Bank Account Number	*
CHECKDIGIT.ISBN	ISBN-13 identifiers (with hyphens)	*
CHECKDIGIT.ISIN	International Securities Identification Number	*
CHECKDIGIT.LUHN	Digit String that has a valid Luhn Check digit (and length between 8 and 30 inclusive)	*
CHECKDIGIT.SEDOL	UK/Ireland Security Identifier	*
CHECKDIGIT.UPC	Universal Product Code	*
CITY	City/Town	en
COLOR.HEX	Hex Color code	*
COMPANY_NAME	Company Name	en
CONTINENT.CODE_EN	Continent Code	en
CONTINENT.TEXT_EN	Continent Name	en
COORDINATE.LATITUDE_DECIMAL	Latitude (Decimal degrees)	*
COORDINATE.LONGITUDE_DECIMAL	Longitude (Decimal degrees)	*
COORDINATE.LATITUDE_DMS	Latitude (degrees/minutes/seconds)	*
COORDINATE.LONGITUDE_DMS	Longitude (degrees/minutes/seconds)	*
COORDINATE.EASTING	Coordinate - Easting	*
COORDINATE.NORTHING	Coordinate - Northing	*
COORDINATE_PAIR.DECIMAL	Coordinate Pair (Decimal degrees)	*
COUNTRY.ISO-3166-2	Country as defined by ISO 3166 - Alpha 2	*
COUNTRY.ISO-3166-3	Country as defined by ISO 3166 - Alpha 3	*
COUNTRY.TEXT_	Country as a string	de, en
CREDIT_CARD_TYPE	Type of Credit CARD - e.g. AMEX, VISA, ...	*
CURRENCY_CODE.ISO-4217	Currency as defined by ISO 4217	*
CURRENCY.TEXT_EN	Currency Name	en
DAY.DIGITS	Day represented as a number (1-31)	*
DAY.ABBR_	Day of Week Abbreviation  = Locale, e.g. en-US for English language in US	Current Locale
DAY.FULL_	Full Day of Week name  = Locale, e.g. en-US for English language in US	Current Locale
EMAIL	Email Address	*
EPOCH.MILLISECONDS	Unix Epoch (Timestamp) - milliseconds	*
EPOCH.NANOSECONDS	Unix Epoch (Timestamp) - nanoseconds	*
FREE_TEXT	Free Text field - e.g. Description, Notes, Comments, ...	de, en, fr
GENDER.TEXT_	Gender	bg, ca, de, en, es, fi, fr, hr, it, ja, ms, nl, pl, pt, ro, sv, tr, zh
GUID	Globally Unique Identifier, e.g. 30DD879E-FE2F-11DB-8314-9800310C9A67	*
HASH.SHA1_HEX	SHA1 Hash - hexadecimal	*
HASH.SHA256_HEX	SHA256 Hash - hexadecimal	*
HONORIFIC_EN	Title (English language)	en
IDENTITY.AADHAR_IN	Aadhar	en-IN, hi-IN
IDENTITY.DUNS	Data Universal Numbering System (Dun & Bradstreet)	*
IDENTITY.EIN_US	Employer Identification Number	en-US
IDENTITY.NHS_UK	NHS Number	en-UK
IDENTITY.SSN_FR	Social Security Number (France)	fr-FR
IDENTITY.SSN_CH	AVH Number / SSN (Switzerland)	de-CH, fr-CH, it-CH
IDENTITY.INDIVIDUAL_NUMBER_JA	Individual Number / My Number (Japan)	ja
INDUSTRY_EN	Industry Name	en
IPADDRESS.IPV4	IP V4 Address	*
IPADDRESS.IPV6	IP V6 Address	*
JOB_TITLE_EN	Job Title	en
LANGUAGE.ISO-639-2	Language code - ISO 639, two character	*
LANGUAGE.TEXT_EN	Language name, e.g. English, French, ...	en
MACADDRESS	MAC Address	*
MONTH.ABBR_	Month Abbreviation  = Locale, for example, en-US for English language in US	Current Locale
MONTH.DIGITS	Month represented as a number (1-12)	*
MONTH.FULL_	Full Month name  = Locale, for example, en-US for English language in US	Current Locale
NAME.FIRST	First Name	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NAME.FIRST_LAST	Merged Name (First Last)	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NAME.LAST	Last Name	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NAME.LAST_FIRST	Merged Name (Last, First)	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NAME.MIDDLE	Middle Name	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NAME.MIDDLE_INITIAL	Middle Initial	br, de, do, en, es, fr, gt, mx, nl, pr, pt
NATIONALITY_EN	Nationality	en
PERSON.AGE	Age (Person)	en, es, fr, es, it, pt
POSTAL_CODE.POSTAL_CODE_	Postal Code	AU, BG, CA, FR, JA, NL, UK, ES, MX, PT, SE, UY
POSTAL_CODE.ZIP5_US	Postal Code	en-CA, en-US
POSTAL_CODE.ZIP5_PLUS4_US	Postal Code + 4	en-CA, en-US
SSN	Social Security Number (US)	en-US
STATE_PROVINCE.COMMUNE_IT	Italian Commune	it-IT
STATE_PROVINCE.COUNTY_	County	en-UK, en-US, hu-HU
STATE_PROVINCE.DISTRICT_NAME_PT	Portuguese District Name	pt-PT
STATE_PROVINCE.MUNICIPALITY_BR	Brazilian Municipality	pt-BR
STATE_PROVINCE.STATE_	State Code	en-AU, pt-BR, es-MX, en-US
STATE_PROVINCE.STATE_NAME_	State Name	en-AU, pt-BR, de-DE, es-MX, en-US
STATE_PROVINCE.STATE_PROVINCE_NA	US State Code/Canadian Province Code/Mexican State Code	en-CA, en-US, es-MX
STATE_PROVINCE.PROVINCE_CA	Canadian Province Code	en-CA, en-US
STATE_PROVINCE.PROVINCE_IT	Italian Province Code	it-IT
STATE_PROVINCE.PROVINCE_ZA	South African Province Code	en-ZA
STATE_PROVINCE.PROVINCE_NAME_CA	Canadian Province Name	en-CA, en-US
STATE_PROVINCE.PROVINCE_NAME_IT	Italian Province Name	it-IT
STATE_PROVINCE.PROVINCE_NAME_ES	Spanish Province Name	es-ES
STATE_PROVINCE.PROVINCE_NAME_NL	Dutch Province Name	nl-NL
STATE_PROVINCE.PROVINCE_NAME_ZA	South African Province Name	en-ZA
STATE_PROVINCE.STATE_PROVINCE_NAME_NA	US State Name/Canadian Province Name	en-CA, en-US, es-MX
STATE_PROVINCE.DEPARTMENT_FR	French Department Name	fr-FR
STATE_PROVINCE.REGION_FR	French Region Name	fr-FR
STATE_PROVINCE.CANTON_CH	Swiss Canton Code	de-CH, fr-CH, it-CH
STATE_PROVINCE.CANTON_NAME_CH	Swiss Canton Name	de-CH, fr-CH, it-CH
STATE_PROVINCE.PREFECTURE_NAME_JP	Japanese Prefecture Name	ja
STREET_ADDRESS_EN	Street Address (English Language)	en
STREET_ADDRESS2_EN	Street Address - Line 2 (English Language)	en
STREET_MARKER_EN	Street Suffix (English Language)	en
TELEPHONE	Telephone Number (Generic)	*
TIMEZONE.IANA	IANA Time Zone (Olson)	*
URI.URL	URL - see RFC 3986	*
VIN	Vehicle Identification Number
@ivbeg ivbeg added the improve semantic types Add/change/remove semantic types (entities) label Jun 28, 2022
@ivbeg ivbeg self-assigned this Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improve semantic types Add/change/remove semantic types (entities)
Development

No branches or pull requests

1 participant