Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcode search #9

Merged
merged 53 commits into from
Oct 3, 2020
Merged

Shortcode search #9

merged 53 commits into from
Oct 3, 2020

Conversation

MyriaCore
Copy link
Contributor

@MyriaCore MyriaCore commented Jul 21, 2020

Closes #7. Blocked by #8 (since this PR includes commits from #8). Review blocks merge of #8.

While this MR is being reviewed, users can install my fork via Ulauncher's plugin install system, since the shortcode-search branch has been merged into master as of MyriaCore#2.

New Features / Changes

  • EmojiSpider.py now scrapes pages on emojipedia.org to retrieve shortcode information
  • Users can search shortcodes by beginning their search with :. Otherwise, CLDR names will be searched as per usual.
  • When searching for shortcodes, the shortcode that matched is displayed instead of the emoji's name
  • I sprinkled a few comments here and there to make the codebase a bit more readable 😄
  • Fixes skin tone bug by properly sourcing skin tone data

Technical Information

I've added a shortcode table to the emoji.sqlite database:
image

MyriaCore added 30 commits June 27, 2020 12:25
- Data looks different: `icon` changed to `icon_${style}`
- noto-emoji is now noto, since idk if dashes will be happy
- main.py should now get icon data from the correct place
- twemoji seems to work rly well w/ url naming scheme
- noto/blobmoji don't quote work as well
See discussion in commit 8115b76 for more information
Also, DB Updated with new specs thanks to EmojiSpider modifications! :D
- Fixed scraper's incorrect var refs
- Added Fallback Emoji Style Setting to alleviate emoji version differences
- Added System Emoji display setting to further alleviate emoji version differences
Also, gave up on trying to make the extension icon dynamic.
It's now hardcoded to twemoji
- Default Emoji style is now twemoji to support recognizability betwee the extension icon (also now twemoji)
- Default fallback is noto, because that works better with twemoji than apple IMO
If the EmojiSpider can't get the resource, it will not insert the path
of where the image would be into the database. This way, when we try to
figure out whether to use the fallback, we'll know based on whether or not
we get a None from row['icon_{ICON STYLE}']
@MyriaCore
Copy link
Contributor Author

Name-encoding bug

image

As you can see, the blood type emojis don't have any shortcodes associated. However, the shortcodes are on emojipedia, and should be present.

It appears this is happening because the CLDR Shortnames (as listed on unicode.org) use punctuation that isn't correctly url-encoded, so when the spider tries to access them, it accesses a dead link. The full name of 🅰️ is A button (blood type), which turns into a button (blood type), and then a-button-(blood-type) when URL-ized by name_to_shortcodes.

@MyriaCore
Copy link
Contributor Author

I'm not 100% sure on how I want UX to work here, but I had a few ideas. We'll use the 😄 emoji as an example:

A: How to tell if the user is searching normally or searching 'by shortcode'

  1. The user has a preferences setting that tells us this (this is current behavior)
  2. If the first character of the search is :, then we assume the user is searching by shortcode. Otherwise, we'll search by CLDR name.
    EX: The user types em :smile, and gets 😄 back. The user then types em grinning face with, then selects 😄, and then receives the intended emote.

B: What to display on the search listing when searching by shortcode

  1. The CLDR shortname should be displayed (this is current behavior)
  2. The shortname that matched the user's query should be displayed.

Right now, I'm partway through implementing a setup with A.1 + B.1, but I'm honestly liking A.2 + B.2 better. I just don't quite know how I'd implement B.2.

@MyriaCore
Copy link
Contributor Author

Implementing B.2 - Ideas

  1. 🏗️ Add a new table meant specifically for tracking shortcodes. Since skin_tone.name and emoji.name are used as unique identifiers, this new table might do the same. It'd have 2 columns in this case - shortcode.name and shortcode.code. This would allow a single emoji to have multiple entries - it'd just provide entries with different shortcode.codes that share the same shortcode.name.
  2. 🚧 ATM, the emoji shortcodes are stored as space-seperated values in a single entry. So, maybe they could be parsed out by main.py, and we could use whichever shortcode matches the query and just display that.

So, idea 1 would provide a more robust implementation of this feature, but idea 2 would be way easier, and we'd make less heavy architecture changes when doing so.

@MyriaCore
Copy link
Contributor Author

B.2 Implementation 1 - Searching the new DB structure

Here's the current query string that's used by main.py to search the DB when searching with shortcodes:

SELECT em.name, em.code, em.keywords,
       em.icon_apple, em.icon_twemoji, em.icon_noto, em.icon_blobmoji,
       skt.icon_apple AS skt_icon_apple, skt.icon_twemoji AS skt_icon_twemoji,
       skt.icon_noto AS skt_icon_noto, skt.icon_blobmoji AS skt_icon_blobmoji,
       skt.code AS skt_code
FROM emoji AS em
  LEFT JOIN skin_tone AS skt 
    ON skt.name = em.name AND tone = ?
WHERE shortcodes LIKE %?%
LIMIT 8

An individual entry retrieved w/ this query would return a shortcodes dictionary entry, which should be a space-seperated string of shortcodes.

With B.2 Impl 1's proposed changes, the search query would likely have to be something more like this:

SELECT em.name, em.code, em.keywords,
       em.icon_apple, em.icon_twemoji, em.icon_noto, em.icon_blobmoji,
       skt.icon_apple AS skt_icon_apple, skt.icon_twemoji AS skt_icon_twemoji,
       skt.icon_noto AS skt_icon_noto, skt.icon_blobmoji AS skt_icon_blobmoji,
       skt.code AS skt_code, sc.code AS shortcode
FROM emoji AS em
  LEFT JOIN skin_tone AS skt 
    ON skt.name = em.name AND tone = ?
  LEFT JOIN shortcode AS sc 
    ON sc.name = em.name
WHERE shortcode LIKE ?%
LIMIT 8

Not 100% if this actually works, since I had to familiarize myself with how LEFT JOIN works, but I'm pretty sure this should be all good.

@MyriaCore
Copy link
Contributor Author

At this point, I think I'm probably ready to draft A.2 + B.2.1 as a full solution.

@MyriaCore
Copy link
Contributor Author

I'm noticing something super funky in the DB. It basically has no data on skin tone variations. Not super sure why.

This bug seems like it might have been present since
I first updated this with the EmojiSpider, since
the page that it scrapes from doesn't even list
modifier emojis anymore
@gornostal
Copy link
Member

Hi @MyriaCore
I really appreciate your work!

I wasn't able to review your PRs yet. I hope I can find some time for that soon.

@MyriaCore
Copy link
Contributor Author

Thanks @gornostal! Please review #8 first, as this PR contains code from there.

@MyriaCore
Copy link
Contributor Author

At this point, unless I find some more bugs, the implementation is pretty much done. This PR should be ready for review.

@MyriaCore MyriaCore marked this pull request as ready for review July 23, 2020 20:52
@gornostal gornostal merged commit 3445b30 into Ulauncher:master Oct 3, 2020
@gornostal
Copy link
Member

Thanks a lot for these improvements and sorry it took so long!
Great work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Shortcode Search
2 participants