Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do TagUI visual automation using OCR #152

Closed
kensoh opened this issue May 6, 2018 · 5 comments
Closed

How to do TagUI visual automation using OCR #152

kensoh opened this issue May 6, 2018 · 5 comments
Labels

Comments

@kensoh
Copy link
Member

kensoh commented May 6, 2018

Raising this issue on behalf of user email so that other users can benefit -


I planning to make a tagui script in recognizing a receipt and was wondering is there tutorials in using visual automation OCR ?

@kensoh kensoh added the query label May 6, 2018
@kensoh
Copy link
Member Author

kensoh commented May 6, 2018

This part of homepage would cover on the setup and use of the visual recognition - https://github.com/kelaberetiv/TagUI#visual-automation

In particular, you probably use something like below -

dclick image_of_pdf_icon.png
wait 5 seconds
read page.png to pdf_contents
echo pdf_contents

Breakdown of what the lines above do -

  1. double click on the desktop icon of the pdf file to be opened. note that dclick step is still in cutting edge version. If you have installed TagUI from the packaged release, you can download the cutting edge version to overwrite your local installation.

  2. wait for a few seconds to make sure pdf file has been opened in pdf viewer on your laptop

  3. the read step captures info from elements to a variable. if used with .png or .bmp as the element, it will look for an element on the screen instead of a webpage. using page.png tells TagUI to scan through the whole screen for text and saves the OCR text captured to variable pdf_contents

  4. do your post-processing to extract whatever info you want from the receipt, using standard JavaScript string manipulation code. after that you can echo or do the other steps with that information. My favorite reference for JS is https://www.w3schools.com/js/default.asp (I usually search for something on google but it almost always goes to this w3 schools).

Note that the OCR is based on Tesseract which is the standard open-source library for OCR. I think the best state of the art is still lousy for hand-written text. But for printed fonts it should do fairly ok. Try above out and lemme know if you run into issues!

@kensoh kensoh closed this as completed May 6, 2018
@adegard
Copy link
Contributor

adegard commented May 6, 2018

Hi Mr Ken Soh,
Thank for your great integration with Sikuli with Tagui.
I'm a user of AutoHotKey also and frequently use it to automate routine where css doesn't give possibilities to see button etc... but you notice in the [https://github.com/kelaberetiv/TagUI#visual-automation] paragraph that "A screen (real or Xvfb) is needed for visual automation. "
I'm curious and ask you if it's not too much for you to explain a little bit more how to launch a "virtual screen" trough Tagui using Xvfb...

Thanks again for your great product!

@kensoh
Copy link
Member Author

kensoh commented May 6, 2018

Hi @adegard Xvfb is X Virtual Frame Buffer. It is a virtual display buffer to be used on Linux that has no display screen / monitors. An example would be running on VM (virtual machine) that does not have display. To set up, you can look at the link above and search for setup instructions for your version of Linux. I have not tried setting up Xvfb before but I have heard from people who used it for other purposes that it can be a pain to configure to work correctly.

For the integration with AutoHotKey or RoroScript, there are more details here - #113 I aim to integrate them on a better level. Right now, you can write AHK scripts and trigger them with the TagUI run step. Am thinking over how to make the integration better.

Thanks for your encouragement! Thanks to AI Singapore who hired me and continue working on this project as part of my job. Otherwise, frankly, it is hard in 2018 open-source world to maintain an open-source project on your own part-time. If a project becomes successful, the number of users will be too much to support as a part-time maintainer.

@adegard
Copy link
Contributor

adegard commented May 7, 2018

Thank you @kensoh for your answer. I'm a beginner devlopper so virtual display seems to me a little bit complicate for now...

Aboout AutoHotKey, I would like to make a little tool for editing Tagui script if it is possible... Can I share with you a repository to work on it? it is a little menu to remember mains comands in english (activated by using crtl+left click), it is not completed, but I could share it to you and other user: https://github.com/adegard/tagui_scripts

I read about AI Singapore and other blogs on RPA, it seems that for beginners UIpath is a bit complicate and RPA express too much big program to install... So in my opinion Tagui is a very good alternative, simple and leight. Please continue your project, even it's so hard to maintain ;-)

@kensoh
Copy link
Member Author

kensoh commented May 9, 2018

Re-posted here for more visibility - as it is related to AHK and that issue is still open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants