Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting alt text from HTML? #137

Closed
matatk opened this issue Jan 3, 2018 · 3 comments
Closed

Extracting alt text from HTML? #137

matatk opened this issue Jan 3, 2018 · 3 comments

Comments

@matatk
Copy link

matatk commented Jan 3, 2018

I've just been recommended textract and it looks very cool; thanks. I am trying to extract text from HTML and it works fine with things like

<div id="blah">Some Text</div>

but it doesn't pick up alt text, such as

<img src="..." alt="A text alternative for this image">

I was wondering if you have any plans for such a feature (or, if I've missed it, how to extract the values of attributes like this)?

@dbashford
Copy link
Owner

You haven't missed it, and no plans, but no reason it can't be introduced as a feature!

@matatk
Copy link
Author

matatk commented Mar 21, 2018

Hi, just got the closure notice for this issue; can see you have been working on it; thanks. I have been working on it too, and created a small library to extract alt and various other user-facing text attributes from HTML—I have been testing it for a while, which is why I didn't mention it here, though I probably should've, sorry.

If you'd like to use the code I wrote for extracting not just alt attributes but other user-facing text from HTML, then feel free (I can help with a PR too if you'd like).

@dbashford
Copy link
Owner

I can create another issue to track the other user-facing attributes and include them in the solution. Ought not be too hard to include most of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants