Skip to content

z505/fast-html-parser

Repository files navigation

Fast HTML Parser

HTML Parser for FPC and Delphi originally written by Jazarsoft

  • Modified for use as a pure command line unit (no dialogs) for freepascal.
  • Also added UPPERCASE tags so that when you check for i.e. it returns all tags like < FONT > and < FoNt > and < font >

Versions

Revision 18 is Version 1 of this tool

After revision 18 version 2 of the tool is being worked on with more object methods to access elements by Name or ID for example just like a DOM.

Todo

  • keep the entire HTML file in an array for later usage: htmltags[] and text[]
  • parse like this: OnSection(opentag, text, closetag); as a different parser kind so that globals are not needed to keep track of InTag booleans, etc. so that all are together, tag, text, closing tag, in the same procedure
  • associate a number (open tag) with the text label using a record or such i.e. < body > < b >some text< / b >< / body > where < b > is tag "2" and some text is text "1"
  • turn into a DLL using FPC or C so that other languages can use a callback to parse html fast in that language (i.e. golang, python, etc.)

Use this parser for what reasons:

  • make your own web browsers,
  • make your own text copies of web pages for caching purposes
  • Grab content from websites -without- using regular expressions
  • Seems to be MUCH MUCH FASTER than regular expressions, as it is after all a true parser
  • convert website tables into spreadsheets (parse TD and TR, turn in to CSV or similar)
  • convert websites into txt files
  • convert website tables into CSV/Database (parse TD and TR)
  • find certain info from a web page.. i.e. all the bold text or hyperlinks in a page.
  • Parse websites remotely from a CGI app using something like Sockets or Synapse and SynWrap to first get the HTML site. This would allow you to dynamically parse info from websites and display data on your site in real time.
  • HTML editor.. WYSIWYG or a partial WYSIWYG editor. Ambitious, but possible.
  • HTML property editor. Not completely wysiwyg but ability to edit proprties of tags. Work would need to be done to parse each property in a tag.

About

Fast HTML Parser for FPC and Delphi

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages