The History of ATagParser

I've been working on, and using, different versions of ATagParser for my own internal use since 1998. Over the years, I've leaked a couple of limited versions to the public trying to gauge whether it would have an audience. When I first started writing ATagParser, I wanted a method that allowed me to turn off the parts of the process that I didn't need. I remember getting quite a bit of feedback on the Elements property (called Objects at the time) and its ability to tune the component on the fly by shutting off certain element types. The Elements property seemed a good way to optimize the parsing process by allowing the parser to return the information that was important and to ignore everything else. One of the side benefits of this approach is that the entire process becomes very dynamic. You can turn Elements on and off in each of the events making for a very capable tag parsing system.


What makes ATagParser different?

There are several things - in no particular order.

It handles all tag formats - not just HTML.

ATagParser makes exactly one pass on the Contents property and from that single pass determines everything that it needs to know in order to return the information quickly and accurately. Most other publicly available HTML parsers use large doses of POS and COPY. There is not a single instance of either POS or COPY in ATagParser.

Another thing that differentiates ATagParser is that it is event driven. Most parsers create either a parse tree, or document object model (DOM) of some sort that you can use after the parsing process is finished. I opted not to do this because those are slightly higher level processes and to be quite honest, a lot of people are confused by their usage. Instead, I opted to use events to pass objects containing all of the information for a specific type of data. By allowing you, the developer, to interact with the various events and Elements property, it becomes simple to write the higher level processes like trees and tag lists.

Along with the Elements property, and event driven architecture, are the various element objects that are passed to the events. Each of these element objects contain all of the information you need to determine what an element is, its type and its location in the document. In addition to this information, certain element objects allow you a glimpse backwards. For example, the TagElement passed via the OnTag event, and the TextElement passed via the OnText event, both contain a PreviousTag property and a PreviousText property. Knowing what came before the current tag, or text, can be very useful in a variety of situations.


Are there a lot of features in ATagParser?

Yes. And I've needed every feature in ATagParser at least once which is why they're still in there.


Will ATagParser eventually support Unicode?

Versions 2.03+ supports UTF-8, UTF-16LE and UTF-16BE, but not specific encodings.
See ATagParser's Unicode Support for more information.


Why the name - ATagParser?

Why not? I realize that it doesn't fit with the Delphi component naming scheme, but it does fit what it does. It is A - Tag - Parser. I couldn't call it HTML Parser, or HTML Tag Parser because that's not the only kind of tag that it's capable of parsing.


It's as easy to use, flexible, durable and accurate as anything on the market.


All of the demo programs do something useful and show the various ways of interacting with ATagParser's objects and events. Tag Explorer is the primary demo and can be used to shread ASP, HTML, XHTML and other tag based documents. I use it exclusively when making changes to the ATagParser source code to make sure the results of any changes are correct.

Well, there it is. I hope you enjoy using ATagParser and that it does everything you need it to do..thanks!


John E. McTaggart





Copyright 2000 - 2006 John E McTaggart - All rights reserved worldwide