As part of a request to check an XML file to make sure it was correctly formatted, I decided to leverage HTMLTidy within VBScript. The best article I found that referenced using the COM/ATL wrapper for the library. However, that article was for .NET and since I was working on a plugin for the RedDot CMS, I wanted to keep with using VBScript to accomplish the task. Also, I'm a bit of a novice at referencing COM components via VBscript, so it was a good learning experience overall.

  1. First off, you will need to download the COM/ATL wrapper from this location: http://users.rcn.com/creitzel/tidy/TidyATL.zip and extract this on your machine (or server)
  2. Then, you will need to register the TidyATL.dll file using regsvr32. The command I used was:
    regsvr32 /c c:\Tools\TidyATL.dll

Then, knowing very little about the API of this DLL and finding little online about it, I stumbled across this help page that describes using the OLE/COM: http://msdn.microsoft.com/en-us/library/d0kh9f4c(v=vs.80).aspx

Using the OLE/COM Object viewer I was able to locate the TidyDocument class that you register with the command above and get a look at the methods it exposed.

This gives us a listing of the methods available within the library, so from there with a little trial and error, you can begin to piece together how to use the library. I recommend creating a tidy config file when using the library because otherwise you have to use the SetOptValue method and then look up the integer for each of the tidy options which would get pretty annoying. Here's the config file I used just for basic HTML formatting:

// sample config file for HTML tidy
indent: auto
indent-spaces: 2
wrap: 72
markup: yes
output-xml: no
input-xml: no
show-warnings: yes
numeric-entities: yes
quote-marks: yes
quote-nbsp: yes
quote-ampersand: no
break-before-br: no
uppercase-tags: no
uppercase-attributes: no
char-encoding: latin1
new-inline-tags: cfif, cfelse, math, mroot, 
  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
  munder, mover, mmultiscripts, msup, msub, mtext,
  mprescripts, mtable, mtr, mtd, mth
new-blocklevel-tags: cfoutput, cfquery
new-empty-tags: cfelse

Then, here is the VBScript code to process a file using the library:

set TidyDocument=Server.CreateObject("Tidy.Document")
TidyDocument.LoadConfig(Server.MapPath("tidy.config"))
TidyDocument.SetErrorFile(Server.MapPath("errors.txt"))
If TidyDocument.ParseFile(Server.MapPath("file.htm")) < 0 Then
	Response.Write "Error parsing file, check errors file"
Else
	errCode = TidyDocument.CleanAndRepair()
	If errCode < 0 Then
		Response.Write "Error cleaning/repairing file, check errors file"
	Else
		Response.Write TidyDocument.SaveString()
	End If
End If

I wasn't able to get the OnMessage event wired up to an event handler in VBScript. I tried following the steps listed here: http://msdn.microsoft.com/en-us/library/ms974564 to no avail on my XP development machine. If you are reading this and have gotten the event handler function working for the TidyATL, please drop me a line in the comments.