Home
Products
Forums
Help
Publish Article
HTMLParser.Net
 
.Net HTML Parser - Parse your HTML content with speed and ease

What is HTMLParser.Net?

HTMLParser.Net is a .Net library built on codebase of popular javabased HTMLParser available on sourceforge.net. If you are building applications that involve screen scrapping of HTML pages or data extraction from the web sites, then you definitely want to have a tool like HTMLParser.Net in your arsenal. Parsing of a page is as simple as writing 4 lines of code and you are on your way home. And if you want to little bit more creative with your parsing and query of results, then the API offer more advanced features that are easy to use.

Community Edition

We offer a community edition of the library for free download. This edition has all the features of professional edition version except support for mime types likes PDF, MS office documents, Xml etc. and multithreaded crawling capabilities. But if your needs are limited to text/html mime type then this is a great library to keep in your tool chest.

Features

Feature list of the API includes

  • You can use it with any .Net language (C#,VB.Net,J# etc.)
  • Parses almost all the HTML tags and allows you to search based on tag types, attribute values or regular expression search in the content. There were some tags that were not supported by javabased HTMLParser project. We have included those in this release.
  • Set of extensible filters that allows you to filter the content that you do not want to include in your analysis.
  • High level APIs that allow you to get answers to common questions like, What are outbound links in the page, What are images in the page, What are different tables on the page, Are there any broken links on the page and much more.
  • A configuration file based Http protocol engine that extracts the content from the URL that you specified. The crawler follows the instructions in robots.txt file of that site and does not get the content if site blocks that page.
  • Http protocol engine is fully capable of handling compressed response sent from any site. it accepts gzip, x-zip and deflate mime types.


Commercial Use Of Parser

Following are the links for some of the web scaper APIs and applications that we have built for our clients. This is not a complete list. These examples are here to show you some of the real life uses of our parser.

Releases

V4.0(Pro Editio) 6/1/2010
  • Upgraded to .Net 4.0 framework
  • Fixed assembly attributes to comply with .Net 4.0 security changes
  • Bug fixes and optmization changes
V3.5(Pro Edition) 2/28/2009
  • Added ability to specify parser configuration per instance of object. This made multi-threaded use of the library possible.
  • Optimized code for .Net3.0 and higher
  • Enabled use of Proxy server settings
V3.2 Release (Pro and Lite Edition) 9/10/2007
  • Added new filters in API
  • Fixed bugs in parser that were fixed in its counter part java version
V3.1 Release (Pro and Lite Edition) 9/15/2006
  • All bug fixes from Pro version has been rolled into Lite version
  • New filters have been added
  • Ability to override configuration settings at run time
V3.0.1 Release (Pro Edition) - 8/22/2006
  • Added new XOR filter as released in V1.6 java library
  • Added new NodeTreeWalker class as released in V1.6 of java library
  • Added support to override some settings in configuration file at run time.
V2.1.42 Release (Pro Edition) - 8/14/2006
  • Fixed the bug when empty request attributes were supplied and request URL was constructed wrong
  • Added cookie container support
V2.1.41 Release (Pro Edition) - 7/25/2006
  • Added capability to parse "table" on page and create a DataTable object from it.
V2.1.39 Release (Pro Edition) - 7/1/2006
  • Professional Version released.
  • Added capability to specify request parameters for POST or GET type of requests
  • Full capability to parse PDF, MS Office documents
  • Full capability to handle compressed response from server
  • New APIs added that facilitate development of multi-threaded and scalable document crawler.

Community Edition

V1.8.0 Release - 8/21/2006
  • Added 2 new filters as requested by lot of users
V1.7.3.0 Release - 3/6/2006
  • Tag name change for link tag to "ATag".
  • Bug fixed for issue when charset was switched to some value that was not understood by framework.
Community Edition - Release
  • Release community edition of parser. This is a free download for educational as well as commercial use
V1.6.13.0 Release - 2/10/2006
  • Bug fix where a string could not be used source for parsing
V1.6.12.0 Release - 1/16/2006
  • Added capability to ignore robots.txt settings
  • Added capability to delay fetching of pages.
  • Bug fixes
V1.6.8.0 Release - 1/1/2006
  • Added new APIs to analyze a page.
  • Performance enhancements and bug fixes.
V1.6.5.0 Release - 12/30/2005
  • Added handling of deflate content-encoding to Http protocol engine.
  • Minor bug fixing
V1.6.3.0 Release - 12/22/2005
  • First public release of the library

Earn Cash for Completing Surveys Online!
Go Freelance
Home     About us     Contact us    Copyright    Privacy Policy    Return Policy    Advertisers
Copyright © Netomatix