Updated on 24 Jul 2011
By Anthony Buckner
4.1 MB Safe To Install
Advertising











Specifications
License:
OpenSource under LGPL
Updated:
Downloads:
574
Platform:
Windows All
Publisher:
Derrick Oswald(more)
Website:
Unknown
User Reviews:
Other versions:
Do you like
HTML Parser?
Publisher's Descriptions
Primarily used for transformation or extraction, it features filters, custom tags, visitors, and easy to use JavaBeans. HTML Parser is a robust, fast, and well tested package.

Welcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.

The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data).

In general, to use the HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, it's more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.

To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner.

The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes.

Extraction

Extraction encompasses all the information retrieval programs that are not meant to preserve the source page.

This covers uses like:
· text extraction, for use as input for text search engine databases for example
· link extraction, for crawling through web pages or harvesting email addresses
· screen scraping, for programmatic data input from web pages
· resource extraction, collecting images or sound
· a browser front end, the preliminary stage of page display
· link checking, ensuring links are valid
· site monitoring, checking for page differences beyond simplistic diffs

There are several facilities in the HTMLParser codebase to help with extraction, including filters, visitors and JavaBeans.

Transformation

Transformation includes all processing where the input and the output are HTML pages.

Some examples are:
· URL rewriting, modifying some or all links on a page
· site capture, moving content from the web to local disk
· censorship, removing offending words and phrases from pages
· HTML cleanup, correcting erroneous pages
· ad removal, excising URLs referencing advertising
· conversion to XML, moving existing web pages to XML

During or after reading in a page, operations on the nodes can accomplish many transformation tasks "in place", which can then be output with the toHtml() method. Depending on the purpose of your application, you will probably want to look into node decorators, visitors, or custom tags in conjunction with the PrototypicalNodeFactory.

Do you like HTML Parser

HTML Parser Disclamer
Please be aware FindMySoft.com accepts no responsibility for the file you are downloading. The same applies to the information provided about the software products listed.
We do not allow the inclusion of any HTML Parser serial, keygen or crack and we disclaim any liability for the inappropriate use of HTML Parser.
FindMySoft advises that HTML Parser should be only used in accordance with the rules of intellectual property and the existing Criminal Code.
For your own protection ALWAYS check downloaded files for viruses.
Advertising
Popular News
The MATE and Cinnamon editions of Linux Mint 13 Release Candidate have been rolled to the public.
Tell Drinkify what you're listening to and it will tell you what drink to get.
Your Rating
1.0
out
of
5
Rated By
1 Users
Top Downloads
Become A Fan!
Link To Us!
HTML Parser
HTML Linking Code
Latest Reviews