This release includes significant revisions to the HTML parser to enhance compatibility with HTML5 parsing rules for optional opening and closing tags.
When optional closing tags are omitted, such as
</p>, CsQuery's HTML parser will use the HTML5 spec rules to determine when to insert a closing tag. When opening tags for required elements such as
tbody are omitted, the parser will generate the missing tags when parsing in document mode. This means you can expect a very high degree of compatibility between the HTML (and selections) generated by CsQuery, and the DOM rendered by web browsers, when valid HTML is passed.
The HTML5 spec also includes a set of rules for handling invalid markup. While the CsQuery parser usually makes pretty good decisions about how to handle bad HTML, and should be able to parse about anything, it doesn't yet comply with the "bad markup" part of the spec - just the "optional" handling part. Over time, though, I intend to continue improving the parser to comply with other parts of the spec as much as possible.
Because the HTML parser will generate tags now, it needs to understand context. If you're creating a fragment that's just supposed to be a building block, you obviously don't want it adding
body tags around your markup.
There are now three static methods for parsing HTML:
Create a content block
This method is meant to be used for complete HTML blocks that are not self-contained documents. Examples of this are a piece of content retrieved from a CMS, or a template. It should be used for anything that is a compete block, but is intended to be embedded in another document. Using this method, missing tags will be handled according to the HTML5 spec EXCEPT for adding the optional
bodytags. Additionally, any text found at the root of the markup will be wrapped in
spantags making it safe to insert into nodes that cannot have text directly as children.
Create a document.
This method creates a complete HTML document. If the
headtags are missing, they will be created. Stranded text nodes (e.g. outside of
body) will be moved inside the body. If you're parsing HTML from the web or from a file that's supposed to represent a complete HTML document, use this.
Create a fragment.
This method interprets the content as a true fragment that you can use for any purpose. No new elements will be created. The rules for optional closing tags are still honored -- to do otherwise would just result in the default handling for any broken/unclosed tag being used instead. But no optional tags like
tbodywill be generated even if they are expected to be found. This method is the default handling for creating HTML from a selector, e.g.
var html = dom["<div></div>"];
- The jQuery
:inputpseudoclass was added. It had been inadvertently omitted from prior versions.
- All selectors can include escaped characters now
- HTML parser permits all valid characters in class and attribute names. Previously, the : and . characters were stop characters.
CQobject's property indexer overloads now align with the
- Migrated all of the tests from Sizzle. (A few of the bugs fixed in this release were found as a result of implementing the Sizzle test suite).
- Issue #12: CSS class names being output in lowercase
- Issue #11:
:hiddenselector not selecting
- Issue #8: allow leading + and - signs in nth-child type equations
- Corrected a problem with some last-child selectors (found during Sizzle unit test migration, no bug report)
This release has also had some performance optimizations; nth-child type selectors in particular should be an order of magnitude faster as a result of caching the results of each calculation.
CsQuery is a complete CSS selector engine and jQuery port for .NET4 and C#. It's on NuGet as CsQuery. For documentation and more information please see the GitHub repository and posts about CsQuery on this blog.