Insights and discoveries
from deep in the weeds
Outsharked

Tuesday, June 26, 2012

CsQuery 1.1.2 Released

CsQuery 1.1.2 has been released. You can get it from NuGet or from the source repository on GitHub.

New features

This release includes significant revisions to the HTML parser to enhance compatibility with HTML5 parsing rules for optional opening and closing tags.

When optional closing tags are omitted, such as </p>, CsQuery's HTML parser will use the HTML5 spec rules to determine when to insert a closing tag. When opening tags for required elements such as head and tbody are omitted, the parser will generate the missing tags when parsing in document mode. This means you can expect a very high degree of compatibility between the HTML (and selections) generated by CsQuery, and the DOM rendered by web browsers, when valid HTML is passed.

The HTML5 spec also includes a set of rules for handling invalid markup. While the CsQuery parser usually makes pretty good decisions about how to handle bad HTML, and should be able to parse about anything, it doesn't yet comply with the "bad markup" part of the spec - just the "optional" handling part. Over time, though, I intend to continue improving the parser to comply with other parts of the spec as much as possible.

API Change

Because the HTML parser will generate tags now, it needs to understand context. If you're creating a fragment that's just supposed to be a building block, you obviously don't want it adding html and body tags around your markup.

There are now three static methods for parsing HTML:

  • CQ.Create(..)
    Create a content block

    This method is meant to be used for complete HTML blocks that are not self-contained documents. Examples of this are a piece of content retrieved from a CMS, or a template. It should be used for anything that is a compete block, but is intended to be embedded in another document. Using this method, missing tags will be handled according to the HTML5 spec EXCEPT for adding the optional html and body tags. Additionally, any text found at the root of the markup will be wrapped in span tags making it safe to insert into nodes that cannot have text directly as children.

  • CQ.CreateDocument(..)
    Create a document.

    This method creates a complete HTML document. If the html, body or head tags are missing, they will be created. Stranded text nodes (e.g. outside of body) will be moved inside the body. If you're parsing HTML from the web or from a file that's supposed to represent a complete HTML document, use this.

  • CQ.CreateFragment(..)
    Create a fragment.

    This method interprets the content as a true fragment that you can use for any purpose. No new elements will be created. The rules for optional closing tags are still honored -- to do otherwise would just result in the default handling for any broken/unclosed tag being used instead. But no optional tags like tbody will be generated even if they are expected to be found. This method is the default handling for creating HTML from a selector, e.g.
    var html = dom["<div></div>"];

Other Enhancements

  • The jQuery :input pseudoclass was added. It had been inadvertently omitted from prior versions.
  • All selectors can include escaped characters now
  • HTML parser permits all valid characters in class and attribute names. Previously, the : and . characters were stop characters.
  • The CQ object's property indexer overloads now align with the Select method overloads.
  • Migrated all of the tests from Sizzle. (A few of the bugs fixed in this release were found as a result of implementing the Sizzle test suite).

Bug Fixes

  • Issue #12: CSS class names being output in lowercase
  • Issue #11: :hidden selector not selecting input[type=hidden]
  • Issue #8: allow leading + and - signs in nth-child type equations
  • Corrected a problem with some last-child selectors (found during Sizzle unit test migration, no bug report)

This release has also had some performance optimizations; nth-child type selectors in particular should be an order of magnitude faster as a result of caching the results of each calculation.


CsQuery is a complete CSS selector engine and jQuery port for .NET4 and C#. It's on NuGet as CsQuery. For documentation and more information please see the GitHub repository and posts about CsQuery on this blog.

No comments:

Post a Comment