Insights and discoveries
from deep in the weeds
Outsharked

Wednesday, June 27, 2012

CsQuery Performance vs. Html Agility Pack and Fizzler

I put together some performance tests to compare CsQuery to the only practical alternative that I know of (Fizzler, an HtmlAgilityPack extension). I tested against three different documents:

  • The sizzle test document (about 11 k)
  • The wikipedia entry for "cheese" (about 170 k)
  • The single-page HTML 5 spec (about 6 megabytes)

The overall results are:

  • HAP is faster at loading the string of HTML into an object model. This makes sense, since I don't think Fizzler builds an index (or perhaps it builds only a relatively simple one). CsQuery takes anywhere from 1.1 to 2.6x longer to load the document. More on this below.
  • CsQuery is faster for almost everything else. Sometimes by factors of 10,000 or more. The one exception is the "*" selector, where sometimes Fizzler is faster. For all tests, the results are completely enumerated; this case just results in every node in the tree being enumerated. So this doesn't test the selection engine so much as the data structure.
  • CsQuery did a better job at returning the same results as a browser. Each of the selectors here was verified against the same document in Chrome using jQuery 1.7.2, and the numbers match those returned by CsQuery. This is probably because HtmlAgilityPack handles optional (missing) tags differently. Additionally, nth-child is not implemented completely in Fizzler - it only supports simple values (not formulae).

The most dramatic results are when running a selector of a single ID or a nonexistent ID in a large document. CsQuery returns the result (an empty set) over 100,000 times faster than Fizzler. This is almost certainly because it doesn't index on IDs; other selectors are much faster in Fizzler than this (though still substantially slower than CsQuery).

Size Matters

In the very small documents (the 11k sizzle test document) CsQuery still beats Fizzler, but by much less. The ID selector is still pretty substantial about 15-15x faster. For more complex selectors, the margin is just over 1x to 3x faster.

On the other hand, in very large documents, the edge that Fizzler has in loading the documents seems to mostly disappear. CsQuery is only about 10% slower at loading the 6 megabyte "large" document. This could be an opportunity for optimizing CsQuery - this seems to indicate that overhead just in creating a single document is dragging performance down. Or, it could be indicative of the makeup of the respective test documents. Maybe CsQuery does better with more elements, and Fizzler with more text - or vice versa.

You can see a detailed comparison of all the tests so far here in a google doc:

"FasterRatio" is how much faster the winner was than the loser. Yellow ones are CsQuery; red ones are Fizzler.

Red in the "Same" column means the two engines returned different results.

Try It Out

This output can be created directly from the CsQuery test project under "Performance."


CsQuery is a complete CSS selector engine and jQuery port for .NET4 and C#. It's on NuGet as CsQuery. For documentation and more information please see the GitHub repository and posts about CsQuery on this blog.

4 comments:

  1. Excellent comparison! Do you know of a similar test comparing changing of the source, not just querying it? For example, changing all images sources and replacing a string in all text elements? A test like that would be complimentary to this methinks! I would do it myself but I don't know how.

    ReplyDelete
  2. I haven't done a test like that, but I suspect HTML Agility Pack would be faster, for the same reasons it's faster when parsing: it doesn't have an index. So when CsQuery makes changes, it also has to update the index to reflect the new DOM. It would certainly be worthwhile though, because I don't even have a good sense for how fast CsQuery is when making substantive DOM changes.

    At the same time performance of making changes to the DOM is likely to be much faster than selector performance (especially for HTML Agility Pack) in general. That is - the reason HAP is so slow for selectors is because it doesn't use an index, so it basically has to seek through the entire node tree to locate matches. For small documents, no big deal, but as you can see it makes a huge difference on larger documents. But changing the DOM is a targeted operation that requires only changing a few pointers in the node you're moving, and perhaps updating references among siblings where it gets moved to. So chances are, for either HAP, even substantial modifications to the DOM will take less time than typical queries. For CsQuery, it might be a closer comparison, since queries are usually pretty optimized. In any event it is definitely something I'm interested in looking at, and will put together some tests whenever I have a chance.

    One other thing - this test is fairly outdated (I should update it) since the HTML parser in CsQuery has been completely replaced as of version 1.3, and many other changes have been made. Generally performance of CsQuery has improved since then, though. If you want to see how things fare now, you can run the tests yourself from the CsQuery.PerformanceTests project.

    ReplyDelete
  3. a Problem: HP Printer not connecting to my laptop.
    I had an issue while connecting my 2 year old HP printer to my brother's laptop that I had borrowed for starting my own business. I used a quick google search to fix the problem but that did not help me. I then decided to get professional help to solve my problem. After having received many quotations from various companies, i decided to go ahead with Online Tech Repair (www.onlinetechrepairs.com). Reasons I chose them over the others:
    1) They were extremely friendly and patient with me during my initial discussions and responded promptly to my request.
    2) Their prices were extremely reasonable.
    3) They were ready and willing to walk me through the entire process step by step and were on call with me till i got it fixed.
    How did they do it
    1) They first asked me to state my problem clearly and asked me a few questions. This was done to detect any physical connectivity issues with the printer.
    2) After having answered this, they confirmed that the printer and the laptop were functioning correctly.
    3) They then, asked me if they could access my laptop remotely to troubleshoot the problem and fix it. I agreed.
    4) One of the tech support executives accessed my laptop and started troubleshooting.
    5) I sat back and watched as the tech support executive was navigating my laptop to spot the issue. The issue was fixed.
    6) I was told that it was due to an older version of the driver that had been installed.
    My Experience
    I loved the entire friendly conversation that took place with them. They understood my needs clearly and acted upon the solution immediately. Being a technical noob, i sometimes find it difficult to communicate with tech support teams. It was a very different experience with the guys at Online Tech Repairs. You can check out their website www.onlinetechrepairs.com or call them on 1-914-613-3786.
    Would definitely recommend this service to anyone who needs help fixing their computers.
    Thanks a ton guys. Great Job....!!


    ReplyDelete
  4. VIRUS REMOVAL

    Is Your Computer Sluggish or Plagued With a Virus? – If So you Need Online Tech Repairs
    As a leader in online computer repair, Online Tech Repairs Inc has the experience to deliver professional system optimization and virus removal.Headquartered in Great Neck, New York our certified technicians have been providing online computer repair and virus removal for customers around the world since 2004.
    Our three step system is easy to use; and provides you a safe, unobtrusive, and cost effective alternative to your computer service needs. By using state-of-the-art technology our computer experts can diagnose, and repair your computer system through the internet, no matter where you are.
    Our technician will guide you through the installation of Online Tech Repair Inc secure software. This software allows your dedicated computer expert to see and operate your computer just as if he was in the room with you. That means you don't have to unplug everything and bring it to our shop, or have a stranger tramping through your home.
    From our remote location the Online Tech Repairs.com expert can handle any computer issue you want addressed, like:
    • - System Optimization
    • - How it works Software Installations or Upgrades
    • - How it works Virus Removal
    • - How it works Home Network Set-ups
    Just to name a few.
    If you are unsure of what the problem may be, that is okay. We can run a complete diagnostic on your system and fix the problems we encounter. When we are done our software is removed; leaving you with a safe, secure and properly functioning system. The whole process usually takes less than an hour. You probably couldn't even get your computer to your local repair shop that fast!
    Call us now for a FREE COMPUTER DIAGONISTIC using DISCOUNT CODE (otr214421@gmail.com) on +1-914-613-3786 or chat with us on www.onlinetechrepairs.com.

    ReplyDelete