Outsharked: November 2011

Insights and discoveries
from deep in the weeds

Outsharked

Tuesday, November 29, 2011

CsQuery 1.0 is imminent

In the last four months I've done a lot of work on CsQuery - on github - a C# jQuery port. I have been using it extensively in a few web site projects and it's quite solid. I've ported most of the jQuery tests that are relevant (dom manipulation, traversing, selection, attributes, utility functions).

Rather than update the list of implemented methods, I've compiled a list of the methods that still remain to be implemented. There are not many. :) Everything else that's not in CsQuery already is browser-DOM specific (e.g. related to events, callbacks, etc.) or is a utility function that I don't think is useful in C#.

jQuery Methods NOT Implemented In CsQuery

        Detach
        Empty
        NextAll
        NextUntil
        End
        WrapAll
        WrapInner
        ParentsUntil
        NextUntil
        OffsetParent
        PrevAll
        PrevUntil
        Prepend
        PrependTo
        Slice
        jquery.Contains
        jquery.Grep

.. plus a few CSS selectors. Additionally, there is extensive support for dynamic/Expando objects using a special JsObject class, and CsQuery.Extend (which works pretty much as you would expect). Though anything that implements IDictionary<string,object> can be used as the target for object creation methods. This lets you work with objects in JSON form, or dynamic objects, almost seamlessly, e.g.:

// Create a new dom from a string of html

var myDom = CsQuery.Create(html);

// "AttrSet" and "CssSet" are the same as Attr(object) and Set(object) - since in C# we can't
//  overload return types. Attr(string) and Css(string) return the values of named items in 
//  CsQuery. This convention is used for methods that can be passed a string of JSON data.

myDom.Select("div.sidebar")
    .AddClass("courier")
    .CssSet("{'border': '1px solid black', 'font-weight': 'bold'}");

// create a new anonymous object. You can also use any conventional object or expando object
// as a source parameter in CsQuery.Extend

var data = new { pageName="My Home Page", url="/myhomepage.html"};

// "null" below is a convention for the empty object {}. You can also pass a new expando object,
// this isjust shorthand. The parameters match jQuery.extend. This merges the properties of data,
// and the object created from the JSON string passed. There's also a CsQuery.ParseJSON method 
// for explicitly creating a new expando object from JSON. Finally, CsQuery.Extend will work
// with conventional objects as the target (first parameter). In this case, it will only update
// existing properties with the new data, since you can't add properties to an existing non-expando
// object.

dynamic dataExtended = CsQuery.Extend(null,data,"{ 'access':'all' }");
myDom.Data("page",dataExtended).Hide();
myDom.RenderSelection();

// outputs: 
//   <div class="sidebar courier" style="border: 1px solid black; font-weight: bold; display: none;"
//       data-page='{"pageName": "My Home Page", "url": "/myhomepage.html", "access": "all" }'>
//   </div>

There are still some other features I want to implement, but I am hoping to get some examples together and create a version 1.0 distribution in the next month or so. The code is solid and well tested, and it makes server-side HTML management a joy compared to WebControls, Razor/HTML helpers, and so on, where you have limited control over server-side HTML layout. And your brain can work with HTML exactly the same way on the server as it needs to on the client. Your whole browser DOM is right in front of you. It's great for scraping too.

I have not done extensive performance testing, but have done a little. It's easily fast enough for real-time HTML parsing. Of course, if you plan to use it on something serving a thousand pages a second, this might matter, but I suspect most people would find it plenty fast. On my laptop, it can parse a 5 megabyte HTML file with over 100,000 unique nodes (the entire HTML 5 spec) into an indexed DOM in 2.5 seconds. Selecting all the DIVs (over 3,300) takes less than 1/100th of a second. Now - 2.5 seconds is an eternity for a web server, but this is meant to be an unrealistic situation, and there would be little reason to parse a big page of static HTML that you had no intention of manipulating. A web page that's 20K, which is more typical, would be less than 1/100th of a second. There's definitely room to make it faster, too, but it's plenty fast now, and I suspect it's a lot faster than manipulating and rendering a page with something like WebControls anyway.

Features that I still want to add:

Asynchronous HTTP gets - right now when using CsQuery.Server().CreateFromUrl() to load a DOM from the web, code execution is blocked while the get is performed. This is probably fine for some basic web scraping, but will slow things down a lot for any substantive real-time usage. I started coding for an async model but have not finished yet.
Form postback management - there's a basic tool for repopulating form elements from their postback data in the Server() module. This needs to be fleshed out and tested a bit, though, because I have not used it too much as I haven't created a lot of conventional HTML forms lately.
Framework and view engine - I've developed a useful, simple framework as part of one project. This includes some custom HTML tags like <csq-include src="..." />, <csq-when [conditions]>...</csq-when> to do things like server-side includes, environment-specific includes, and so on. These are not really specific to CsQuery but rather CsQuery is used to implement them, and they make working with pure HTML a lot easier.
Templates - something like the jQuery template plugin. Of course it's a piece of cake to write CsQuery code to do simple substitutions, but it would be nice to integrate some of that functionality into a framework.
Client script communication - one of the things that CsQuery makes very convenient is preconfiguring data for client-side controls. For example, say you have a grid control. A typical usage might be to initialize the control with an ajax request upon first page load. This causes the page to be rendered with no data at first, then perhaps an ajax loader shown to the user while it gets the default data. Why not pass the first batch of data directly to the control? It's easy to use CsQuery.Data() to pass data as an attribute of an HTML element, then in your javascript, just grab it with jQuery.Data(). This requires using some HTML element as a payload container. Not a big deal, but I would like to standardize this convention and create methods to abstract it.

Anyway, it's getting close, but feel free to download the project from github and give it a try. The basic usage could not be simpler.

var myDom = CsQuery.Create(htmlString);
var content = myDom["#maincontent > div.title"];
var newContent = myDom["Hello world!"].Css("font-weight","bold");
content.Append(newContent);
Response.Write(content.Render());

Thursday, November 10, 2011

How to run NUnit tests in Visual Studio 2010/MSTest

There are probably millions of lines of test code written against NUnit, and most people take no joy in switching to MSTest just so they can use Visual Studio's IDE. There is an extension called Visual NUnit which adds some support. This is actually a really nice, solid extension, but unfortunately, it doesn't solve the most basic problem: being able to debug tests from directly within your project.

MSTest is unfortunately closed, sealed, locked. It's virtually impossible to extend it. But there is a quick and dirty way to get around this problem and have the best of both worlds: NUnit as a testing framework, but still have the ability to run (and debug!) tests from within VS. And you don't have to sacrifice anything - they will still work in any NUnit test runner.

Step 1: Assert Your Independence

Add both testing framework namespaces:
using Microsoft.VisualStudio.TestTools.UnitTesting; using NUnit.Framework;
Create aliases in your namespace declarations:
using Assert = NUnit.Framework.Assert; using CollectionAssert = NUnit.Framework.CollectionAssert; using StringAssert = NUnit.Framework.StringAssert;
This causes the Assert references to unambiguously refer to the NUnit version. That covers the objects, then there are all the attributes used to mark things for the framework. Luckily, most attributes do not conflict. Description is an exception, you'll have to pick a framework if you use this attribute, e.g. :
using Description = NUnit.Framework.DescriptionAttribute;
will cause all appearances of Description to be recognized only in NUnit.

Step 2: Search & Replace

You need to add all the corresponding MSTest attributes to get the IDE runner to recognize things. Just add both attributes to each class/method, e.g. [TestClass,TestFixture]

Function	NUnit	Microsoft
Run tests in this class at start (class-level attribute)	[SetupFixture]	[AssemblyInitialize] [AssemblyCleanup]
(SetupFixture & AssemblyIntialize/AssemblyCleanup work differently - in NUnit a class is marked as [SetupFixture] and has [Setup] and [TearDown] methods. In MS, these just apply to any static methods, with a limit of one each per namespace)
Run once at start per class	[TestFixtureSetUp]	[ClassInitialize]
Run once at end per class	[TestFixtureTearDown]	[ClassCleanup]
Identify a class containing tests	[TestFixture]	[TestClass]
Run before each test	[Setup]	[TestInitialize]
Run after each test	[Cleanup]	[TestCleanup]
... and, of course, A Test	[Test]	[TestMethod]

Step 3: Instance Setup/Teardown

There are some other differences. The setup types for MS must all be static methods, whereas NUnit allows them to also be instance methods. Recoding everything to use static methods is a headache, so I just do this instead. Chances are, your units tests already inherit from some other class. If so, just change your template class. If not, add one. To fake the NUnit instance startup/teardown methods, just use the constructor and destructor of your base class:

public class Test() 
{
    public Test()
    {
        Setup();
    }
    ~Test() {
        TearDown();
    }

    public virtual void Setup()
    {    }

    public virtual void TearDown()
    {    }
}

I've basically just skipped out on using any of the framework class-level setup/teardown methods, and use the regular class constructor/destructor instead. In each unit test, you just override Setup and Teardown. You probably do this already if your tests inherit from a base class, this just changes the mechanism by which they are invoked. I haven't thought too much about possible side effects of this, but it would seem to be functionally equivalent.

Step 4: TestContext

If you happen to be using TestContext, this will be another conflict, since both frameworks have a same-named object. The MS static intitialization methods have it as a parameter, too, whereas for the NUnit framwork, it's a static object you can always access. An easy solution is just to alias the MS one, since you probably haven't written any code against it yet, e.g.:
using MsTestContext = Microsoft.VisualStudio.TestTools.UnitTesting.TestContext; using TestContext = NUnit.Framework.TestContext;
Now, if you actually want to use the MS static methods, you can just use MsTestContext as a parameter, and TestContext will refer unambiguously to the NUnit one.

Step 5: Convert to Test Project

Visual Studio won't give you the testing tools until you add this to the .csproj file of your test project. It goes under Project/PropertyGroup.

<ProjectTypeGuids>{3AC096D0-A1C2-E12C-1390-A8335801FDAB};{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}</ProjectTypeGuids>

Step 6. Develop, Test, Not Necessarily In That Order!

You are now done. If you've been careful, nothing you've done will in any way break this when running under NUnit, and all these tests will now run directly in the Visual Studio IDE as well.

In Summary:

Alias conflicting objects
Use both the NUnit & MS attributes on each class/method as appropriate
Deal with non-implemented instance setup & teardown methods using constructor/destructor
Convert to a test proejct

... and you should be good to go. While it may take a little bit of work to update large existing test suites, it's mostly search and replace. For new work, just build this into your template, and it's already done.

Tuesday, November 8, 2011

IE7 & quirks removes trailing space from empty HTML elements

Pull up this bad boy in IE7 standards or quirks mode.

http://jsfiddle.net/QNs8s/8/

Using `innerText` or `innerHTML` causes the space after an element to be erased, e.g. if you take

"this is some <span id="field"></span> inline text"

and apply innerText to that span, you get

"this is some  <span id="field">more</span>inline text"

which renders as

this is some moreinline text"

The solution is to start with something inside the span, e.g.

"this is some  <span id="field">&nbsp;</span>inline text"

I can't believe I've never come across this before, but googling didn't turn up anything about it. Another irritation for supporting old IE.