Outsharked: 2012

Insights and discoveries
from deep in the weeds

Outsharked

Tuesday, October 16, 2012

CsQuery 1.3 Released

CsQuery 1.3 has been released. You can get it from NuGet or from the source repository on GitHub.

New HTML5 Compliant Parser

This release replaces the original HTML parser with the validator.nu HTML5 parser. This is a complete, standards-compliant HTML5 parser. This is the same codebase used in gecko-based web browsers (e.g. Firefox). You should expect excellent compatibility with the DOM that a web browser would render from markup. Problems that people have had in the past related to character set encoding, invalid HTML parsing, and other edge cases should simply go away.

In the process of implementing the new parser, some significant changes were made to the input and output API in order to take advantage of the its capabilities. While these revisions are generally backwards compatible with 1.2.1, there are a few potentially breaking changes. These can be summarized as follows:

DomDocument.DomRenderingOptions has been removed. The concept of assigning output options to a Document doesn't make sense any more (if it ever did); rather, you define options for how output is rendered at the time you render it.
IOutputFormatter interface has changed. This wasn't really used for anything before, so I doubt this will impact anyone, but it's conceivable that someone coded against it. The interface has been revised somewhat, and it is now used extensively to define a model for rendering output.

Hopefully, these changes won't impact you much or at all. But with this small price comes a host of new options for parsing and rendering HTML.

Create Method Options

Complete documentation: Create method

In the beginning, there was but a single way to create a new DOM from HTML: Create. And it was good. But as the original parser evolved towards HTML5 compliance, the CreateFragment and CreateDocument methods were added, to define intent. Different rules apply depending on the context: a full document must always have an html tag (among others) for example. But you wouldn't want to add any missing tags if your intent was to create a fragment that was not supposed to stand alone.

The new parser has some more toys. It lets us define an expected document type (HTML5, HTML4 Strict, HTML4 Tranistional). We can tell it the context we expect out HTML to be found in when it starts parsing. We can choose to discard comments, and decide to permit self-closing XML tags. All of these things went into the Create method, allowing you complete control over how your input gets processed.

New Overloads

The basic Create method has overloads to accept a number of different kinds of input:

    public static CQ Create(string html)
    public static CQ Create(char[] html)
    public static CQ Create(TextReader html)
    public static CQ Create(Stream html)
    public static CQ Create(IDomObject element)
    public static CQ Create(IEnumerable<IDomObject> elements)

Additionally, there are similar overloads with parameters that let you control each option:


    public static CQ Create(string html, 
            HtmlParsingMode parsingMode =HtmlParsingMode.Auto, 
            HtmlParsingOptions parsingOptions = HtmlParsingOptions.Default,
            DocType docType = DocType.Default)

When calling the basic methods, the "default" values of each of these will be used. The default values are defined on the CsQuery.Config object (the "default defaults" are shown here -- if you change these on the config object, your new values will be used whenever a default is requested):

    CsQuery.Config.HtmlParsingOptions = HtmlParsingOptions.None;
    CsQuery.Config.DocType = DocType.HTML5;

Note that HtmlParsingOptions is a [Flags] enum. This means you can specify more than one option. So you could, for example, call Create like this:

    var dom = CQ.Create(someHtml,HtmlParsingOptions.Default | HtmlParsingOptions.IgnoreComments);

If you pass a method both Default and some other option(s), it will merge the default values with any additional options you specified. On the other hand, passing options that do not include Default will result in only the options you passed being used.

The other methods remain more or less unchanged. CreateDocument and CreateFragment now simply call Create using the appropriate HtmlParsingOption to define the intended document type.

    public static CQ CreateDocument(...)
    public static CQ CreateFragment(...)
    public static CQ CreateFromFile(...)
    public static CQ CreateFromUrl(...)
    public static CQ CreateFromUrlAsync(...)

The Create method offers a wide range of options for input and parsing. These other methods were created for convenience and before an API to handle input features had been thought out. Though I don't intend to deprecate them right away, I will not likely extend them to support the various options. Anything you can do with these methods can be done about as easily with `Create` and a helper of some kind. For example, if you want to load a DOM from a file using options other than the defaults, you can just pass `File.Open(..)` to the standard `Create` method.

Render Method Options

Complete documentation: Render method

The Render method signatures look pretty much the same as 1.2.1.. but a lot has changed behind the scenes. The IOutputFormatter interface, which used to be more or less a placeholder, now runs the show. All output is controlled by OutputFormatters implementing this interface. Any Render method which doesn't explicitly identify an OutputFormatter will be using the default formatter provided by the service locator CsQuery.Config.GetOutputFormatter.

    public static Func<IOutputFormatter> GetOutputFormatter {get;set;}

You can replace the default locator with any delegate that returns IOutputFormatter.. Additionally, you can assign a single instance of a class to the CsQuery.Config.OutputFormatter property, which, if set, will supercede use of service locator. When using this method, the object must be thread safe, since new instances will not be created for each use.

There are a number of built-in IOutputFormatter objects accessible through the static OutputFormatters factory:

    OutputFormatters.HtmlEncodingBasic
    OutputFormatters.HtmlEncodingFull
    OutputFormatters.HtmlEncodingMinimum
    OutputFormatters.HtmlEncodingMinimumNbsp
    OutputFormatters.HtmlEncodingNone
    OutputFormatters.PlainText

Each of these except the last returns an OutputFormatter configured with a particular HtmlEncoder. The last strips out HTML and returns just the text contents (to the best of its ability). The factory also has Create methods that let you configure it with specific DomRenderingOptions too. Complete details of these options are in the Render method documentation.

Bug Fixes

Issue #51: Fix an issue with compound subselectors whose target included CSS matches above the level of the context.
Fix for :empty could return false when non-text or non-element nodes are present

Other New Features

The completely new HTML parser, input and output models aren't enough for you? Well, there are a couple other minor new features.

CsQuery should compile under Mono now, after implementing a suggestion to change to `CsQuery.Utility.JsonSerializer.Deserialize` to avoid an unimplemented Mono framework feature.
Added a HasAttr method to test for the presence of a named attribute.
Add CSS descriptor for Paged Media Module per Pull Request #40 from @kaleb
`CQ.DefaultDocType` has been marked as obsolete and will be removed in a future version. Use `Config.DocType` instead
`CQ.DefaultDomRenderingOptions` has been marked as obsolete and will be removed in a future version. Use `Config.DomRenderingOptions` instead.

There are other changes in the complete change log, however, many of them are related to the deprecated parser and no longer relevant.

Thanks To The Community

This is a big project, and the new parser is a huge step forward. I think you'll find this release is fast, stable, flexible, and standards-compliant. I owe a debt to a number of people who suffered through the development and beta releases for the last couple months, without their patience and feedback, this would not have been possible. A bug report is a gift! So thanks to all the givers. The following is a list of all the people who've contributed code or bug reports recently. (If I missed anyone, it wasn't intentional!) Thanks - please keep it coming.

Vitallium (code), kaleb (code), petterek, ilushka85, laurentlbm, martincarlsson, allroadcole, Nico1234, Uncleed, Vids, Arithmomaniac, CJCannon, muchio7, SaltyDH

CsQuery is a complete CSS selector engine and jQuery port for .NET4 and C#. It's on NuGet as CsQuery. For documentation and more information please see the GitHub repository and posts about CsQuery on this blog.

Thursday, September 13, 2012

Using your favorite Visual Studio 2010 add-ins/extensions in VS2012

I've just about finished my transition from Visual Studio 2010 to Visual Studio 2012. While this has probably been the easiest of any VS update I can remember, it wasn't without a few painful moments. Here's a summary of the annoyances and the solutions I found.

Uppercase Menus

Why, Microsoft, why? I don't want my menus to shout at me. It just looks so... 1992. Luckily, the fix is a piece of cake and requires adding a registry key:

[HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\11.0\General] "SuppressUppercaseConversion"=dword:00000001

Or just run this to add it automatically: vs_menu_case.reg

Impenetrable color themes

Metro has is moments, but eliminating any visual distinction between windows, boundaries, and areas is not one of them. Neither of the two themes that come with VS2012 were especially workable for me.

Visual Studio 2012 Color Theme Editor to the rescue. The "blue" theme that comes packaged with this painless extension is comfortingly familiar to those used to VS2010's default scheme. Hooray! I can find the edge of a window again.

Ultrafind (and other non-updated VS2010 extensions)

Did you use Ultrafind with VS2010? If no, I feel sorry for you. If yes, you probably miss it now, since it hasn't been updated.

Not content to wait for an update, I threw caution to the wind and figured I'd see what happens if I just shoehorned it into VS2012. What do you know-- it works. Here's how to get your VS2010 extensions running in VS2012. Warning: I know nothing about what, if any, differences there may be in the extension model from VS2010 to VS2012. This works for me. It's absolutely not guaranteed to work for you or for all extensions, but there's not likely much harm you can do.

1. Locate your VS2010 user extensions folder.

Start by opening up C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\devenv.pkgdef which shows you the locations from where extensions are loaded. Anything you've installed will likely be in "UserExtensionsFolder":

"UserExtensionsRootFolder" = "$AppDataLocalFolder$\Extensions"

This is probably located here:

C:\Users\{username}\AppData\Local\Microsoft\VisualStudio\10.0\Extensions

2. Copy them.

Within this folder should be a subfolder for each extension you've installed. Copy just the folders related to the extensions you want to migrate from here to the same folder for VS2012 -- the same path, but with "11.0" instead of "10.0". For Ultrafind, it's Logan Mueller [MSFT].

3. Clear cache.

There are two ".cache" files in the Extensions folder. Just delete them. This step might not be needed; I tried this a couple times with and without. If you don't do it, VS seems to get confused about which extensions are enabled. If you do, you may need to re-enable other extensions that are installed.

3. Enable.

You should now just be able to restart VS2012 and see your extension in the extension manager. Cross you fingers and click Enable.

Build Version Increment (and other add-ins)

You can use a similar technique for add-ins. It's even easier. The one I really care about is Build Version Increment, which seems even less likely that Ultrafind to get an update any time soon (since it was barely updated for VS2010!).

1. Find the add-ins folder.

Go to Tools->Options->Add In Security from within VS to find the add-in search path. (I happen to keep mine in dropbox so they stay in sync across several machines). If you've never touched this, your add-ins are probably located in %VSMYDOCUMENTS%\Addins, which is here:

C:\Users\{username}\Documents\Visual Studio 2010\Addins

I have no idea why it's in a completely different place than extensions. Never question Microsoft logic.

2. Copy files

Like before, just copy the files related to the addins you want to migrate to the same folder for VS 2012. It could be just a single file called "something.addin". For BuildVersionIncrement there's also a DLL.

3. Update version.

Edit the "*.addin" file and look for this section:

..
    <HostApplication>
        <Name>Microsoft Visual Studio</Name>
        <Version>10.0</Version>
    </HostApplication>
    ..

Just change that "10.0" to "11.0" and save. That's all. Restart visual studio. If the add-in isn't immediately available, go to Tools->Add In Manager and it should be listed; you can enable it there. CodeProject

Monday, August 13, 2012

jQuery :text filter selector deprecated in 1.8

... and why it matters

In the list of things changed for jQuery 1.8, you might miss this one, buried deep in the change log:

#9400: Deprecate :text, :radio, :checkbox, etc. selector extensions

Sure enough.. it's got the scarlet letter "Deprecated" tag. What the...? These jQuery pseudo-selectors are probably the first thing I ever learned about using jQuery. This seems to be a... confusing move at best.

Most of these jQuery extension selectors are easily replaced using longform CSS. Indeed, this is the rationale presented with the original request: they're redundant. For example, :checkbox is literally the same as input[type=checkbox]. While I've always like the terseness of the jQuery aliases, I could live without them.

The problem is specifically with :text selector. The CSS version input[type=text] does not work the same as the jQuery :text selector. This is because when there's no type attribute, :text will select it, and the CSS version will not. CSS works only against actual attributes in the markup. This is important with "text" inputs because "text" is the default value. It's perfectly legal, valid, and even encouraged by some (because it's terse), to omit the "type" attribute for the ubiquitous text input. The simplest possible text input is just <input />.

Behold, a textbox, which you will then style with jQuery...

Text: Check if you love koala bears

or NOT, since you can't select it without :text!!!

Okay, this is not the end of the world. "Deprecated" is a lot different from "removed." jQuery contains features that were deprecated years ago, and it's not especially likely that this is going to be removed any time soon. But most people are uncomfortable writing new code that uses features they know are slated for future removal. So, starting with jQuery 1.8, you need to either choose to always have a "type" attribute, even though it's not required, or use a feature that's been deprecated to select all "text" inputs.

So, this post is mostly an observation. If at some point in the future :text stopped working, you could always use a simple plugin to replace it. No big deal. But it's certainly a curious feature to remove. It's at the core of jQuery's original purpose: making it easy to work with HTML; filling the void left by the DOM and CSS. The :text filter clearly fills such a void; this change undoes something useful.

Wednesday, August 8, 2012

CsQuery 1.2 Released

CsQuery 1.2 has been released. You can get it from NuGet or from the source repository on GitHub.

This release does not add any significant new features, but is tied with the first formal release of the CsQuery.Mvc framework. This framework simplifies integrating CsQuery into an MVC project by allowing you to intercept the HTML output of a view before it's rendered and inspect or alter it using CsQuery. Additionally, it adds an HtmlHelper method for CsQuery so you can create HTML directly in Razor views. It's on nuget as CsQuery.Mvc.

Breaking Change

Though this change is unlikely to affect many people, it is a significant change to the public API for DOM element creation. Any code which creates DOM elements using "new" such as:

    IDomElement obj = new DomElement("div");

will not compile, and should be replaced with:

    IDomElement obj = DomElement.Create("div");

This was necessary to support a derived object model for complex HTML Element implementations to better implement the browser DOM. Previously, any element-type specific functionality was handled conditionally. This was OK when the DOM model was mostly there to support a jQuery port, but as I have worked to create a more accurate representation of the browser DOM itself, it became clear this was not sustainable going forward

In the new model, some DOM element types will be implemented using classes that derive from DomElement. This means that creating a new element must be done from a factory so that element types which have more specific implementations will be instances of their unique derived type.

Any code that used CQ.Create or Document.CreateElement will be unaffected: this will only be a problem if you had been creating concrete DomElemement instances using new.

Bug Fixes

Issue #27 - .Value for some HTML tags not implemented

CsQuery.Mvc

As usual I'm behind on documentation, but the usage of CsQuery.Mvc is simple and there's an example MVC3 project in the github repo.

The CsQuery MVC framework lets you directly access the HTML output from an MVC view. It adds a property Doc to the controller and methods Cq_ActionName that run concurrently with action invocations, letting you manipulate the HTML via CsQuery before it's rendered. There's basic documentation in the readme and there's also an example MVC application showing how to use it. You can also take a look at the CsQuery.Mvc.Tests project which is, itself, an MVC application.

Using the CsQuery HTML helper requires adding a reference to CsQuery.Mvc in Views/web.config as usual for any HtmlHelper extension methods:

<system.web.webPages.razor>
    <host factoryType="System.Web.Mvc.MvcWebRazorHostFactory, System.Web.Mvc, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31BF3856AD364E35" />
    <pages pageBaseType="System.Web.Mvc.WebViewPage">
      <namespaces>
        <add namespace="System.Web.Mvc" />
        <add namespace="System.Web.Mvc.Html" />
        <add namespace="System.Web.Mvc.Ajax" />
        <add namespace="System.Web.Routing" />
        <add namespace="CsQuery.Mvc"/>
      </namespaces>
   </pages>
</system.web.webPages.razor>

Now you can do this in a Razor view:

@Html.HtmlTag("div").AddClass("someclass").Text("some text");

.. or anything at all that you can do with CsQuery normally, and the HTML output of the CQ object will be inserted inline.

Tuesday, July 10, 2012

CsQuery 1.1.3 Released

CsQuery 1.1.3 has been released. You can get it from NuGet or from the source repository on GitHub.

New features

This release adds an API for extending the selector engine with custom pseudo-class selectors. In jQuery, you can do this with code like James Padolsey's :regex extension.. In C#, we can do a little better than this since we have classes and interfaces to make our lives easier. To that end, in the CsQuery.Engine namespace, you can now find:

    interface IPseudoSelector
        IPseudoSelectorFilter        
        IPseudoSelectorChild
   
    abstract class PseudoSelector: IPseudoSelector
        PseudoSelectorFilter: IPseudoSelectorFilter
        PseudoSelectorChild: IPseudoSelectorChild

The two different incarnations of the base IPseudoSelector interface represent two different types of pseudoclass selectors, which jQuery calls basic filters and child filters. Technically there are also content filters but these work the same way as "basic filters" in practice.

If you are only testing characteristics of the element itself, then use a filter-type selector. If an element's inclusion in a set depends on its children (such as :contents, which tests text-node children) or depends on its position in relation to its siblings (such as nth-child) then you should probably use a child-type selector. In many cases you could do it either way. For example, nth-child could be implemented by looking at each element's ElementIndex property and figuring out if it's a match. But it would be much more efficient to start from the parent, and handpick each child that's at the right position.

The basic API

To create a new filter, implement one of the two interfaces. They both share IPseudoSelector:

IPseudoSelector Interfaces

    public interface IPseudoSelector
    {
        string Arguments { get; set; }
        int MinimumParameterCount { get; }
        int MaximumParameterCount { get; }
        string Name { get; }
    }

In both cases, you should set the min/max values to the number of parameters you want your filter to accept (the default is 0). "Name" should be the name of this filter as it will be used in a selector. Then choose the one that works best for your filter:

    public interface IPseudoSelectorChild : IPseudoSelector
    {
        bool Matches(IDomObject element);
        IEnumerable<IDomObject> ChildMatches(IDomContainer element);
    }

    public interface IPseudoSelectorFilter: IPseudoSelector
    {
        IEnumerable<IDomObject> Filter(IEnumerable selection);
    }

PseudoSelector Abstract Class

/// <summary>
    /// Base class for any pseudoselector that implements validation of min/max parameter values, and
    /// argument validation. When implementing a pseudoselector, you must also implement an interface for the type
    /// of pseudoselector
    /// </summary>

    public abstract class PseudoSelector : IPseudoSelector
    {
        #region private properties

        private string _Arguments;
        
        /// <summary>
        /// Gets or sets criteria (or parameter) data passed with the pseudoselector
        /// </summary>

        protected virtual string[] Parameters {get;set;}

        /// <summary>
        /// A value to determine how to parse the string for a parameter at a specific index.
        /// </summary>
        ///
        /// <param name="index">
        /// Zero-based index of the parameter.
        /// </param>
        ///
        /// <returns>
        /// NeverQuoted to treat quotes as any other character; AlwaysQuoted to require that a quote
        /// character bounds the parameter; or OptionallyQuoted to accept a string that can (but does not
        /// have to be) quoted. The default abstract implementation returns NeverQuoted.
        /// </returns>

        protected virtual QuotingRule ParameterQuoted(int index)
        {
            return QuotingRule.NeverQuoted;
        }

        #endregion

        #region public properties

        /// <summary>
        /// This method is called before any validations are called against this selector. This gives the
        /// developer an opportunity to throw errors based on the configuration outside of the validation
        /// methods.
        /// </summary>
        ///
        /// <value>
        /// The arguments.
        /// </value>

        public virtual string Arguments
        {
            get
            {
                return _Arguments;
            }
            set
            {

                string[] parms=null;
                if (!String.IsNullOrEmpty(value))
                {
                    if (MaximumParameterCount > 1 || MaximumParameterCount < 0)
                    {
                        parms = ParseArgs(value);
                    }
                    else
                    {
                        parms = new string[] { ParseSingleArg(value) };
                    }

                    
                }
                ValidateParameters(parms);
                _Arguments = value;
                Parameters = parms;
                
            }
        }

        /// <summary>
        /// The minimum number of parameters that this selector requires. If there are no parameters, return 0
        /// </summary>
        ///
        /// <value>
        /// An integer
        /// </value>

        public virtual int MinimumParameterCount { get { return 0; } }

        /// <summary>
        /// The maximum number of parameters that this selector can accept. If there is no limit, return -1.
        /// </summary>
        ///
        /// <value>
        /// An integer
        /// </value>

        public virtual int MaximumParameterCount { get { return 0; } }

        /// <summary>
        /// Return the properly cased name of this selector (the class name in non-camelcase)
        /// </summary>

        public virtual string Name
        {
            get
            {
                return Utility.Support.FromCamelCase(this.GetType().Name);
            }
        }

        #endregion

        #region private methods

        /// <summary>
        /// Parse the arguments using the rules returned by the ParameterQuoted method.
        /// </summary>
        ///
        /// <param name="value">
        /// The arguments
        /// </param>
        ///
        /// <returns>
        /// An array of strings
        /// </returns>

        protected string[] ParseArgs(string value)
        {
            List<string> parms = new List<string>();
            int index = 0;


            IStringScanner scanner = Scanner.Create(value);
           
            while (!scanner.Finished)
            {
                var quoting = ParameterQuoted(index);
                switch (quoting)
                {
                    case QuotingRule.OptionallyQuoted:
                        scanner.Expect(MatchFunctions.OptionallyQuoted(","));
                        break;
                    case QuotingRule.AlwaysQuoted:
                        scanner.Expect(MatchFunctions.Quoted());
                        break;
                    case QuotingRule.NeverQuoted:
                        scanner.Seek(',', true);
                        break;
                    default:
                        throw new NotImplementedException("Unimplemented quoting rule");
                }

                parms.Add(scanner.Match);
                if (!scanner.Finished)
                {
                    scanner.Next();
                    index++;
                }
                
            }
            return parms.ToArray();
        }

        /// <summary>
        /// Parse single argument passed to a pseudoselector
        /// </summary>
        ///
        /// <exception cref="ArgumentException">
        /// Thrown when one or more arguments have unsupported or illegal values.
        /// </exception>
        /// <exception cref="NotImplementedException">
        /// Thrown when the requested operation is unimplemented.
        /// </exception>
        ///
        /// <param name="value">
        /// The arguments.
        /// </param>
        ///
        /// <returns>
        /// The parsed string
        /// </returns>

        protected string ParseSingleArg(string value)
        {
            IStringScanner scanner = Scanner.Create(value);

            var quoting = ParameterQuoted(0);
            switch (quoting)
            {
                case QuotingRule.OptionallyQuoted:
                    scanner.Expect(MatchFunctions.OptionallyQuoted());
                    if (!scanner.Finished)
                    {
                        throw new ArgumentException(InvalidArgumentsError());
                    }
                    return scanner.Match;
                case QuotingRule.AlwaysQuoted:

                    scanner.Expect(MatchFunctions.Quoted());
                    if (!scanner.Finished)
                    {
                        throw new ArgumentException(InvalidArgumentsError());
                    }
                    return scanner.Match;
                case QuotingRule.NeverQuoted:
                    return value;
                default:
                    throw new NotImplementedException("Unimplemented quoting rule");
            }
        
        }

        /// <summary>
        /// Validates a parameter array against the expected number of parameters.
        /// </summary>
        ///
        /// <exception cref="ArgumentException">
        /// Thrown when the wrong number of parameters is passed.
        /// </exception>
        ///
        /// <param name="parameters">
        /// Criteria (or parameter) data passed with the pseudoselector.
        /// </param>

        protected virtual void ValidateParameters(string[] parameters) {

            if (parameters == null)
            {
                 if (MinimumParameterCount != 0) {
                     throw new ArgumentException(ParameterCountMismatchError());
                 } else {
                     return;
                 }
            }

            if ((parameters.Length < MinimumParameterCount ||
                    (MaximumParameterCount >= 0 &&
                        (parameters.Length > MaximumParameterCount))))
            {
                throw new ArgumentException(ParameterCountMismatchError());
            }

        }

        /// <summary>
        /// Gets the string for a parameter count mismatch error.
        /// </summary>
        ///
        /// <returns>
        /// A string to be used as an exception message.
        /// </returns>

        protected string ParameterCountMismatchError()
        {
            if (MinimumParameterCount == MaximumParameterCount )
            {
                if (MinimumParameterCount == 0)
                {
                    return String.Format("The :{0} pseudoselector cannot have arguments.",
                        Name);
                }
                else
                {
                    return String.Format("The :{0} pseudoselector must have exactly {1} arguments.",
                     Name,
                     MinimumParameterCount);
                }
            } else if (MaximumParameterCount >= 0)
            {
                return String.Format("The :{0} pseudoselector must have between {1} and {2} arguments.",
                    Name,
                    MinimumParameterCount,
                    MaximumParameterCount);
            }
            else
            {
                return String.Format("The :{0} pseudoselector must have between {1} and {2} arguments.",
                     Name,
                     MinimumParameterCount,
                     MaximumParameterCount);
            }
        }

        /// <summary>
        /// Get a string for an error when there are invalid arguments
        /// </summary>
        ///
        /// <returns>
        /// A string to be used as an exception message.
        /// </returns>

        protected string InvalidArgumentsError()
        {
            return String.Format("The :{0} pseudoselector has some invalid arguments.",
                        Name);
        }

        #endregion

PseudoSelectorChild Abstract Class

    public abstract class PseudoSelectorChild: 
        PseudoSelector, IPseudoSelectorChild
    {
        /// <summary>
        /// Test whether an element matches this selector.
        /// </summary>
        ///
        /// <param name="element">
        /// The element to test.
        /// </param>
        ///
        /// <returns>
        /// true if it matches, false if not.
        /// </returns>

        public abstract bool Matches(IDomObject element);

        /// <summary>
        /// Basic implementation of ChildMatches, runs the Matches method 
        /// against each child. This should be overridden with something 
        /// more efficient if possible. For example, selectors that inspect
        /// the element's index could get their results more easily by 
        /// picking the correct results from the list of children rather 
        ///  than testing each one.
        /// 
        /// Also note that the default iterator for ChildMatches only 
        /// passed element (e.g. non-text node) children. If you wanted 
        /// to design a filter that worked on other node types, you should
        /// override this to access all children instead of just the elements.
        /// </summary>
        ///
        /// <param name="element">
        /// The parent element.
        /// </param>
        ///
        /// <returns>
        /// A sequence of children that match.
        /// </returns>

        public virtual IEnumerable<IDomObject> ChildMatches(IDomContainer element)
        {
            return element.ChildElements.Where(item => Matches(item));
        }
    }

PseudoSelectorFilter Abstract Class

    public abstract class PseudoSelectorFilter: 
        PseudoSelector, IPseudoSelectorFilter
    {
        /// <summary>
        /// Test whether an element matches this selector.
        /// </summary>
        ///
        /// <param name="element">
        /// The element to test.
        /// </param>
        ///
        /// <returns>
        /// true if it matches, false if not.
        /// </returns>

        public abstract bool Matches(IDomObject element);

        /// <summary>
        /// Basic implementation of ChildMatches, runs the Matches method 
        /// against each child. Same caveats as above.
        /// </summary>
        ///
        /// <param name="element">
        /// The parent element.
        /// </param>
        ///
        /// <returns>
        /// A sequence of children that match.
        /// </returns>

        public virtual IEnumerable<IDomObject> Filter(IEnumerable<IDomObject> elements)
        {
            return elements.Where(item => Matches(item));
        }
    }

If you implement one of the abstract classes, you get most of the functionality pre-rolled:

Name is the un-camel-cased name of the class itself, e.g. class MySpecialSelector would become a selector :my-special-selector
MinimumParameterCount and MaximumParameterCount are 0, meaning no parenthesized parameters.
Arguments is parsed into a protected property string Parameters[] (using comma as a separator) using the min/max values as a guide. Additionally, you can override QuotingRule ParameterQuoted(int index) and return a value to tell the class how to parse each parameter. The index refers to the zero-based position of the parameter, and QuotingRule is an enum that indicates how quoting should be handled for the parameter at that position: NeverQuoted, AlwaysQuoted or OptionallyQuoted. NeverQuoted means single and double quotes will be treated as regular characters, and AlwaysQuoted means single or double-quote bounds are required. OptionallyQuoted means that if found, they will be treated as bounding quotes, but are not required.
The PseudoSelectorChild class implements ChildMatches by simply passing each element child to the Matches function. If you want to test other types of children (like text nodes) or have a smarter way to choose matching children, then override it.

Adding Your Selector to CsQuery

Here's the cool part. To add your selector to CsQuery, you don't need to do anything.. If you include it in a namespace called CsQuery.Extensions, it will automatically be detected. This works as long as this extension can be found in the assembly which first invokes a selector when the application starts. If for some reason this might not be the case, you can force CsQuery to register the extensions explicitly by calling from the assembly in which they're found:

    CsQuery.Config.PseudoClassFilters.Register();

You can also pass an Assembly object to that method. Finally, you can register a filter type explicitly:

    CsQuery.Config.PseudoClassFilters.Register("my-special-selector",typeof(MySpecialSelector));

The Name property isn't used when you register an extension this way.

Example

Here's an port of the :regex selector mentioned above. This can also be found in the test suite under CSharp\Selectors\RegexExtension.cs.

Regular Expression Filter Code

    using System.Text.RegularExpressions;
    using CsQuery.ExtensionMethods;

    class Regex : PseudoSelectorFilter
    {
        private enum Modes
        {
            Data = 1,
            Css = 2,
            Attr = 3
        }

        private string Property;
        private Modes Mode;
        private SysRegex Expression;

        public override bool Matches(IDomObject element)
        {
            switch (Mode)
            {
                case Modes.Attr:
                    return Expression.IsMatch(element[Property] ?? "");
                case Modes.Css:
                    return Expression.IsMatch(element.Style[Property] ?? "");
                case Modes.Data:
                    return Expression.IsMatch(element.Cq().DataRaw(Property) ?? "");
                default:
                    throw new NotImplementedException();
            }
        }

        private void Configure()
        {
            var validLabels = new SysRegex("^(data|css):");

            if (validLabels.IsMatch(Parameters[0]))
            {
                string[] subParm = Parameters[0].Split(':');
                string methodName = subParm[0];

                if (methodName == "data")
                {
                    Mode = Modes.Data;
                }
                else if (methodName == "css")
                {
                    Mode = Modes.Css;
                }
                else
                {
                    throw new ArgumentException("Unknown mode for regex pseudoselector.");
                }
                Property = subParm[1];
            }
            else
            {
                Mode = Modes.Attr;
                Property = Parameters[0];
            }

            // The expression trims whitespace the same way as the original
            // Trim() would work just as well but left this way to demonstrate
            // the CsQuery "RegexReplace" extension method

            Expression = new SysRegex(Parameters[1].RegexReplace(@"^\s+|\s+$",""),
                RegexOptions.IgnoreCase | RegexOptions.Multiline);
        }


        // We override "Arguments" to do some setup when this selector
        // is first created, rather than parse the arguments on each 
        // iteration as in the Javascript version. This technique should 
        // be used universally to do any argument setup. Selectors with no
        // arguments by definition should have no instance-specific
        // configuration to do, so there would be no point in overriding 
        // this for that kind of filter.

        public override string Arguments
        {
            get 
            {
                return base.Arguments;
            }
            set
            {
                base.Arguments = value;
                Configure();
            }
        }

        // Allow either parameter to be optionally quoted since they're both
        // strings: just return null regardless of index.

        protected override bool? ParameterQuoted(int index)
        {
            return null;
        }

        public override int MaximumParameterCount
        {
            get { return 2; }
        }
        public override int MinimumParameterCount
        {
            get { return 2; }
        }

        public override string Name
        {
            get { return "regex"; }
        }
    }

This is actually a relatively complicated pseduo-selector. To see some simpler examples, just go look at the source code for the CsQuery CSS selector engine. Most of the native selectors have been implemented using this API. The exceptions are pseudoselectors that match only on indexed characteristics, e.g. all the tag and type selectors such as :input and :checkbox. These could have been set up the same way, but they wouldn't be able to take advantage of the index if they were implemented as filters.

Speaking Of Which... Selector Performance

Many of the same rules about selector performance apply here as they do in jQuery. Don't do this:

   var sel = doc[":some-filter"].Filter("div");

Do this:

   var sel = doc["div:some-filter"];

Obviously that's a pretty silly example - most people wouldn't go out of their way to do the first. But generally speaking, you should order your selectors this way:

ID, tag, and class selectors first;
attribute selectors next;
filters last

Unlike jQuery, it doesn't matter whether a filter or selector is "native" to CSS or not - everything is native in CsQuery. What matters is whether it's indexed. All attribute names (but not values), node names (tags), classes and ID values are indexed. It doesn't matter if you combine selectors -- the index can still be used as long as you're selecting on one of those things. But you should try to organize your selectors to chose the most specific indexed criteria first.

It's very fast for CsQuery to pull records from the index. So if you are targeting an ID, that's unique - always use that first. Classes are probably the next best, followed by tag names, and last attributes. Nodes with a certain attribute will be identified in the index just as fast as anything else, but then the engine still has to check the value of each node that has that attribute against your selection criteria.

Wednesday, June 27, 2012

CsQuery Performance vs. Html Agility Pack and Fizzler

I put together some performance tests to compare CsQuery to the only practical alternative that I know of (Fizzler, an HtmlAgilityPack extension). I tested against three different documents:

The sizzle test document (about 11 k)
The wikipedia entry for "cheese" (about 170 k)
The single-page HTML 5 spec (about 6 megabytes)

The overall results are:

HAP is faster at loading the string of HTML into an object model. This makes sense, since I don't think Fizzler builds an index (or perhaps it builds only a relatively simple one). CsQuery takes anywhere from 1.1 to 2.6x longer to load the document. More on this below.
CsQuery is faster for almost everything else. Sometimes by factors of 10,000 or more. The one exception is the "*" selector, where sometimes Fizzler is faster. For all tests, the results are completely enumerated; this case just results in every node in the tree being enumerated. So this doesn't test the selection engine so much as the data structure.
CsQuery did a better job at returning the same results as a browser. Each of the selectors here was verified against the same document in Chrome using jQuery 1.7.2, and the numbers match those returned by CsQuery. This is probably because HtmlAgilityPack handles optional (missing) tags differently. Additionally, nth-child is not implemented completely in Fizzler - it only supports simple values (not formulae).

The most dramatic results are when running a selector of a single ID or a nonexistent ID in a large document. CsQuery returns the result (an empty set) over 100,000 times faster than Fizzler. This is almost certainly because it doesn't index on IDs; other selectors are much faster in Fizzler than this (though still substantially slower than CsQuery).

Size Matters

In the very small documents (the 11k sizzle test document) CsQuery still beats Fizzler, but by much less. The ID selector is still pretty substantial about 15-15x faster. For more complex selectors, the margin is just over 1x to 3x faster.

On the other hand, in very large documents, the edge that Fizzler has in loading the documents seems to mostly disappear. CsQuery is only about 10% slower at loading the 6 megabyte "large" document. This could be an opportunity for optimizing CsQuery - this seems to indicate that overhead just in creating a single document is dragging performance down. Or, it could be indicative of the makeup of the respective test documents. Maybe CsQuery does better with more elements, and Fizzler with more text - or vice versa.

You can see a detailed comparison of all the tests so far here in a google doc:

"FasterRatio" is how much faster the winner was than the loser. Yellow ones are CsQuery; red ones are Fizzler.

Red in the "Same" column means the two engines returned different results.

Try It Out

This output can be created directly from the CsQuery test project under "Performance."

Tuesday, June 26, 2012

CsQuery 1.1.2 Released

CsQuery 1.1.2 has been released. You can get it from NuGet or from the source repository on GitHub.

New features

This release includes significant revisions to the HTML parser to enhance compatibility with HTML5 parsing rules for optional opening and closing tags.

When optional closing tags are omitted, such as </p>, CsQuery's HTML parser will use the HTML5 spec rules to determine when to insert a closing tag. When opening tags for required elements such as head and tbody are omitted, the parser will generate the missing tags when parsing in document mode. This means you can expect a very high degree of compatibility between the HTML (and selections) generated by CsQuery, and the DOM rendered by web browsers, when valid HTML is passed.

The HTML5 spec also includes a set of rules for handling invalid markup. While the CsQuery parser usually makes pretty good decisions about how to handle bad HTML, and should be able to parse about anything, it doesn't yet comply with the "bad markup" part of the spec - just the "optional" handling part. Over time, though, I intend to continue improving the parser to comply with other parts of the spec as much as possible.

API Change

Because the HTML parser will generate tags now, it needs to understand context. If you're creating a fragment that's just supposed to be a building block, you obviously don't want it adding html and body tags around your markup.

There are now three static methods for parsing HTML:

CQ.Create(..)
Create a content block

This method is meant to be used for complete HTML blocks that are not self-contained documents. Examples of this are a piece of content retrieved from a CMS, or a template. It should be used for anything that is a compete block, but is intended to be embedded in another document. Using this method, missing tags will be handled according to the HTML5 spec EXCEPT for adding the optional html and body tags. Additionally, any text found at the root of the markup will be wrapped in span tags making it safe to insert into nodes that cannot have text directly as children.
CQ.CreateDocument(..)
Create a document.

This method creates a complete HTML document. If the html, body or head tags are missing, they will be created. Stranded text nodes (e.g. outside of body) will be moved inside the body. If you're parsing HTML from the web or from a file that's supposed to represent a complete HTML document, use this.
CQ.CreateFragment(..)
Create a fragment.

This method interprets the content as a true fragment that you can use for any purpose. No new elements will be created. The rules for optional closing tags are still honored -- to do otherwise would just result in the default handling for any broken/unclosed tag being used instead. But no optional tags like tbody will be generated even if they are expected to be found. This method is the default handling for creating HTML from a selector, e.g.
```
var html = dom["<div></div>"];
```

Other Enhancements

The jQuery :input pseudoclass was added. It had been inadvertently omitted from prior versions.
All selectors can include escaped characters now
HTML parser permits all valid characters in class and attribute names. Previously, the : and . characters were stop characters.
The CQ object's property indexer overloads now align with the Select method overloads.
Migrated all of the tests from Sizzle. (A few of the bugs fixed in this release were found as a result of implementing the Sizzle test suite).

Bug Fixes

Issue #12: CSS class names being output in lowercase
Issue #11: :hidden selector not selecting input[type=hidden]
Issue #8: allow leading + and - signs in nth-child type equations
Corrected a problem with some last-child selectors (found during Sizzle unit test migration, no bug report)

This release has also had some performance optimizations; nth-child type selectors in particular should be an order of magnitude faster as a result of caching the results of each calculation.

Tuesday, June 19, 2012

ImageMapster 1.2.5 released

After 9 months I've finally released an update to ImageMapster. Download the latest release distribution or go to github to see the source.

Since 1.2.4 much has changed. If you've been following along the development, a lot of this may be old news, but this covers most of what's changed since the last official release.

New Features

clickNavigate allows binding a URL to an area, just like a regular HTML imagemap! Seriously, this was a common request - sometimes people just wanted the map to highlight areas on mouseover, but otherwise act the same. You could always do this by capturing a click event and then set window.location, but the method streamlines this.

It offers a few conveniences, e.g. when an area has only href='#' then it will not navigate even when this option is enabled, and if any valid href target is found on any area in a group, then it will be used no matter which area in the group is clicked.
A new keys option allows you to obtain a list of keys associated with an area or area group. That is, you can assign more than one key to an area, e.g. this area:
```
<area href="#" data-key="area1, group1" coords="...">
```
has two keys, "area1" and "group1." This lets you create different, independent groups which you can control separately. The first key on the list is always the primary key, though, and determines whether something is considered "selected". So sometimes, given a key, you want to find out other keys associated with it, so you can select or deselect associated areas in response to an action. This option gives you easy access to data on the relationships between area keys.
mouseoutDelay option lets you specify a time in milliseconds that a highlighted area will remain highlighted after the mouse leaves. (If another area is highlighted before this time elapses, the old one will be removed immediately). This is useful for sparse maps, e.g. maps with large areas that aren't part of the map and only small highlighted areas. Because a users's pointer may only be over the area briefly, the effect could appear flickery or jerky. This allows you to keep it highlighted for some time after they leave to avoid this problem.
Rendering options can be passed on-the-fly with set allowing you to have complete control over the appearance of every area without having to define area options up front.

Bug Fixes, Improvements

Many compatibility and stability improvements to resolve conflicts with browser plugins (AdBlock in particular) and solve some browser issues. Fading effects now work consistently in IE 6-8 too.
More robust binding to handle situations that caused problems such as the imagemap being initially hidden or extremely slow-loading images
Tooltips can be positioned outside the boundaries of the image. A few bugs related to tooltip positioning were fixed.
rebind and snapshot have been cleaned up a lot, allowing you to chain events to create complex initial effects. For example, this code would bind a map using a set of options defined in initial_opts, then highlight "CA" using the "fill" and "fillColor" options shown, then finally take a snapshot and rebind with a different set of options basic_opts. All the effects that were rendered before the snapshot will now be part of a static backdrop. Fiddle with it.
```
    $('img').mapster(initial_opts)
        .mapster('set',true,'CA', {
            fill: true,
            fillColor: '00ff00'
        })
        .mapster('snapshot')
        .mapster('rebind',basic_opts);
```
resize has been improved to increase smoothness and performance. A bug that caused its callback to be fired at the wrong time has been fixed.

What's next?

First, I'm not going to wait 9 months to make a new release next time. This was a result of being dissatisfied with the state of javascript testing frameworks for testing complex UI tools. I never felt comfortable calling this a "release" while the tests were a mess. That was probably a mistake since thousands of people downloaded the old version even as I've known it's got many bugs that have since been fixed. I won't make that mistake again.

The next major release will include a new API as an option. That is, instead of calling mapster with mapster('method',...) you will be able to obtain an actual mapster object and call its methods directly, e.g.


var mapster = $('img').mapster(initial_opts);
    mapster.set('CA', {fill: true, fillColor: '00ff00' })
        .snapshot()
        .rebind(basic_opts);

While sticking to the jQuery model makes sense to a point, this tool has become sufficiently built-out that it's a hinderance when doing anything beyond the basics. The old methods will still be perfectly valid.

There will be panning and zooming. I started coding some more sophisticated zoom effects that work with "resize" to let you easily zoom directly to an area. I stopped when I realized feature creep was preventing me from getting a new release finished and fixing bugs. Now it's time to get back to that.

Better tooltips. Lots of people ask about controlling the position and functionality of tooltips. I plan to add some better integrated support for tooltip manipulation.

Feature selection I broke the source code into modules some time ago because it was becoming unwieldy as a single file. My secondary goal in doing this was to allow one to create custom builds using only the features needed. For example, if you don't care about tooltips, why include that extra code? This is really more of a web site feature than anything else, it is (almost) possible to exclude some modules now.

What else? Let me know if you have ideas, or want to contribute!

Wednesday, June 13, 2012

CsQuery 1.1 Released, and available on NuGet

CsQuery 1.1 has been released. This is a major milestone; the library now implements every CSS2 and CSS3 selector.

Additionally, CsQuery is now available on NuGet:

    PM> Install-Package CsQuery

There are two important API changes from prior versions.

The IDomElement.NodeName method now returns its results in uppercase. Formerly, results were returned in lowercase. So any code that tests for node type with a string will break, e.g.
```
    CQ results = dom["div, span"];
    foreach (IDomObject item in results) {
//        if (item.NodeName=="div") {
        if (item.NodeName=="DIV") {
            ...
        }
    }
```
I realize this can easily break code in ways that the compiler cannot detect and apologize for this; but this is important to be consistent with the browser DOM. This was a long time coming.

The CsQuery.Server object has been removed. Methods for loading a DOM from an http server have been replaced with static methods on the CQ object:

    // synchronous
    var doc = CQ.CreateFromUrl("http://www.jquery.com");
  
    // asynchronous with delegates to call upon completion
    CQ.CreateFromUrlAsync("http://www.jquery.com", responseSuccess => {
        Dom = response.Dom;        
    }, responseFail => {
        ..
    });

    // asynchronous using IPromise (similar to C#5 Task)
    var promise = CQ.CreateFromUrlAsync("http://www.jquery.com");
    var promise2 = CQ.CreateFromUrlAsync("http://www.cnn.com");

    promise.Then(successDelegate);
    promise2.Then(successDelegate,failDelegate);

    When.All(promise,promise2).Then(allFinishedDelegate);

See Creating a new DOM and Promises in the readme for more details.

New Features in 1.1

Implemented all missing CSS pseudoclass selectors:

    :nth-last-of-type(N)              :nth-last-child(N)    
    :nth-of-type(N)                   :only-child
    :only-of-type                     :empty
    :last-of-type                     :first-of-type

Implemented all missing jquery pseudoclass selectors:

    :parent                           :hidden
    :header

Added IDomObject.Name property
Added IDomObject.Type property

Bug Fixes

Don't consider html node a child when targeted by child-targeting selectors (consistent with browser behavior)
Fix checkbox lists in Forms.RestorePost
Pseudoselectors from a descendant combinator only returning direct descendant matches (e.g., div :empty)
Issue #5 - Remove enforcement of unique id attribute when parsing HTML

CsQuery is a complete port of jQuery written in C# for .NET4. For documentation and more information please see the GitHub repository and posts about CsQuery on this blog.

Thursday, June 7, 2012

Async web gets and Promises in CsQuery

More recent versions jQuery introduced a "deferred" object for managing callbacks using a concept called Promises. Though this is less relevant for CsQuery because your work won't be interactive for the most part, there is one important situation where you will have to manage asynchronous events: loading data from a web server.

Making a request to a web server can take a substantial amount of time, and if you are using CsQuery for a real-time application, you probably won't want to make your users wait for the request to finish.

For example, I use CsQuery to provide current status information on the "What's New" section for the ImageMapster web site. I do this by scraping GitHub and parsing out the relevant information. But I certainly do not want to cause anyone to wait while the server makes a remote web request to GitHub (which could be slow or inaccessible). Rather, the code keeps track of when the last time it's updated it's information using a static variable. If it's become "stale", it initiates a new async request, and when that request is completed, it updates the cached data.

So, the http request that actually triggered the update will be shown the old information, but there will be no lag. Any requests coming in after the request to GitHub has finished will of course use the new information. The code looks pretty much like this:

    private static DateTime LastUpdate;
    
    if (LastUpdate.AddHours(4) < DateTime.Now) {

        /// stale - start the update process. The actual code makes three 
        /// independent requests to obtain commit & version info

        var url = "https://github.com/jamietre/ImageMapster/commits/master";
        CQ.CreateFromUrlAsync(url)
           .Then(response => {
               LastUpdate = DateTime.Now;
               var gitHubDOM = response.Dom;
               ... 
               // use CsQuery to extract needed info from the response
           });
    }

    ...

    // render the page using the current data - code flow is never blocked even if an update
    // was requested

Though C# 5 includes some language features that greatly improve asynchronous handling such as `await`, I dind't want to "wait", and the promise API used often in Javascript is actually extraordinarily elegant. Hence I decided to make a basic C# implementation to assist in using this method.

The `CreateFromUrlAsync` method can return an `IPromise` object. The basic promise interface (from CommonJS Promises/A) has only one method:

    then(success,failure,progress)

The basic use in JS is this:

    someAsyncAction().then(successDelegate,failureDelegate);

When the action is completed, "success" is called with an optional parameter from the caller; if it fails, "failure" is called.

I decided to skip progress for now; handling the two callbacks in C# requires a bit of overloading because function delegates can have different signatures. The CsQuery implementation can accept any delegate that has zero or one parameters, and returns void or something. A promise can also be generically typed, with the generic type identifying the type of parameter that is passed to the callback functions. So the signature for `CreateFromUrlAsync` is this:

    IPromise CreateFromUrlAsync(string url, ServerConfig options = null)

This makes it incredibly simple to write code with success & failure handlers inline. By strongly typing the returned promise, you don't have to cast the delegates, as in the original example: the `response` parameter is implicitly typed as `ICsqWebResponse`. If I wanted to add a fail handler, I could do this:

    CQ.CreateFromUrlAsync(url)
        .Then(responseSuccess => {
            LastUpdate = DateTime.Now;
             ...
        }, responseFail => {
             // do something
        });

CsQuery provides one other useful promise-related function called `WhenAll`. This lets you create a new promise that resolves when every one of a set of promises has resolved. This is especially useful for this situation, since it means you can intiate several independent web requests, and have a promise that resolves only when all of them are complete. It works like this:

    var promise1 = CQ.CreateFromUrlAsync(url);
    var promise2 = CQ.CreateFromUrlAsync(url);

    CsQuery.When.All(promise1,promise2).Then(successDelegate, failDelegate);

You can also give it a timeout which will cause the promise to reject if it has not resolved by that time. This is valuable for ensuring that you get a resolution no matter what happens in the client promises:

    // Automatically reject after 5 seconds

    CsQuery.When.All(5000,promise1,promise2)
        .Then(successDelegate, failDelegate);

`When` is a static object that is used to create instances of promise-related functions. You can also use it to create your own deferred entities:

    var deferred = CsQuery.When.Deferred();
    
   // a "deferred" object implements IPromise, and also has methods to resolve or reject

   deferred.Then(successDelegate, failDelegate);
   deferred.Resolve();   // causes successDelegate to run

What's interesting about promises, too, is that they can be resolved *before* the appropriate delegates have been bound and everything still works:

    var deferred = CsQuery.When.Deferred();

    deferred.Resolve();
    deferred.Then(successDelegate, failDelegate);   // successDelegate runs immediately

I may completely revisit this once VS2012 is out; the `await` keyword cleans things up a little but and the `Task.WhenAll` feature does the same thing as `When.All` here. By the way - the basic API and operation for "when" was 100% inspired by Brian Cavalier's excellent when.js project which I use extensively in Javascript.

Monday, June 4, 2012

Using CsQuery with MVC views

Update 7/17/2012: The source repository now includes a complete MVC example project that implements a custom view engine using CsQuery, allowing you to simply add methods to a controller to have access to the page's markup before rendering as a CQ object, e.g.

    public class AboutController : CsQueryController
    {

        public ActionResult Index()
        {
           
            return View();
            
        }

        // runs for the "Index" action after the ActionResult is returned,
        // providing access to the final HTML before it's rendered

        public void Cq_Index()
        { 
            // add the "highlight" class to all anchors

            Doc["a"].AddClass("highlight");
        }
    }

Take a look at the MVC example for more information. The contents of this blog post are accurate but the example provides much more detail as well as a complete implementation, since it's not completely trivial to intercept the final HTML for a page in an MVC application.

Original Post

I've been neglecting CsQuery, the C# jQuery port lately, and I feel bad about that. But I haven't forgotten it. Quite the opposite, I'm gearing up to create the first formal release, get it onto NuGet, and publish a web site with interactive demos and documentation. It's going to take a little while to move this all forward but it's in progress.

There's been a spark of outside interest in the project in the last month or so, which has inspired me to get moving again on some of this stuff. Things always slow down at work in the summer so the timing is good and I hope to have this thing in a more consumer-friendly format soon.

In the meantime, here's a nugget from Rick Strahl about rendering MVC views as strings. If you're using CsQuery with ASP.NET MVC, this is a technique you will almost certainly use to feed your MVC markup into CsQuery for further manhandling.

I described a similar technique in this question. Rick's post encapsulates this cleanly in a class. To get from there to a CsQuery object is a piece of cake:

    string message=ViewRenderer.RenderView("~/views/template/ContactSellerEmail.cshtml",
        model,ControllerContext);

    // create a CsQuery object from the HTML string
    CQ messageDom = CQ.Create(message);

    // do stuff...
    messageDom["#content-placeholder"].ReplaceWith(...);

    // render it back to a string of HTML
    message = messageDom.Render();

Tuesday, May 15, 2012

On Javascript Style

My two cents on the ongoing debate over punctuation in Javascript.

This two year old gist from Isaac Schlueter, author of NPM, has been a focal point for this debate. He describes a rationale for putting commas first, beginning with these simple words:

Note how the errors pop out in the comma-first style.

He's absolutely right. And that, right there, is why I think it's a bad style choice.

I want punctuation to disappear when it's not doing something important. Commas that separate array elements or object properties that are already on separate lines are not doing anything important. So putting commas first may save me from a few comma-related errors every day. (Which, of course, JSHINT will catch promptly, along with all the other errors I make every day). But I will spend ten or twelve hours a day, every day, looking at commas instead of variable names or object definitions. I will have to visually parse the punctuation for every single line I read, instead of not thinking about it.

I agree that putting commas first makes errors more apparent. I disagree that an entire aesthetic should be defined by this desire.

There are some arguments made over the course of this two-year-old (and still going strong) gist about the sorts of errors that automated syntax checking tools will not catch, also justifying this. For example, this is valid:

var x = [ ["asdf", "foo", "bar"],
          ["baz", "blerg", "boof"]
          ["quux", "antimatter"] ];

The parser will interpret the last element as a property indexer, which is wrong.

All I can say is, how much time do you spend hardcoding two-dimensional arrays? On the other hand, how much time do you spend reading object constructs and var statements? The latter are the foundation for pretty much every bit of code you will write in Javascript. The former is an anomaly.

Yes. You can come up with of situations where the error introduced by a missing comma would resemble valid code and not be caught by JSHINT. A comma-first syntax would probably have helped you avoid that problem.

I am not sure such a situation has ever come up in practice, though. This is OCD. This is carrying a bottle of antiseptic and spraying every surface before you touch it, because it (maybe) will prevent you from getting a cold once in a while. The cure is much worse than the disease!

Avoiding syntax errors is important. But is it more important than making code that can be read and understood with the least amount of effort? I don't think so. If comma-first adds even 1% to the amount of mental energy I need to read and understand some code, it's not worth it.

One final note. I'm not a purist at all. Use multiple var statements (but only before your first line of functional code, of course!) Use side effects. Don't use semicolons. I think the goal should be to write code that is the most easily understood, and a lot of "non-purist" techniques can be used to create terse code that's more readable than its longform version. I feel like a lot of the rhetoric on this topic has been about a debate over purism. It shouldn't be.

I have a tendency to use semicolons rigorously, but that's because I program in C# half the time and it's hard to get my brain to move easily back and forth between the two mindsets. But I completely understand the rationale often cited for using ASI: semicolons are visual clutter. Yup, they sure are! And I think that same rationale should apply doubly so for putting commas at the beginning of the line instead of the end. At least when punctuation is at the end of a line, our "left to right" brains can easily dismiss it. When it's at the beginning, you can't avoid it.

Thursday, March 29, 2012

Passing control from a custom HttpHandler to the default handler in asp.net

Using System.Web.Routing and IIS7 you can do some pretty interesting stuff with a web app. It also opens up a lot of possibilities for "modernizing" old webforms applications that are stuck with ugly paths and query strings by overriding the default handlers and parsing out the page path yourself.

One thing I wanted to do was to map certain paths back to an aspx page, but dynamically - e.g. I wanted to be able to parse out a path and using complex logic, build a new "real" path+querystring that I could just pass to the default handler. This way I can create nice clean API-like paths and map them to an old, ugly query-string based API. The routing part is a piece of cake. (Well, not really, but it's not a mystery). But how do you invoke the default handler from code? There's a baffling lack of info out there, so I figured I'd post a solution for future coders.

After some digging I realized that, actually, the default handler is just the plain old System.Web.UI.Page object. I tried to just create an instance of one and call ProcessRequest like you would any other handler. Nothing at all. No error, no output.

Microsoft is decidedly no help with this either. For Page.ProcessRequest, their documentation actually says:

You should not call this method.

Priceless!

Let us proceed to tread into "you have been warned" territory, though. The problem is you (apparently) aren't just supposed to create a new Page. You should use the PageHandlerFactory. Unfortunately, MS has inconveniently laden that thing with an internal constructor, so you can't make one of those, either.

Thanks to Robert's C# musings for the answer to this one, which is the key to solving this problem. You can't instantiate a class with an internal constructor, directly. But you can make yourself an instance of any class that doesn't call any constructor using a well-hidden GetUninitializedObject method. Since the constructor in the case of this factory is designed only to prevent us from using the factory, and doesn't actually do antyhing useful, not a problem. Once you've got access to the handler factory, the rest falls into place pretty easily. Here's some basic code that maps any path to default.aspx, converting the original path to a query parameter.

public void ProcessRequest(HttpContext context) {

    // the internal constructor doesn't do anything but prevent you from instantiating
    // the factory, so we can skip it.
    
    PageHandlerFactory factory =
        (PageHandlerFactory)System.Runtime.Serialization.FormatterServices
            .GetUninitializedObject(typeof(System.Web.UI.PageHandlerFactory));

    // you may want to use context.PathInfo - in my case I mapped a wildcard to this
    // handler so it's always blank.

     string newTarget  = "default.aspx"; 
     string newQueryString = "path="+context.Path;
     string oldQueryString = context.Request.QueryString.ToString();
     string queryString = newQueryString + oldQueryString!="" ? 
         "&" + newQueryString :
         "";

     // the 3rd parameter must be just the path to the file target (no querystring).
     // the 4th parameter should be the physical path to the file, though it also
     //   works fine if you pass an empty string - perhaps that's only to override
     //   the usual presentation based on the path?

     var handler = factory.GetHandler(context, "GET",newTarget,
         context.Request.MapPath(context,newTarget));

     // Update the context object as it should appear to your page/app, and
     // assign your new handler.

     context.RewritePath(newTarget, "", queryString);
     context.Handler = handler;

     // .. and done

     handler.ProcessRequest(context);
}

The PageHandlerFactory can certainly be created statically so you don't have the overhead of reflection for every request. The actual "Page" handler must be created each time, though, because it's not marked as IsReusable.

Monday, March 12, 2012

SharpLinter now works with inline scripts in HTML files

It will look for embedded scripts and only validate what's inside legal <script type="text/javascript"></script> blocks. Any file that's not called *.js or *.javascript will be treated this way.

Go get Sharplinter 1.0.2. You can also just download the binary distribution.

SharpLinter is a C# command-line tool for validating javascript. It is highly configurable, can produce output in customized formats, and integrates with Visual Studio. It should work with any editor or tool that supports using external tools to process files. The readme includes instructions for use with Visual Studio, Sublime Text 2, and Textpad.

Wednesday, March 7, 2012

Area groups in ImageMapster

ImageMapster has had the capability to form complex area groups for some time, but never got around to documenting it well. More recently, I added a keys feature that lets you get the keys associated with an area (or another key), and also the ability to pass rendering options directly with the set method. This opens up a lot of possibilities for area manipulation that were possible but not very easy before.

I put together an example showing how area groups can be used, along with these new features, to have a great deal of control over the effects.

Take a look at it on jsFiddle and play around! I'm going to start a library of interactive examples to include on the web site. If you like it, let me know, or put together one that shows your own techniques so I can share it!

Complete documentation is on the project web site.

Intro to the ImageMapster Area Groups example:

This example shows how to use mapKey to create groups that you can use to control sets of areas independently.

Each area in the imagemap has a custom attribute data-state. This defines the groups that each area belongs to.
The mapKey: 'data-state' option identifies this attribute for the imagemap. The values in the mapKey can be used to select, deselect or highlight areas or groups of areas from code.
Areas can belong to more than one group. In this example, New England states belong to three groups: a state code like "ME", the group "new-england", and possibly the group "really-cold".
Options can be set for area groups. These options only apply when the group is activated using its group name. Notice if you click a New England state, it's red (like the other states) but when you activate it using "new-england" or "really-cold", it's blue.
When you mouse over an area, the first group in the list determines what gets highlighted. In this example, most states are actually defined by more than one area HTML element. The first value of data-state is the state code, ensuring that when you mouse over a state, all the different areas that make it up get highlighted together, even if they aren't connected. New England states and Hawaii are good examples of this (the islands are separate, and the New England states have separate text markers).
Areas are separate logical entities. You will notice if you click one of the group links, then highlight a state in New England, it highlights again (in a different color, per the render_select options). This means that area groups are not *directly* a good way to just act as if the user selected each area in the group, but...
You can use the keys method to get the primary keys for a group, and the get_options method to get the options, and set them manually. Click "New England As Separate States" below. The "Texas with Custom Options" is a simpler case of setting custom rendering options.

Tuesday, March 6, 2012

SharpLinter now supports Sublime Text 2

Okay, okay. Nothing at all has changed with SharpLinter - github, and Sublime Text 2 supports pretty much anything that generates consistent output with the right config :)

Here's how to add a build system for Sublime Text 2 that uses SharpLinter for Javasript files.

Select Tools -> Build System -> New Build System
Enter the following to create a build config that works against javascript files:

{
    "cmd": ["sharplinter", "-v","-ph","best","*.min.js", "$file"],
    "file_regex": "^(.*?)\\(([0-9]+)\\): ()(.*)$",
    "selector": "source.js"
}

Save it, and your're done. The regex should match SharpLinter's default output and let you use F4 and shift+F4 to navigate any errors within your file.

The "cmd" property for Submlime Text 2 should contain the command followed by any options you want, so this could be as simple as

...
"cmd": ["sharplinter", "$file"],
...

to run SharpLinter with default options against the active file. The options above are just the ones I like to use, which provide verbose output, minify to *.min.js on success, and use the best compression method (usually yui).

Go ahead and set yourself up with SublimeOnSaveBuild and you can have it run every time you save automatically.

Thursday, February 16, 2012

Dragging and Dropping onto HTML Image Maps

A user of my jQuery plugin ImageMapster, while working through a problem, asked if it was possible to set up an imagemap that you could drop things onto. I had never done this before, but it seemed like an interesting problem with lots of potential applications, so I set about trying to solve it. A the bottom of this page is a simple example using Marvin the Martian that works, and I'll explain how before we get there.

Using jQueryUI's draggable method, it's easy to create things that can be dragged. Things get a little tricky because of z-indexes though. When you drag something, it needs to have a highest z-index on the page, or it will disappear behind things when you drag it over them. But at the same time, an imagemap will not activate unless it has the highest z-index. A paradox!

Luckily, when using ImageMapster, things are already set up so you can have your dragging element on top, and still have an active imagemap. This is the very nature of how it works: when binding an image, ImageMapster makes a copy of the image to use as a backdrop, and then makes your original image invisible though CSS using opacity:0. So all you need to do is make sure the z-index of your drag element is between those two things. A little understanding of ImageMapster's element scheme is all that's needed:

The topmost layer, the HTML image map itself, isn't really a layer in that you don't need to set the z-index of the elements explicitly. However it does act like a layer in that mouse events over the areas will supercede events over the image itself. But basically, the original image is the topmost element.

Finally, all this stuff is wrapped up in a div. This is useful to know because it means you can use jQuery's siblings method to easily change the z-index of everything that's important except your original image.

The code below is the basic logic you'll need to make something draggable over a live imagemap.

    
    var img = $('#my-image-map'), item = $('#draggable-object');

    img.mapster();
    ...

    // after binding with imageMapster, set the z-index of the image's siblings 
    // to zero (the lowest). The image copy that is created, as well as the canvases 
    // that render the highlight effects, are all siblings of the original image 
    // after binding ImageMapster.
    
    // IMPORTANT: This must be done *after* mapster has finished binding -- see the
    // actual code below for the use of "onConfigured" to do this

    img.siblings().css('z-index',0);

    // set the image itself to something higher. 
    
    img.css('z-index',10);

    // the draggable element should have a z-index inbetween the visible
    // background and effects images (which are all set to 0 now) and the
    // original image+imagemap (now set to 10).

    item.css('z-index',5);

That's almost all you need to do. What happens when you drop the martian somewhere? It now has a z-index that is lower than the original image. Even though the imagemap is not visible, and the martian is, you won't be able to grab the martian again, because once you drop it, it's old z-index takes effect.

To address this, you need a little more sleight of hand. When someone first grabs the draggable element, change its z-index to be a value between the two image layers. When they drop it, though, change it to something that's higher than the original image, so it will be on top and can be picked up again.

In Action!

Here's the functioning example, all the code follows (or just use your browser to look at it). Drag the martian onto Mars to "win". Any other planets will give you a negative response, and nothing happens if you drop him in space. I've also set it up on jsfiddle.net. Enjoy!

Help me get home!

Code:

Internet Explorer notes: This doesn't seem to work in IE < 9 -- in that the imagemap areas are not activated on mouseover while dragging. I'll address this, and present a solution, in my next post.


    var img = $('#planets'), martian=$('#martian');
    
    img.mapster({ 
        mapKey: 'alt',
        fillOpacity: 0.8, 
        fillColor: 'ff0000', 
        stroke: true,
        strokeWeight: 2, 
        strokeColor: '00ff00',
        onConfigured: function() {
            // this will be called only after ImageMapster is done setting up
            img.siblings().css('z-index',0);
            img.css('z-index',10);
        }
    });   

    martian.css('z-index',5)
        .draggable({
            drag: function() {
                $(this).css('z-index', 5)
            }
        });

    img.droppable({
        drop: function(e,ui) {
            // set z-index to the highest, so it can be dragged again
            $(ui.draggable).css('z-index',20);
 
            // returns the mapKey value of the currently highlighted item
            var landing=img.mapster('highlight');

            if (landing=='Mars') {
                alert("Thanks for bringing me home!!!");
                martian.draggable('disable').hide('slow');
            } else if (landing) {
                 alert("I don't live on " +  landing);       
            }
        }
    });