Building a Better Stock Ticker

by jerod.venema 11. March 2010 23:15

(Cross-posted to Comet Daily)

One of the most common use-cases for Comet is the ubiquitous stock ticker demo. There are several reasons why this particular demo is so popular: it’s easy to understand why people want stock quotes in real-time, it’s obvious that changes are occurring (even without any user interaction), and at least for a simple demo, it’s fairly simple to implement.

Although building a demo is quite trivial, building a stock ticker that has useful functionality quickly raises a number of questions, many of which have been directed to me at Frozen Mountain. One question in particular has been directed to me multiple times, and is more of a design question: what’s the best way to implement a stock ticker using Comet?

Note: for the purposes of this article, I’m going to assume we’re working with a Comet server built on the Bayeux protocol, such as WebSync.

The Problem

The “best way to implement a stock ticker” is rather ambiguous, so let’s take a moment and first define our scenario clearly so we know what the problem really entails.

Let’s work with a fairly common scenario. Assume each user has a “portfolio” of stocks, each stock containing between 10 and 50 points of data (such as min price, max price, current value, etc). Each user also had the ability to show or hide certain data points. Since we’re talking about 10-50 data points per stock, and we obviously want to avoid sending 5x the amount of necessary data, we want to send each user only the exact set of information they require.

At First Glance

If you just take this problem at face value without digging into the details, one approach jumps out pretty quickly. The data set is unique per user, so just send each user their own set of data! This approach would look something like this:

  • Create a unique channel for the user
  • Build a custom “portfolio” object for each user
  • Change the serialization of that object to only include the properties selected by the user
  • For every user, when new data arrives, check to see if the new data matches the list of properties the user cares about, and if so, send it to them

While this approach would (somewhat) work, it has a couple fairly notable disadvantages:

  • State must be maintained for each user on the server, increasing memory usage substantially
  • A lot of additional checking needs to be done to determine if the data should be sent
  • The stock data gets tied directly to a specific user

Our first disadvantage, maintaining state, can be resolved with more hardware and persistent storage. Obviously, if we can eliminate the need to maintain state for thousands of users, we reduce the application complexity, remove the need for persistent storage (in case of a soft application reset), and we’re going to need a lot less hardware. Pretty straightforward.

Our second disadvantage, additional data checking, is less important, but still a hassle. When the data is sent, it has to be checked to make sure it’s in the list of data this user cares about. If the sum total of the data is nothing, then the request doesn’t actually need to be sent at all. Not impossible by any stretch of the imagination, but annoying. And software should never be annoying.

Our third disadvantage with this approach, the tight coupling of data to individual users, is a little more interesting. Typically, to get data streaming, the data is published from a separate process - a windows service, linux/unix daemon, whatever. This process contains no information about the web users, and is just managing the data. That means we need to make an adjustment to either the process (to make it aware of the users by using a common storage mechanism, such as a database) or the web application (to make it pre-process the data, such as in WebSync’s “BeforePublish” event, and push the data to the individual users). Neither of these options sound like ideal, and both would still require the web application to be stateful.

So with all these issues, where does that leave us? If we analyze the problems with our “first glance” approach, it becomes obvious that the what we’re looking for is a solution that is both 1) stateless and 2) loosely coupled between the data and users.

Now that we’ve agreed on that…let’s locate that solution!

Channel Mania

Let’s re-state our problem a little, now that we’ve gotten a little more in-depth. Basically, we need to have a way for any user to be able to access live changes for any, all, or none of the data points. If we think about this in terms of the Bayeux protocol, what we’re really saying is that each data point needs its own channel.

That means no more per-user channels, which in turn means our application can be stateless, which also implies that we can keep our publishing of data separate from our consuming of the data. Ok, interesting. Of course, this solution immediately raises some questions as well.

First, isn’t creating a channel for every data point a lot of extra overhead? Well, there are two places where we really have to care about overhead: the server process that’s managing all the messages and the size of the actual message itself. The message size can be easily managed by keeping channel length small, so that’s not a big problem. The server processing of the messages is actually a very interesting discussion…

While I can’t speak for certain about implementations of the Bayeux protocol other than WebSync, I believe our implementation may be similar to many others in this particular aspect. WebSync makes using multiple channels a very efficient method of message distribution. In fact, WebSync only actually creates a channel when at least one user subscribes to it, and destroys it once the last subscription to that channel is gone. What that means is that publishing a message to 10,000 channels when users are only subscribed to 5 channels results in 9,995 messages immediately getting discarded, which is the fastest possible operation that can be performed. So we’re ok on server overhead too.

Final Result

Now we’ve answered our two main objections to this solution: we can keep message sizes small, and having lots and lots of channels is actually a very good thing for efficient message distribution. We’ve also noted that this solution allows us to reach our goals of statelessness and clean separation of tasks between the publishing and consuming of data. So, as a result, we end up with all kinds of channels, each of which might look something like:

    "/ticker/GOOG/max"
    "/ticker/GOOG/min"
    "/ticker/GOOG/current"

So the only question that remains is how to actually subscribe to all these channels without creating a ton of additional requests? Well guess what - the Bayeux protocol accounts for this very scenario. When you make a subscribe request, you can subscribe to multiple channels in a single shot:

  client.subscribe({
      channels: [
          "/ticker/GOOG/max",
          "/ticker/GOOG/min",
          "/ticker/GOOG/avg"
      ],
      onReceive: function(args){
          // handle the incoming data...args.channel will describe the 
          // channel, so if we need to we can update specific table 
          // columns, rows, etc.
      }
  });

Another interesting aspect of this approach is that it is highly extensible. We’re now able to work with 2D data (stocks => rows, data points => columns) but we can extend our approach out to 3D data with another path portion on the channel such as "/ticker/GOOG/{date}/avg", and start displaying data in a cube instead of a table!

Man, now I want to make a 3D stock ticker…

Tags: ,

Regarding WebSync On-Demand and SSL

by jerod.venema 20. February 2010 01:08

We've had a number of questions about SSL for WebSync On-Demand, so I thought I'd write up a quick post about what's actually going on behind the scenes.

First, let me clarify that WebSync Server works great over SSL. All you have to do is load the script tag over SSL, and everything else will be taken care of automagically; WebSync detects the secure connection, and all requests will immediately switch to SSL, no other changes required.

That said, with WebSync On-Demand, the issue is not so straightforward, and centers around load balancing algorithms and the need for sticky sessions while maintaining our "proxying" capability. To understand the problem, you have to know that typically, load balancers allow sticky sessions via several methods, two common methods being IP-based balancing and query parameter-based balancing.

IP-based load balancing works great with WebSync Server. However, with WebSync On-Demand, we have to account for proxying, which allows developers to add custom processing to the requests; this is what allows you to, for example, add authentication and authorization to your WebSync On-Demand requests. However, that also means that a given client's session isn't tied to a specific IP address, so that load balancing scenario goes out the window.

Enter query-based load balancing. This works very well, as the query is unique per-user, and follows along nicely with the request even if it has been proxied. The problem that comes up, however, is that if you introduce SSL, the query parameter is encrypted when it reaches the load balancer, so it can't use the query parameter to load balance the request. Whoops!

As a result, we have to place SSL decryption in *front* of our load balancer, so that when the load balancer gets the request, it's able to use the query parameter and maintain our sticky sessions. This is a bit more complicated, and we're still working though the details, but we are planning at some point in the near future to add SSL to the list of supported features for WebSync On-Demand!

Tags: ,

websync

Comet Best Practices: Subscribing and Initial State Load

by anton.venema 5. February 2010 04:29

Many complications arise when building a real-time web application that employs "comet" or "reverse Ajax" to push data from the server. This is mostly due to the fact that real-time web applications require multi-threading, either directly or indirectly, and many of the same issues that arise in threading scenarios directly parallel the issues present in comet-based websites.

A frequently found issue has to do with the timing of subscribing to a comet channel versus the initial state load of the web page.

For example, consider a simple chat application. When the page loads up, the user expects to see some recent chat messages, as well as start receiving any newly published messages. The initial state load is a distinct operation from the subscription to receive new messages. As such, one must be executed before the other. There are only two possible options:

  1. Load state, then subscribe.
  2. Subscribe, then load state.

The first option results in the possibility of missing messages. It's feasible (and likely under load) for a user to publish a message after the state has loaded, but before the subscription has completed. This message would not be included in the state load, and it would not be delivered to the user over comet since the subscription was not present at the time of delivery.

The second option results in the possibility of duplicated messages. It's feasible (and again likely under load) for a user to publish a message after the subscription has completed, but before the state has loaded. This message would be delivered to the user over comet since the subscription was present at the time of delivery, and it would be included in the state load.

It has been suggested that the initial state load should be delivered in the response to the subscription request. This is absolutely possible with WebSync, but it does not remove the problem - it just narrows the gap in which the problem can occur. The scenario where messages go missing or are duplicated becomes less likely, but we find ourselves in a dangerous situation when we try to hide bugs instead of fix them. Testing becomes more difficult and we run the risk of unleashing a bug that is more difficult to track down.

So what do we do?

In general, the best thing to do is not lose messages. (As always, there are exceptions, but this is the general case we have found.) So, subscribe first, and load state data afterwards, usually through an Ajax request.

  • Page loads and a loading indicator is shown.
  • WebSync client is created and subscribes to the channel.
  • In the subscribe onSuccess handler, POST a request for initial state with Ajax.
  • In the Ajax onSuccess handler, populate the UI and remove the loading indicator.

To get rid of the duplicate messages, set a loaded flag to true once the UI has been populated. If the WebSync client receives a message and the loaded flag is false, push it onto a list. Once the loaded flag has been set to true, read through the list and process new messages, discarding duplicates. Duplicates can be identified by a server-generated ID that is attached to every message publication.

Tags:

programming | websync

Automatic Foreign Objects in SubSonic3 SimpleRepository

by anton.venema 30. December 2009 21:38

SubSonic3's SimpleRepository is a wonder to behold. It's simple, clean, effective, and represents a huge step forward in abstracting away the DAL and allowing developers to focus on what matters.

I've been using it in development for a while now, and I have found that it comes up short when dealing with foreign keys. For one thing, foreign relationships are not persisted to the database, so data integrity is not 100% guaranteed for less-than-meticulous programmers.

What is more crucial, however, and the subject of this post, is the difficulty in loading up foreign objects from their respective keys.

Consider a simple case of a Car and a Wheel:

public class Car
{
    public int Id { get; set; }
}
public class Wheel
{
    public int Id { get; set; }
    public int CarId { get; set; }
}

In code, whenever you have an instance of a Wheel and want to reference the Car it belongs to, you have to do something like this:

// wheel is an instance of Wheel
SimpleRepository repo = new SimpleRepository("connection-string");
Car c = repo.Single<Car>(wheel.CarId);

Pretty simple! We can abstract away the need to always supply a connection string to the SimpleRepository by setting up a static method.

public class Repository
{
    public static SimpleRepository GetRepository()
    {
        return new SimpleRepository("connection-string");
    }
}

We can even abstract out the primary key for our models, knowing all our models will have a unique integer primary key.

public class Record
{
    public int Id { get; set; }
}
public class Car : Record
{
}
public class Wheel : Record
{
    public int CarId { get; set; }
}

That aside, let's get back to foreign object loading, and look at the reverse process. Given a Car, retrieve its Wheels.

// car is an instance of Car
SimpleRepository repo = Repository.GetRepository();
List<Wheel> wheels = repo.Find<Wheel>(w => w.CarId == car.Id).ToList();

Again, fairly simple.

So, what's the problem?

Well, there are two problems actually. The first is that there are performance issues. Consider a common case where the Car instance is passed around to a few methods. If any of those methods (or methods that they call, etc.) have to access the Wheels, they will have to make separate round-trips to the database. Ideally, once the Wheels have been loaded once, they will be cached with the Car instance. The second problem is that of code duplication. If the model changes, the expressions that describe the foreign key relationships will have to be updated everywhere.

So how do we fix it?

Ideally, we would use properties on the models to reflect the foreign key relationships. Something like:

public class Car : Record
{
    public List<Wheel> Wheels { get; }
}
public class Wheel : Record
{
    public int CarId { get; set; }
    public Car Car { get; set; }
}

So that's what we will do :) By abstracting away the details of the foreign key lookups and caching the foreign key objects, we can write the process once and reuse it in every single one of our models. This is what the real-world implementation will look like:

public class Car : Record
{
    public List<Wheel> Wheels
    {
        get { return GetForeignList<Wheel>(w => w.CarId == Id); }
    }
}
public class Wheel : Record
{
    public int CarId { get; set; }
    public Car Car
    {
        get { return GetForeign<Car>(CarId); }
        set { CarId = SetForeign(value); }
    }
}

The GetForeign<T>, SetForeign<T>, and GetForeignList<T> methods are implemented as protected methods in the Record base class we built earlier. All the complexity is wrapped into these methods, including an in-memory cache, so the models can just be... models.

public class Record
{
    public int Id { get; set; }
    
    private Hashtable ForeignCache = new Hashtable();

    protected T GetForeign<T>(int key) where T : Record, new()>
    {
        string relation = typeof(T).Name;
        T foreign = ForeignCache[relation] as T;
        if (foreign == null || foreign.Id != key)
        {
            foreign = Repository.Get<T>(key);
            ForeignCache[relation] = foreign;
        }
        return foreign;
    }

    protected int SetForeign<T>(T foreign) where T : Record, new()
    {
        string relation = typeof(T).Name;
        ForeignCache[relation] = foreign;
        return (foreign == null) ? 0 : foreign.Id;
    }

    protected List<T> GetForeignList<T>(Expression<Func<T, bool>> expression) where T : Record, new()
    {
        return GetForeignList<T>(expression, false);
    }

    protected List<T> GetForeignList<T>(Expression<Func<T, bool>> expression, bool refresh) where T : Record, new()
    {
        string relation = "l-" + typeof(T).Name;
        List<T> foreign = ForeignCache[relation] as List<T>;
        if (foreign == null || refresh)
        {
            foreign = Repository.GetRepository().Find<T>(expression).ToList();
            ForeignCache[relation] = foreign;
        }
        return foreign;
    }
}

Tags:

PHP's json_encode has weird forward slash escaping

by jerod.venema 12. December 2009 01:46

OK, we've just uploaded a new version of our PHP publisher and proxy.

We are using the built in json_encode function of PHP (new in version 5.2.1+) to create our JSON strings, and it turns out that there's a bug, or at least a pseudo-bug, in how the json_encode function is implemented. For some reason, they escape forward slashes in the output. This is rather odd, and was the underlying issue that could eventually result in an "invalid json" complaint in our javascript client.

It appears that this issue may be resolved in PHP 5.3+, so those of you running the latest version may not have seen any issues. In either case, the updated scripts should work.

Anyhow, for those of you using WebSync with PHP, grab a new copy of the PHP libraries from our downloads section and you'll be good to go.

Special thanks to Andrew Betts for finding and reporting this one to us.

Tags: ,

Comet Daily

by jerod.venema 8. December 2009 01:12

Looks like the guys over at CometDaily picked up on our ajaxian article and had a chance to check us out. Those are some bright fellas over there, so be sure to check them out, especially if you're working with the Bayeux protocol and comet.

Tags:

Ajaxian Announcement

by jerod.venema 8. December 2009 01:05

Well, I forgot to announce this here back when it happened, but WebSync was featured on Ajaxian a while back. My article for them was hopefully an interesting read, and might give some insight into what goes on here at Frozen Mountain, and how we're working to build the best comet server for those of you out there working with ASP.NET and the Microsoft platform.

Tags:

VisualWebGUI AddOn for WebSync Announced

by jerod.venema 1. December 2009 23:12

Daniel, one of the developers over at arcalife (www.arcalife.com), has been using WebSync OnDemand for awhile now, and now he and his company have just released a very cool addition for WebSync, the WebSyncControl for the VisualWebGUI framework. This control allows users of the VisualWebGUI framework to easily add comet capabilities to any project, with a very simple control. Kudos guys, great work!

Tags: ,

3rd Party Controls

by jerod.venema 1. December 2009 23:06

Daniel, one of the devlopers over at arcalife (www.arcalife.com) have been using WebSync OnDemand for awhile now, and have just released a very cool addition for WebSync, the WebSyncControl for the VisualWebGUI framework. This control allows users of the VisualWebGUI framework to easily add comet capabilities to any project, with a very simple control. Kudos guys, great work!

Tags: ,

Building a Managed Comet Server Part 1 - Scalability in ASP.NET

by anton.venema 18. November 2009 23:34

A common misconception about IIS and the .NET framework is that it is unable to scale well to tens of thousands of simultaneous requests. WebSync has proven that this is not the case, so why does that misconception still exist?

Not just a page anymore

In classic ASP, there was no way to make an incoming request go "idle" and just wait for something else to push it along. Instead, requests blocked on the thread pool, eventually saturating it so no further requests could be processed.

ASP.NET takes the same approach with standard requests, but provides another option as well - the IHttpAsyncHandler. This handler is designed to allow multiple requests with long-running executions to take minimal CPU usage and not saturate the thread pool, a perfect solution for building a comet server. It's no surprise, then, that this is the foundation on which WebSync is built.

Beyond the basics

Of course, creating the async handler is pointless if each request just blocks on a thread from the CLR thread pool waiting for events. Even if each request blocked for just a few seconds, you'd quickly exhaust the thread pool under load. The requests have to be offloaded for batch analysis in a separate bounded thread pool. The bounded thread pool can then make use of shared data structures to balance the request load while waiting for events to trigger a response.

To make this work, the CLR thread pool has to have a large number of threads available for the handling of incoming and outgoing requests, and the secondary bounded thread pool has to have just a few threads for the handling of everything in between. The trick is to get incoming requests off the CLR thread pool quickly for long-running processes so the secondary thread pool can bear the weight and keep the CLR thread pool light on it's feet.

Tags: , , ,

programming | websync