Comparing Haskell and Node concurrency performance

  • > Which leads to the 'callback hell' that we all know and hate. Part of the bill-of-goods we accept when using Node is that in exchange better time and space characteristics, we lose the thread as an abstraction.

    Perhaps my head has been stuck in javascript/node land for too long but I think accusations about javascript producing callback hell now seem a bit disingenuous even for relative novices to the language.

    It's 2016 and there are many well documented and widely adopted solutions arising from external libraries and developments in ECMAScript. Thanks to transpilers like Babel/Typescript we can even shoehorn these new ECMAScript features into older browsers.

  • It feels like this generation has never had to use Windows 3.1 Or maintain an event loop application long term.

    There are good reasons we went to thread based models - developer productivity and safety. Event loops are fine for toy demo's, or very carefully managed products (trading systems, NGINX) but not for use as general purpose hammers.

    Every single bit of extra friction and cognitive overhead costs you dearly a few years down the line. We scrambled away from this stuff as soon as we could, and there's no good reason to go back.

  • I don't think this comparison is really fair at all... it seems to be cherry picked to point out what is already known to be a bad use case for node. In terms of the multiple async calls, Promises and async functions takes care of that, not to be confused with the reference to the `async` library.

    First, using clustering and memoization would improve the throughput a lot. I did something similar when adapting a JS based script library to be used in node, because I knew it would lock the main loop otherwise. Beyond this, cpu intensive work should be avoided in your service loop regardless. It's best distributed to an RPC/Worker pool.

    In terms of scale, node scales as well or better than a lot of frameworks, it's only that you will usually want to use similar techniques locally as well as remote.

    Another poor example is when you need millions of references in a single thread, Node will die spectacularly. That doesn't mean it shouldn't be used for many use cases, it only means that it's bad at some of them.

    I find that node is great as an intermediate/translation layer... your UI talks directly to node, tightly coupled.. then node can translate against backend databases or other services as a gatekeeper for your front end. It allows you to make the data the shape that is most convenient, with the least amount of disconnect of thought and approach.

    It's also pretty great for certain types of orchestration control and even in the proof of concept stages of applications. Doing a first version of almost anything I've tried in Node is usually much faster than alternative platforms. And often performs well enough to stick with it. Developer productivity is more important than absolute scale at the beginning, and if you have a plan to scale horizontally, you can do that for a while before you need to break off other optimizations.

  • This article is misleading. Here's the real problem: https://github.com/AndrewRademacher/fpco-article-examples/bl...

    Anyone who understands JavaScript can see that the recursion invoked in the slow route is not asynchronous (each recursive invocation keeps piling onto the call stack without ever releasing it)! You'd have to use process.nextTick (or setTimeout) if you wanted to recurse asynchronously without spawning a new process...

    For these kinds of unusual, heavy computations, though, you'd be better off using the child_process module to spawn a new process and do the recursion inside that process so that it doesn't block the main event loop.

    This has nothing to do with starvation. Node.js just has a completely different approach to this kind of problem.

  • On one thread, I am defending FP from critics who are unaware from the state of the art.

    EDIT: Just to make it clear early on, I agree with the article's conclusion that Nodejs is not as good at compute heavy workloads as Haskell. I simply object to any use of "the nested callback problem" as valid in 2016. It's an issue exclusively for legacy code and developers who take pride in writing outdated code.

    It seems only fair, then that I also should defend Javascript from people obviously unaware of the state of the art in pseudo-imperative programming. And by state of the art, I mean "has been around in some languages for 3+ years."

    The example:

        request('http://example.com/random-number', function(error, response1, body) {
          request('http://example.com/random-number', function(error, response2, body) {
            request('http://example.com/random-number', function(error, response3, body) {
                ...
            });
          });
        });
    
    But modern Javascript (before you start, yes, it runs on every browser with preprocessing, which is normal for this ecosystem) would make it look more like this:

        // rp is a request promise, multiple options for creating them
        async function make3StaticRequests() {
            try {
                var res1 = await rp('http://example.com/random-number')
                var res2 = await rp('http://example.com/random-number')
                var res3 = await rp('http://example.com/random-number') 
                // ...
            }
            catch(error) {
                // ... 
            }   
        }
    
        // And of course the promise library allows for many things
        // you'd like with applicative functors, like binding groups
        // of operations together and evaluating them all.
    
        function randomNumberPromise() {
            rp('http://example.com/random-number')
        }
    
        async function make3StaticRequests() {
            var [res1, res2, res3] = Promise.all([randomNumberPromise(), 
                                                  randomNumberPromise(),
                                                  randomNumberPromise()])
            // ...
        }
    
    I don't really understand why people feel comfortable writing up comparison articles without doing sufficient research into what they're comparing things to.

    That said, the articles point about large compute workloads starving other operations is very much true and a good example of what the weakness of V8 as a server-side programming environment brings.

  • The first part about comparing the callbacks in Javascript vs do notation in Haskell is super misleading. The do notation desguars to code that looks essentially identical to the Javascript version. Sugar is not a negligible consideration in system choice, but that kinda thing just bugs me.

  • A lot of the readability problems with NodeJS the author mentions at the beginning of the article have been solved in numerous ways. Promise based work flows for instance allow one to define a step by step flow very similar to the counter examples provided. Packages like babel can expose things like yield and async/await which get us even closer. I'm not saying its ideal but it certainly mitigates the worst parts of the 'callback hell' problem pointed out.

  • "Many web servers, for example achieve concurrency by creating a new thread for every connection. In most platforms, this comes at a substantial cost. The default stack size in Java 512KB, which means that if you have 1000 concurrent connections, your program will consume half a gigabyte of memory just for stack space. "

    As a WebLogic developer we fixed this in the late 90s but the Volano chat benchmark was still run for no apparent reason. Somehow everyone else didn't get the message until Netty was released and people started using it.

    Obviously what you want is multi-threaded execution with asynchronous I/O. Using node on multi-core systems just doesn't make a lot of sense as you end up having to duplicate your entire program on each core to get the full performance of the machine. Not unlike 512k/thread but much worse — especially if you cache anything locally in the process like template compilation, etc.

  • > Node ... popularized the event-loop

    Tcl was doing event loops quite successfully in the late 90ies and was fairly popular, back in the day.

    These days, I'd use Erlang (Elixir).

    This is kind of ranty, but also makes me laugh: https://www.youtube.com/watch?v=bzkRVzciAZg

  • node is fast when the heaviest work is delegated to libuv or native modules (that are performant that is). If you require to do heavy work on v8, it slows down significantly.

    What would I call "heavy work"? e.g: compression, serialization, encryption, image processing... tasks that are bound by CPU and not only I/O. Usually you want to delegate that to a native module and not do that yourself in JavaScript. If you absolutely have to do it in JavaScript, then you need to make sure the task is not blocking the event loop. In order to play nicer with the event loop you inject something like setImmediate or process.nextTick after certain amount of time or iterations... otherwise you will starve other tasks in the loop, notably, I/O.

    node is also not a really good idea if you need a lot of interprocess communication.

    It is a very viable alternative, though.

  • Here we go, a node version that uses a least-latency load balancer and doesn't exhibit the problem:

    https://github.com/spion/fpco-article-examples

    Its an interesting benchmark, but it needs more work to give a more accurate picture. It would be nice if it:

      * used wrk to measure requests and latency percentiles
      * percentage of "slow" requests was tweakable
      * number of workers per core was tweakable
    
    Then we could generate a nice chart that shows latency percentiles as fn of % of slow requests and workers per core, and compare with Haskell.

  • Another nitpick - most Java based web/app servers have used thread pooling (or similar approaches) for at least 10 years. Seems overly simplified to say these environments always "spawn a new thread".

  • In regards of concurrency, I's suggest to have a look at Elixir. It's growing like weed and is offering the best programming experience you can find. Not kidding. Just try it.

  • I guess if we're nitpicking, then here's another one:

    > Looking near the top of the output, we see that Haskell's run-time system was able to create 100,000 threads while only using 165 megabytes of memory. We are roughly consuming 1.65 kilobytes per thread.

    Those are not the same kind of threads that the author is talking about in the beginning of the article. Those a green threads and as such are multiplexed to much smaller amount of real system threads to do work in parallel. What that means is that they, for example, can't all make a system call at the same time. Go has the same issue.

  • lamda, anonymous functions are so popular in JS that it migt be a suprise that you can actually name your functions. Heck you can even use them like any object, pass them along, store them in lists, return a funtion from a fuction etc.

  • > However, Haskell doesn't impose an additional burden on the design of your software to accomplish that goal.

    giggles uncontrollably

    I like Haskell, but saying that it doesn't impose a design burden is incredibly misleading.

  • is the author really just comparing single-core to multi-core? I didn't read any specs on the machine that ran the benchmarks, but assuming its multi-core, are the Haskell tests using all the cores, while node is only using 1?

    node is probably not the best choice for truly CPU bound operations, but you can sometimes get by using the native cluster module to spread work over multiple cores.

  • Node has concurrency? :P

  • >> The difference here is stark because in Node.JS's execution model, the moment it receives a request on the slow route, it must fully complete the computation for that route before it can begin on a new request.

    With this statement, the author acknowledges that the Node.js code in the slow route was not asynchronous. The test is therefore invalid; it's comparing apples and oranges.

    Node.js is more than capable of handling different requests asynchronously (regardless of whether they are fast or slow); if you have any kind of blocking or waiting around happening; then you're doing it wrong.

    I'm so tired of all the anti-Node.js propaganda; it's hurting people. If I walk into one more company where some zombie tells me that they're migrating away from Node.js because "the Node.js event loop starves the CPU", I'm going to have a stroke.

    In reality all Node.js 'starvation' problems can be solved with the 'cluster' module or the 'child_process' module.

    Since we've been talking a lot about 'Fake news' on Facebook recently. Maybe we should start talking about how fake news is affecting Hacker News. This anti-Node.js strain is particularly virulent.

  • Comparing apples to oranges

  • So someone compared performance and programming model, and lo and behold, found Haskell to be superior.

    What might be more interesting:

    - what is the salary difference between a Node dev vs. a Haskeller?

    - is there a productivity difference? does the salary over- or under-compensate?

    - is correctness a core business concern?

    - if I need to hire 10 devs, can I do that?

  • I'm getting sick of seeing this. Node wasn't intended for compute heavy workloads...ever...at any point...for any reason. This is like the 50th time someone has decided it was appropriate to point it out by generating Fibonacci numbers (among other things).

    Go watch the node.js presentation Ryan Dahl gave at JsConf 2009, he addresses this during that speech.

    His opinion on the "right way" to do concurrency is a little polarizing, but, quote: "the right way to do concurrency is to use a single thread and have an event loop. this requires that what you 'do' outside of IO waits not take very long".