Hacker News

Always use [closed, open) intervals

by smukherjee19on 11/22/2022, 4:02:15 AM with 36 comments

by ravon 11/22/2022, 11:37:36 AM
Half-open intervals are why I try as much as possible to stay away from languages that use 1-based indexing (Lua, Julia, Matlab, R, ...) - 1-based indexing lends itself to closed intervals because an array of N elements has [1,N] as its index range, whereas 0-based indexing lends itself to half-open intervals because an array of N elements has [0,N) as its index range.
-------
However, I know of one case where closed intervals really shine. Consider displaying a zoomable map in tiles. On a given zoom level, each tile has some coordinates (x;y) where x and y are integers denoting the virtual column and row. Suppose that we allow zooming out by a factor 2, so that two-by-two tiles are aggregated into a single tile. Then a natural choice for the coordinates of the zoomed-out tile are (floor(x/2);floor(y/2)), that is, divide by two and round down. Suppose that a dataset has data on tile coordinates [x1,x2]×[y1,y2], meaning that there's only data on tiles (x;y) where x1≤x≤x2 and y1≤y≤y2. These are closed intervals, but stay with me - the reason they are nice in this case is because of how you compute the range of valid tile coordinates when you zoom out: The range becomes [floor(x1/2),floor(x2/2)]×[floor(y1/2),floor(y2/2)] - that is, you simply divide the range endpoints by two and round down. If you try to do this with half-open intervals, then you need some +1/-1 shenanigans, which are normally what I try to avoid by going for half-open intervals.
by ChrisMarshallNYon 11/22/2022, 10:20:29 AM
> Never, ever, ever use [closed, closed] intervals
I’m not really a fan of “never,” or “always” rules, when it comes to programming. I’ve found it’s usually better to have a “make sure to justify deviations from” heuristics.
I usually use [closed..open) ranges (as they are called in Swift), but sometimes, an inclusive range is a lot more appropriate, for expressing an operation (for example, I may express a range as start...end, as opposed to start..<(end + 1)).
by elcaroon 11/22/2022, 11:35:08 AM
I'll repeat the sentiment that "always" and "never" rules usually have their fair share of exception.
And while I agree that [closed, open) intervals are often the best choice... sometimes what you want is [closed, closed], or (open, open), and it's nice to use a language that makes that easy.
For example, Raku makes it easy to do:
```
  [closed, closed] with ($a .. $b)

  [closed, open) with ($a ..^ $b)

  (open,  open) with ($a ^..^ $b)

  (open, closed] with ($a ^.. $b)
```
by dragontameron 11/22/2022, 2:32:23 PM
Ehh, this post misses the important tidbid.
You should keep your code consistent across your organization, so that a large number of programmers knows how your code works. You should have a "default writing style", and the "default writing style" should be used unless you have very, very, very good reasons to avoid it. (And an errant +1 or -1 here and there isn't a good enough reason to switch).
There are four styles of intervals. Lets say you want to represent a loop of 5 iterations numbered [555, 556, 557, 558, 559]. You've got:
* [closed, closed] -- [555, 559]
* [closed, open> -- [555, 560>
* <open, closed] -- <554, 559]
* <open, open> -- <554, 560>
There's not much difference to any of these four. As long as you pick a singular choice, get comfortable with its quirks, and make it consistent across your organization, you get benefits.
The main reason we do [closed, open> is because Dijkstra (father of structured programming back in the 1960s), argued to use [closed, open>, when presented with all these options. (and argued for zero-indexed as well).
The "[closed, closed]" set is one-too-small (559 - 555 == 4), so you need to add +1 to the representation.
The <open, open> set is one-too-large: (560 - 554) == 6. So this too seems prone to off by one error.
[closed, open> and <open, closed] are both the correct size of 5 when you subtract, but both "includes" a number that doesn't exist. In [closed, open>, the latter number isn't part of the array (560 is "one past the end), while in <open, closed>, the first number isn't part of the array.
Make that what you will of it. [closed, open> became a programmer convention because of these reasons. The important bit is to know all the quirks / off by one errors associated with this representation.
by jmullon 11/22/2022, 12:16:07 PM
Half-closed is generally good because it tends to reduce off-by-one errors.
But this article overstates the case. Especially for floating-point, where the distinction between a < b and a <= b is not straightforward. Some times you’re modeling closed intervals, and in that case using closed intervals is the right decision.
The “splitting by time” section should probably just be removed, since it confuses the point and doesn’t add anything. The scenario doesn’t really make sense (if you wanted this you’d store the registration time and get the hour you’d be better off by truncating the value, not using an interval). Also, if you’re going to be doing math on time values, you better know the precision of the values you’re working with (among many other things). Intervals, however closed or open aren’t going to help you there.
Maybe I shouldn’t criticize so much, since I agree with the general point. But this makes the case awkwardly.
by eigenspaceon 11/22/2022, 7:15:27 AM
There are convincing cases one can make in favour of half-open intervals (at least in certain circumstances), but this isn’t it.
This is just a rambling, absolutist mess.
by igammarayson 11/22/2022, 12:40:39 PM
Why would you want a half-open interval when booking an AirBnB or flight? If I search for flights from February 24 to February 24 I don’t expect the empty interval.
by chenglouon 11/22/2022, 12:08:53 PM
Quite a few APIs use a pair of `{start, length}` instead, which in the context of the post's example, is even clearer. Empty interval would be `length == 0`, time interval would be a single array of `starts`, etc. Fewer subtractions (to get length) usually end up nicer too.
by BiteCode_devon 11/22/2022, 9:31:40 AM
It's the default in Python, and it's more helpful that not, so I would say it's a good design decision. But "always" is a dangerous word in engineering, and in this case, definitly not warranted.
Case in point, last week I worked with list ofdates, and I needed the last date to bracket my sliding windows as a time period cleanly.
by knorlon 11/22/2022, 9:22:59 AM
I think this advice could be better summed up into: to minimise off-by-one errors, choose a consistent strategy for describing intervals, and stick with it as much as is sensible.
by branko_don 11/22/2022, 2:45:13 PM
Just for completeness I'll mention that there is another style:
```
    start, count
```
This seems to be popular in .NET ecosystem.
by vglocuson 11/22/2022, 2:27:40 PM
I have recently been on the other side of this argument for specific case.
In our case we (users of our API) are to specify date ranges, representing a list of partitions. So we are not counting nights between dates, but rather a set of daily or hourly buckets.
Here (maybe even only here) I argue that inclusive ranges feel more intuitive.
I find it much more intuitive to represent the 1st 7 days of January as
['2022-01-01', '2022-01-07']
compared to
['2022-01-01', '2022-01-08').
Another very common example is to specify the last 7 days (incl 'today') in which case I find
[today().minusDays(6), today()]
to be a clearer representation than
['today().minusDays(6)', 'today().plusDays(1)')
by andreareinaon 11/22/2022, 4:57:44 AM
*half-open intervals. There have been times I needed an (open, closed] interval. But there have also been times when I wanted a fully closed interval because I was setting an arbitrary limit and it's easier to tell a user "100 is the maximum" versus "it must be below 100”, so what value should be put to max out out, 99? 99.99? etc
To add some nuance I'd say that if you're dividing a larger interval into smaller subintervals then a half-open one is probably what you want.
by zasdffaaon 11/22/2022, 11:14:09 AM
I know Dijkstra's paper and it's short, good and should be read but this article is wrong in saying always. It feels like a newbie programmer came across a good thing then lost all proportion; use the right tool for the right job, as ever.
by the_crameron 11/22/2022, 8:20:36 AM
This depends on use case. If you are doing a frontend layer on top of closed-open, then the frontend will have to handle the points the article is rambling about.
Users are used to selecting a daterange in closed closed format for example.
by ur-whaleon 11/22/2022, 2:22:14 PM
Wholeheartedly agree that sticking - where possible - to [closed, open) is a good idea. It has helped me tremendously when implementing, e.g. computational geometry algorithm. Robust triangle rasterization comes to mind.
Another interesting point: in the weird corner of the world where I grew up, half-open intervals were always denoted : [low_bound, hi_bound[
I am of course completely biased, but I've always found this notation much more elegant and intuitively obvious than the [low_bound, hi_bound) that seems to be the prevalent norm in the anglo world.
Using '[' after the upper bound clearly shows that we're open at the top whereas the ')' is fairly arbitrary.
And while I'm on the topic of weird culture-induced quasi-arbitrary biases: I had a math teacher that would bark (and I mean BARK!) at us if we ever used '>' in inequalities.
The justification was that with this constraint, all inequalities ended up written and laid out with its two members respecting the standard "left-to-right" drawing of the real line, which made it much easier to picture what was going on geometrically.
It also enforced consistency throughout a long demonstration - one less thing added to the cognitive load.
He was made fun of a lot by the student body, of course, but later in life, as a programmer, I have found myself sticking to the habit and I always force myself to mostly use '<', very rarely '<=, and almost never '>' and ">".
I find this makes code much more readable, just like back in the days of my old teacher with math inequalities., and pretty much for the exact same reasons.
Of course, doing that does not help at all when reading other folks code, those uncivilized heretical users of the 'greater than' form.
by chkason 11/22/2022, 1:09:53 PM
Python uses "closed open" intervals with `range(0, n)`, the reverse is then `range(n - 1, -1, -1)`, which is then highly unintuitive. This in connection with 0-based array indexing makes certain algorithms then very cumbersome. For example Knuth-Shuffle. In Python this is:
```
    from random import randrange
    x = [10, 20, 30, 40, 50 ]
    for i in range(len(x) - 1, 0, -1):
        r = randrange(i + 1)
        x[i], x[r] = x[r], x[i]
    print(x)
```
With 1-based indexing and inclusive ranges it would be much more understandable:
```
    a[] = [ 10 20 30 40 50 ]
    for i = len a[] downto 2
        r = random i
        swap a[i] a[r]
    end
    print a[]
```
by sshineon 11/22/2022, 10:03:04 AM
This is one of those "well, of course!" pieces.
But I've bookmarked it, in case I run into someone who thinks they disagree, in which case I can offload the explanation.
I had a similar incident with colleagues who had discovered the "Default" trait and starting adding defaults to everything, including things that didn't have good defaults, and things where they didn't mean default but actually something quite specific such as "empty". The canonical "don't do that!" blog post didn't exist, so I had to create one.
by enqkon 11/22/2022, 11:21:15 AM
I think this has to do with the nature of the metric underneath. Closed-open intervals are the way to go for integers. However they don’t seen to be a good fit for sampling from continuuous space
by personalitysonon 11/22/2022, 2:48:05 PM
Closed/open makes sense for continuous measures, for integers closed/closed is more readable
by bheadmasteron 11/22/2022, 10:39:21 AM
I prefer [a, b) intervals because they encode two pieces of important information directly:
1. The first element (a)
2. The length (b-a)
Which are what we most often need.
by zx8080on 11/22/2022, 10:57:05 AM
Am I the only one who noticed a bit unusual (or not?) ligature for 'st'?
by Gehinnnon 11/22/2022, 11:44:14 AM
Don't call you integer bound vars `start` and `end` please. Either use `start` and `endExclusive` or start and length - this greatly reduces confusion.
In my experience, half opened integer intervals lead to fewer `- 1` in the code.
by jwilkon 11/22/2022, 6:19:15 PM
> You could try [T, T-1], but that's a bit clunky and it won't work if T is a decimal number.
Huh? What do they mean by "decimal number"?
by hans_castorpon 11/22/2022, 3:19:29 PM
FWIW, Postgres converts all ranges (=interval in this context) for discrete data types to half open intervals even when a closed one was requested.
So the daterange '[2022-01-01, 2022-01-07]' will result in [2022-01-01,2022-01-08) and the integer range '[1,7]' will result in '[1,8)'
So it seems the Postgres devs agree with the author.
Edit: typo fixed for integer range
by return_to_monkeon 11/22/2022, 3:35:51 PM
I am surprised I can't see this here. Wasn't anyone else taught [closed; closed] and ]open; open[ notation in school?
by nvartolomeion 11/22/2022, 7:54:07 PM
E. W. Dijkstra constructed a similar argument to argue that numbering should start at 0 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...
by green-in-goldon 11/22/2022, 5:40:27 PM
The other nice property of [closed, open) intervals is that they concatenate perfectly: [a, b) ++ [b, c) == [a, c)
by alexchantavyon 11/22/2022, 7:20:47 PM
Dumb question but why is ‘[‘ called ‘closed’ and ‘)’ called ‘open’? I somehow would have thought the reverse.
by nequoon 11/22/2022, 5:01:43 PM
As it goes with maxims like this, it depends on the problem domain. In probability theory and statistics, a cumulative distribution function is defined as F(x) := Pr(X <= x), not Pr(X < x).
Like others are saying, it is consistency within the code base that probably matters the most.
by runarbergon 11/22/2022, 5:32:22 PM
The notation here really bothered me. The author defines the [closed, open) interval [a, b) as the list of all numbers number x that fulfill a ≤ x < b. Good so far, but when they talk about the empty interval [a, a) we get into problem because a ≤ x < a is a nonsensical statement. a cannot be equal to and strictly less then it self.
I think this is a problem when borrowing math concepts to programming. What the author is really talking about here is slicing, not intervals, and the slicing behavior is hopefully well defined on the construct you are working with, most of the time in a manner that makes sense to each construct, or in a consistent manner to other related constructs in the language.
If the author would stick with programming concepts, I don’t think this is a rule we should abide to, rather, a guideline which can be employed. And I think most programmers value consistency, so this really isn’t that much of an issue.
by bob1029on 11/23/2022, 10:22:23 AM
I was just watching YouTube series on arithmetic coding and I was wondering why the intervals were specified this way. Makes a lot more sense now. The "b - a" use case is quite prevalent there.
by dananson 11/22/2022, 4:50:40 PM
What's with the funny connected "st" in the article? I've never seen that before.
by phoe-krkon 11/22/2022, 1:04:34 PM
How does [2, 2) make sense?
by eurasiantigeron 11/22/2022, 6:57:10 AM
This is not universal advice. If it’s nonsensical for your app to have start-end events of zero length, nothing in the article applies, and using closed intervals does make things a bit cleaner.