> Never, ever, ever use [closed, closed] intervals
Iâm not really a fan of ânever,â or âalwaysâ rules, when it comes to programming. Iâve found itâs usually better to have a âmake sure to justify deviations fromâ heuristics.
I usually use [closed..open) ranges (as they are called in Swift), but sometimes, an inclusive range is a lot more appropriate, for expressing an operation (for example, I may express a range as start...end, as opposed to start..<(end + 1)).
I'll repeat the sentiment that "always" and "never" rules usually have their fair share of exception.
And while I agree that [closed, open) intervals are often the best choice... sometimes what you want is [closed, closed], or (open, open), and it's nice to use a language that makes that easy.
For example, Raku makes it easy to do:
[closed, closed] with ($a .. $b)
[closed, open) with ($a ..^ $b)
(open, open) with ($a ^..^ $b)
(open, closed] with ($a ^.. $b)
Ehh, this post misses the important tidbid.
You should keep your code consistent across your organization, so that a large number of programmers knows how your code works. You should have a "default writing style", and the "default writing style" should be used unless you have very, very, very good reasons to avoid it. (And an errant +1 or -1 here and there isn't a good enough reason to switch).
There are four styles of intervals. Lets say you want to represent a loop of 5 iterations numbered [555, 556, 557, 558, 559]. You've got:
* [closed, closed] -- [555, 559]
* [closed, open> -- [555, 560>
* <open, closed] -- <554, 559]
* <open, open> -- <554, 560>
There's not much difference to any of these four. As long as you pick a singular choice, get comfortable with its quirks, and make it consistent across your organization, you get benefits.
The main reason we do [closed, open> is because Dijkstra (father of structured programming back in the 1960s), argued to use [closed, open>, when presented with all these options. (and argued for zero-indexed as well).
The "[closed, closed]" set is one-too-small (559 - 555 == 4), so you need to add +1 to the representation.
The <open, open> set is one-too-large: (560 - 554) == 6. So this too seems prone to off by one error.
[closed, open> and <open, closed] are both the correct size of 5 when you subtract, but both "includes" a number that doesn't exist. In [closed, open>, the latter number isn't part of the array (560 is "one past the end), while in <open, closed>, the first number isn't part of the array.
Make that what you will of it. [closed, open> became a programmer convention because of these reasons. The important bit is to know all the quirks / off by one errors associated with this representation.
Half-closed is generally good because it tends to reduce off-by-one errors.
But this article overstates the case. Especially for floating-point, where the distinction between a < b and a <= b is not straightforward. Some times youâre modeling closed intervals, and in that case using closed intervals is the right decision.
The âsplitting by timeâ section should probably just be removed, since it confuses the point and doesnât add anything. The scenario doesnât really make sense (if you wanted this youâd store the registration time and get the hour youâd be better off by truncating the value, not using an interval). Also, if youâre going to be doing math on time values, you better know the precision of the values youâre working with (among many other things). Intervals, however closed or open arenât going to help you there.
Maybe I shouldnât criticize so much, since I agree with the general point. But this makes the case awkwardly.
There are convincing cases one can make in favour of half-open intervals (at least in certain circumstances), but this isnât it.
This is just a rambling, absolutist mess.
Why would you want a half-open interval when booking an AirBnB or flight? If I search for flights from February 24 to February 24 I donât expect the empty interval.
Quite a few APIs use a pair of `{start, length}` instead, which in the context of the post's example, is even clearer. Empty interval would be `length == 0`, time interval would be a single array of `starts`, etc. Fewer subtractions (to get length) usually end up nicer too.
It's the default in Python, and it's more helpful that not, so I would say it's a good design decision. But "always" is a dangerous word in engineering, and in this case, definitly not warranted.
Case in point, last week I worked with list ofdates, and I needed the last date to bracket my sliding windows as a time period cleanly.
I think this advice could be better summed up into: to minimise off-by-one errors, choose a consistent strategy for describing intervals, and stick with it as much as is sensible.
Just for completeness I'll mention that there is another style:
start, count
This seems to be popular in .NET ecosystem.I have recently been on the other side of this argument for specific case.
In our case we (users of our API) are to specify date ranges, representing a list of partitions. So we are not counting nights between dates, but rather a set of daily or hourly buckets.
Here (maybe even only here) I argue that inclusive ranges feel more intuitive.
I find it much more intuitive to represent the 1st 7 days of January as
['2022-01-01', '2022-01-07']
compared to
['2022-01-01', '2022-01-08').
Another very common example is to specify the last 7 days (incl 'today') in which case I find
[today().minusDays(6), today()]
to be a clearer representation than
['today().minusDays(6)', 'today().plusDays(1)')
*half-open intervals. There have been times I needed an (open, closed] interval. But there have also been times when I wanted a fully closed interval because I was setting an arbitrary limit and it's easier to tell a user "100 is the maximum" versus "it must be below 100â, so what value should be put to max out out, 99? 99.99? etc
To add some nuance I'd say that if you're dividing a larger interval into smaller subintervals then a half-open one is probably what you want.
I know Dijkstra's paper and it's short, good and should be read but this article is wrong in saying always. It feels like a newbie programmer came across a good thing then lost all proportion; use the right tool for the right job, as ever.
This depends on use case. If you are doing a frontend layer on top of closed-open, then the frontend will have to handle the points the article is rambling about.
Users are used to selecting a daterange in closed closed format for example.
Wholeheartedly agree that sticking - where possible - to [closed, open) is a good idea. It has helped me tremendously when implementing, e.g. computational geometry algorithm. Robust triangle rasterization comes to mind.
Another interesting point: in the weird corner of the world where I grew up, half-open intervals were always denoted : [low_bound, hi_bound[
I am of course completely biased, but I've always found this notation much more elegant and intuitively obvious than the [low_bound, hi_bound) that seems to be the prevalent norm in the anglo world.
Using '[' after the upper bound clearly shows that we're open at the top whereas the ')' is fairly arbitrary.
And while I'm on the topic of weird culture-induced quasi-arbitrary biases: I had a math teacher that would bark (and I mean BARK!) at us if we ever used '>' in inequalities.
The justification was that with this constraint, all inequalities ended up written and laid out with its two members respecting the standard "left-to-right" drawing of the real line, which made it much easier to picture what was going on geometrically.
It also enforced consistency throughout a long demonstration - one less thing added to the cognitive load.
He was made fun of a lot by the student body, of course, but later in life, as a programmer, I have found myself sticking to the habit and I always force myself to mostly use '<', very rarely '<=, and almost never '>' and ">".
I find this makes code much more readable, just like back in the days of my old teacher with math inequalities., and pretty much for the exact same reasons.
Of course, doing that does not help at all when reading other folks code, those uncivilized heretical users of the 'greater than' form.
Python uses "closed open" intervals with `range(0, n)`, the reverse is then `range(n - 1, -1, -1)`, which is then highly unintuitive. This in connection with 0-based array indexing makes certain algorithms then very cumbersome. For example Knuth-Shuffle. In Python this is:
from random import randrange
x = [10, 20, 30, 40, 50 ]
for i in range(len(x) - 1, 0, -1):
r = randrange(i + 1)
x[i], x[r] = x[r], x[i]
print(x)
With 1-based indexing and inclusive ranges it would be much more understandable: a[] = [ 10 20 30 40 50 ]
for i = len a[] downto 2
r = random i
swap a[i] a[r]
end
print a[]
This is one of those "well, of course!" pieces.
But I've bookmarked it, in case I run into someone who thinks they disagree, in which case I can offload the explanation.
I had a similar incident with colleagues who had discovered the "Default" trait and starting adding defaults to everything, including things that didn't have good defaults, and things where they didn't mean default but actually something quite specific such as "empty". The canonical "don't do that!" blog post didn't exist, so I had to create one.
I think this has to do with the nature of the metric underneath. Closed-open intervals are the way to go for integers. However they donât seen to be a good fit for sampling from continuuous space
Closed/open makes sense for continuous measures, for integers closed/closed is more readable
I prefer [a, b) intervals because they encode two pieces of important information directly:
1. The first element (a)
2. The length (b-a)
Which are what we most often need.
Am I the only one who noticed a bit unusual (or not?) ligature for 'st'?
Don't call you integer bound vars `start` and `end` please. Either use `start` and `endExclusive` or start and length - this greatly reduces confusion.
In my experience, half opened integer intervals lead to fewer `- 1` in the code.
> You could try [T, T-1], but that's a bit clunky and it won't work if T is a decimal number.
Huh? What do they mean by "decimal number"?
FWIW, Postgres converts all ranges (=interval in this context) for discrete data types to half open intervals even when a closed one was requested.
So the daterange '[2022-01-01, 2022-01-07]' will result in [2022-01-01,2022-01-08) and the integer range '[1,7]' will result in '[1,8)'
So it seems the Postgres devs agree with the author.
Edit: typo fixed for integer range
I am surprised I can't see this here. Wasn't anyone else taught [closed; closed] and ]open; open[ notation in school?
E. W. Dijkstra constructed a similar argument to argue that numbering should start at 0 https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...
The other nice property of [closed, open) intervals is that they concatenate perfectly: [a, b) ++ [b, c) == [a, c)
Dumb question but why is â[â called âclosedâ and â)â called âopenâ? I somehow would have thought the reverse.
As it goes with maxims like this, it depends on the problem domain. In probability theory and statistics, a cumulative distribution function is defined as F(x) := Pr(X <= x), not Pr(X < x).
Like others are saying, it is consistency within the code base that probably matters the most.
The notation here really bothered me. The author defines the [closed, open) interval [a, b) as the list of all numbers number x that fulfill a ⤠x < b. Good so far, but when they talk about the empty interval [a, a) we get into problem because a ⤠x < a is a nonsensical statement. a cannot be equal to and strictly less then it self.
I think this is a problem when borrowing math concepts to programming. What the author is really talking about here is slicing, not intervals, and the slicing behavior is hopefully well defined on the construct you are working with, most of the time in a manner that makes sense to each construct, or in a consistent manner to other related constructs in the language.
If the author would stick with programming concepts, I donât think this is a rule we should abide to, rather, a guideline which can be employed. And I think most programmers value consistency, so this really isnât that much of an issue.
I was just watching YouTube series on arithmetic coding and I was wondering why the intervals were specified this way. Makes a lot more sense now. The "b - a" use case is quite prevalent there.
What's with the funny connected "st" in the article? I've never seen that before.
How does [2, 2) make sense?
This is not universal advice. If itâs nonsensical for your app to have start-end events of zero length, nothing in the article applies, and using closed intervals does make things a bit cleaner.
Half-open intervals are why I try as much as possible to stay away from languages that use 1-based indexing (Lua, Julia, Matlab, R, ...) - 1-based indexing lends itself to closed intervals because an array of N elements has [1,N] as its index range, whereas 0-based indexing lends itself to half-open intervals because an array of N elements has [0,N) as its index range.
-------
However, I know of one case where closed intervals really shine. Consider displaying a zoomable map in tiles. On a given zoom level, each tile has some coordinates (x;y) where x and y are integers denoting the virtual column and row. Suppose that we allow zooming out by a factor 2, so that two-by-two tiles are aggregated into a single tile. Then a natural choice for the coordinates of the zoomed-out tile are (floor(x/2);floor(y/2)), that is, divide by two and round down. Suppose that a dataset has data on tile coordinates [x1,x2]Ă[y1,y2], meaning that there's only data on tiles (x;y) where x1â¤xâ¤x2 and y1â¤yâ¤y2. These are closed intervals, but stay with me - the reason they are nice in this case is because of how you compute the range of valid tile coordinates when you zoom out: The range becomes [floor(x1/2),floor(x2/2)]Ă[floor(y1/2),floor(y2/2)] - that is, you simply divide the range endpoints by two and round down. If you try to do this with half-open intervals, then you need some +1/-1 shenanigans, which are normally what I try to avoid by going for half-open intervals.