Researchers Have Found A Faster Way To Do Integer Linear Programming

Quanta Magazine
That’s not to say it’s easy work. It wasn’t until 1983 that the mathematician Hendrik Lenstra proved that the general problem was even solvable, providing the first algorithm that could do it. Lenstra thought about ILP geometrically. First, he turned the inequalities at the heart of ILP into a convex shape, such as any regular polygon. This shape represents the constraints of the individual problem you’re solving, whether it’s couch production or airline scheduling, so the shape’s interior corresponds to all possible values that could solve the inequalities, and thus the problem. Lenstra called this shape the convex body. The problem’s dimension influences the dimension of this shape: With two variables it takes the form of a flat polygon; in three dimensions it is a Platonic solid, and so on.
Lenstra next imagined all the integers as an infinite set of grid points, known in mathematics as a lattice. A two-dimensional version looks like a sea of dots, and in three dimensions it looks like the points where steel beams in a building connect. The dimension of the lattice also depends on the dimension of a given problem.
To solve a given ILP problem, Lenstra showed that you just look for where the possible solutions meet the set of integers: at the intersection of the convex body and the lattice. And he came up with an algorithm that could search this space exhaustively — but to be effective, it sometimes had to break the problem up into pieces of smaller dimensions, adding many steps to the runtime.
In the following years, several researchers explored how to make this algorithm run faster. In 1988, Ravi Kannan and László Lovász introduced a concept called the covering radius, borrowed from the study of error-correcting codes, to help the convex body and lattice intersect more efficiently. Roughly, the covering radius makes sure that the convex body always contains at least one integer point, no matter where you place it on the lattice. As a result, the size of the covering radius also determines how efficiently you can solve the ILP problem.
So it all came down to determining the size of the ideal covering radius. Unfortunately, this proved to be a hard problem on its own, and the best Kannan and Lovász could do was narrow down a possible value by searching for upper and lower bounds. They showed that the upper bound — the maximum size of the covering radius — scaled up linearly with the dimension. This was pretty fast, but not enough to significantly speed up the ILP runtime. Over the next 30 years, other researchers could only do slightly better.
What ultimately helped Reis and Rothvoss break through was an unrelated mathematical result that focused purely on lattices. In 2016, Oded Regev and Stephens-Davidowitz showed, in effect, how many lattice points could fit within a specific shape. Reis and Rothvoss applied this to other shapes, which allowed them to better estimate the number of lattice points contained in an ILP covering radius, lowering the upper bound. “The latest breakthrough came with the realization that you can actually do other kinds of shapes,” Regev said.
This new tightened upper bound was a vast improvement, allowing Reis and Rothvoss to achieve a dramatic speedup of the overall ILP algorithm. Their work brings the runtime to (log n)O(n), where n is the number of variables and O(n)means it scales linearly with n. (This expression is considered “almost” the same as the run time of the binary problem.)
“It’s a triumph at the intersection of math, computer science and geometry,” said Daniel Dadush of the national research institute CWI in the Netherlands, who helped pioneer the algorithm Reis and Rothvoss used to measure the ILP runtime.
For now, the new algorithm hasn’t actually been used to solve any logistical problems, since it would take too much work updating today’s programs to make use of it. But for Rothvoss, that’s beside the point. “It’s about the theoretical understanding of a problem that has fundamental applications,” he said.
As to whether the computational efficiency of ILP could be further improved, researchers are still hopeful that they’ll keep approaching the ideal runtime — but not anytime soon. “That would require a fundamentally new idea,” Vempala said.
Score: 347
Author: pseudolus
Link - Comments (133)
So many discrete optimization problems can be translated into linear programs. It's a really powerful set of tools to know, kind of like SAT solvers.I only recently learned about linear programming. I started with PuLP and Python to get a grasp. It was one of those "How did I miss this??" moments as a developer.Do you have any recommendations on where to start?I wish I can remember how I even learned LP tools existed. I started with this:https://coin-or.github.io/pulp/The Google OR-tools library is also a good starting point.I learned about linear programming in uni, but alas I don't think a mathematician's course on linear programming would be a good starting point for practical programmers.
Winston's "Operations Research: Applications and Algorithms" is the authority so far as I can tell. Trivial to find old editions online.For positive integers m and n,have a m x n matrix A of real numbers. Then also have n x 1 x, 1 x n c, and m x 1 b. Seek x to solveLP1:
maximize z = cx
subject to
Ax = b
x >= 0
Instead just as easily can do minimize.
Instead of =, might be given >= and/or slack and/or surplus variables to get the problem in the form of LP1.
Any x so that
Ax = b
x >= 0
is feasible. If there is such an x, then LP1 is feasible; else LP1 is infeasible. If LP1 is feasible and for any feasible x we have z bounded above, then LP1 is bounded and has an optimal x (z as large as possible) solution. Else feasible LP1 is unbounded above.
So, LP1 is feasible or not. If feasible, then it is bounded or not. If bounded, then there is at least one optimal solution.
Regard n x 1 x as a point in R^n for the real numbers R.
Cute: If all the numbers in LP1 are rational, then have no need for the reals.
The set of all feasible x is the feasible region and is closed (in the usual topology of R^n) and convex. If LP1 is bounded, then there is at least one optimal x that is an extreme point of the feasible region. So, it is sufficient to look only at the extreme points.
To determine if LP1 is feasible or not, and if feasible bounded or not, and if bounded to find an optimal x, can use the simplex algorithm which is just some carefully selected linear algebra elementary row operations on
z = cx
Ax = b
The iterations of the simplex algorithm have x move from one extreme point to an adjacent one and as good or better on z.
A LOT is well known about LP1 and the simplex algorithm. There is a simpler version for a least cost network flow problem where move from one spanning tree to another.
If insist that the components of x be integers, then are into integer linear programming and the question of P = NP. In practice there is a lot known about ILP, e.g., via G. Nemhauser.
I used to teach LP at Ohio State -- there are lots of polished books from very elementary to quite advanced.
I attacked some practical ILP problems successfully.
I got into ILP (set covering, a darned clever idea since get to handle lots of goofy, highly non-linear constraints, costs, etc. efficiently) for scheduling the fleet at FedEx. The head guy at FedEx wrote me a memo making that problem my work -- officially I reported to the Senior VP Planning, but for that ILP work in every real sense reported to the head guy. The promised stock was very late, so I went for a Ph.D. and got good at lots of math, including optimization, LP, and ILP, etc.
Conclusion: A career in LP or ILP is a good way to need charity or sleep on the street -- literally, no exaggeration.
For some of what AI is doing or trying to do now, LP/ILP stands to be tough competition, tough to beat. And same for lots more in the now old applied math of optimization. Bring a strong vacuum cleaner to get the thick dust off the best books.
People really need to come up with better names. "Linear Programming" or "Integer Linear Programming" mean absolutely nothing.Also anything dealing with finding the minimum distance distances can be short circuited by keeping the shortest distance and not taking paths that exceed that. This is how approximate nearest neighbor works and can still speed up the full solution. Figuring out full paths that have short average distances first can also get to shorter distances sooner.
You can also cluster points knowing you probably don't want to jump from one cluster to another multiple times.
This made me wonder why it's called programming (since clearly it's not the sense of the word programming most HN'ers are used to).https://en.wikipedia.org/wiki/Mathematical_optimization#Hist...
From that link:Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time
Is that also true of 'dynamic programming'?
If you don’t know, you are in for a treat. Here is Bellman’s own description of how he came up with the term “dynamic programming “ —I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, ‘Where did the name, dynamic programming, come from?’
The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence.
You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose?
In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense.
It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities (Autobiography, p. 159).
See: https://pubsonline.informs.org/doi/pdf/10.1287/opre.50.1.48....
People up-thread have been getting cranky about the use of “programming “ in this sense.
Now of course, “programming“ for optimization has been well-entrenched since the 1970s at least. But perhaps Bellman’s story does give some cover to those who feel the word “programming“ has been wrongly appropriated?
Oh gosh—I was vastly out of the loop: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...Thanks! That's a classic for sure.
take "programming" to mean "scheduling" and the ancient crusty term acquires some meaning.absolutely nothing? you mean other than the relationship with linear systems and linear algebra?Linear programming solvers use lots of heuristics (not entirely unlike the ones you sketched) internally.The important thing to keep in mind is that their heuristic only speed up the amount of time spent finding the optimal solution. But you still get a prove at the end, that they actually found the optimal solution. (Or if you stop earlier, you get a provable upper bound estimate of how far you are away at worst from the optimal solution.)
Heuristics like the ones you sketched don't (easily) give you those estimates.
Software engineers interested in ML/algorithms should learn about linear programming.It's surprising how many problems can be formulated as linear optimization.
For example, in college I was talking to my Industrial Engineer friend about the average minimum number of swaps required to place billiards balls in an acceptable starting position in the rack (triangle). We both happened to write programs that used monte-carlo sampling to solve it - but my solution did BFS on the state space of a graph, and his used linear programming (which was _probably_ more efficient)
ILP is NP-complete.Yes? We do manage to solve ILP problems in practice quite nicely.In fact, most NP problems that you come across in practice are relatively tractable for most practical instances.
Eg for the knapsack problem you have to actually work very hard to get a hard instance in the first place.
That's not correct.First of all, you can't solve a neoclassical economy using LP, because equilibrium constraints can only be represented as complementarity constraints. You would have to give up on some aspects, like dynamic prices.
The linear complementarity problem in itself is NP hard. So you're screwed from the get go, because your problems are now LPCC problems.Good luck finding an LPCC solver. I can confirm that an open source QPCC solver exists though, which should be even slower.
Next is the fact that if you wanted to build a neoclassical economy model, only global optimization will do.
This means that you need to simulate every time step in one large LPCC model, instead of using a finite horizon. Due to the perfect information assumption, you must know about the state of every person on the planet. You're going to need millions of variables due to simple combinatorial explosion.
It's kind of startling how these assumptions, which are supposed to make analytical solutions tractable by the way, also make non-analytical solutions literal hell.
And before you say that prices can be determined iteratively, as I mentioned, you would run into the problem that future prices are unknown to you, so how are you going to plug them into the second time step? The very thing you want to calculate depends on it's future value.
Economics is a weird science, where experienced reality works much better than the theory.
Computational economics is a relatively new field where intelligent agents are used with lots of runs instead of general optimization solvers I believe. Pretty nifty. One of my colleagues publishes a good bit on it.it's not. it's np-hard. the easiest proof is that the best known algorithm is greater than O(2^N)Just because it is NP-hard in the worst-case doesn't mean it is not practical. As can be seen in the many theorems under which conditions the regular polynomial-time LP algorithm provides an integer solution.I foresee a future where industrial engineering and CS are combined into some super-degree. There is currently a surprising amount of overlap in the OR side of things, but I'm shocked by how few IE grads can program their way out of a box. It's a shame, really.CS already is the super-degree.How so?It qualifies you for an opinion on any subject.A CS degree also qualifies you for on-the-job training in writing code, that odious task that your professors find trivial but somehow are also terrible at it.We just don't have time. Incentives are elsewhere. Any time devoted to writing good code for a paper is time we cannot use to work on the next paper, (shudder) grant application, or a plethora of other things that we are either forced or incentivized to do.I miss coding from when I was in a more junior stage of my career and could afford time for it, and I think my fellow professors mostly feel the same, I don't think many would dismiss it as trivial or odious.
Operations Research is basically Industrial Engineering + Mathematical Optimization + programming familiarity. It's super useful.> but I'm shocked by how few IE grads can program their way out of a box. It's a shame, really.These days, you could replace the "IE" in your sentence by any of many, many disciplines and still be correct.
As much as mathematicians will hate to hear this, CS is a new and more tangible/practical way to do maths and should therefore hold a spot in a general education as central as maths has in the last few centuries.
One of my favourite courses in grad school was approximation algorithms and it involved reductions to LP. Lots of fun, can recommend.
Great short article. I haven't looked deeply into the math behind this yet, but this looks to be a preprint [0]. It doesn't appear they're looking directly at the Space Groups as a way to reduce out any symmetries or repetitions that may occur (thus generalizing simplifications of the problem "space"), but it would be interesting to see whether those structures apply or not. I say this as someone who writes software to apply the Space Groups and describe the Voronoi cells around points (or groups of points) distributed through them, so I'm familiar with the "uncanny" ways effects propagate. [1]I'm also not a mathematician (just a lowly architect), so I'm way out of my depth here. But it's fascinating and as someone looking at paths across these generated honeycombs, this result bears more investigation for me as well.
[0] https://arxiv.org/pdf/2303.14605.pdf[1] If you know a mathematician who might be interested in collaborating on this kind work, ping me. This is ongoing work, and as I said I'm out of my depth mathematically. But have run into some interesting properties that don't seem that deeply investigated which may bear deeper study by an actual expert.
This discovery may change the world in unpredictable and, perhaps very big, ways. Discoveries like this put all the self-important feature / model developers that we work with in our big tech day jobs into context.Can you explain specifically what about it you think will change the world and why?
About the travelling salesperson problem, below is a quote from the latest Sapolsky's book Determined: A Science of Life without Free Will. I am not sure how relevant this is for software developers, but still fascinating:"An ant forages for food, checking eight different places. Little ant legs get tired, and ideally the ant visits each site only once, and in the shortest possible path of the 5,040 possible ones (i.e., seven factorial). This is a version of the famed “traveling salesman problem,” which has kept mathematicians busy for centuries, fruitlessly searching for a general solution. One strategy for solving the problem is with brute force— examine every possible route, compare them all, and pick the best one. This takes a ton of work and computational power— by the time you’re up to ten places to visit, there are more than 360,000 possible ways to do it, more than 80 billion with fifteen places to visit. Impossible. But take the roughly ten thousand ants in a typical colony, set them loose on the eight- feeding- site version, and they’ll come up with something close to the optimal solution out of the 5,040 possibilities in a fraction of the time it would take you to brute- force it, with no ant knowing anything more than the path that it took plus two rules (which we’ll get to). This works so well that computer scientists can solve problems like this with “virtual ants,” making use of what is now known as swarm intelligence."
There's been more than a few of these "nature solves NP-hard problems quickly!" kinds of stories, but usually, when one digs deeper, the answer is "nature finds local optima for NP-hard problems quickly!" and the standard response is "so does pretty trivial computer algorithms."In the case of TSP, when you're trying to minimize a TSP with a Euclidean metric (i.e., each node has fixed coordinates, and the cost of the path is the Euclidean distance between these two points), then we can actually give you a polynomial-time algorithm to find a path within a factor ε of the optimal solution (albeit exponential in ε).
https://scottaaronson.blog/?p=266"""I went to the hardware store, bought some glass plates, liquid soap, etc., and found that, while Nature does often find a minimum Steiner tree with 4 or 5 pegs, it tends to get stuck at local optima with larger numbers of pegs."""
"Did he try jiggling it a bit, and then less and less and less?"( Annealing /s )
If you try to make your path close to a circle, it’s obviously not guaranteed to be optimal, but it’ll probably be close enough for most small practical applicationsThe Evolutionary Computation Bestiary [1] list a wide variety of animal behavior inspired heuristics.The foreword includes this great disclaimer:"While we personally believe that the literature could do with more mathematics and less marsupials, and that we, as a community, should grow past this metaphor-rich phase in our field’s history (a bit like chemistry outgrew alchemy), please note that this list makes no claims about the scientific quality of the papers listed."
I have a dumb question: how long will it take before this result becoming a pratical MIP solver beating SCIP or gurobi?Couldn't either of those implement this algorithm?Don't forget about the HiGHS solver [1]. MIT licensed and getting to the point where it's outperforming SCIP on the Mittelmann benchmarks [2].[1]: https://github.com/ERGO-Code/HiGHS
[2]: https://mattmilten.github.io/mittelmann-plots/
HiGHS is more of an alternative to Bonmin and Minotaur than Couenne and SCIP. In my experience though the presolvers in SCIP are extremely dangerous, and it's easy to end up with a local optimum even when that isn't your goal.
For now, the new algorithm hasn’t actually been used to solve any logistical problems, since it would take too much work updating today’s programs to make use of it. But for Rothvoss, that’s beside the point. “It’s about the theoretical understanding of a problem that has fundamental applications,” he said.I don't see how "it would take to much work updating today's programs". Most domain specific models call out to Gurobi, CPLEX, or FICO solvers for large problems, and open source ones like SCIP for the small ones. There is a standard MPS format where you can run exchange models between all of these solvers, and the formulation of the problem shouldn't change, just the solving approach inside the solver.
Can someone enlighten me? I could see if they are arguing, this will require a new implementation, and if so, there is a ton of benefit the world would see from doing so.
The new algorithm of R&R would need to replace the algorithms at the core of Gurobi, CPlex, etc. These tools are marvels of engineering, extremely complex, results of decades of incremental improvements. If would likely take significant research effort to even figure out a way to incorporate the new discoveries into these engines.Why would it need to replace them? From the article, they claim they have found a way to reduce the upperbound faster when searching large Integer problems. I don't see how that effects the current searching process. All of these solvers you can enter in an upperbound yourself if you have knowledge of the problem and know a previous solution. So it seems if this is just a programmatic way of reducing the upper bound, it should fit right in with current approaches. What am I missing?It's a research paper. You can write a theoretical paper and let others apply it practically, which others can figure out the practical aspect and report results of benchmarks, or others can also build on the theory.This paper only has 2 authors. The other solvers are probably applying technique specific tricks and speedups, and you're working with approximate optimization, it's not that easy to move everything over.
> This paper only has 2 authors.So? I don't get the relevance of the author count.
It's quite easy to go tell other people what they should do with their time.These researchers are in the business of improving algorithms. Implementing them in large industrial (or open source) code bases in a maintainable way -- and then actually maintaining that code -- is a different skillset, a different set of interestes, and as was pointed out, besides the point.
Either you believe their results, then be grateful. Someone (yoU!) can implement this.Or you don't. In which case, feel free to move on.
Your tone comes off as entitled.
> Implementing them in large industrial (or open source) code bases in a maintainable way -- and then actually maintaining that code -- is a different skillset, a different set of interestes,You're making a very general point on how algorithm research and software development are two different things, which is of course true. However OP's question is genuine: a lot of research in OR is very practical, and researchers often hack solvers to demonstrate that whatever idea offers a benefit over existing solving techniques. There are no reason to believe that a good new idea like this one couldn't be demonstrated and incorporated into new solvers quickly (especially given the competition).
So the quoted sentence is indeed a bit mysterious. I think it just meant to avoid comment such as "if it's so good why isn't it used in cplex?".
>business of improving algorithmsYou do realize that the solver companies are in exactly the same boat, right?
And given how much the licenses cost, I'd love a new player to show up and bring them down to a reasonable level.no they're not. they're in the business of making their customers' problems solve fast and well. That's of course strongly related, but it is _not_ the same. An algorithm may well be (and this is what OP might be hinting at) be more elegant and efficient, but execute worse on actually existing hardware.Every time an integer feasible point is found during the iterative process these algorithms use (branch and bound), you get a new upper bound on the global minimum. It’s not clear to me how these dynamically generated upper bounds highly specific to the particular problem relate to the upper bounds of a more general nature that R&R produce.> upper bounds of a more general nature that R&R produceIf it's an upper bound, it should be pretty easy to plug into the existing stuff under the hood in these solvers. Can you provide my insight into how the R&R "Upper bound" is different and "more general in nature"?
I don't think they're talking about a bound for the optimum objective value, but a theoretical upper bound for a covering radius related to a convex body and a lattice. The bound would be useful in a lattice-based algorithm for integer linear programming. I don't think there exists an implementation of a lattice algorithm that is practical for non-toy integer linear programming problems, let alone one that is competitive with commercial ILP solvers.Honestly?The search for the 'exactly optimal solution' is way overrated
I think you can get a moderately efficient solution using heuristics at 1/10 of the time or less
Not to mention developer time and trying to figure out which constraints make your problem infeasible. Especially as they get more complicated because you want to make everything linear
I agree, especially when considering that a model is also not reality.However, what folks often do is find a Linear Solution quickly, then optimize on the Integer Solution, which gives you a gap that you can use to choose termination.
> If would likely take significant research effort to even figure out a way to incorporate the new discoveries into these engines.What? Have you ever used a solver before? The actual APIs exposed to the user are very simple interfaces that should allow swapping out the backend regardless of the complexity. The idea a new algorithm—short of something like "updating the solution to adjust to a change in data"—would not require any sort of research to slot in as an implementation for the existing interface.
the interface is simple, but modern solvers apply a ton of heuristics that often dramatically reduce problem size, so a naive implementation of a better algorithm that isn't hooked deeply into the core of an existing ilp solver is likely to be very slowWhy is this exposed to the user? If it isn't exposed to the user, what on earth are you talking about?From what I gather the parent post is saying that it is easy to make a naive implementation of this improvement, but due to naivety of the implementation it will be slower in practice. Hence it is a lot of work (and thus difficult) to actually put this improvement into practice.Why would the API expose the heuristics to the user? Because an intelligent user can make minor adjustments and turn certain features on/off to sometimes dramatically increase performance depending on the problem.These solvers get faster every year, how exactly are they supposed to stay the world's fastest if people invent better algorithms all the time that never get implemented by the commercial offerings?You seem to be confusing problem formulation with the problem solution. It is true there is a standard way to exchange the problem formulation through something like MPS (though it seems AML's like AMPL etc. have taken over). All this format gives you is a standard mathematical formulation of the problem.However, the solution is something very specific to the individual solver and they have their own data structures, algorithms and heuristic techniques to solve the problem. None of these are interchangeable or public (by design) and you cannot just insert some outside numbers in the middle of the solver process without being part of the solver code and having knowledge of the entire process.
Wouldn't the "open source ones like SCIP for the small ones." be public by design?All these solvers use branch and bound to explore the solution space and "fathom" (i.e. eliminate candidate search trees if the lowest possible value for the tree is above an already found solution). The upper bound that the solver calculates via pre-solve heuristics and other techniques does vary from solver to solver. However, they all have a place for "Upper bound", and there are mechanisms in all of these solvers for updating that value in a current solve.If this paper were a complementally orthogonal implementation from everything that exists in these solvers today, if it can produce a new upper bound, faster than other techniques, it should be fairly plug and play.
I have an undergrad OR degree, and I have been a practitioner for 18 years in LP/MIP problems. So I understand the current capacities of these solvers, and have familiarity with these problems. However, I and am out of my depth trying to understand the specifics of this paper, and would love to be corrected where I am missing something.
What is OR?Operations Research: https://en.m.wikipedia.org/wiki/Operations_researchIn many cases you can actually insert outside numbers in the middle of the solver process via callbacks. For example see IncumbentUpdater at https://python-mip.readthedocs.io/en/latest/classes.htmlAnd various C APIs for solvers have other callbacks
It's generally quite limited of course, for the reasons you mentioned.
The math programming languages of AMPL, AIMMS, GAMS...etc are dying in my industry and being replaced by general industry languages like Python/Java + Solver API.I honestly think that's just journalism for "no one implemented it in production yet". Which is not surprising, for an algorithm less than a year old. I don't think it's worth expanding and explaining "too much work".That being said, sometimes if an algorithm isn't the fastest but it's fast and cheap enough, it is hard to argue to spend money on replacing it. Which just means that will happen later.
Furthermore, you might not even see improvements until you implement an optimized verision of a new algorithm. Even if big O notation says it scales better... The old version may be optimized to use memory efficiently, to make good use of SIMD or other low level techniques. Sometimes getting an optimized implementation of a new algorithm takes time.
> I don't see how "it would take to much work updating today's programs".I think some peeps are not reading this sentence the way you meant it to be read.
It seems to me you meant "I don't know what part of this research makes it especially hard to integrate into current solvers (and I would like to understand) ".
But people seem to be interpreting "why didn't they just integrate this into existing solvers? Should be easy (what lazy authors)".
Just trying to clear up some misunderstanding.
The randomized algorithm that Reis & Rothvoss [1] present at the end of their paper will not be implemented in Gurobi/CPLEX/XPRESS. It remains a fantastic result regardless (see below). But first let me explain.In terms of theoretical computational complexity, the best algorithms for "integer linear programming" [2] (whether the variables are binary or general integers, as in the case tackled by the paper) are based on lattices. They have the best worst-case big-O complexity. Unfortunately, all current implementations need (1) arbitrary-size rational arithmetic (like provided by gmplib [3]), which is memory hungry and a bit slow in practice, and (2) some LLL-type lattice reduction step [4], which does not take advantage of matrix sparsity. As a result, those algorithms cannot even start tackling problems with matrices larger than 1000x1000, because they typically don't fit in memory... and even if they did, they are prohibitively slow.
In practice instead, integer programming solver are based on branch-and-bound, a type of backtracking algorithm (like used in SAT solving), and at every iteration, they solve a "linear programming" problem (same as the original problem, but all variables are continuous). Each "linear programming" problem could be solved in polynomial time (with algorithms called interior-point methods), but instead they use the simplex method, which is exponential in the worst case!! The reason is that all those linear programming problems to solve are very similar to each other, and the simplex method can take advantage of that in practice. Moreover, all the algorithms involved greatly take advantage of sparsity in any vector or matrix involved. As a result, some people routinely solve integer programming problems with millions of variables within days or even hours.
As you can see, the solver implementers are not chasing the absolute best theoretical complexity. One could say that the theory and practice of discrete optimization has somewhat diverged.
That said, the Reis & Rothvoss paper [1] is deep mathematical work. It is extremely impressive on its own to anyone with an interest in discrete maths. It settles a 10-year-old conjecture by Dadush (the length of time a conjecture remains open is a rough heuristic many mathematicians use to evaluate how hard it is to (dis)prove). Last november, it was presented at FOCS, one of the two top conferences in computer science theory (together with STOC). Direct practical applicability is besides the point; the authors will readily confess as much if asked in an informal setting (they will of course insist otherwise in grant applications -- that's part of the game). It does not mean it is useless: In addition to the work having tremendous value in itself because it advances our mathematical knowledge, one can imagine that practical algorithms based on its ideas could push the state-of-the-art of solvers, a few generations of researchers down the line.
At the end of the day, all those algorithms are exponential in the worst case anyways. In theory, one would try to slightly shrink the polynomial in the exponent of the worst-case complexity. Instead, practitioners typically want to solve one big optimization problems, not family of problems of increasing size n. They don't care about the growth rate of the solving time trend line. They care about solving their one big instance, which typically has structure that does not make it a "worst-case" instance for its size. This leads to distinct engineering decisions.
[1] https://arxiv.org/abs/2303.14605
[2] min { c^T x : A x >= b, x in R^n, some components of x in Z }
[4] https://www.math.leidenuniv.nl/~hwl/PUBLICATIONS/1982f/art.p...
Thanks for your information. I think it really bridge the gap between the people who are interested in this algorithm and MILP "users". I have two more questions.1. Usually we deal with models with both integer and continuous variables (MILP). Conceptually B&B tackles ILP and MILP in similar ways. Is there any difficulty for lattice based method to be extended to solve MILP?
2. How likely do you think this lattice type algorithm will overcome the difficulties you mentioned and eventually replace B&B, totally or partly (like barrier vs simplex methods)?
The open source solvers are a mess of 30 years of PhD students random contributions. It's amazing they work at all. If you can possibly avoid actually implementing anything using them you will.Maybe what they mean is that, despite an asymptotic advantage, the new algorithm performs worse for many use cases than the older ones. This might be due to the many heuristics that solvers apply to make problems tractable as others have mentioned, as well as good old software engineering optimization.So the work that's required is for someone to take this algorithm and implement it in a way that levels the playing field with the older ones.
Minor nitpick, but the title of this submission should specify "Integer Linear Programming", since the integer part is a much bigger deal.Polynomial time algorithms have been known for linear programming for decades; _integer_ linear programming is NP-hard.
I think we fixed that, albeit by accident when I edited the title earlier. If it needs further fixing let me know!You are right that integer linear programming is NP-hard; but faster algorithms for continuous linear programming are also super interesting and impactful.Continuous linear programming is also _hard_. Not in the sense of NP-hard, but in the sense of there being lots of algorithmic and engineering aspects that go into an efficient, modern LP solver. Even just the numerics are complicated enough.
(And many integer linear programming solvers are based on continuous linear programming solvers.)
True, these are all fair points! I didn't intend to diminish the impact or complexity of linear programming solvers. Well-written solvers are some of the most useful and powerful computational tools that exist today.
Linear programming is very cool, I loved Vasek Chvatal's book as a kid having accidently bought it thinking it was for computers.But it's tricky to understand and implement and it struggles with real life constraints. i.e. This whole specialty just for integers.
Monto Carlo is trivial to understand and implement, adapts to changes and constraints trivially and should be just as good.
I'm sure for something high end like chip design you will do both. I'd be surprised to hear of real life stories where linear programming beats Monty Carlo.
Linear programming on reals is "easy"... You can just check all the points. I believe you can follow the shell of the legal polytope and just use a greedy algorithm to choose the next point that will minimize your goal.If you can get away with a continuous linear program I don't see why you'd use monte carlo. The simplex method will get you an exact answer.
> But it's tricky to understand and implement and it struggles with real life constraints. i.e. This whole specialty just for integers.Integers are actually harder to deal with than rational numbers in linear programming. Many solvers can also deal with a mixed problem that has both rational and integer variables.
Monte Carlo simulations are an entirely different beast. (Though you probably mean simulated annealing? But that's close enough, I guess. Linear programming is an optimization technique. Monte Carlo by itself doesn't have anything to do with optimization.)
One problem with these other approaches is that you get some answer, but you don't know how good it is. Linear programming solvers either give you the exact answer, or otherwise they can give you a provable upper bound estimate of how far away from the optimal answer you are.
[2023] The paper was uploaded first in March https://arxiv.org/abs/2303.14605
While this is an interesting theoretical result, we need to remember that they found an algorithm that is (log n)^O(n). In other words, this is not practical to solve problems with moderate to large size n.compared to the previous bound of n^n, log(n)^n looks pretty good.> In other words, this is not practical to solve problems with moderate to large size n.This depends entirely on your definition of "moderate to large". Many real world problems can be solved easily using existing MILP solvers. We will likely never find an algorithm that can solve arbitrarily large instances of NP-complete problems in practice. Heck, it's easy to generate lists that are to large to be sorted with O(n^2) bubblesort.
It’s great result but probably not useful. Similarly to how interior point methods have better theoretical complexity than simplex for LPs, but fine tuned simplex in reality almost always wins.
I remember the news articles when Karmarkar's algorithm for linear programming was announced.
So this is for the special case of non-negative and non-zero weights only, right? But those cases are the only sane ones, avoiding recursive loops winning a time-travel-alike race.
It seems their result has been out for almost a year now... https://arxiv.org/abs/2303.14605I'm curious how this affects Traveling Salesman. I was under the impression that all NP-Complete problems take O(n!). Does this method improve it at all?
So what? It takes time for the community to digest the result, grasp its significance, and then write popularization articles about it. If you want to know what's being discovered right this second, read arXiv preprints. If you want to know what was discovered semi-recently and you want an explanation in layman terms that puts the results in perspective, read popularization pieces a while later.Depending on the problem it can also O(2^n), but that is always the worst case scenario. Modern ILP solvers employ a variety of heuristics that in many cases significantly reduce the time needed to find a solution.Anecdotally, some years back I was developing MILPs with millions of variables and constraints, and most of them could be solved within minutes to hours. But some of them could not be cracked after weeks, all depending the inputs.
We actually don't know how long NP-complete problems take to solve. We conjecture that it's superpolynomial, but that can be exponentially faster than O(n!).Often, the concrete problems we are interested in have some internal structure that make them easier to solve in practice. Solving Boolean formulas is NP-complete but we routinely solve problems with millions of variables.ILP (and sat) solvers are interesting because they are great at ruling out large areas that cannot contain solutions. It's also easy to translate many problems into ILP or SAT problems.
> I was under the impression that all NP-Complete problems take O(n!).SAT is NP-complete and the naive algorithm ("just try every combination") is O(2^n). Even for TSP there is a dynamic programming approach that takes O(n^2*2^n) instead of O(n!).
Lowering the algorithmic upper bound for a core NP-complete problem is always extremely interesting. However, this is not necessarily related to improving runtime for practical implementations solving the problem in question.Solvers for mixed integer programming (MIP) use a lot of algorithms in conjunction with loads of heuristics. Building up the library of heuristics and strategies is a crucial part of why the improvement in MIP solvers have outpaced Moores law. From https://www.math.uwaterloo.ca/~hwolkowi/henry/teaching/f16/6..., the improvements in hardware from 1990 to 2014 was 6500x. But the improvements to the software are responsible for 870000x performance improvement.
The referenced article may become another part of the puzzle in continuing performance improvements for MIP solvers, but it is not in any way a given.
1. It’s “run time”, not “runtime”. Sorry to be pedantic but they are different. The former is for algorithms the latter is for Java.2. The article seems to say that they have improved the performance concretely, not simply that bounds have been tightened.
It’s still a NP class algorithm so as n grows large the exponent will still grow large but the base value is lower.
Run time, runtime and run-time are all commonly used for the thing you want to call run time. None are incorrect.https://www.collinsdictionary.com/dictionary/english/runtime
Informally they are used interchangeably. However, their definitions are different.The subject here, is academic research involving algorithmic complexity. What papers do you know of that refer to algorithmic complexity and “runtime”?
Context matters. It is incorrect in this context.
Going on arXiv computer science and searching for the string "runtime" returns 6135 results, many of which seem use it in the sense we're talking about here.https://arxiv.org/search/cs?query=runtime&searchtype=all&abs...
In any case Quanta Magazine is not a formal academic journal, and neither is hacker news. We're having an informal discussion about a popular science article.
Your 6135 results are ignoring that there are two different usages.I said “run time” relates to algorithmic complexity. That’s one definition.
The other definition is also valid in research, as it relates to environments. Like Java.
The first result in your query is an example of the second definition. It is not an example showing these are interchangeable.
Fine, I edited "most" in my previous comment to "many". There are still plenty of examples (even on the first page) where the "runtime" is used to mean algorithmic complexityhttps://arxiv.org/abs/2401.14645
https://arxiv.org/abs/2401.13770
https://arxiv.org/abs/2401.12253
https://arxiv.org/abs/2401.12205
https://arxiv.org/abs/2401.10856
I’ll respond to what I think are the real points:Precision doesn’t matter in this case? I disagree.
It was a dick move and pretentious to call it out?
Maybe so, it was not my intent and I apologize if it was the case.
Keeping the HN tradition of "pedantry at all costs" alive, I applaud youWas #1 really worth mentioning...?Isn't this a stylistic choice? For example, some style guides would suggest to use "run time", but also "run-time error".The original paper (https://arxiv.org/pdf/2303.14605.pdf) that the linked article reports on is a theoretical algorithms paper, and does not contain any references to actual solvers or implementations.For practical cases it is not that important what the worst-case complexity is, but rather what the expected complexity is of solving problems that occur in practice.
> Sorry to be pedanticAs always, a guaranteed way to make friends.