Markov Combinatorial Processes for Reinforcement Learning and Combinatorial Optimization Problems


  • Francesca Guerriero
  • Francesco Paolo Saccomanno



Markov decision process, combinatorial optimization, reinforcement learning, bin packing problem


Solving combinatorial optimization problems is a crucial challenge in many real-world applications. These problems require the optimal choice of combinations from a large set of possibilities, subject to the specific constraints of the problem under consideration, in order to maximize or minimize an objective function. In recent years, Reinforcement Learning (RL) has attracted considerable attention as a potential innovative tool for tackling these complex tasks. The main challenge, to solve combinatorial optimization problems using RL, is related to the need of overcoming the sequential nature of the Markov Decision Processes (MDP) model, on which the solution algorithms are based. In this work, we present an extension of the MDP, that enables software agents to learn from a model, that better reflects the non-sequential nature of these problems. The results demonstrate that, for the first time, a software agent can provide optimal results or, at the very least, solutions with minimal deviation from the optimal values, in the majority of the benchmark instances used in the computational study.


H. Dai, E. B. Khalil, Y. Zhang, B. Dilkina, and L. Song, “Learning combinatorial optimization algorithms over graphs,” arXiv preprint arXiv:1704.01665, 2017.

A. Nowak, S. Villar, A. S. Bandeira, and J. Bruna, “A note on learning algorithms for quadratic assignment with graph neural networks,” arXiv preprint arXiv:1706.07450, 2017.

N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial optimization: A survey,” Computers & Operations Research, vol. 134, pp. 539–548, 2021.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.

J. Schulman, F. W. P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.

Y. Bengio, A. Lodi, and A. Prouvost, “Machine learning for combinatorial optimization: A methodological tour d’horizon,” European Journal of Operational Research, 2021.

F. Guerriero and F. P. Saccomanno, “A machine learning approach for the bin packing problem,” in Proceeding of 12th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications. 7-9 September 2023, 2023.

Z. Li, Q. Chen, and V. Koltun, “Combinatorial optimization with graph convolutional networks and guided tree search,” Advances in Neural Information Processing Systems, pp. 539–548, 2018.

S. Basu, Design methods and analysis of algorithms. PHI Learning Pvt. Ltd., 2013.

S. Martello and P. Toth, “Knapsack problems: Algorithms and computer implementations,” Wiley, 1990.

P. Schaus, J. Regin, R. V. Schaeren, W. Dullaert, and B. Raa, “Cardinality reasoning for bin-packing constraint: Application to a tank allocation problem,” Lecture Notes in Computer Science, 2012.

L. Wei, Z. Luo, R. Baldacci, and A. Lim, “A new branch-and-price-and-cut algorithm for one-dimensional bin-packing problems,” INFORMS Journal on Computing, 2019.

M. Delorme, M. Iori, and S. Martello, “Bin packing and cutting stock problems: Mathematical models and exact algorithms,” European Journal of Operational Research, vol. 255, pp. 1–20, 2016.

S. Martello and D. Vigo, “Exact solution of the two-dimensional finite bin packing problem,” Management Science, 1997.

R. E. Korf, “A new algorithm for optimal bin packing,” in Eighteenth National Conference on Artificial Intelligence. USA: American Association for Artificial Intelligence, 2002, p. 731–736.

S. E. L. and R. E. Korf, “Improved bin completion for optimal bin packing and number partitioning.” Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, 2013.

M. Delorme, M. Iori, and S. Martello, “Bin packing and cutting stock problems: Mathematical models and exact algorithms,” European Journal of Operational Research, vol. 255, no. 1, pp. 1–20, 2016.

E. Falkenauer, “A New Representation and Operators for Genetic Algorithms Applied to Grouping Problems,” Evolutionary Computation, vol. 2, no. 2, pp. 123–144, 06 1994.

F. Brandao and J. Pedroso, “Bin packing and related problems: General arc-flow formulation with graph compression,” Computers & Operations Research, 2016.

F. Guerriero and F. P. Saccomanno, “A hierarchical hyper-heuristic for the bin packing problem,” Soft Comput., vol. 27, no. 18, p. 12997–13010, May 2022.

D. S. Johnson, A. J. Demers, J. D. Ullman, M. R. Garey, and R. L. Graham, “Worst-case performance bounds for simple one-dimensional packing algorithms,” SIAM J. Comput., vol. 3, no. 4, pp. 299–325, 1974.

D. S. Johnson, “Fast algorithms for bin packing,” SIAM J. Comput., vol. 8, no. 3, p. 272–314, 1974.

E. G. Coffman Jr., J. Csirik, G. Galambos, S. Martello, and D. Vigo, Bin Packing Approximation Algorithms: Survey and Classification. New York, NY: Springer New York, 2013, pp. 455–531.

C. Munien and A. E. Ezugwu, “Metaheuristic algorithms for onedimensional bin-packing problems: A survey of recent advances and applications,” Journal of Intelligent Systems, vol. 30, no. 1, pp. 636–663, 2021.

J. Filar and K. Vrieze, Competitive Markov Decision Processes. Berlin, Heidelberg: Springer-Verlag, 1996.

Y. Ye, “A new complexity result on solving the markov decision problem,” Mathematics of Operations Research, vol. 30, no. 3, pp. 733–749, 2005.

V. Gabillon, M. Ghavamzadeh, and B. Scherrer, “Approximate dynamic programming finally performs well in the game of tetris,” in Advances in Neural Information Processing Systems, C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, Eds., vol. 26. Curran Associates, Inc., 2013.

D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” 2018.

W. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality: Second Edition. United States: Wiley-Blackwell, Sep. 2011, publisher Copyright: © 2011 by John Wiley Sons, Inc. All rights reserved.

M. Hutter, “Extreme state aggregation beyond markov decision processes,” Theoretical Computer Science, vol. 650, pp. 73–91, 2016, algorithmic Learning Theory.

R. Meshram and K. Kaza, “Simulation based algorithms for markov decision processes and multi-action restless bandits,” 2020.

H. S. Chang, J. Hu, M. C. Fu, and S. I. Marcus, Simulation-Based Algorithms for Markov Decision Processes, 2nd ed. Springer Publishing Company, Incorporated, 2013.

D. Silver, C. J. M. Aja Huang, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, no. 529, p. 484–489, 2016.

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis, “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” ArXiv, vol. abs/1712.01815, 2017.

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,” Science, vol. 362, no. 6419, pp. 1140–1144, 2018.

M. L. Littman, T. L. Dean, and L. P. Kaelbling, “On the complexity of solving markov decision problems,” CoRR, vol. abs/1302.4971, 2013. [Online]. Available:

M. G. Robert Givan, Thomas Dean, “Equivalence notions and model minimization in markov decision processes,” Artificial Intelligence, vol. 147, no. 1, pp. 163–223, 2003.

Y. Mansour and S. Singh, “On the complexity of policy iteration,” 2013.

R. D. C. Monteiro and T. Tsuchiya, “A variant of the vavasis–ye layeredstep interior-point algorithm for linear programming,” SIAM Journal on Optimization, vol. 13, no. 4, pp. 1054–1079, 2003.

O.Vinyals, M.Fortunato, and N.Jaitly, “Pointer networks,” Advances in Neural Information Processing Systems, p. 2692–2700, 2015.

H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, “Solving a new 3d bin packing problem with deep reinforcement learning method,” 2017.

D. Li, Z. Gu, Y. Wang, C. Ren, and F. C. M. Lau, “One model packs thousands of items with recurrent conditional query learning,” CoRR, vol. abs/2111.06726, 2021. [Online]. Available:

W. Kool and M. Welling, “Attention solves your tsp,” arXiv preprint arXiv:1803.08475, 2018.

B. Balaji, J. Bell-Masterson, E. Bilgin, A. Damianou, P. M. Garcia, A. Jain, A. Luo, A. Maggiar, B. Narayanaswamy, and C. Ye, “Reinforcement learning benchmarks for online stochastic optimization problems,” 2019. [Online]. Available:

M. Bender, B. Bradley, G. Jagannathan, and K. Pillaipakkamnatt, “Sumof-squares heuristics for bin packing and memory allocation,” ACM Journal of Experimental Algorithmics, vol. 12, 01 2007.

J. B. Diederik P. Kingma, “Adam: A method for stochastic optimization,” 3rd International Conference for Learning Representations, 2015.

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction (2nd Edition). MIT press Cambridge, 2017.

E. Falkenauer, “A hybrid grouping genetic algorithm for bin packing,” Journal of Heuristics, vol. 2, no. 1, pp. 5–30, Jun 1996.




How to Cite

Guerriero, F., & Saccomanno, F. P. (2023). Markov Combinatorial Processes for Reinforcement Learning and Combinatorial Optimization Problems. International Journal of Computing, 22(4), 447-454.