Open Access Open Access  Restricted Access Subscription Access

SIMPLE EFFECTIVE FAST INVERSE SQUARE ROOT ALGORITHM WITH TWO MAGIC CONSTANTS

Oleh Horyachyy, Leonid Moroz, Viktor Otenko

Abstract


The purpose of this paper is to introduce a modification of Fast Inverse Square Root (FISR) approximation algorithm with reduced relative errors. The original algorithm uses a magic constant trick with input floating-point number to obtain a clever initial approximation and then utilizes the classical iterative Newton-Raphson formula. It was first used in the computer game Quake III Arena, causing widespread discussion among scientists and programmers, and now it can be frequently found in many scientific applications, although it has some drawbacks. The proposed algorithm has such parameters of the modified inverse square root algorithm that minimize the relative error and includes two magic constants in order to avoid one floating-point multiplication. In addition, we use the fused multiply-add function and iterative methods of higher order in the second iteration to improve the accuracy. Such algorithms do not require storage of large tables for initial approximation and can be effectively used on field-programmable gate arrays (FPGAs) and other platforms without hardware support for this function.

Keywords


inverse square root; FISR algorithm; initial approximation; magic constant; IEEE 754 standard; floating-point arithmetic; FMA function; maximum relative error; Newton-Raphson; Householder.

Full Text:

PDF

References


B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs, second ed., Oxford Univ. Press, New York, 2010, 641 p.

N.H.F. Beebe, The Mathematical-Function Computation Handbook: Programming Using the MathCW Portable Software Library, Springer, 2017, 1115 p.

M.D. Ercegovac, T. Lang, Division and Square Root: Digit Recurrence Algorithms and Implementations, Boston: Kluwer Academic Publishers, 1994, 230 p.

J.-M. Muller, N. Brisebarre, F. de Dinechin, C.-P. Jeannerod, V. Lefevre, G. Melquiond, N. Revol, D. Stehle, S. Torres, Handbook of Floating-Point Arithmetic, Springer, 2010, 572 p.

J.-M. Muller, N. Brunie, F. de Dinechin, C.-P. Jeannerod, M. Joldes, V. Lefevre, G. Melquiond, N. Revol, S. Torres, “Software implementation of floating-point arithmetic,” in: Handbook of Floating-Point Arithmetic, Birkhäuser, Cham, 2018, pp. 321-374.

F. Lemaitre, B. Couturier, L. Lacassagne, “Cholesky factorization on SIMD multi-core architectures,” Journal of Systems Architecture, vol. 79, pp. 1-15, 2017.

Id Software, Quake III Arena, quake3-1.32b code, [Online]. Available: https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c

C. Lomont, Fast Inverse Square Root, Purdue University, Tech. Rep., 2003, [Online]. Available at: http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf

P. Martin, “Eight rooty pieces,” Overload Journal, vol. 24, issue 135, pp. 8-12, 2016.

D.H. Eberly, GPGPU Programming for Games and Science, CRC Press, 2014, 469 p.

L. Moroz, C.J. Walczyk, A. Hrynchyshyn, V. Holimath, J.L. Cieśliński, “Fast calculation of inverse square root with the use of magic constant – analytical approach,” Applied Mathematics and Computation, vol. 316, issue C, pp. 245-255, 2018.

C.J. Walczyk, L.V. Moroz, J.L. Cieśliński, “Improving the accuracy of the fast inverse square root algorithm,” ArXiv Preprint, arXiv:1802.06302, 2018.

A. Hasnat, T. Bhattacharyya, A. Dey, S. Halder and D. Bhattacharjee, “A fast FPGA based architecture for computation of square root and inverse square root,” Proceedings of the International Conference on Devices for Integrated Circuit (DevIC), Kalyani, India, 2017, pp. 383-387.

C.J. Hsu, J.L. Chen and L.G. Chen, “An efficient hardware implementation of HON4D feature extraction for real-time action recognition,” Proceedings of the 2015 IEEE International Symposium on Consumer Electronics (ISCE), 2015, pp. 1-2.

Z. Li, Y. Chen and X. Zeng, “OFDM synchronization implementation based on Chisel platform for 5G research,” Proceedings of the 2015 IEEE 11th International Conference on ASIC (ASICON), 2015, pp. 1-4.

S. Zafar and R. Adapa, “Hardware architecture design and mapping of fast inverse square root’s algorithm,” Proceedings of the International Conference on Advances in Electrical Engineering (ICAEE), 2014, pp. 1-4.

C.A. Navarro, M. Vernier, N. Hitschfeld, B. Bustos, “Competitiveness of a non-linear block-space GPU thread map for simplex domains,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, issue 12, pp. 2728-2741, 2018.

A. Gazneli et al. Adaptive Nonlinear System Control using Robust and Low-complexity Coefficient Estimation, U.S. Patent No 15/648, 205, 2018.

Espressif Systems, ESP32-WROOM-32 (ESP-WROOM-32) datasheet. Version 2.4, 2018.

Tensilica Inc., Xtensa Instruction Set Architecture (ISA), Reference Manual, 2018.

S. Xiao, T. Isshiki, D. Li, H. Kunieda, “HOG-based object detection processor design using ASIP methodology,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. 100, issue 12, pp. 2972-2984, 2017.

M. Ramirez-Martinez, F. Sanchez-Fernandez, P. Brunet, S. M. Senouci and El-Bay Bourennane, “Dynamic management of a partial reconfigurable hardware architecture for pedestrian detection in regions of interest,” Proceedings of the 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig), 2017, pp. 1-7.

J. Lv, F. Wang and Z. Ma, “Peach fruit recognition method under natural environment,” Proceedings of the Eighth International Conference on Digital Image Processing (ICDIP’2016), Chengu, China, August 29, 2016, vol. 10033, paper 1003317.

B.K. Anirudh, V. Venkatraman, A. R. Kumar and S. D. Sumam, “Accelerating real-time computer vision applications using HW/SW co-design,” Proceedings of the 2017 International Conference on Computer, Communications and Electronics (Comptelix), Jaipur, India, July 1-2, 2017, pp. 458-463.

Intel Corporation, Intel® Cyclone® 10 GX device overview. C10GX51001, 2018.


Refbacks

  • There are currently no refbacks.
hgs yükleme