Floating-point division involves the computation of the ratio (1 $+$ Mx)/(1 $+$ My), where Mx and My represents the mantissas of the input values. In this paper, we propose a new method for approximating this operation using a linear function of Mx, with coefficients that depend on My. The coefficients are calculated to minimize the Mean Relative Error Distance (MRED) of the approximation. To this end, the range of My is partitioned in N sub-intervals where the minimization of MRED is formulated as a linear programming problem, whose solution gives optimal coefficient values. The hardware implementation requires a small lookup table, two multipliers and an adder. An aggressive coefficients quantization is exploited to further optimize the design. Obtained MRED improves by increasing $N$ , ranging from 1.4% to 0.33%. Implementation results in a 28nm CMOS technology show that the proposed design outperforms the state-of-the-art, offering the best trade-off between hardware complexity and accuracy. Results for two image processing applications, change detection and JPEG compression, demonstrate remarkable performance, with SSIM very close to 1 and PSNR values exceeding 50dB.
Novel Low-Power Floating-Point Divider With Linear Approximation and Minimum Mean Relative Error / DI MEO, Gennaro; Strollo, A. G. M.; DE CARO, Davide. - In: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. I, REGULAR PAPERS. - ISSN 1549-8328. - 70:12(2023), pp. 5275-5288. [10.1109/TCSI.2023.3312974]
Novel Low-Power Floating-Point Divider With Linear Approximation and Minimum Mean Relative Error
Di Meo Gennaro;Strollo A. G. M.;De Caro Davide
2023
Abstract
Floating-point division involves the computation of the ratio (1 $+$ Mx)/(1 $+$ My), where Mx and My represents the mantissas of the input values. In this paper, we propose a new method for approximating this operation using a linear function of Mx, with coefficients that depend on My. The coefficients are calculated to minimize the Mean Relative Error Distance (MRED) of the approximation. To this end, the range of My is partitioned in N sub-intervals where the minimization of MRED is formulated as a linear programming problem, whose solution gives optimal coefficient values. The hardware implementation requires a small lookup table, two multipliers and an adder. An aggressive coefficients quantization is exploited to further optimize the design. Obtained MRED improves by increasing $N$ , ranging from 1.4% to 0.33%. Implementation results in a 28nm CMOS technology show that the proposed design outperforms the state-of-the-art, offering the best trade-off between hardware complexity and accuracy. Results for two image processing applications, change detection and JPEG compression, demonstrate remarkable performance, with SSIM very close to 1 and PSNR values exceeding 50dB.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.