This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valence multicomputers, i.e. on parallel systems made up of processing elements (PEs) which do not share memory and are connected to a bounded number of neighbours. After a qualitative analysis of several possible partitionings of the DIT FFT algorithm, a decomposition is introduced that has good scalability properties and makes it possible to use sections of sequential code based on the most common 1D-FFT algorithms. If a computing architecture with indirect binary n-cube interconnection network is used, the proposed decomposition guarantees strictly local communications and therefore requires no through-routing support. These characteristics have a positive impact on software development and also on overall performance. Furthermore, thanks to a pipelined organization of the PEs, the resulting architecture has high potentialities for real-time signal processing. As these useful features are obtained at the 'expense' of an uneven workload distribution, computing efficiency is relatively low but does not significantly change in a wide range of the number of processors. An implementation on a Transputer based system is presented along with the performance results obtained. Finally a simple analytical model of the architecture is shown, that allows the values of the main performance parameters to be obtained as a function of the number of processors used and of the elementary response times of the first stage of PEs.
Parallel 1d-fft Computation On Constant-valence Multicomputers / Mazzeo, Antonino; Villano, Umberto. - In: SOFTWARE-PRACTICE & EXPERIENCE. - ISSN 0038-0644. - STAMPA. - 25:(1995), pp. 681-704. [10.1002/spe.4380250607]
Parallel 1d-fft Computation On Constant-valence Multicomputers
MAZZEO, ANTONINO;VILLANO, UMBERTO
1995
Abstract
This paper addresses the problem of monodimensional (1D) FFT parallel computation on constant-valence multicomputers, i.e. on parallel systems made up of processing elements (PEs) which do not share memory and are connected to a bounded number of neighbours. After a qualitative analysis of several possible partitionings of the DIT FFT algorithm, a decomposition is introduced that has good scalability properties and makes it possible to use sections of sequential code based on the most common 1D-FFT algorithms. If a computing architecture with indirect binary n-cube interconnection network is used, the proposed decomposition guarantees strictly local communications and therefore requires no through-routing support. These characteristics have a positive impact on software development and also on overall performance. Furthermore, thanks to a pipelined organization of the PEs, the resulting architecture has high potentialities for real-time signal processing. As these useful features are obtained at the 'expense' of an uneven workload distribution, computing efficiency is relatively low but does not significantly change in a wide range of the number of processors. An implementation on a Transputer based system is presented along with the performance results obtained. Finally a simple analytical model of the architecture is shown, that allows the values of the main performance parameters to be obtained as a function of the number of processors used and of the elementary response times of the first stage of PEs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.