A method of determining the time relation between an original or input speech signal (10) and an output speech signal (15) affected by time warping in a communications system, such as a VoIP (Voice over Internet Protocol) system. Wherein corresponding speech bursts (11, 12; 16, 17) of the input (10) and output speech signal (15) are located in accordance with a predefined signal property thereof. The corresponding speech bursts (11, 12; 16, 17) thus located and time aligned (10, 30) for the correction of continuous and discontinuous warping effects. A performance estimate is generated by comparing the time aligned input and output speech signals (10, 30) applying cross-correlation techniques and PSQM (Perceptual Speech Quality Measure) or PSQM+ (Enhanced Perceptual Speech Quality Measure) techniques.

