Gait recognition which is one of the most important and effective biometric technologies has a significant advantage in long-distance recognition systems. For existing gait recognition methods, the template-based approaches may lose temporal information, while the sequence-based methods cannot fully exploit the temporal relations among the sequence. To address the above issues, we propose a novel multiple-temporal-scale gait recognition framework which integrates the temporal information in multiple temporal scales, making use of both the frame and interval fusion information. Moreover, the interval-level representation is realized by a local transformation module. Concretely, 3D convolution neural network (3D CNN) is applied in both the small and the large temporal scales to extract the spatial-temporal information. Moreover, a frame pooling method is developed to address the mismatch of the input of 3D network and video frames, and a novel 3D basic network block is designed to improve efficiency. Experiments demonstrate that the multiple-temporal-scale 3D CNN based gait recognition method can achieve better performance than most recent state-of-the-art methods in CASIA-B dataset. The proposed method obtains the rank-1 accuracy with 96.7% under normal condition, and outperforms other methods on average accuracy by at least 5.8% and 11.1%, respectively, in complex scenarios.