Accurate and efficient traffic speed prediction is crucial for improving road safety and efficiency. With the emerging deep learning and extensive traffic data, data-driven methods are widely adopted to achieve this task with increasingly complicated structures and progressively deeper layers of neural networks. Despite the design of the models, they aim to optimize the overall average performance without discriminating against different traffic states. However, the fact is that predicting the traffic speed under congestion is normally more important than under free flow since the downstream tasks, such as traffic control and optimization, are more interested in congestion rather than free flow. Unfortunately, most of the state-of-the-art (SOTA) models do not differentiate between the traffic states during training and evaluation. To this end, we first comprehensively study the performance of the SOTA models under different speed regimes to illustrate the low accuracy of low-speed prediction. We further propose and design a novel Congestion-Aware Sparse Attention transformer (CASAformer) to enhance the prediction performance under low-speed traffic conditions. Specifically, the CASA layer emphasizes the congestion data and reduces the impact of free-flow data. Moreover, we adopt a new congestion adaptive loss function for training to make the model learn more from the congestion data. Extensive experiments on real-world datasets show that our CASAformer outperforms the SOTA models for predicting speed under 40 mph in all prediction horizons.