Vision-based action recognition of construction workers has attracted increasing attention for its diverse applications. Though state-of-the-art performances have been achieved using spatial-temporal features in previous studies, considerable challenges remain in the context of cluttered and dynamic construction sites. Considering that workers actions are closely related to various construction entities, this paper proposes a novel system on enhancing action recognition using semantic information. A data-driven scene parsing method, named label transfer, is adopted to recognize construction entities in the entire scene. A probabilistic model of actions with context is established. Worker actions are first classified using dense trajectories, and then improved by construction object recognition. The experimental results on a comprehensive dataset show that the proposed system outperforms the baseline algorithm by 10.5%. The paper provides a new solution to integrate semantic information globally, other than conventional object detection, which can only depict local context. The proposed system is especially suitable for construction sites, where semantic information is rich from local objects to global surroundings. As compared to other methods using object detection to integrate context information, it is easy to implement, requiring no tedious training or parameter tuning, and is scalable to the number of recognizable objects.