Detecting the presence of workers, plant, equipment, and materials (i.e. objects) on sites to improve safety and productivity has formed an integral part of computer vision- based research in construction. Such research has tended to focus on the use of computer vision and pattern recognition approaches that are overly reliant on the manual extraction of features and small datasets (<10k images/label), which can limit inter and intra-class variability. As a result, this hinders their ability to accurately detect objects on construction sites and generalization to different datasets. To address this limitation, an Improved Faster Regions with Convolutional Neural Network Features (IFaster R-CNN) approach is used to automatically detect the presence of objects in real-time is developed, which comprises: (1) the establishment dataset of workers and heavy equipment to train the CNN; (2) extraction of feature maps from images using deep model; (3) extraction of a region proposal from feature maps; and (4) object recognition. To validate the model’s ability to detect objects in real-time, a specific dataset is established to train the IFaster R-CNN models to detect workers and plant (e.g. excavator). The results reveal that the IFaster R- CNN is able to detect the presence of workers and excavators at a high level of accuracy (91% and 95%). The accuracy of the proposed deep learning method exceeds that of current state-of-the-art descriptor methods in detecting target objects on images.