Object detection is one of the most exciting fields in computer vision, allowing us to identify and label objects in an image or video. Using TensorFlow and the Object Detection API, we can build and customize a model to recognize objects in our dataset.
First, you need to make sure your environment is set up with the necessary libraries for object detection. Use the following commands to install TensorFlow and the related libraries:
pip install tensorflow pip install tf_slim pip install tensorflow-object-detection-api
In this step, you’re installing:
Once these libraries are installed, you’re ready to move to the next step: preparing your dataset.
To train an object detection model, you need to have a properly labeled dataset. This means you should have both the images and their corresponding annotations.
The annotations should either be in Pascal VOC format (XML files) or COCO format (JSON files). You can find tools like LabelImg or VGG Image Annotator to help you label the dataset if it’s not already annotated.
Here’s how to organize your dataset:
train
and test
folders.TensorFlow’s Object Detection API is a collection of pre-trained models and utilities for creating custom object detection models. To use it, you’ll need to clone the TensorFlow models repository and compile the necessary components.
Run the following commands:
git clone https://github.com/tensorflow/models.git cd models/research protoc object_detection/protos/*.proto --python_out=.
Explanation:
research
directory, which contains the object detection API.TensorFlow models typically expect data to be in TFRecord format. TFRecord is a highly efficient binary format for storing large datasets. You’ll need to write a script to convert your images and annotations into TFRecords.
Here’s an example script that converts images and annotations to TFRecord:
import tensorflow as tf from object_detection.utils import dataset_util import os import io from PIL import Image def create_tf_example(image_path, annotation): height = 480 # Image height width = 640 # Image width filename = image_path.encode('utf8') # Path to image file with tf.io.gfile.GFile(image_path, 'rb') as fid: encoded_image_data = fid.read() # Encoded image bytes image_format = b'jpeg' # Assuming JPEG format # Assuming a single bounding box per image for simplicity x_min = 0.1 # Normalized coordinates [0, 1] x_max = 0.9 y_min = 0.2 y_max = 0.8 tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_image_data), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature([x_min]), 'image/object/bbox/xmax': dataset_util.float_list_feature([x_max]), 'image/object/bbox/ymin': dataset_util.float_list_feature([y_min]), 'image/object/bbox/ymax': dataset_util.float_list_feature([y_max]), })) return tf_example def convert_to_tfrecord(images_dir, annotations_dir, output_path): writer = tf.io.TFRecordWriter(output_path) for image_file in os.listdir(images_dir): image_path = os.path.join(images_dir, image_file) annotation_file = os.path.join(annotations_dir, image_file.replace('.jpg', '.xml')) # Assuming XML annotations, you can parse them here. # For simplicity, we'll assume there's one bounding box in this example. tf_example = create_tf_example(image_path, annotation_file) writer.write(tf_example.SerializeToString()) writer.close() # Convert the dataset convert_to_tfrecord('path/to/train/images', 'path/to/train/annotations', 'train.record')
This script performs the following steps:
You’ll need to modify this script based on your dataset format (e.g., Pascal VOC, COCO).
Once your dataset is ready in TFRecord format, you can choose a pre-trained model from TensorFlow’s Model Zoo. These models have been trained on large datasets like COCO, making them a great starting point for custom object detection.
Download a pre-trained model (like SSD, Faster R-CNN) and adjust the configuration file to fit your dataset. You can find configuration files in models/research/object_detection/samples/configs/
.
With everything configured, you can now train your model using the following command:
python models/research/object_detection/model_main_tf2.py --pipeline_config_path=path/to/your/model.config --model_dir=path/to/output_directory --alsologtostderr
Explanation:
pipeline_config_path
: Path to the model configuration file you edited earlier.model_dir
: Directory where checkpoints and logs will be saved.During training, you can evaluate your model on the test dataset using this command:
python models/research/object_detection/model_main_tf2.py --pipeline_config_path=path/to/your/model.config --model_dir=path/to/output_directory --checkpoint_dir=path/to/output_directory
This evaluates your model using the saved checkpoints from the training process.
Once the model is trained, you can use it to perform object detection on new images. Here’s how you can load the model and run inference:
import tensorflow as tf from object_detection.utils import config_util from object_detection.builders import model_builder from object_detection.utils import visualization_utils as viz_utils import cv2 # Load the model configs = config_util.get_configs_from_pipeline_file('path/to/model.config') model_config = configs['model'] detection_model = model_builder.build(model_config=model_config, is_training=False) # Restore the latest checkpoint ckpt = tf.compat.v2.train.Checkpoint(model=detection_model) ckpt.restore('path/to/checkpoint').expect_partial() # Load image and run detection def detect_objects(image_path): image_np = cv2.imread(image_path) input_tensor = tf.convert_to_tensor(image_np) input_tensor = input_tensor[tf.newaxis, ...] detections = detection_model(input_tensor) # Visualize results viz_utils.visualize_boxes_and_labels_on_image_array( image_np, detections['detection_boxes'][0].numpy(), detections['detection_classes'][0].numpy().astype(int), detections['detection_scores'][0].numpy(), category_index, use_normalized_coordinates=True, line_thickness=8) cv2.imshow('Object Detection', image_np) cv2.waitKey(0)
This script:
By following these steps, you can create a custom object detection model using TensorFlow and the Object Detection API. Whether you’re detecting cars, fruits, or anything else, the key is to have a well-labeled dataset and a properly configured model.