Hierarchical Detection Network (HDN)

Projects / Image Analysis / Hierarchical Detection Network (HDN)

Multiple Object Detection by Sequential Monte Carlo and Hierarchical Detection Network (HDN)

This page gives a high level overview of our research on Hierarchical Detection Network (HDN). For more details, please refer to our article published in CVPR 2010 proceedings.

Contents

Overview
Motivation and Intuition
Results
Summary
Bibliography

Overview

In this paper, we propose a novel framework for detecting multiple objects in 2D and 3D images. Since a joint multi-object model is difficult to obtain in most practical situations, we focus here on detecting the objects sequentially, one-by-one. The interdependence of object poses and strong prior information embedded in our domain of medical images results in better performance than detecting the objects individually. Our approach is based on Sequential Estimation techniques, frequently applied to visual tracking. Unlike in tracking, where the sequential order is naturally determined by the time sequence, the order of detection of multiple objects must be selected, leading to a Hierarchical Detection Network (HDN). We present an algorithm that optimally selects the order based on probability of states (object poses) within the ground truth region. The posterior distribution of the object pose is approximated at each step by sequential Monte Carlo. The samples are propagated within the sequence across multiple objects and hierarchical levels. We show on 2D ultrasound images of left atrium, that the automatically selected sequential order yields low mean detection error. We also quantitatively evaluate the hierarchical detection of fetal faces and three fetal brain structures in 3D ultrasound images.

Motivation and Intuition

The most challenging aspect of the multi-object detection algorithms is designing detectors that are fast and robust, modeling the spatial relationships between objects, and determining the detection order. In this paper, we propose a multi-object detection system that addresses these challenges.

The computational speed and robustness of our system is increased by hierarchical processing. In detection, one major problem is how to effectively propagate object candidates across the levels of the hierarchy. This typically involves defining a search range at a fine level where the candidates from the coarse level are refined. Incorrect selection of the search range leads to higher computational speeds, lower accuracy, or drift of the coarse candidates towards incorrect refinements. The search range in our technique is part of the model that is learned from the training data. Furthermore, our detection schedule is designed to minimize the uncertainty of the detections and optimally select schedule of the hierarchical scales.

**Figure 1:** Examples of multi-object detection: five landmarks of left atrium (LA) apical two chamber (A2C) view (left) and 3D ultrasound volume of fetal brain with three anatomies (right).

Our approach to multi-object detection is motivated by Sequential Estimation techniques, frequently applied to visual tracking. We sample from a sequence of probability distributions, but the sequence specifies a spatial order rather than a time order in tracking. The posterior distribution of each object pose (state) is estimated based on all observations so far. The observations are features computed from image neighborhoods surrounding the objects. The likelihood of a hypothesized state that gives rise to observations is based on a deterministic model learned using a large annotated database of images. The transition model that describes the way the poses of objects are related is Gaussian.

Employing the sequential sampling model allows us to use fewer samples of the object pose and formally extend this class of algorithms to multiple objects. This saves computational time and increases accuracy since the samples are taken from the regions of high probability of the posterior distribution. Many ideas from the Sequential Sampling literature on visual tracking can likely be extended to multi-object detection. We demonstrate the benefit of the sampling when detecting multiple landmarks in 2D images of the left atrium. Unlike in tracking, where the sequential order is naturally determined by the time progression, the order in multi-object detection must be selected. In our algorithm, the order is selected such that the uncertainty of the detections is minimized. So, instead of using the immediate precursor in the Markov process, the transition model could be based on any precursor, which is optimally selected. This leads to a Hierarchical Detection Network (HDN). The likelihood of a hypothesized pose is computed using a trained detector. The detection scale is introduced as another parameter of the likelihood model and the hierarchical schedule is determined in the same way as the spatial schedule.

Results

The goal is to automatically determine the detection order of five left atrium landmarks. The landmark detectors are trained independently using 281 annotated images. Total of 46 annotated images from the testing data set were used to obtain the detection order. The remaining 90 cases were used for detection and evaluation comparison.

**Figure 2:** The final automatically selected detection order. At first, it might seem that landmarks 01 and 17 would be preferred over landmarks 5 and 13 due to the higher distinctiveness of the region. However, the high appearance variation of these landmarks causes preference of landmarks 05 and 13.

Our next experiment is on detecting three fetal brain structures in 3d ultrasound data. The output of the system is a visualization of the plane with correct orientation and centering as well as biometric measurement of the anatomy. A total of 589 expert-annotated images were used for training and 295 for testing. The volumes have average size 250 × 200 × 150 mm. We use three resolutions in a hierarchical system shown in Figure 3. Quantitative evaluation is in Table 1 and several examples of detected structures in Figure 3. The HDN average detection error 2.2 mm is lower compared to 4.8 mm error of a system without HDN.

**Figure 3:** The detection order and the hierarchy of three brain structures: Cerebellum (CER), Cisterna Magna (CM), and Lateral Ventricles (LV). Scale selection is applied.

Table 1: Measurement errors of the hierarchical detection system (top part of the table) compared to an earlier system without the hierarchy. Mean error, standard deviation, median error, and maximum error are computed. The system was trained using number of volumes specified in the 6th column and tested on the number of volumes specified in the 7th column. The average detection error using the hierarchy is 2.2 mm on data with 1 mm finest resolution. The average error of the system without the hierarchy is 4.8 mm.

	mean	std	median	max	#train	#test
CER	2.289	0.884	2.213	4.197	589	295
CM	2.149	0.807	2.075	4.019	589	295
LV	2.245	0.817	2.154	3.891	589	295
CER	4.961	6.767	3.422	59.607	589	295
CM	4.989	6.832	3.519	68.679	589	295
LV	4.565	5.023	3.097	39.176	589	295

**Figure 3:** Final sequential detection result (cyan) compared to ground truth (red). Notice that the landmarks are accurately detected despite the noise, high appearance and shape variations, and shadowing effects. The landmark detection errors (in pixels) are shown below each image in the left-bottom-right order.

(6.39, 6.91, 4.64, 7.21, 6.26)	(2.42, 6.84, 9.95, 7.33, 8.41)	(7.94, 5.03, 7.03, 8.18, 5.00)	(5.24, 9.83, 6.16, 5.12, 7.71)

**Figure 4:** Final hierarchical detection result (cyan) compared to ground truth (red). The last two columns show the agreement of the detection plane in the sagittal and coronal cross section.

Summary

We have presented a Sequential Monte Carlo based Hierarchical Detection Network (HDN) for detecting multiple objects. The order of detection is automatically determined by a greedy algorithm that puts the most reliable detections earlier in the detection sequence. The detectors are organized in a multi-scale hierarchy with the scale parameter included in the order selection process. We have shown the effectiveness of the automatic order selection process on the detection of five left atrium landmarks in 2D ultrasound images. The multi-scale hierarchical detectors have higher detection accuracy than systems based on a single level as we demonstrated on detection of fetal face and three fetal brain structures in 3D ultrasound images.

The described framework opens up several possible avenues of future research. One area we are particularly interested in is how to include dependence on multiple objects at each detection stage. This will result in a stronger geometrical constraint and therefore improve performance on objects that are difficult to detect by exploiting only the pairwise dependence.

Publications and Further Reading

Sofka, M., Zhang, J., Good, S., Zhou, S.K., Comaniciu, D., 2014. Automatic Detection and Measurement of Structures in Fetal Head Ultrasound Volumes Using Sequential Estimation and Integrated Detection Network (IDN). IEEE Transactions on Medical Imaging 33, 1054–1070.
Routine ultrasound exam in the second and third trimesters of pregnancy involves manually measuring fetal head and brain structures in 2D scans. The procedure requires a sonographer to find the standardized visualization planes with a probe and manually place measurement calipers on the structures of interest. The process is tedious, time consuming, and introduces user variability into the measurements. This paper proposes an Automatic Fetal Head and Brain (AFHB) system for automatically measuring anatomical structures from 3D ultrasound volumes. The system searches the 3D volume in a hierarchy of resolutions and by focusing on regions that are likely to be the measured anatomy. The output is a standardized visualization of the plane with correct orientation and centering as well as the biometric measurement of the anatomy. The system is based on a novel framework for detecting multiple structures in 3D volumes. Since a joint model is difficult to obtain in most practical situations, the structures are detected in a sequence, one-byone. The detection relies on Sequential Estimation techniques, frequently applied to visual tracking. The interdependence of structure poses and strong prior information embedded in our domain yields faster and more accurate results than detecting the objects individually. The posterior distribution of the structure pose is approximated at each step by sequential Monte Carlo. The samples are propagated within the sequence across multiple structures and hierarchical levels. The probabilistic model helps solve many challenges present in the ultrasound images of the fetus such as speckle noise, signal drop-out, shadows caused by bones, and appearance variations caused by the differences in the fetus gestational age. This is possible by discriminative learning on an extensive database of scans comprising more than two thousand volumes and more than thirteen thousand annotations. The average difference between ground truth and automatic measu- ements is below 2 mm with a running time of 6.9 seconds (GPU) or 14.7 seconds (CPU). The accuracy of the AFHB system is within inter-user variability and the running time is fast, which meets the requirements for clinical use.
```
@article{sofka:tmi14,
  author = {Sofka, Michal and Zhang, Jingdan and Good, Sara and Zhou, S.~Kevin and Comaniciu, Dorin},
  title = {Automatic Detection and Measurement of Structures
                in Fetal Head Ultrasound Volumes Using Sequential
                Estimation and Integrated Detection Network ({IDN})},
  journal = {IEEE Transactions on Medical Imaging},
  year = {2014},
  month = may,
  volume = {33},
  number = {5},
  pages = {1054--1070},
  doi = {10.1109/TMI.2014.2301936}
}
```

Sofka, M., Ralovich, K., Birkbeck, N., Zhang, J., Zhou, S.K., 2011. Integrated Detection Network (IDN) for Pose and Boundary Estimation in Medical Images. In: Proceedings of the 8th International Symposium On Biomedical Imaging (ISBI 2011). Chicago, IL.
The expanding role of complex object detection algorithms introduces a need for flexible architectures that simplify interfacing with machine learning techniques and offer easy-to-use training and detection procedures. To address this need, the Integrated Detection Network (IDN) proposes a conceptual design for rapid prototyping of object and boundary detection systems. The IDN uses a strong spatial prior present in the medical imaging domain and a large annotated database of images to train robust detectors. The best detection hypotheses are propagated throughout the detection network using sequential sampling techniques. The effectiveness of the IDN is demonstrated on two learning-based algorithms: (1) automatic detection of fetal brain structures in ultrasound volumes, and (2) liver boundary detection in MRI volumes. Modifying the detection pipeline is simple and allows for immediate adaptation to the variations of the desired algorithms. Both systems achieved low detection error (3.09 and 4.20 mm for two brain structures and 2.53 mm for boundary).
```
@inproceedings{sofka:isbi11,
  author = {Sofka, Michal and Ralovich, Krist\'{o}f and Birkbeck, Neil and Zhang, Jingdan and Zhou, S.Kevin},
  title = {Integrated Detection Network ({IDN}) for Pose and Boundary
                Estimation in Medical Images},
  booktitle = {Proceedings of the 8th International Symposium on
                    Biomedical Imaging (ISBI 2011)},
  year = {2011},
  month = {30~Mar -- 2~Apr},
  address = {Chicago, IL}
}
```

Sofka, M., Zhang, J., Zhou, S.K., Comaniciu, D., 2010. Multiple Object Detection by Sequential Monte Carlo and Hierarchical Detection Network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). San Francisco, CA, USA.
In this paper, we propose a novel framework for detecting multiple objects in 2D and 3D images. Since a joint multi-object model is difficult to obtain in most practical situations, we focus here on detecting the objects sequentially, one-by-one. The interdependence of object poses and strong prior information embedded in our domain of medical images results in better performance than detecting the ob- jects individually. Our approach is based on Sequential Estimation techniques, frequently applied to visual tracking. Unlike in tracking, where the sequential order is naturally determined by the time sequence, the order of detection of multiple objects must be selected, leading to a Hierarchical Detection Network (HDN). We present an algorithm that optimally selects the order based on probability of states (object poses) within the ground truth region. The posterior distribution of the object pose is approximated at each step by sequential Monte Carlo. The samples are propagated within the sequence across multiple objects and hierarchical levels. We show on 2D ultrasound images of left atrium, that the automatically selected sequential order yields low mean detection error. We also quantitatively evaluate the hierarchical detection of fetal faces and three fetal brain structures in 3D ultrasound images.
```
@inproceedings{sofka:cvpr10,
  author = {Sofka, Michal and Zhang, Jingdan and Zhou, S.~Kevin and Comaniciu, Dorin},
  title = {Multiple Object Detection by Sequential {M}onte {C}arlo
                    and Hierarchical Detection Network},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision
                    and Pattern Recognition (CVPR)},
  month = "13–-18~" # jun,
  address = {San Francisco, CA, USA},
  year = {2010},
  annote = {}
}
```