Topological issues in image analysis

Go to Computer Vision Wiki

 

 

 

Right Arrow:   Analysis

 

 

            Consider the following questions. How do you teach a computer

·        to count the number of people in the airport or components in a microchip?

·        to tell letter P from letter B or locate all tunnels in a human bone or a large molecule?

·        to find voids in a piece of alloy?

            Why do we care?

            Knowing the number of people in the airport or a supermarket is important for security. Knowing the number of components in a microchip may tell you that it is defective. You need a computer able to distinguish letters to read handwritten tax forms. The existence of a large tunnel in a protein molecule may be an indication that its function is to hold a DNA molecule. The number of tunnels and voids may reflect the strength of a bone affected by osteoporosis or another porous material. The applications are endless.

            To summarize, it is often necessary to compute topological characteristics of the image. Specifically, these characteristics are the so-called Betti numbers which are directly related to the three questions above. 

·        B0 is the number of objects or parts of objects in the image.

·         B1 is the number of holes or tunnels. For letter O and the donut, it is 1, for letter B and the inner tire, it is 2.

·        B2 is the number of voids or cavities. For both a ball and the inner tire, it is 1.

            Currently, the tools available for the analysis of the image are mostly qualitative. Quantitative topological description has been missing or insufficient.

            One of the most important applications of image analysis is image retrieval. The problem is of enormous complexity, so let’s consider just a few very simple search queries.

            Find all portraits from among all my family photos. What about group portraits? These searches are clearly about counting the number of objects. Hence, you need B0.

            Find all Ps, As, and Os in the text. What about Bs? Find X-rays of fractured sculls in a medical database. What about clogged arteries? Find proteins in the Protein Data Bank that can hold a DNA. These searches are clearly about detecting and counting the number of holes and tunnels. Hence, you need B1.

            Find out if there are any air bubbles in the blood stream. Find all intact pipes, tires, balls, etc. Separating intact from damaged objects of almost any kind will require counting the number of parts, tunnels, and voids. Hence, you need B0, B1, and B2.

            Currently, there is no software on the market that can solve all these problems. Some of them are subject of extensive research, but so far the solutions are inadequate in terms of robustness of the results. The approaches are mostly based on statistical study of the image. The topological decomposition of the image is ignored. This makes these methods problematic even for retrieval of binary images. This is further discussed below.

            To fully analyze the image, it may be insufficient to know the number of topological features of each kind. For example, you may want to have a separate count of adults and children in the airport. Even though the microchip has the correct number of components, it may still be defective because these components are of the wrong size or shape. You can’t distinguish letters A and O this way because both have one hole. Whether the tunnels in a human bone go mostly across the bone or along it may significantly affect its strength. A protein can hold a DNA only if the hole is of the appropriate width. The properties of the alloy or foam will depend on the size, shape, and location of the bubbles.

            Therefore, there is a need to capture and extract topological features from the image. Then, beyond counting their number, we can also measure them in any number of ways. We can compare and identify them. We can use them in image search and retrieval. For more on this, see this article in our wiki.

            Another important application of image analysis is “image enhancement”. However, this term carries a judgment about the original image; therefore, we prefer the term image simplification. Simplification means removal of features from the image. It is especially applicable to removal of objects and regions. The choice may be based on the importance of the region to be removed. The measure of importance is determined separately and could be the area of the region or, in case of gray scale images, its contrast. Thus, we can remove objects of small size and/or with low contrast. The process is called “denoising”.

            Some of the above problems are treated by methods well-known in the imaging industry. The first drawback of these methods, however, is their narrow applicability. A given method applies to 2D images but not 3D images, or to “point clouds” but not digital images, or to still images but not movies, etc. Many significantly depend on the context. Therefore, there is a need for a simple, yet unified way to deal with all of these applications.

Image segmentation usually includes prior image denoising. Normally, it requires a priori information about the image and the nature of the noise. Therefore, these techniques cannot be safely applied outside of their original scope. In particular, denoising methods are content dependent because the nature of noise in photography, fingerprinting, or electron microscopy is different.

            Currently the goal of image processing software is to "enhance" the image and simply give the new image to the user. This is unacceptable in image analysis. Indeed, how can one be sure that something important has not been lost during the "enhancement"? The only way is to compare visually the new (“good”) image to the original (“bad”) one. The same problem exists in the context of image compression. Further, both "smoothing" and thresholding introduce arbitrariness to the process of simplification.

            Unspecified changes to the original are unacceptable in many settings. For example, in science the removal of an “insignificant” feature from the image without approval by the researcher can result in a missed discovery. In medicine, such a practice may lead to a misdiagnosis. In forensic imaging, the enhanced image may be challenged in court. In photography, the user has to deal with aftereffects of enhancements, such as halos, blur, etc. Therefore, there is a need for a method that allows total control by the user over what and how much is removed from the image. Read more about our approach.

            From the point of view of a consumer, there is also a need for a simple yet user-controlled image simplification method. Even in a scientific setting, the user is often unfamiliar with either imaging terminology, such as filtering, snakes, dilation, erosion, etc, or advanced mathematics involved, such as wavelets, Fourier transform, or PDEs. When this is the case, the user is forced to accept the results provided by the software and rely entirely on visual inspection and subjective judgment. There is a need for an image analysis and simplification method so elementary that even a user with no background in imaging or mathematics will be able to understand exactly what is happening and control the outcome.