2020년 1월 24일 금요일

[Scrap] Zero to Hero: Guide to Object Detection using Deep Learning: Faster R-CNN,YOLO,SSD

Zero to Hero: Guide to Object Detection using Deep Learning: Faster R-CNN,YOLO,SSD

In this post, I shall explain object detection and various algorithms like Faster R-CNN, YOLO, SSD. We shall start from beginners’ level and go till the state-of-the-art in object detection, understanding the intuition, approach and salient features of each method.
What is Image Classification?:
Image classification takes an image and predicts the object in an image. For example, when we built a cat-dog classifier, we took images of cat or dog and predicted their class:
What do you do if both cat and dog are present in the image:
What would our model predict? To solve this problem we can train a multi-label classifier which will predict both the classes(dog as well as cat). However, we still won’t know the location of cat or dog. The problem of identifying the location of an object(given the class) in an image is called localization. However, if the object class is not known, we have to not only determine the location but also predict the class of each object.
Predicting the location of the object along with the class is called object Detection. In place of predicting the class of object from an image, we now have to predict the class as well as a rectangle(called bounding box) containing that object. It takes 4 variables to uniquely identify a rectangle. So, for each instance of the object in the image, we shall predict following variables:
class_name, 
bounding_box_top_left_x_coordinate,
bounding_box_top_left_y_coordinate,
bounding_box_width,
bounding_box_height
Just like multi-label image classification problems, we can have multi-class object detection problem where we detect multiple kinds of objects in a single image:
In the following section, I will cover all the popular methodologies to train object detectors. Historically, there have been many approaches to object detection starting from Haar Cascades proposed by Viola and Jones in 2001. However, we shall be focussing on state-of-the-art methods all of which use neural networks and Deep Learning.
Object Detection is modeled as a classification problem where we take windows of fixed sizes from input image at all the possible locations feed these patches to an image classifier.
Demo of Sliding WIndow detector      
Each window is fed to the classifier which predicts the class of the object in the window( or background if none is present). Hence, we know both the class and location of the objects in the image. Sounds simple! Well, there are a few more problems. How do you know the size of the window so that it always contains the image? Look at examples:
Small sized object
Big sized object. What size do you choose for your sliding window detector?

As you can see that the object can be of varying sizes. To solve this problem an image pyramid is created by scaling the image.Idea is that we resize the image at multiple scales and we count on the fact that our chosen window size will completely contain the object in one of these resized images. Most commonly, the image is downsampled(size is reduced) until certain condition typically a minimum size is reached. On each of these images, a fixed size window detector is run. It’s common to have as many as 64 levels on such pyramids. Now, all these windows are fed to a classifier to detect the object of interest. This will help us solve the problem of size and location.
                                          
There is one more problem, aspect ratio. A lot of objects can be present in various shapes like a sitting person will have a different aspect ratio than standing person or sleeping person. We shall cover this a little later in this post. There are various methods for object detection like RCNN, Faster-RCNN, SSD etc. Why do we have so many methods and what are the salient features of each of these? Let’s have a look:

1. Object Detection using Hog Features:

In a groundbreaking paper in the history of computer vision, Navneet Dalal and Bill Triggs introduced Histogram of Oriented Gradients(HOG) features in 2005. Hog features are computationally inexpensive and are good for many real-world problems. On each window obtained from running the sliding window on the pyramid, we calculate Hog Features which are fed to an SVM(Support vector machine) to create classifiers. We were able to run this in real time on videos for pedestrian detection, face detection, and so many other object detection use-cases.

2. Region-based Convolutional Neural Networks(R-CNN):

Since we had modeled object detection into a classification problem, success depends on the accuracy of classification. After the rise of deep learning, the obvious idea was to replace HOG based classifiers with a more accurate convolutional neural network based classifier. However, there was one problem. CNNs were too slow and computationally very expensive. It was impossible to run CNNs on so many patches generated by sliding window detector. R-CNN solves this problem by using an object proposal algorithm called Selective Search which reduces the number of bounding boxes that are fed to the classifier to close to 2000 region proposals. Selective search uses local cues like texture, intensity, color and/or a measure of insideness etc to generate all the possible locations of the object. Now, we can feed these boxes to our CNN based classifier. Remember, fully connected part of CNN takes a fixed sized input so, we resize(without preserving aspect ratio) all the generated boxes to a fixed size (224×224 for VGG) and feed to the CNN part. Hence, there are 3 important parts of R-CNN:
  1. Run Selective Search to generate probable objects.
  2. Feed these patches to CNN, followed by SVM to predict the class of each patch.
  3. Optimize patches by training bounding box regression separately.

3. Spatial Pyramid Pooling(SPP-net):

Still, RCNN was very slow. Because running CNN on 2000 region proposals generated by Selective search takes a lot of time. SPP-Net tried to fix this. With SPP-net, we calculate the CNN representation for entire image only once and can use that to calculate the CNN representation for each patch generated by Selective Search. This can be done by performing a pooling type of operation on JUST that section of the feature maps of last conv layer that corresponds to the region. The rectangular section of conv layer corresponding to a region can be calculated by projecting the region on conv layer by taking into account the downsampling happening in the intermediate layers(simply dividing the coordinates by 16 in case of VGG).
There was one more challenge: we need to generate the fixed size of input for the fully connected layers of the CNN so, SPP introduces one more trick. It uses spatial pooling after the last convolutional layer as opposed to traditionally used max-pooling. SPP layer divides a region of any arbitrary size into a constant number of bins and max pool is performed on each of the bins. Since the number of bins remains the same, a constant size vector is produced as demonstrated in the figure below.
However, there was one big drawback with SPP net, it was not trivial to perform back-propagation through spatial pooling layer. Hence, the network only fine-tuned the fully connected part of the network. SPP-Net paved the way for more popular Fast RCNN which we will see next.  

4. Fast R-CNN:

              Fast RCNN uses the ideas from SPP-net and RCNN and fixes the key problem in SPP-net i.e. they made it possible to train end-to-end. To propagate the gradients through spatial pooling,  It uses a simple back-propagation calculation which is very similar to max-pooling gradient calculation with the exception that pooling regions overlap and therefore a cell can have gradients pumping in from multiple regions.
              One more thing that Fast RCNN did that they added the bounding box regression to the neural network training itself. So, now the network had two heads, classification head, and bounding box regression head. This multitask objective is a salient feature of Fast-rcnn as it no longer requires training of the network independently for classification and localization. These two changes reduce the overall training time and increase the accuracy in comparison to SPP net because of the end to end learning of CNN.

5. Faster R-CNN:

So, what did Faster RCNN improve? Well, it’s faster. And How does it achieve that? Slowest part in Fast RCNN was Selective Search or Edge boxes. Faster RCNN replaces selective search with a very small convolutional network called Region Proposal Network to generate regions of Interests.
To handle the variations in aspect ratio and scale of objects, Faster R-CNN introduces the idea of anchor boxes. At each location, the original paper uses 3 kinds of anchor boxes for scale 128x 128, 256×256 and 512×512. Similarly, for aspect ratio, it uses three aspect ratios 1:1, 2:1 and 1:2. So, In total at each location, we have 9 boxes on which RPN predicts the probability of it being background or foreground. We apply bounding box regression to improve the anchor boxes at each location. So, RPN gives out bounding boxes of various sizes with the corresponding probabilities of each class. The varying sizes of bounding boxes can be passed further by apply Spatial Pooling just like Fast-RCNN. The remaining network is similar to Fast-RCNN. Faster-RCNN is 10 times faster than Fast-RCNN with similar accuracy of datasets like VOC-2007. That’s why Faster-RCNN has been one of the most accurate object detection algorithms. Here is a quick comparison between various versions of RCNN.

Regression-based object detectors:

So far, all the methods discussed handled detection as a classification problem by building a pipeline where first object proposals are generated and then these proposals are send to classification/regression heads. However, there are a few methods that pose detection as a regression problem. Two of the most popular ones are YOLO and SSD. These detectors are also called single shot detectors. Let’s have a look at them:

6. YOLO(You only Look Once):

For YOLO, detection is a simple regression problem which takes an input image and learns the class probabilities and bounding box coordinates. Sounds simple?
          YOLO divides each image into a grid of S x S and each grid predicts N bounding boxes and confidence. The confidence reflects the accuracy of the bounding box and whether the bounding box actually contains an object(regardless of class). YOLO also predicts the classification score for each box for every class in training. You can combine both the classes to calculate the probability of each class being present in a predicted box.
So, total SxSxN boxes are predicted. However, most of these boxes have low confidence scores and if we set a threshold say 30% confidence, we can remove most of them as shown in the example below.
Notice that at runtime, we have run our image on CNN only once. Hence, YOLO is super fast and can be run real time. Another key difference is that YOLO sees the complete image at once as opposed to looking at only a generated region proposals in the previous methods. So, this contextual information helps in avoiding false positives. However, one limitation for YOLO is that it only predicts 1 type of class in one grid hence, it struggles with very small objects.

7. Single Shot Detector(SSD):

Single Shot Detector achieves a good balance between speed and accuracy. SSD runs a convolutional network on input image only once and calculates a feature map. Now, we run a small 3×3 sized convolutional kernel on this feature map to predict the bounding boxes and classification probability. SSD also uses anchor boxes at various aspect ratio similar to Faster-RCNN and learns the off-set rather than learning the box. In order to handle the scale, SSD predicts bounding boxes after multiple convolutional layers. Since each convolutional layer operates at a different scale, it is able to detect objects of various scales.
That’s a lot of algorithms. Which one should you use? Currently, Faster-RCNN is the choice if you are fanatic about the accuracy numbers. However, if you are strapped for computation(probably running it on Nvidia Jetsons), SSD is a better recommendation. Finally, if accuracy is not too much of a concern but you want to go super fast, YOLO will be the way to go. First of all a visual understanding of speed vs accuracy trade-off:
SSD seems to be a good choice as we are able to run it on a video and the accuracy trade-off is very little. However, it may not be that simple, look at this chart that compares the performance of SSD, YOLO, and Faster-RCNN on various sized objects. At large sizes, SSD seems to perform similarly to Faster-RCNN. However, look at the accuracy numbers when the object size is small, the gap widens.
YOLO vs SSD vs Faster-RCNN for various sizes
Choice of a right object detection method is crucial and depends on the problem you are trying to solve and the set-up. Object Detection is the backbone of many practical applications of computer vision such as autonomous cars, security and surveillance, and many industrial applications. Hopefully, this post gave you an intuition and understanding behind each of the popular algorithms for object detection.

2020년 1월 8일 수요일

편광필름의 종류

편광필름의 종류


1. 요오드계 편광필름
고투과 고편광 특성을 갖는 고선명 LCD용 편광필름으로서 투명한 PVA 필름에 가시광 영역의 빛 흡수 능력을 부여하기 위해 높은 이색성을 갖는 요오드를 사용한 필름. 현재까지 LCD에 사용되는 편광필름의 대부분은 요오드계 편광필름입니다.
고온 및 고습도 조건에서도 광학특성의 변화가 적은 고내구성 편광필름으로서 내구성이 높은 장점 때문에 높은 내구성을 요구하는 LCD에 사용되고 있습니다. 또한 염료계 편광필름의 경우 색깔의 조절이 비교적 용이하기 때문에 다양한 색의 편광필름을 제조할 수 있으며 이는 선글라스 등의 분야에도 사용되고 있습니다.
2. 위상차 편광필름
고속응답형 액정 디스플레이(STN-LCD : Super Twisted Nematic LCD)용에 주로 사용되며 편광필름에 어떤 특성을 갖는 위상차 필름을 어떤 각도로 적용하느냐에 따라 매우 다양한 제품이 생산됩니다. 백라이트(backlight)에서 나오는 빛이 패널의 아래쪽 편광 필름에서 선형 편광되어 광학적 이방성을 갖는 액정을 통과하는 경우, 액정 cell을 수직으로 통과할 때와 비스듬히 통과할 때 그 위상지연(retardation)값이 서로 달라 위상차가 발생합니다. 위상차 필름은 이러한 액정 자체의 복굴절로 인한 위상차를 보상해 주는 역할을 하여 R, G, B 에 따른 다른 빛의 투과량을 비슷하게 조절하여 시야각 보상 및 색상 개선 등의 기능을 보유한 편광필름입니다. PC (Ploycarbonate)재질의 위상차 필름이 주로 STN에 사용되고 있으며, 최근에는 노보넨(Norbornene) 계열의 COP(Cycle-olefin Polymer) 고분자들이 사용된 위상차 편광필름이 TFT-LCD VA mode에 적용되고 있습니다. 위상차 편광필름은 사용되는 액정 모드에 따라 매우 많은 종류가 있습니다.
3. 액정보상 편광필름
STN LCD에서 위상보상을 위하여 PC 재질의 위상차 필름을 많이 사용하지만, 이는 특정한 파장에서만 보상이 가능하므로 완벽한 B/W 구현이 어렵습니다. 이를 개선하기 위하여 STN cell의 액정과 반대로 꼬아놓은 액정코팅 보상필름을 사용하여 STN cell을 통과하여 발생한 위상차를 전 파장에서 보상시켜 완벽한 B/W 구현이 가능한 편광필름입니다. 이러한 액정보상 편광필름은 투과형/반투과형 Color STN LCD와 B/W STN에 적용되고 있습니다.
4. 반투과 편광필름
투과 특성과 반사 특성을 동시에 갖는 반투가 반사형 편광필름입니다. 이동기기의 디스플레이에서 가장 중요한 요소중의 하나는 소비전력으로, 소비전력이 높으면 제품의 사용시간이 짧아지는 문제점이 있기 때문에 소비전력이 높은 기존의 투과형 제품 대신에 외부의 빛을 활용하는 반사기능을 첨가하여 생산되는 반투과형 LCD에 하판용 재료로 사용되는 기능성 필름입니다. 이러한 반투과 편광필름은 사용하는 재료와 특성값에 따라 다양한 종류가 있습니다. BW-STN LCD에는 점착제에 안료를 첨가하여 투과율을 조절한 제품(ST type)이 적용되며 CSTN LCD와 TFT LCD에는 각각의 액정 모드에 적합한 위상차 편광필름이 사용되고 있습니다.
5. 고반사 반투과 편광필름
최근 STN-NCdl에서 소비전력을 줄이고 디스플레이의 외관을 좀 더 깨끗하게 보이도록 하기 위해서 기존 안료를 사용한 반투과 편광필름 대신 금속증착 필름을 사용하여 반사율을 높이고 확산 점착제를 사용해 외관을 깨끗하게 개선한 고반사 반투과 편광필름(SG type) 사용이 확대되고 있습니다.
6. 반사형 편광필름
일반 투과형(요오드계) 편광필름에 금속증착반사 필름을 합지해서 만드는 제품으로 반사형 LCD에 사용합니다.
7. 표면반사 방지 편광필름
표면반사방지는 AG(Anti-glare) 표면처리 가공과 AR(Anti-reflect) 표면처리 가공의 2가지 방식이 있습니다. AG가공은 필름의 표면에 불규칙한 면을 형성 함으로써 외부빛을 표면에서 난반사시켜 반사방지효과를 나타내며 AR가공은 굴절률(Reflective Index)이 다른 여러층의 박막을 증착법이나 코팅법에 의해서 필름의 표면에 형성함으로써 반사방지 효과를 나타냅니다. 일반적으로 반사방지 가공을 하지 않는 편광필름의 반사율은 약 4%이며 AG필름은 반사율이 약 2%, AR 필름은 반사율 0.5% 미만의 값을 갖습니다.


출처: https://blog.naver.com/moys79/110024192997

[Scrap] Zero to Hero: Guide to Object Detection using Deep Learning: Faster R-CNN,YOLO,SSD

Zero to Hero: Guide to Object Detection using Deep Learning: Faster R-CNN,YOLO,SSD https://cv-tricks.com/object-detection/faster-r-cnn-yo...