transferObject Detection

Object Localization

  • figure out where in the picture is the car
  • detect all objects and localize them

Bounding Boxes

  • : is there any object
  • : x of midpoint
  • : y of midpoint
  • : the height
  • : the width
  • class label

sliding windows detection

    • input a small rectangular region
    • make predictiosn in a ConvNet in the region
    • slide the square across the entire image
    • convolution on the entire image
    • Make predictions at the same time

YOLO algorithm

  • single convolutional implementation
  • take the midpoint of each of the objects
  • assign the object to the grid cell containing the midpoint
  • output labels


Single Label

  • precise bounding boxes
    • (5 + m) dimensional output
      • m: number of class of objects

Target Label

    • n: number of cells

Intersection over Union (交并比)

  • the intersection over union of two bounding boxes
  • correct if IoU 0.5 (at least)

Non-max Suppression (非极大值抑制)

  • clean up multiple detections
    • may end up with multiple detections on each object


  • discard all cells with low under threshold
  • pick the cells with the highest
  • suppress all other cells with a high overlap with a high IoU

Anchor Boxes

  • detect overlapping objects
  • assign an object
    • to the grid cell containing the midpoint
    • to an anchor box with the highest IoU
    • (5 + m) * a dimensional output

Face Recognition

Face Verification Face Recognition
classify whether an input image is a person classify who the person on the image is

Face Verification

similarity function

  • d(img1, img2) = degree of difference between images
    • : same person
    • : different person
    • : hyperparameter

Siamese Network

  • learning parameters that
    • if are the same person, then is small
    • if are not the same person, then is large

Triplet Loss

  • triplet
    • Anchor
    • Positive
    • Negative
  • minimize
    • d(A, P) d(A ,N)
      • : margin
Loss Function
  • choose triplets that are hard to train
    • d(A, P) d(A ,N)

Binary Classification

Neural Style Transfer

Cost Function

Content Cost Function

  1. use pre-trained ConvNet
  2. let and be the activation of layer l
  3. if and are similar, both images have similar content

Style Cost Function

Style Matrix for the Style Image
  • : activation a (i, j, k)

    • size:
    • S: style image
    • : measure how correlated are channels and
Style Matrix for the Generated Image

Cost Function

Generate Image

  1. Initiate G randomly
  2. use gradient descent to minimize J(G)

results matching ""

    No results matching ""