Algorithms
In this section I discuss important algorithms that were used and how they have been customized for this project. If I found few algorithms that are close together in what they do, then I have provided description, implementation and reasons why I used one of them over the other.
Generate Pattern
The pattern is generated using the following steps:- Create a sequence of incrementing positive integers (i.e. 1, 2, 3, …, 212)
- Randomly shuffle the integers in the sequence
- Pick an unused integer from the sequence with bits n = {b1, b2, …, b12} such that:
- COUNT(bi = pi) = 2, where 1 = i = 12 and pi is a bit from last used integer
- 3 = COUNT(bi = 1) < 9 for all 1 = i = 12
- 3 = COUNT(bi = 0) < 9 for all 1 = i = 12
- For every subset S = {bi, bi+1, …, bi+k} where 1 = i < i+k = 12 such that every element of S is the same, LENGTH(S) = 4
- Let revN = {b12, ..., b1}, check if revN has not been used before
- Flag numbers n and revN as used
- Repeat step 3 until there are no unused integers remaining that fit the properties
These steps produce a sequence where any two adjacent integers are different from each other in their bit representation and have varying 1’s and 0’s. The reasons for making these specific properties is in order to make it easier to detect the pattern considering amount of noise that is present from printing, lighting and lens effects. Condition 3a checks the count of matching bits in the same position of previous and current integers. Condition 3b and 3c check that there are enough variety of 1’s and 0’s in the current integer. Condition 3d checks that length of any substring of consecutive and identical bits is no longer than 4, because otherwise there will not be much variation in the pattern. Condition 3e makes sure that a reverse order of bits does not match another number, because this can produce false positives.
Next, the sequence is converted into its bit representation where each bit and its state is coded using assigned shade and colour. This is then printed on each line where 24 pixels space is provided for the user to write between the lines. The final image is saved into a file called “pattern.tga”.
Canny Edge Detection
I have chosen Canny Edge Detection, because “[it is] probably the most used edge detector in today’s machine vision community” (Trucco & Verri, pp. 70). The steps taken are as follows:
- Apply Gaussian smoothing kernel
- Use Sobel operator to calculate horizontal and vertical gradient
- Detect edge direction using arctangent and round it to nearest neighbour
- Apply non-maximum suppression along the edge using the direction
- Threshold and hysteresis
(Trucco & Verri, pp. 58-60, 71-80)
The Gaussian smoothing kernel is built as a 5 by 5 integer matrix with sigma 1.4. The sigma was chosen because it provides enough smoothing to eliminate the little holes visible in the physical paper due to the zooming of the lens on the camera. See Figure 4 for the kernel. Note that the sum is divided by 115.
| 2 | 4 | 5 | 4 | 2 |
| 4 | 9 | 12 | 9 | 4 |
| 5 | 12 | 15 | 12 | 5 |
| 4 | 9 | 12 | 9 | 4 |
| 2 | 4 | 5 | 4 | 2 |
Figure 4: Gaussian smoothing kernel
The Sobel operator uses pair of 3 by 3 convolution masks to estimate the horizontal and vertical gradients. See Figure 5 for the two masks. The gradient is then approximated using |Gx| + |Gy|. I use this information to calculate the angle of the edge at each pixel by performing arctangent of Gy/Gx and rounding the angle to 0, 45, 90 and 135. These four angles represent all the possible straight edges that can pass through the eight possible neighbouring pixels.
| -1 | -2 | -1 |
| 0 | 0 | 0 |
| 1 | 2 | 1 |
| -1 | 0 | 1 |
| -2 | 0 | 2 |
| -1 | 0 | 1 |
Figure 5: Horizontal and vertical convolution masks for Sobel operator
Non-maximum suppression is performed on each pixel if its respective neighbours according to the angle have greater edge value than the current pixel. Otherwise, the edge value is checked against upper and lower thresholds followed by hysteresis that suppresses the pixel if all the eight neighbours are not edge pixels. These steps provide me with a thin edge that can be used in the next stage of the process.
| 0 | -1 | -1 |
| 1 | 1 | -1 |
| 0 | 1 | 0 |
| -1 | -1 | -1 |
| 0 | 1 | 0 |
| 1 | 1 | 1 |
An
alternative to performing non-maximum suppression and hysteresis is
to apply the thinning kernels under four rotations: 0, 90, 180 and 270
degrees (Gonzalez, ch. 9). See Figure 6
for the kernels where -1 must be a non-edge, 1 must be an edge, and
0 can be either. This is an iterative approach, which requires all 8
versions of the kernels to be applied until the image converges. My
testing, as well as others’ (Fisher, 2003), show that this method
is comparable to Canny in terms of edge detection, but produces more
noise. I found that it can be slower due to the multiple iterations.
I decided against using this method in my experiments (implementation
can be found the function Cimage::Thinning for further consideration).

Figure
7: Input image (left) and edge detection output (right)
Refer to Figure 7 for the input image being processed by Canny Edge detector. You can observe that most of the noise is filtered out and that the two lines are perfectly available for the next stage. Also pay attention how lens blurred the left line, which resulted in a much thicker line and produced a set of rectangles. This can cause problems, because it will most likely be detected as two parallel lines rather than one.
Hough Transform
I use a version of Hough Transform that detects straight lines. The parametric equation I use to describe a straight line is
x cos(m) + y sin(m) = b
where (x, y) are the coordinates of a pixel and (m, b) are the varying orientation and radius from origin parameters of all the lines that pass through that pixel. Then in (m, b) space one (x, y) pixel is represented as a line. An intersection of two lines in (m, b) space implies that there is a possible line between the two (x, y) pixels in (x, y) space. Thus, a line in (x, y) space is represented by an intersection of many lines in (m, b) space, which basically vote for it. One way to detect these lines is to look for local maxima in the (m, b) space. The steps to this algorithm are as follows:
- Define the (m, b) parameter space as a 2-dimensional integer array D of the size 360 (degrees) by width × height (maximum offset from the origin)
- Set D to zero and populate it with votes based on edge pixels in (x, y) space
- Select lowest unused threshold from a predefined list
- Find all local maxima in D greater than a threshold
- Look for a good line count:
- If line count is between 1 and MaxLines, then break
- Else repeat step 3 with a higher threshold
- Check each line against other lines to see if it has an angle and offset very close to another line in which case I average the similar ones
(Trucco & Verri, pp. 96-99)
The
algorithm makes each edge pixel vote for the possible lines that can
go through it. The local maxima are used to find lines that get the
most votes. See Figure
8 for sample parameter
space on the left where there are four maxima. If you see Figure 7
again, then you can notice that there are four connected groups: two
lines, one dot in the middle, and one small line on the right. Therefore,
the two maxima in the centre that are very close to each other are what
I really want and I pick them out by using thresholds in steps 3-4.

Figure
8: Hough parameter space (left) and detected lines (right)
Step 6 is necessary to eliminate parallel lines that are very close to each other, because the edge detection sometimes finds two edges that describe one actual line on the paper. Furthermore, in the next stages I can use the angle of the line to rotate the photograph, so I pass it down to the next stage. This advantage is another reason I chose Hough Transform for my line recognition.
Pattern Recognition
This algorithm runs through each line from the previous section and attempts to find colours, construct numbers, validate sequence and store valid points. It is a multistage algorithm that can be best described with pseudo code below.
|
for each line
votes <- array[ length(line) ] for i = 1 to length(line) votes[i] <- bestVote( line[i] ) end points <- nullVector() for way = 1 to 2 if way = 2 then reverse( votes ) end segments <- extractSegments( votes ) prevNum <- -1 for each s in segments num <- convertSegmentToBits( s ) if properPattern( prevNum, num ) then points.add( num ) end prevNum <- num end end line.add( points ) end |
Figure 9: Pseudo code for pattern recognition
From the previous stage we have lines and all the pixels that are on the line. I have to convert this to what our pattern had initially. To do this a function called Cimage::prVoter goes through every pixel and checks the saturation and hue for either a red or a green pixel. If no colour is found then it checks by intensity for the gray or black pixel. This function returns a list of votes that we can now segment.
Segmenting is done in a function called Cimage::prSegments that creates a segment from two adjacent different colours. This segment is the 11 most significant bits of the number from the pattern. The coloured pixel is the least significant bit. Since we do not know the image orientation at this point we must run the recognition function twice: once forwards and once backwards.
Next, I go through all the segments in Cimage::prRunner and compare the numbers to the pattern. The actual recognition happens in the function called Cimage::prParser where a “runner” runs through the votes in a segment and extrapolates each bit. The size of the bit is trivially calculated based on the segment size divide by 11. However, in some cases due to perspective warp two adjacent segments might be of different size. This means there is a scale factor present. I use this scale factor when determining the size of each bit by applying geometric series summation formula (I know the total size, scalar and number of bits, so I can find the initial bit size).
Stitching
From the line detection and pattern recognition stages I already have collected a fair bit of information that can help me properly position and orientate the photograph. If I choose one line to be the reference line, then Hough Transform has provided me with the rotation to position the line parallel to the horizontal axis. The pattern recognition provides me with a reference number. I can now extract the scale, because I know the distance between the least significant bits of two adjacent numbers and I know how much they should be in the pattern. After scaling and rotating I can subtract the position of the reference number in the pattern from its position in the updated image to get the translation. Knowing this information I can algorithmically apply the steps to the rest of the pixels in the image:
- Scale the image using distance between two adjacent numbers
- Rotate the image using angle of the reference line
- Translate using reference number
The scaling is done using Cimage::Resize function, which uses the sinc interpolator, which tends to produce better results over other similar algorithms (McHugh has good analysis). For rotation I chose Area Mapping algorithm, which is not the fastest but produces good results. The performance is not really an issue at this point, because I apply the rotation to the scaled image that is reduced to around 40 by 30 pixels. The algorithm works by taking each destination pixel and finding closest 4 source pixels that it partially covers. Then I compute the destination value by taking a weighted average of the 4 source using bilinear interpolation. Overall, linear transformation is not the best way to stitch the image since it does not account for perspective effects such as warping or shearing.
To resolve the perspective problem I need at least 4 non-collinear points. To get these points I need 2 lines from which I look for pairs of points. If I have this information, then I can proceed to apply non-linear transformation. Non-linear transforms cannot be composed from a sequence of translations, rotations, shears and scaling for obvious reason that they are not linear. That is why I cannot re-use information from previous stages.
My research showed two practical transforms to bring the photograph back to its original plane. The projection transform is a nonlinear transform, which keeps straight lines straight but does not preserve the angle between them. The lines from the pattern are in fact straight and need to remain straight. The transform can be described by the next equations:
x' = (a*x + b*y + c) / (g*x + h*y + 1)
y' = (d*x + e*y + f) / (g*x + h*y + 1)
The components allow for warping to appear, which is not possible with linear transformations such as affine transform. To solve the transformation matrix I use standard Gaussian Jordan elimination algorithm with pivoting (Numerical Recipes, pp. 36-41). Another possible non-linear 4 point transform is a bilinear transform. The equation for this is:
x' = a*x + b*y + c*x*y + d
y' = e*x + f*y + g*x*y + h
It
appears a bit faster since it avoids division. The down side is that
this transform does not preserve straight lines. Overall, the transform
can estimate the projection transform if the warping effect is small.

Figure
10: Projection (left) and bilinear (right) transforms
In the Figure 10 I show an example of the two projections being applied. It is apparent that projection transform preserves the straight lines, but slightly miss the angles. The bilinear transform warps the top line towards the right location, but the bottom one is still off. I have left both transforms in my implementation as an option.