Components
This section briefly describes the components, technology and algorithms used. There are three main components in the digital pen project. They are the pen with an attached camera, the lined paper with specially designed patterns, and the software that processes and reproduced the physical paper on a computer screen.
Pen and Camera


Figure
1: The prototype pen
The prototype created for the project uses a Logitech Zoom web camera, which has been stripped down from its plastic case. The camera is attached to the pen using a metal wire and some tape to hold the wires. The angle of the camera is slightly tilted towards the pen creating a triangle. This allows for the camera to capture everything around the pen except for behind it. The added bonus is also that you can hold then pen using its original grip with fair bit of comfort. Since usually a human user holds the pen under an angle towards them this positions camera vertically (or perpendicular to the paper). See Figure 1 for illustrations.
Lined Paper
The main idea behind the paper is that the user prints lines on a regular white printer paper to turn it into lined paper. The lines themselves have embedded pattern, which allows the software to identify information about the user’s pen location and orientation. See Figure 2 for a sample pattern. The pattern has a number represented by bits where each shade and colour identifies a specific bit and its state. For example, green and red represent the least significant bits with green being the off state; black and gray represent the remaining bits with black being the off state.
This makes the problem similar to one faced in serial protocols. One such protocol is called Serial Line Internet Protocol, which has an END character that is escaped by a sequence of two characters if it is found in the data (RFC1055). Since I have the advantage of making a unique END bit that is not found in the data (by changing its colour) I do not have to worry about escaping it. The pen software is aware of the sequence that is printed on paper, so it can extract location and orientation by detecting the numbers that appear in the photographs. The pattern has gone through some evolution in order to fight off noise and errors.

Figure
2: Pattern printed on white paper
First constraint is that the lines must be one pixel thick, because that is the standard for lined paper. This means that during detection process an error can only appear as a wrongly detected sequence of numbers. This must simply not happen because it means a wrong location will be used and it will disrupt the stitching process. There is no way to fix this except that the user would have to go over the area again with the pen, which is unacceptable. My first pattern was a sequence of numbers incremented by one. The problem with this is that there is limited change between the two adjacent numbers, so they produce a fair bit of false positives. I go into detail on how I improved this in the Algorithms section. The number of errors was reduced, but not eliminated.
In order to completely eliminate the errors I had to add redundancy. To do this I have increased the number of bits that represent each number in the pattern. I started with 11 bits, because to cover the resolution of 506 by 674 (it has the same ratio as 8.5” by 11” paper) with 24 pixel space between the lines I need between around 1300 numbers. With 11 bits I already have (2048-1300) / 1300 or 1:0.57 unused numbers comparing to used. However, I want to have a fair bit of unused numbers to eliminate the error, so I have switched to 12 bits. This used a total of 1218 numbers chosen out of possible 4096 (because the numbers are now longer), which means there are 1:2.36 unused numbers.
Software
The
software is the glue of the previous two components. It takes the input
and produces the output that the user sees on the screen. The process
is explained in Figure
3.

Figure
3: Flow diagram of the detection process
The software allows the user to save and print the pattern to the paper. The next step is to start using the software with the paper and pen. The input is captured from the camera on the pen and passed down a set of routines. The mask stage uses a pre-computed mask, which eliminates the part of the pen that appears in the photograph. Next I perform edge detection by using the Canny Edge Detection. This stage is a sub-process with multiple sub-algorithms fine tuned for this particular application. The Hough Transform is applied to the image with detected edges in order to find lines. Conveniently I can assume that all lines that I care about will extend to infinity within the picture. This gives me nice results to pass to the next stage.
The pattern recognition stage reverse engineers the pattern from the input lines. At this point the input is a line of values, which represent the closest approximation to a shade and colour used in the original pattern. The only problem is that the original bit is now represented by a group of pixels. Once the pattern is detected it is compared against the original pattern in order to verify that two adjacent numbers are in the correct order. The output from this stage is basically the location and the orientation of the photograph. Conveniently I extract the rotation from Hough Transform, which uses it to record the slope of each line. Finally, the last stage adjusts the photograph by resizing and rotating it, then pasting it into the output image preserving the pen mask that I have talked about in the beginning.
I chose Targa format for my image files because it allows me to store the basic file information such as number of channels, width and height along with the raw bytes for each channel. This saved time coding and debugging this portion of the project.