Pragma: no-cache
and with the following body (as a byte array, NOT strings - see code below for clarification):
22, 00, 62, 3C, 0A, 13, 22, 02, 65, 6E, BA, D3, F0, 3B, 0A, 08, 01, 10, 01, 28,
01, 30, 00, 38, 01, 12, 1D, 0A, 09, 69, 50, 68, 6F, 6E, 65, 20, 4F, 53, 12, 03,
34, 2E, 31, 1A, 00, 22, 09, 69, 50, 68, 6F, 6E, 65, 33, 47, 53, 1A, 02, 08, 02,
22, 02, 08, 01
If your post is successful, you should receive a 200 OK. This code demonstrates how to create and post your CSSID (synchronous requests are used for simplicity):
namespace GoogleGoggles.NET
{
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
public class GoogleGoggles
{
// The POST body required to validate the CSSID.
private static byte[] CssidPostBody = new byte[] { 34, 0, 98, 60, 10, 19, 34,
2, 101, 110, 186, 211, 240, 59, 10, 8, 1, 16, 1, 40, 1, 48, 0, 56, 1, 18,
29, 10, 9, 105, 80, 104, 111, 110, 101, 32, 79, 83, 18, 3, 52, 46, 49,
26, 0, 34, 9, 105, 80, 104, 111, 110, 101, 51, 71, 83, 26, 2, 8, 2, 34,
2, 8, 1 };
// Bytes trailing the image byte array. Look at the next code snippet to see
// where it is used in SendPhoto() method.
private static byte[] TrailingBytes = new byte[] { 24, 75, 32, 1, 48, 0, 146,
236, 244, 59, 9, 24, 0, 56, 198, 151, 220, 223, 247, 37, 34, 0 };
// Generates a cssid.
private static string Cssid
{
get
{
Random random = new Random((int)DateTime.Now.Ticks);
return string.Format(
"{0}{1}",
random.Next().ToString("X8"),
random.Next().ToString("X8"));
}
}
static void Main(string[] args)
{
string cssid = GoogleGoggles.Cssid;
GoogleGoggles.ValidateCSSID(cssid);
// See next code snippet for SendPhoto()
//GoogleGoggles.SendPhoto(cssid, yourImageByteArray);
Console.ReadLine();
}
// Validates the CSSID we just created, by POSTing it to Goggles.
private static void ValidateCSSID(string cssid)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
string.Format(
"http://www.google.com/goggles/container_proto?cssid={0}",
cssid));
GoogleGoggles.AddHeaders(request);
request.Method = "POST";
using (Stream stream = request.GetRequestStream())
{
stream.Write(
GoogleGoggles.CssidPostBody,
0,
GoogleGoggles.CssidPostBody.Length);
stream.Flush();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}
private static void AddHeaders(HttpWebRequest request)
{
request.ContentType = "application/x-protobuffer";
request.Headers["Pragma"] = "no-cache";
request.KeepAlive = true;
}
}
}
If you received a 200 OK response, you will use the same CSSID for future request until you receive an error using it. As a rule of thumb, I generate a new one each time I launch the application, and keep on using it until I get an error.
Moving on. We can start posting images now that you have a valid CSSID. But first we will need to understand a bit about Google's ProtoBuffer. Protocol Buffers, or ProtoBuffer, is Google's way of serializing structured data for use in communication protocols, data storage, and more (in their own words). Explaining ProtoBuffer in detail is out of the scope of this post - I will only cover the basics, so you will have to follow the link if you would like to know more about it. First you will need the following:
- A jpeg image - I haven't played with other formats. If you will be using images taken from a phone camera (if you are building a phone application for example), you will likely need to resize it since high-megapixel phone cameras are common now a days.
- The size of the jpeg image in bytes.
You will then need to calculate the following integers. Let x be the size of the image in bytes.
a = x + 32
b = x + 14
c = x + 10
Now, you will need to encode a, b, c and x into varints. Let's assume x = 27985. To encode it into variant, do the following:
- Convert the number into binary:
110110101010001
- From right to left, divide the bits into groups of 7 bits each. Append zeros to the most left group to make it 7 bits if needed:
0000001 1011010 1010001
- Reverse the groups:
1010001 1011010 0000001
- Convert the groups of 7 bits into bytes. From left to right, set the msb (most significant bit) if there are further bytes to come. If you reach the last byte, the msb should 0:
11010001 11011010 00000001
0xD1DA01
So x is 0xD1DA01 in varint. Using the same exercise, you will find that a, b and c encoded in varint to be 0xF1DA01, 0xDFDA01 and 0xDBDA01 respectively.
Now you are ready to conduct a search using a picture. Send a POST request to the same URL http://www.google.com/goggles/container_proto?cssid=[your cssid] with the same headers:
Content-Type: application/x-protobuffer
Pragma: no-cache
and with the following body (as a byte array, NOT strings):
0A a 0A b 0A c 0A x [image as a byte array] [trailing bytes]
With our example, that would be:
0A F1 DA 01 0A DF DA 01 0A DB DA 01 0A D1 DA 01[image as a byte array] [trailing bytes]
The trailing bytes are (as byte array, NOT strings - see code below for more details):
18, 4B, 20, 01, 30, 00, 92, EC, F4, 3B, 09, 18, 00, 38, C6, 97, DC, DF, F7, 25,
22, 00
This code summarizes the above and shows how to post an image (again, using synchronous requests for simplicity):
// Conducts an image search by POSTing an image to Goggles, along with a valid CSSID.
public static void SendPhoto(string cssid, byte[] image)
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
string.Format(
"http://www.google.com/goggles/container_proto?cssid={0}",
cssid));
GoogleGoggles.AddHeaders(request);
request.Method = "POST";
// x = image size
int x = image.Length;
byte[] xVarint = GoogleGoggles.ToVarint32(x).ToArray<byte>();
// a = x + 32
byte[] aVarint = GoogleGoggles.ToVarint32(x + 32).ToArray<byte>();
// b = x + 14
byte[] bVarint = GoogleGoggles.ToVarint32(x + 14).ToArray<byte>();
// c = x + 10
byte[] cVarint = GoogleGoggles.ToVarint32(x + 10).ToArray<byte>();
// 0A [a] 0A [b] 0A [c] 0A [x] [image bytes]
using (Stream stream = request.GetRequestStream())
{
// 0x0A
stream.Write(new byte[] { 10 }, 0, 1);
// a
stream.Write(aVarint, 0, aVarint.Length);
// 0x0A
stream.Write(new byte[] { 10 }, 0, 1);
// b
stream.Write(bVarint, 0, bVarint.Length);
// 0x0A
stream.Write(new byte[] { 10 }, 0, 1);
// c
stream.Write(cVarint, 0, cVarint.Length);
// 0x0A
stream.Write(new byte[] { 10 }, 0, 1);
// x
stream.Write(xVarint, 0, xVarint.Length);
// Write image
stream.Write(image, 0, image.Length);
// Write trailing bytes
stream.Write(
GoogleGoggles.TrailingBytes,
0,
GoogleGoggles.TrailingBytes.Length);
stream.Flush();
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}
// Encodes an int32 into varint32.
public static IEnumerable<byte> ToVarint32(int value)
{
int index = 0;
while ((0x7F & value) != 0)
{
int i = (0x7F & value);
if ((0x7F & (value >> 7)) != 0)
{
i += 128;
}
yield return ((byte)i);
value = value >> 7;
index++;
}
}
If the POST succeeded, you should receive a 200 OK with a ProtoBuffer response buddy. The response will include a bunch of URLs depending on the results your image search generated. If Goggles did not find any results to your image search, it would return URLs of similar images to browse from. If it found one or more results for your search, it will return URLs representing the search results. In other words, the search results are URLs viewed in a web browser, rather than receiving an XML or a JSON blob. I will not explain how to interpret/parse the URLs out of the ProtoBuffer response body in this post (perhaps in a future post).
Before I end this post, let's take a quick look at an example. If you post the image shown below, using the above directions/code:
you will get the results similar to this (obtained from Wireshark trace):
24b
..
.....;B
.......s '.../.-
http://www.google.com/goggles/images/text.png2.en...;Q.O
http://www.google.com/goggles/result?
cssid=3FEDE280756ECB89&rh=F0FB212061A4DC0D...=.
.Hello, goggles!..Text
.....;..
.......} ......
Bhttp://benedictehh.blogg.no/images/315802-2-1241960116211-
n400.jpg._http://t1.gstatic.com/images?q=
tbn:TGUR3fPDjEkjMM&w=120&h=87&usg=___LKNIk0JoGL1z6Rc43y8v
yv9DDE=.Ahttp://benedictehh.blogg.no/1241959699_lesevurdere_
din_blogg.html...;Q.Ohttp://www.google.com/goggles
/result?cssid=3FEDE280756ECB89&rh=066B8E469AA45B09...="
.religi..se bilder.
Similar Image...;...E..Y...
0
You will notice that the Goggles was able to convert the image to text (Hello, goggles! in green). The highlighted link in yellow is the search result with "Hello, goggles!" as the search term. The link will have probably expired by the time you check it out.
The rest of the ProtoBuffer (most of them represented by '.' because the values don't have an ASCII representation) is probably additional meta-data that the Google provides for a better UX. I am willing to bet that some of these are coordinates inside of the picture, which when interpreted correctly, it shows the user where the text in the picture resides. I am not interested enough to decode them myself though (yet).
If you are planning to build an application using this, I am interested to hear what is it you are trying to build - please feel free to drop a line. Similarly, if you have any comments or suggestions, please do not hesitate to post a comment.