Thursday, February 3, 2011

Google Goggles API

Update 5/18/11 - Finally I had the time to update the post. I now have a working Google Goggles API set. In case you are wondering what changed: The protobuffer number a has changed from x + 401 to x + 32, where x is the image size. Also, there are trailing bytes that you need to send after the image byte array when posting the photo. All of this information is detailed in the post - just copy the code as-is and it should work.  Also, I am no longer having issues sending images larger than 140 KB.  I do not know what the upper limit is, but it seems to have increased.

Update 2/14/11 
- I have been battling a class of failures that I discovered when posting an image. It turns out Goggles expects images to be less than 140 KB in size. If you pass in an image that is too large (1 MB), you will get a 404 failure. If you pass in an image slightly larger than 140 KB, you will get a 200 OK, but with an empty response. The image has to be less than 140 KB.

Google Goggles posts: Google Goggles is a service that allows you to search the web using pictures taken from your mobile device. I was quite taken with it when I took a picture of a painting in my living room, and it correctly recognized it as a Banksy's Flower Thrower stencil.

I started to look for an API set for Goggles online to help me with a WP7 application that I have in mind, but did not find anything. So I went about trying to reverse engineer it myself, and I came out with some successful results.

A word of caution though, Google hasn't officially published an API set yet, so a lot of this is subject to change. In other words, use it at your own risk.

So let's start. You do not need to authenticate to use the Goggles API. To use the API, you will need to generate the following:

  • CSSID: a string composed of 16 random hexadecimal values. An example of a CSSID would be 43CBE97C0C31FE5A. The CSSID will remain valid for subsequent requests until you receive an error using it, then you will have to generate a new one.

Once you generated a CSSID, you will need to send a POST request to http://www.google.com/goggles/container_proto?cssid=[your cssid] with the following headers:

    Content-Type: application/x-protobuffer
    Pragma: no-cache

and with the following body (as a byte array, NOT strings - see code below for clarification):

    22, 00, 62, 3C, 0A, 13, 22, 02, 65, 6E, BA, D3, F0, 3B, 0A, 08, 01, 10, 01, 28, 

    01, 30, 00, 38, 01, 12, 1D, 0A, 09, 69, 50, 68, 6F, 6E, 65, 20, 4F, 53, 12, 03, 
    34, 2E, 31, 1A, 00, 22, 09, 69, 50, 68, 6F, 6E, 65, 33, 47, 53, 1A, 02, 08, 02, 
    22, 02, 08, 01

If your post is successful, you should receive a 200 OK. This code demonstrates how to create and post your CSSID (synchronous requests are used for simplicity):

namespace GoogleGoggles.NET
{
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Net;

    public class GoogleGoggles
    {
        // The POST body required to validate the CSSID.
        private static byte[] CssidPostBody = new byte[] { 34, 0, 98, 60, 10, 19, 34,
            2, 101, 110, 186, 211, 240, 59, 10, 8, 1, 16, 1, 40, 1, 48, 0, 56, 1, 18,
            29, 10, 9, 105, 80, 104, 111, 110, 101, 32, 79, 83, 18, 3, 52, 46, 49, 
            26, 0, 34, 9, 105, 80, 104, 111, 110, 101, 51, 71, 83, 26, 2, 8, 2, 34,
            2, 8, 1 };
 
        // Bytes trailing the image byte array. Look at the next code snippet to see
        // where it is used in SendPhoto() method.
        private static byte[] TrailingBytes = new byte[] { 24, 75, 32, 1, 48, 0, 146,
            236, 244, 59, 9, 24, 0, 56, 198, 151, 220, 223, 247, 37, 34, 0 };

        // Generates a cssid.
        private static string Cssid
        {
            get
            {
                Random random = new Random((int)DateTime.Now.Ticks);
                return string.Format(
                    "{0}{1}",
                    random.Next().ToString("X8"), 
                    random.Next().ToString("X8"));
            }
        }

        static void Main(string[] args)
        {
            string cssid = GoogleGoggles.Cssid;
            GoogleGoggles.ValidateCSSID(cssid);

            // See next code snippet for SendPhoto()
            //GoogleGoggles.SendPhoto(cssid, yourImageByteArray);
            Console.ReadLine();
        }

        // Validates the CSSID we just created, by POSTing it to Goggles.
        private static void ValidateCSSID(string cssid)
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
                string.Format(
                "http://www.google.com/goggles/container_proto?cssid={0}", 
                cssid));

            GoogleGoggles.AddHeaders(request);
            request.Method = "POST";

            using (Stream stream = request.GetRequestStream())
            {
                stream.Write(
                    GoogleGoggles.CssidPostBody, 
                    0,
                    GoogleGoggles.CssidPostBody.Length);

                stream.Flush();
            }

            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        }

        private static void AddHeaders(HttpWebRequest request)
        {
            request.ContentType = "application/x-protobuffer";
            request.Headers["Pragma"] = "no-cache";
            request.KeepAlive = true;
        }
    }
}

If you received a 200 OK response, you will use the same CSSID for future request until you receive an error using it. As a rule of thumb, I generate a new one each time I launch the application, and keep on using it until I get an error.

Moving on. We can start posting images now that you have a valid CSSID. But first we will need to understand a bit about Google's ProtoBuffer. Protocol Buffers, or ProtoBuffer, is Google's way of serializing structured data for use in communication protocols, data storage, and more (in their own words). Explaining ProtoBuffer in detail is out of the scope of this post - I will only cover the basics, so you will have to follow the link if you would like to know more about it. First you will need the following:

  • A jpeg image - I haven't played with other formats. If you will be using images taken from a phone camera (if you are building a phone application for example), you will likely need to resize it since high-megapixel phone cameras are common now a days.
  • The size of the jpeg image in bytes.

You will then need to calculate the following integers. Let x be the size of the image in bytes.

    a = x + 32
    b = x + 14
    c = x + 10

Now, you will need to encode a, b, c and x into varints. Let's assume x = 27985. To encode it into variant, do the following:
  1. Convert the number into binary:
    110110101010001
  2. From right to left, divide the bits into groups of 7 bits each. Append zeros to the most left group to make it 7 bits if needed:
    0000001 1011010 1010001
  3. Reverse the groups:
    1010001 1011010 0000001
  4. Convert the groups of 7 bits into bytes. From left to right, set the msb (most significant bit) if there are further bytes to come. If you reach the last byte, the msb should 0:
    11010001 11011010 00000001
    0xD1DA01
So x is 0xD1DA01 in varint. Using the same exercise, you will find that a, b and c encoded in varint to be 0xF1DA01, 0xDFDA01 and 0xDBDA01 respectively.

Now you are ready to conduct a search using a picture. Send a POST request to the same URL http://www.google.com/goggles/container_proto?cssid=[your cssid] with the same headers:

    Content-Type: application/x-protobuffer
    Pragma: no-cache

and with the following body (as a byte array, NOT strings):

    0A a 0A b 0A c 0A x [image as a byte array] [trailing bytes]

With our example, that would be:

    0A F1 DA 01 0A DF DA 01 0A DB DA 01 0A D1 DA 01[image as a byte array] [trailing bytes]

The trailing bytes are (as byte array, NOT strings - see code below for more details):

    18, 4B, 20, 01, 30, 00, 92, EC, F4, 3B, 09, 18, 00, 38, C6, 97, DC, DF, F7, 25,
    22, 00

This code summarizes the above and shows how to post an image (again, using synchronous requests for simplicity):

// Conducts an image search by POSTing an image to Goggles, along with a valid CSSID.
public static void SendPhoto(string cssid, byte[] image)
{
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(
        string.Format(
        "http://www.google.com/goggles/container_proto?cssid={0}", 
        cssid));

    GoogleGoggles.AddHeaders(request);
    request.Method = "POST";

    // x = image size
    int x = image.Length;
    byte[] xVarint = GoogleGoggles.ToVarint32(x).ToArray<byte>();

    // a = x + 32
    byte[] aVarint = GoogleGoggles.ToVarint32(x + 32).ToArray<byte>();

    // b = x + 14
    byte[] bVarint = GoogleGoggles.ToVarint32(x + 14).ToArray<byte>();

    // c = x + 10
    byte[] cVarint = GoogleGoggles.ToVarint32(x + 10).ToArray<byte>();

    // 0A [a] 0A [b] 0A [c] 0A [x] [image bytes]
    using (Stream stream = request.GetRequestStream())
    {
        // 0x0A
        stream.Write(new byte[] { 10 }, 0, 1);

        // a
        stream.Write(aVarint, 0, aVarint.Length);

        // 0x0A
        stream.Write(new byte[] { 10 }, 0, 1);

        // b
        stream.Write(bVarint, 0, bVarint.Length);

        // 0x0A
        stream.Write(new byte[] { 10 }, 0, 1);

        // c
        stream.Write(cVarint, 0, cVarint.Length);

        // 0x0A
        stream.Write(new byte[] { 10 }, 0, 1);

        // x
        stream.Write(xVarint, 0, xVarint.Length);

        // Write image 
        stream.Write(image, 0, image.Length);

        // Write trailing bytes
        stream.Write(
            GoogleGoggles.TrailingBytes, 
            0, 
            GoogleGoggles.TrailingBytes.Length);

        stream.Flush();
    }

    HttpWebResponse response = (HttpWebResponse)request.GetResponse();
}

// Encodes an int32 into varint32.
public static IEnumerable<byte> ToVarint32(int value)
{
    int index = 0;

    while ((0x7F & value) != 0)
    {
        int i = (0x7F & value);
        if ((0x7F & (value >> 7)) != 0)
        {
            i += 128;
        }

        yield return ((byte)i);
        value = value >> 7;
        index++;
    }
}

If the POST succeeded, you should receive a 200 OK with a ProtoBuffer response buddy. The response will include a bunch of URLs depending on the results your image search generated. If Goggles did not find any results to your image search, it would return URLs of similar images to browse from. If it found one or more results for your search, it will return URLs representing the search results. In other words, the search results are URLs viewed in a web browser, rather than receiving an XML or a JSON blob. I will not explain how to interpret/parse the URLs out of the ProtoBuffer response body in this post (perhaps in a future post).

Before I end this post, let's take a quick look at an example. If you post the image shown below, using the above directions/code:

you will get the results similar to this (obtained from Wireshark trace):

    24b


    ..
    .....;B
    .......s '.../.-
    http://www.google.com/goggles/images/text.png2.en...;Q.O
    http://www.google.com/goggles/result?
    cssid=3FEDE280756ECB89&rh=F0FB212061A4DC0D...=.
    .Hello, goggles!..Text
    .....;..
    .......} ......
    Bhttp://benedictehh.blogg.no/images/315802-2-1241960116211-
    n400.jpg._http://t1.gstatic.com/images?q=
    tbn:TGUR3fPDjEkjMM&w=120&h=87&usg=___LKNIk0JoGL1z6Rc43y8v
    yv9DDE=.Ahttp://benedictehh.blogg.no/1241959699_lesevurdere_
    din_blogg.html...;Q.Ohttp://www.google.com/goggles
    /result?cssid=3FEDE280756ECB89&rh=066B8E469AA45B09...="
    .religi..se bilder.
    Similar Image...;...E..Y...

    0

You will notice that the Goggles was able to convert the image to text (Hello, goggles! in green). The highlighted link in yellow is the search result with "Hello, goggles!" as the search term. The link will have probably expired by the time you check it out.

The rest of the ProtoBuffer (most of them represented by '.' because the values don't have an ASCII representation) is probably additional meta-data that the Google provides for a better UX. I am willing to bet that some of these are coordinates inside of the picture, which when interpreted correctly, it shows the user where the text in the picture resides. I am not interested enough to decode them myself though (yet).

If you are planning to build an application using this, I am interested to hear what is it you are trying to build - please feel free to drop a line. Similarly, if you have any comments or suggestions, please do not hesitate to post a comment.


52 comments:

  1. Man I wish they would release an official API for this. This is a great resource to stumble upon though. Thanks for sharing!

    ReplyDelete
  2. How to create a image byte array?

    I'm trying different ways but I always get: Bad Request

    ReplyDelete
  3. Hey Stefan, I am assuming that you are reading the image from a file on disk - is that correct? If so, you need to make sure that you are reading the correct amount of bytes from disk. If you right-click the image on your disk, you will notice that the file has two sizes, size and size on disk. You need to read 'size' amount of bytes from disk, otherwise, you will be sending extra gibberish in your POST request.

    A crude way of doing this:

    string path = @"C:\image.jpg";
    byte[] imageByteArray;

    using (FileStream fs = File.Open(path, FileMode.Open))
    {
    FileInfo info = new FileInfo(path);
    imageByteArray = new byte[info.Length];
    fs.Read(imageByteArray, 0, imageByteArray.Length);
    }

    Also, make sure your image is less than 140KB.

    Let me know if this solved your issue. I will see if I can update the post with this info.

    ReplyDelete
  4. Is this still working for you guys?

    ReplyDelete
  5. This is fantastic, and works great - as far as it goes. I'm trying to work out the protobuf stuff - I'd like to see all the data in a structured format so I can do sensible things with it - but I don't have any info on how to go about doing it.

    It looks like you need a .proto file to tell the app how to decode stuff - is that correct? I can't find one for goggles.

    Any idea where I can find one, or is this a closely guarded secret of google's? It seems like this .proto file is like a rosetta stone for the data - but I don't have it. :(

    ReplyDelete
  6. Hi Fadi, thanks for the post. I've been working on this for a while, and made a Google Goggles client for iPhone a few months ago ("Noogle Noggles"). That was with an earlier version of the protocol. Could you shoot me an email? I've got some stuff to show you (and a couple of questions). Thanks!

    ReplyDelete
  7. ^^ Guitar - I'm glad you enjoyed the post. You are correct - in an ideal world, you would need proto buffer so that the client can decode response. My understanding from proto buffers is that only the client developer would know how to decode the response. I don't think it's closely guarded secret, it's just that there is no official API and therefore no .proto file. If Google ever comes with an official API, I suggest they remove proto buffers.

    You can have a go at decoding it yourself (and you will need to interpret the results yourself) by learning from http://code.google.com/apis/protocolbuffers/docs/overview.html. But be warned, it will be a painstaking process, and you are not guaranteed to get it right. Hope this helps.

    ReplyDelete
  8. Hello. Congratulations on your blog. I really liked your article, but I Have Not managed to rud the code. I Do Not Know What language do you use and if you need some library. Greetings

    ReplyDelete
  9. Hi,

    I am getting 400 error response now,but it was working few days before .

    Any idea if goggle has changed something or i am making a mistake some where.

    I am not able to validate the CSS ID also.

    ReplyDelete
  10. Rajantak, it appears that Google updated their backend service and no longer like the data we are sending them =( I am getting HTTP 400 when posting a photo, with "Unparseable Container Request" error string. It's worth mentioning that validating the CSSID still works for me. I will see what I can do and post an update.

    Juanjo, the code is written in .NET C#.

    ReplyDelete
  11. Thanks. I've never worked with .NET C#. Although I am getting 400 error also, I learnt something new. I'll watch your blog.

    ReplyDelete
  12. @fadi Thanks for the confirmation...I am using java to post the data as i am not very much comfortable with .net....
    Hope Google comes with an API soon...

    ReplyDelete
  13. Thanks for your blog post, Fadi. I discovered it by accident only recently and tried to understand a bit more of the protocol.
    Everything i found this far is represented in form of a python dict (basically what would be the .proto-file, but in a different format). You can get it here https://github.com/deetch/goggles-experiment/blob/master/parse_dict.py.
    The code using it doesn't work at the moment because of the recent change mentioned above. To make it work again and to find out more of the protobuf fields I'd need more network captures (I neither own an Android nor IPhone myself, everything i got so far is based on the trace and info Fadi provided).

    ReplyDelete
  14. Thanks customvid - that's some interesting information.

    I was able to get this working again :-) I will update the post soon with my new findings.

    ReplyDelete
  15. Great, Fadi! I've spent a day on this too but couldn't figure out the new API completely. What I got so far:

    0x22007a
    3 bytes: varint of size + 18
    0x08
    <1 byte>
    0x12
    3 bytes: varint of size + 12
    0x0a
    3 bytes: varint of size -336
    0x0a
    3 bytes: varint of size -340
    0x0a
    3 bytes: varint of size -352
    n bytes: jpg dump

    I have no idea what the one byte is for (seems random), and those diffs don't seem always to be the same, for small images they are different (14/9/-10/-13).

    Any more info from your side would be very appreciated!

    ReplyDelete
  16. I finally had the time to update my post. I hope I detailed all the information you guys need to get this working again - let me know if you are having issues, or something is not clear.

    ReplyDelete
  17. I am new to C# but still somehow manage to put your code in place and able to compile it properly. When i run the it, the Console window comes up and it remains there without any messages. I am not receiving 200 OK or 400 message. When i press enter the screen disappears.

    As i am not getting neither successful message(200 Ok) nor error response(400) i suspect that i am doing something wrong. Could anyone figure out what it could be possibly. Many thanks in advance..

    ReplyDelete
  18. I am finally able to see the response OK. (I was not printing anything earlier on console window from the response i was getting, My Bad!).
    Now i am getting response.Statuscode OK for both the requests (one while validating CCSID and other while sending the image to the server. Unfortunately i am not getting anything as a result. When i print request.contentlength in the console window after making the reqest in sendphoto codeblock, i get -1 everytime. Could you please let me know what could be the problem..

    ReplyDelete
  19. Hello again,

    Sorry for posting so many times..I am finally able to get the results (Being naive in C# took me a while)..

    Well as it is difficult to parse the data within C# framework, i think we should try to make similar changes in python code too as it has the parsing facility. I am trying to figure out how to put trailing bytes but so far not succeeded. If any one could do that, please kindly post as it would be of great help to me and hopefully many others.

    Here is the link again to python google goggles:
    https://github.com/deetch/goggles-experiment/

    ReplyDelete
  20. Can someone please post the above code in java tanks

    ReplyDelete
  21. I would like to make this service better, as I am helping design an app for mobile devices for a group of volunteers that I work with who are not native to the English language. We want to use it to recognize forign Magic the Gathering cards, but so far it is only just recognizing them as cards, not what specific card it is.

    Anyone who knows who I can talk to about making this better, please feel free to contact me at Goggles@starcitygames.com

    Thanks
    -=Matt

    ReplyDelete
    Replies
    1. You might want to look at iqengines.com . It's a payed-for service for image recognition and should work for a limited set of cards.

      Delete
  22. The changes are not working still....i just copied the code and tried to run it...i m getting 400 response.
    Anyone who has got the response successfully ??

    Any HELP!!!!

    ReplyDelete
  23. Hey I'm having a hard time getting this to work in C# or python. I've made the changes indicated in the blog but the C# version just returns a 404 and the python version keeps attempting a different CCSID. Any thoughts?

    ReplyDelete
  24. Same here - Getting "Unparseable Container Request" as a response.
    The python example keeps trying new CSSIDs.
    Please help. can you please share some pointers on how this could be resolved?

    ReplyDelete
  25. Great post and really helpful, a follow up article on getting any data out of the protobuffer would be well appreciated.

    I have noticed that my version of the code only works on files that are exactly 20Kb such as the one you use in the post and another one I created.

    ReplyDelete
  26. Hello great post, i want to develop an android application called snap2call that simply takes a picture(using the camera phone) of a poster, billboard, newpaper ad which contains contact details such as phone, email address and then initiates a call to that number or sends an email...i would be glad for any directions in this project. Thanks

    ReplyDelete
  27. Thank you John - it is good to know that it is working for files of 20 KB, as I didn't get the chance to investigate this yet.

    ReplyDelete
  28. Hi Fadi,

    This post is awesome - looks really interesting! Before I try to implement this in to an Android app, are you aware if the code is still working?

    Also, As John says above, it would be really helpful if you could post about how to get data out of the protobuffer.

    Thanks
    Aaron

    ReplyDelete
  29. Hi Fadi,

    Does the code above still work? I get a valid cssid - but posting an image gives a 400 error.

    Any help will be appreciated.

    ReplyDelete
  30. Hi,

    Had this working up until a couple of days ago but I'm now getting 500 internal server errors. I wrote a parser for the protobuf response which parses to XML and was about to publish the code but I'm no longer sending a valid request. Would be a doddle to fix but I don't have an iphone/android phone! Can someone sniff some packets and post them? I'll publish the parser (coded in PHP) as open source as soon as this is fixed. Many thanks to deetch for his python parser as it pushed me in the right direction!

    Thanks,

    Matt

    ReplyDelete
  31. Hey Matt,

    I can't even tell what language this is in. I really want to help you, and I have an iphone to help, but I need to be pointed in the direction of what language to compile this code in, before I can begin to help.

    Tom

    ReplyDelete
  32. Hi Fadi,

    May I ask if the code you are providing is still working? I had tried to convert that code into Objective C, to use Goggle in my application. However, I get "Unparseable Container Request".

    So I try to use VS 2010 to build ur code, and I got 400 Bad Request :(

    Do I need a ProtoBuffer parser to do this or this work around is blocked by Google?

    Thanks,

    Nguyen

    ReplyDelete
  33. Hi Fadi,

    I convert your code to java. I finish step 1 "validate CSSID", it's response 200 OK. But I failed with step 2, it response 500 (HTTP/1.1 500 Internal Server Error)
    Please help me. Thanks

    Note: If you allow, I will post my code.

    ReplyDelete
  34. Hi Fadi,

    Great post, but as the same as a few of the guys above I am getting the 400 Error (Bad Request). I am assuming that Google are blocking the code from working again. Is there any work around at the minute? It would be great to play around with this functionality before the official API is released. Does anyone know a date of when this will happen?

    ReplyDelete
  35. This comment has been removed by a blog administrator.

    ReplyDelete
  36. That URL returns a 404 for me. Has it moved?

    ReplyDelete
  37. Awesome work Fadi! Can we connect to discuss a few things regarding image recognition? I would love to ask you a few questions about what projects you are working on now.

    -Chris
    http://www.linkedin.com/in/ctfield

    ReplyDelete
  38. Thanks a lot, have working sulution (objective-c)

    ReplyDelete
  39. Hello can anyone check, if something changed again on Google's end?
    the CCSID query returns OK but when passing the jpeg image, it now returns BAD REQUEST?

    ReplyDelete
  40. I get bad request after generating a valid CCSID.

    ReplyDelete
  41. After validating my CCSID, which is OK, i get 400, bad request when trying to call goggles, same problem as ell....
    Anyone news?

    ReplyDelete
  42. Hi, it seems a good work! But where could i find some source code?

    ReplyDelete
  43. Hi
    I have an app your code.
    It worked for more than a year.
    But since a few months, the CCSID is no longer validating.
    Did you find their new algorithm ?

    ReplyDelete
  44. hi
    i really wish to do a software in that suppose we are trying to upload a film the title have to be analysed and should get output of that particular film name. and i think goggles principle can be implemented. any one have suggestions?? pls contact me email 8vishnu.kj@gmail.com
    whatsapp- +918089927042

    ReplyDelete
  45. Hi there, the CSSID validating URL doesn't work because, I think, Google had moved the service.

    Anyone have found the new URL?

    Regards,

    @Fadi: I working on a app that need to use a powerful ocr and googles seems the best. Please contact me at aaglietti [at] noovle.it

    ReplyDelete
  46. Good job,thanks,i learn a lot.
    I have an IPhone 5c and a an updated version of the Google Search app. In a Google discussion I was reading, a person asked if Goggles had been dropped from the Google search app for IOS. The response was that when you tap the search bar in the Google search app you will see a camera icon on the bottom right hand corner. Aside from part of the keyboard there is nothing on the bottom right hand corner of the screen when I tap the search bar. I attached a screen shot to show you what I'm looking at. "I feel like Im on crazy pills!"


    Labels: Google Goggles API Barcodes website Google products

    ReplyDelete