Import Scans or Go Multilingual

Sunday, September 27, 2009 at 5:32 PM

About a month ago, we launched v3.0 of the Documents List Data API and promised more features were on the way. Well today, we're releasing two experimental features in the API: OCR and Document Translation.

The first, Optical Character Recognition (OCR), allows your application to create editable Google Documents from high-resolution images containing text (such as faxes or scanned letters). To perform OCR on a .png, .jpg, or .gif upload, add the ocr=true parameter onto your upload request:

POST /feeds/default/private/full?ocr=true HTTP/1.1

OCR will only work well on high-resolution images. The quality of the extracted text isn't perfect yet, but we're busy improving it!

Secondly, we have integrated Google Translate into the API. As a result, you can translate a document during upload. Simply add the targetLanguage and sourceLanguage parameters to your upload request:

POST /feeds/default/private/full/?targetLanguage=de&sourceLanguage=en HTTP/1.1

If sourceLanguage is omitted, we'll try to auto detect the document's language. All languages supported by Google Translate (full list here) are supported in the API.

As always, see the documentation for details. There's also a live demo (source will be available soon) up at googlecodesamples.com.

9 comments:

Bob said...

In the live demo, I uploaded a fairly clean PNG of simple text. The result was a blank document.

tehkubix said...

I get the following error:

Error processing document:
Expected response code 200, got 400 GData InvalidEntryException Could not convert document.

Have tried my own PNG and the sample below the upload form.

Eric (Google) said...

Thanks for the posts!

We had a small outage in the service earlier today. That may have been the cause these errors.
I recommend trying again to see if the 400 is gone.

Keep in mind (as stated in the blog post), that the image needs to be high quality (so the character size should be > 10px). That means a screenshot probably won't work very well.

Cheers,
Eric

伊凡姐姐 said...
This post has been removed by a blog administrator.
sajin said...

I got some output, but only a small portion of the doc.

Anyway it is very useful.

priyanshi said...

Hi this is shaheen


My site has been gone from the top Google rankings. Earlier it was among the top 20s. So, please tell me how can i recover the site to its good position?

Bob said...

@priyanshi,

This is completely not the right forum for your question. It is also a fairly complicated question to ask. In a nutshell, in order for your site to rank highly and to continue to rank highly, you need to make a very relevant site for the topic you are focusing on, it needs to be well liked and become fairly popular, and be considered an authoritative source for the topic it is centered around. After that, it should rank relatively well for keywords pertaining to its main topic.

You also have to contend with competition, and other sites that form based around the same or similar topics that, themselves, also become popular, well liked, and authoritative.

poll said...
This post has been removed by a blog administrator.
poll said...
This post has been removed by a blog administrator.