Actually, this is not about how translate.google.com works. It’s about loading HTML from a random URL, adding some extra Javascript and CSS and redisplaying the page on a different domain.
A simple test page
I’ve made a simple test web page:
Hello world
Let’s translate it with Google Translate.
The translation page has two iframes (details omitted):
Google Translate
Let’s look at the second iframe, the one which begins with ...src="/translate_p?...
. This is the actual translated page.
Compare source code of the original page with the translated page. It’s been buffed up considerably.
Hello
world Hej verden
Leaving only the really important stuff:
Hej verden
Two notable things have changed from the original. The head
section has extra script
tags and a base
tag. The body
section the phrase Hello World has been translated into danish.
Summary of modifications to original page
In summary Google translate has done the following to the original page:
- Added
script
tags - Added a
style
tag - Added a
base
tag - Added an
iframe
tag - Replaced content text with translated version
- Marked up content with some
span
tags, for a fancy tooltip
What does the script tags do?
This is the most complex part. I’m not done analysing this yet.
What does the style tag do?
This is simply to provide some styling of the fancy tooltip added with the span
tags.
What does the iframe tag do?
In short I don’t know yet.
What does the base tag do?
The base tag is there to make sure that relative paths like the image path works, even if the HTML is loaded from a different domain than the original skipperkongen.dk domain.
This:
Makes this work:
Would this work with AJAX?
Many pages use Ajax to load content. I’m expecting Google Translate to not work in this case, because of cross site scripting restrictions. In theory it could be done by creating a dynamic service proxy on the google domain, not taking authentication issues into account.
Let’s try with a page that replaces the header text with AJAX.
...
When loading via http://skipperkongen.dk/tmp/test2.html, the page look like this:
Hello World
When loading via Google Translate, the page looks like this:
...
So the conclusion is that Google does not do anything about data fetched via AJAX.
Leave a Reply
You must be logged in to post a comment.