Actually, this is not about how translate.google.com works. It's about loading HTML from a random URL, adding some extra Javascript and CSS and redisplaying the page on a different domain.
A simple test page
I've made a simple test web page:
1 2 3 4 5 6 7 | <html> <head></head> <body> <h1>Hello world</h1> <img src="world.jpg" /> </body> </html> |
Let's translate it with Google Translate.
The translation page has two iframes (details omitted):
<html> <head> <title>Google Translate</title> </head> <frameset> <frame src="/translate_n?..."> <frame src="/translate_p?..."> </frameset> </html> |
Let's look at the second iframe, the one which begins with ...src="/translate_p?...
. This is the actual translated page.
Compare source code of the original page with the translated page. It's been buffed up considerably.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | <html> <head> <script> (function() { function ti_a(b) { this.t = {}; this.tick = function(c, d, a) { a = a ? a : (new Date).getTime(); this.t[c] = [ a, d ] }; this.tick("start", null, b) } var ti_b = new ti_a; window.jstiming = { Timer : ti_a, load : ti_b }; try { var ti_ = null; if (window.chrome && window.chrome.csi) ti_ = Math.floor(window.chrome.csi().pageT); if (ti_ == null) if (window.gtbExternal) ti_ = window.gtbExternal.pageT(); if (ti_ == null) if (window.external) ti_ = window.external.pageT; if (ti_) window.jstiming.pt = ti_ } catch (ti_c) { } ; })() </script> <script src="http://translate.googleusercontent.com/translate/static/biEfM_qFbxU/js/translate_c.js"></script> <script> _infowindowVersion = 1; _intlStrings._originalText = "Original English text:"; _intlStrings._interfaceDirection = "ltr"; _intlStrings._interfaceAlign = "left"; _intlStrings._langpair = "en|da"; _intlStrings._feedbackUrl = "http://translate.google.com/translate_suggestion"; _intlStrings._currentBy = "Current translation on %1$s by %2$s"; _intlStrings._unknown = "unknown"; _intlStrings._suggestTranslation = "Contribute a better translation"; _intlStrings._submit = "Contribute"; _intlStrings._suggestThanks = "Thank you for contributing your translation suggestion to Google Translate."; _intlStrings._reverse = false; </script> <style type="text/css"> .google-src-text { display: none !important } .google-src-active-text { display: block !important; color: black !important; font-size: 12px !important; font-family: arial, sans-serif !important } .google-src-active-text a { font-size: 12px !important } .google-src-active-text a:link { color: #00c !important; text-decoration: underline !important } .google-src-active-text a:visited { color: purple !important; text-decoration: underline !important } .google-src-active-text a:active { color: red !important; text-decoration: underline !important } </style> <meta http-equiv="X-Translated-By" content="Google"> <base href=http://skipperkongen.dk/tmp/test.html /> </head> <body> <iframe src="http://translate.google.com/translate_un?hl=en&ie=UTF-8&sl=en&tl=da&u=http://skipperkongen.dk/tmp/test.html&prev=_t&rurl=translate.google.com&twu=1&lang=en&usg=ALkJrhhpCjCAYEWwbQX9TROT-522jGdGEw" width=0 height=0 frameborder=0 style="width: 0px; height: 0px; border: 0px;"></iframe> <h1><span onmouseover= _tipon(this); onmouseout= _tipoff(); > <span class="google-src-text" style="direction: ltr; text-align: left">Hello world</span> Hej verden </span></h1> <img src=world.jpg /> </body> <script> _addload(function() { _setupIW(); _csi('en', 'da', 'http://skipperkongen.dk/tmp/test.html'); }); </script> </html> |
Leaving only the really important stuff:
1 2 3 4 5 6 7 8 | <head> <script src="http://translate.googleusercontent.com/translate/static/biEfM_qFbxU/js/translate_c.js"></script> <base href=http://skipperkongen.dk/tmp/test.html /> </head> <body> <h1>Hej verden</h1> <img src=world.jpg /> </body> |
Two notable things have changed from the original. The head
section has extra script
tags and a base
tag. The body
section the phrase Hello World has been translated into danish.
Summary of modifications to original page
In summary Google translate has done the following to the original page:
- Added
script
tags - Added a
style
tag - Added a
base
tag - Added an
iframe
tag - Replaced content text with translated version
- Marked up content with some
span
tags, for a fancy tooltip
What does the script tags do?
This is the most complex part. I'm not done analysing this yet.
What does the style tag do?
This is simply to provide some styling of the fancy tooltip added with the span
tags.
What does the iframe tag do?
In short I don't know yet.
What does the base tag do?
The base tag is there to make sure that relative paths like the image path works, even if the HTML is loaded from a different domain than the original skipperkongen.dk domain.
This:
3 | <base href=http://skipperkongen.dk/tmp/test.html /> |
Makes this work:
7 | <img src="world.jpg" /> |
Would this work with AJAX?
Many pages use Ajax to load content. I'm expecting Google Translate to not work in this case, because of cross site scripting restrictions. In theory it could be done by creating a dynamic service proxy on the google domain, not taking authentication issues into account.
Let's try with a page that replaces the header text with AJAX.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <html> <head> <script src="http://code.jquery.com/jquery-1.5.min.js" type="text/javascript"></script> <script type="text/javascript"> $(function() { $.get('message.txt', function(data) { $('h1').text(data); }) }) </script> </head> <body> <h1>...</h1> </body> </html> |
When loading via http://skipperkongen.dk/tmp/test2.html, the page look like this:
Hello World
When loading via Google Translate, the page looks like this:
...
So the conclusion is that Google does not do anything about data fetched via AJAX.