How translate.google.com works

Actually, this is not about how translate.google.com works. It’s about loading HTML from a random URL, adding some extra Javascript and CSS and redisplaying the page on a different domain.

A simple test page

I’ve made a simple test web page:

1
2
3
4
5
6
7
<html>
  <head></head>
  <body>
    <h1>Hello world</h1>
    <img src="world.jpg" />
  </body>
</html>

Let’s translate it with Google Translate.

The translation page has two iframes (details omitted):

<html>
<head>
<title>Google Translate</title>
</head>
<frameset>
	<frame src="/translate_n?...">
	<frame src="/translate_p?...">
</frameset>
</html>

Let’s look at the second iframe, the one which begins with ...src="/translate_p?.... This is the actual translated page.

Compare source code of the original page with the translated page. It’s been buffed up considerably.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
<html>
<head>
<script>
	(function() {
		function ti_a(b) {
			this.t = {};
			this.tick = function(c, d, a) {
				a = a ? a : (new Date).getTime();
				this.t[c] = [ a, d ]
			};
			this.tick("start", null, b)
		}
		var ti_b = new ti_a;
		window.jstiming = {
			Timer : ti_a,
			load : ti_b
		};
		try {
			var ti_ = null;
			if (window.chrome && window.chrome.csi)
				ti_ = Math.floor(window.chrome.csi().pageT);
			if (ti_ == null)
				if (window.gtbExternal)
					ti_ = window.gtbExternal.pageT();
			if (ti_ == null)
				if (window.external)
					ti_ = window.external.pageT;
			if (ti_)
				window.jstiming.pt = ti_
		} catch (ti_c) {
		}
		;
	})()
</script>
<script
	src="http://translate.googleusercontent.com/translate/static/biEfM_qFbxU/js/translate_c.js"></script>
<script>
	_infowindowVersion = 1;
	_intlStrings._originalText = "Original English text:";
	_intlStrings._interfaceDirection = "ltr";
	_intlStrings._interfaceAlign = "left";
	_intlStrings._langpair = "en|da";
	_intlStrings._feedbackUrl = "http://translate.google.com/translate_suggestion";
	_intlStrings._currentBy = "Current translation on %1$s by %2$s";
	_intlStrings._unknown = "unknown";
	_intlStrings._suggestTranslation = "Contribute a better translation";
	_intlStrings._submit = "Contribute";
	_intlStrings._suggestThanks = "Thank you for contributing your translation suggestion to Google Translate.";
	_intlStrings._reverse = false;
</script>
<style type="text/css">
.google-src-text {
	display: none !important
}
 
.google-src-active-text {
	display: block !important;
	color: black !important;
	font-size: 12px !important;
	font-family: arial, sans-serif !important
}
 
.google-src-active-text a {
	font-size: 12px !important
}
 
.google-src-active-text a:link {
	color: #00c !important;
	text-decoration: underline !important
}
 
.google-src-active-text a:visited {
	color: purple !important;
	text-decoration: underline !important
}
 
.google-src-active-text a:active {
	color: red !important;
	text-decoration: underline !important
}
</style>
<meta http-equiv="X-Translated-By" content="Google">
<base href=http://skipperkongen.dk/tmp/test.html />
</head>
<body>
<iframe
	src="http://translate.google.com/translate_un?hl=en&ie=UTF-8&sl=en&tl=da&u=http://skipperkongen.dk/tmp/test.html&prev=_t&rurl=translate.google.com&twu=1&lang=en&usg=ALkJrhhpCjCAYEWwbQX9TROT-522jGdGEw"
	width=0 height=0 frameborder=0
	style="width: 0px; height: 0px; border: 0px;"></iframe>
<h1><span onmouseover=
	_tipon(this);
onmouseout=
	_tipoff();
>
<span class="google-src-text" style="direction: ltr; text-align: left">Hello
world</span> Hej verden </span></h1>
<img src=world.jpg />
</body>
<script>
	_addload(function() {
		_setupIW();
		_csi('en', 'da', 'http://skipperkongen.dk/tmp/test.html');
	});
</script>
</html>

Leaving only the really important stuff:

1
2
3
4
5
6
7
8
<head>
	<script src="http://translate.googleusercontent.com/translate/static/biEfM_qFbxU/js/translate_c.js"></script>
	<base href=http://skipperkongen.dk/tmp/test.html />
</head>
<body>
	<h1>Hej verden</h1>
	<img src=world.jpg />
</body>

Two notable things have changed from the original. The head section has extra script tags and a base tag. The body section the phrase Hello World has been translated into danish.

Summary of modifications to original page

In summary Google translate has done the following to the original page:

  1. Added script tags
  2. Added a style tag
  3. Added a base tag
  4. Added an iframe tag
  5. Replaced content text with translated version
  6. Marked up content with some span tags, for a fancy tooltip

What does the script tags do?

This is the most complex part. I’m not done analysing this yet.

What does the style tag do?

This is simply to provide some styling of the fancy tooltip added with the span tags.

What does the iframe tag do?

In short I don’t know yet.

What does the base tag do?

The base tag is there to make sure that relative paths like the image path works, even if the HTML is loaded from a different domain than the original skipperkongen.dk domain.

This:

3
	<base href=http://skipperkongen.dk/tmp/test.html />

Makes this work:

7
	<img src="world.jpg" />

Would this work with AJAX?

Many pages use Ajax to load content. I’m expecting Google Translate to not work in this case, because of cross site scripting restrictions. In theory it could be done by creating a dynamic service proxy on the google domain, not taking authentication issues into account.

Let’s try with a page that replaces the header text with AJAX.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<html>
	<head>
	<script src="http://code.jquery.com/jquery-1.5.min.js" type="text/javascript"></script>
	<script type="text/javascript">
		$(function() {
			$.get('message.txt', function(data) {
				$('h1').text(data);	
				})
 
			})
	</script>
	</head>
	<body>
		<h1>...</h1>
	</body>
</html>

When loading via http://skipperkongen.dk/tmp/test2.html, the page look like this:

Hello World

When loading via Google Translate, the page looks like this:

...

So the conclusion is that Google does not do anything about data fetched via AJAX.

Leave a Reply