Archived Blog

Designing a JavaScript Interpreter to Emulate Browser Function Scoping

04.29.2009 - 8:00 AM

This blog deals with how one might develop a JavaScript interpreter to emulate a browser environment. In particular, it focuses on JavaScript's treatment of scoping of variables and functions, and how that might affect the interpreter's design. It also shows how a Web site can be designed to exploit the behavior of a browser's JavaScript interpreter to make emulation difficult.

In JavaScript, variables must be defined before they are referenced.

var foo = 1; document.write(foo); // will print "1"
document.write(foo); // 'foo' undefined error var foo = 1;

On the other hand, functions can be referenced before their declaration in the document.

document.write(calculate()); // will print "3" function calculate () { return 1 + 2; }

Here's why: while both variables and functions are technically created when their execution scope is entered (e.g., at the beginning of the file for globally scoped variables and functions), variables are not assigned their initial value until the variable definition is executed. If the variable is used before it is defined (as 'foo' is in the example above), an "undefined" error results. Because functions, on the other hand, are created and assigned when the execution scope is entered, a function declaration can appear after its use. (Note that this applies only to function declarations, not functions created with function expressions or by the Function constructor.)

In order to sit down and implement a JavaScript interpreter that emulates a browser-like environment that can be used to analyze JavaScript code, we have to keep these scoping and symbol resolution rules in mind. In order to handle calling functions before they are declared, we might design a two-pass interpreter. It would first go through the source file and collect all of the function declarations in the global scope. It would then go through the code and resolve the function calls, mapping the calls to defined functions (including those declared later in the file, which were discovered in the first pass) and reporting errors for calls to undefined functions.

There is another complication, however. Functions can also be defined by external JavaScript files linked in an HTML <script> tag. For example, in the code below, the function calculate is called in example.html but defined in calculate.js. The external JavaScript file calculate.js must be dynamically requested and interpreted by the JavaScript emulator in order to simulate the browser environment. Already it starts to feel like building our own browser.

calculate.js: function calculate () { return 1 + 2; } example.html: <script src="calculate.js"></script> <script type="text/javascript"> document.write(calculate()); // prints "3" </script>

This complication means that we can't report any undefined function errors for function references until run time, when the script code is actually interpreted. The next logical step would be to have the first pass go through the file, both collecting function declarations and requesting and passing over externally referenced JavaScript files. Then, the second pass can do the same thing as before: match references to definitions that are available, and report errors for those that aren't.

If we want to mirror the behavior of real browsers, though, we must note another distinction. A function can be used before its declaration, if they are in the same file. If a function definition is loaded from another document instead, it must be linked in before it is called. In the example below, the call to calculate in correct.html will work, but the call in incorrect.html will produce an error.

calculate.js: function calculate () { return 1 + 2; } correct.html: <script src="calculate.js"></script> <script type="text/javascript"> document.write(calculate()); // prints "3" </script> incorrect.html: <script type="text/javascript"> document.write(calculate()); // 'calculate' undefined error </script> <script src="calculate.js"></script>

This behavior seems confusing at first. Why is there an inconsistency in the scoping rules?

The JavaScript emulator could be made to mimic this behavior by keeping track of which functions came from the file and which were loaded from external files. This would reflect the behavior of real browsers, and allow us to do function name resolution statically. However, consider the following example.

calculate.js: function calculate () { return 1 + 2; } tricky.html: <script type="text/javascript"> function calculateWrapper() { return calculate(); } </script> <script src="calculate.js"></script> <script type="text/javascript"> document.write(calculateWrapper()); // prints "3" </script>

Here, the code is similar to incorrect.html (above). The difference is that while the call to the externally defined function is located before the file is loaded, the code is not executed until the wrapper is called after the load. This behavior is perfectly valid, but with the current approach will produce a function undefined error. So there is no way to get around it: function name resolution has to happen at run-time.

Why do browsers behave this way? JavaScript files can be loaded with an HTML <script> tag, but they can also be loaded dynamically by the JavaScript code itself. By manipulating the Document Object Model, we can append a new <script> tag to force the loading of an external file.

var newScript = document.createElement('script'); newScript.setAttribute('type', 'text/javascript'); newScript.setAttribute('src', 'calculate.js'); document.getElementsByTagName('head')[0].appendChild(newScript);

Using this technique, we could even write obfuscated JavaScript code to make it difficult to tell statically what files are being loaded. The only effective way to detect the dynamic loading of other JavaScript files is to interpret the code -- which means we must leave the loading of external files until run time.

A two-pass approach is still necessary, because as we saw in the second example above, we could encounter a call to a function that is declared later, and the call must succeed. However, because externally declared functions cannot be processed during this static first pass, undefined function errors cannot be reported until the runtime second pass. This gives us no way (with only two passes) to allow functions to be called before they are dynamically loaded at run time&emdash;just like browsers. Perhaps with a third pass we could know what would be loaded ahead of time, but then what about external code that redefines the same function multiple times, or whose function definition depends on the time of day? Ultimately we see why browsers treat externally defined functions differently than functions defined within the page.

In conclusion, to effectively interpret JavaScript code in a way that emulates a browser's environment, we need a multi-pass approach. First, we need to collect all of the functions in the global scope, by scanning through all of the JavaScript code on the page. Next, we need to dynamically interpret the JavaScript, keeping track of the current state of the Document Object Model at every step. We need to be able to dynamically request and link in new JavaScript source files, and we need to include these in the scope object appropriately when we do our function name resolution at run time. Without these, a Web page can be designed to exploit a browser in a way that an emulator won't detect.

For more information, see:

Security Researcher: Erik Buchanan

Bookmark This Post: