GWTMap - Reverse Engineering Google Web Toolkit Applications

By Oliver Simonnet on 21 October, 2020

Oliver Simonnet

21 October, 2020

TLDR

GWTMap is a new tool to help map the attack surface of Google Web Toolkit (GWT) based applications. The purpose of this tool is to facilitate the extraction of any service method endpoints buried within a modern GWT application’s obfuscated client-side code, and attempt to generate example GWT-RPC requests payloads to interact with them.

You can get it here: https://github.com/FSecureLABS/GWTMap

WTF is GWT and why is it interesting

Google Web Toolkit (GWT) is a development framework that allows Java developers to create single page applications (SPA) without the need to write any JavaScript or (for the most part) HTML. This is achieved by allowing Java developers to do what they do best, write Java, whilst the framework does the heavy lifting and compiles that Java into the client-side JavaScript - which then becomes the application’s front-end.

GWT has fascinated me ever since I first came across it on an assessment and had no clue what I was looking at. Since that time I’ve come face to face with it again periodically, but always struggled for tooling. But what makes it so different, and why would I want bespoke tooling?

To understand why, we first need to understand two of its most prominent attributes which separate it from many other frameworks:

The client-side code

As mentioned, the front-end HTML / JavaScript of a GWT application is written in Java using the frameworks various UI API - which feels similar to Java Swing. The difference is, once you’re done, instead of it being compiled to bytecode, the front-end Java is compiled into multiple obfuscated JavaScript code permutations. Each permutation representing an optimised implementation for a particular web browser. These permutations are loaded by a bootstrap file which identifies the browser in use when the application is accessed.

During compilation, obfuscation of the client-side code permutation happens by default, which makes it a challenge to investigate or leverage from an assessment point of view. However, it is interesting nonetheless due to the way in which GWT’s communication protocol works. I will dive into that in a minute, but what you should keep in mind is that both the client and server (in GWT applications ) are always aware of exactly what objects, methods, and services exists within the application. This means, if you can understand the client-side code, you can most likely enumerate and map out all of the application’s functionality. Be it publicly accessible endpoints, or more sensitive administrative endpoints behind authenticated sections of the application.

The communication protocol

Due to the front-end being written in Java, its development naturally makes use of native Java objects. However, JavaScript is a completely different language in which these Java objects don’t natively exist. As such, the method used to pass data between the client and server needs a way to retain and transfer this information.

To solve this, GWT makes use of a custom communication protocol called GWT-RPC which serializes Java objects for transmission between the client and the server. However, unlike many other serialization technologies, the client and server in GWT-RPC communication are both fully aware of all objects that should be transmitted during any given request / response. This is achieved using a service policy file, which stores a listing of all serializable objects and their IDs for a given service.

To bring some actual context into the mix here and highlight how different GWT-RPC communication is from your typical SOAP or REST APIs, here is an example request made during the login process of a demo application:

POST /olympian/authenticationService HTTP/1.1
Host: 127.0.0.1:8888
Content-Type: text/x-gwt-rpc; charset=utf-8
X-GWT-Permutation: 4D2DF0C23D1B58D2853828A450290F3F
X-GWT-Module-Base: http://127.0.0.1:8888/olympian/
Content-Length: 251

7|0|7|http://127.0.0.1:8888/olympian/|D48D4639E329B12508FBCA3BD0FC3780|
com.ecorp.olympian.client.asyncService.AuthenticationService|login|java
.lang.String/2004016611|bob|password123|1|2|3|4|2|5|5|6|7|

A breakdown of the request body is as follows:

  • POST /olympian/authenticationService -> The Service Path
  • X-GWT-Permutation: 4D2DF0C23D1B58D2853828A450290F3F -> The “strong name” for the client-side code permutation
  • 7 -> Version 7 of the serialization protocol
  • 0 -> RPC Flag values (0/1/2)
  • 7 -> The length of the “string table” that immediately follows
  • [STRING TABLE] -> Seven pipe-delimited strings that will be referenced to build the RPC call
  • 1 -> http://127.0.0.1:8888/olympian/ -> The base URL for the GWT module
  • 2 -> D48D4639E329B12508FBCA3BD0FC3780 -> The “strong name” of the service’s policy file, which outlines what objects and types can be serialised
  • 3 -> com.ecorp.olympian.client.asyncService.AuthenticationService -> The remote service interface
  • 4 -> login -> The method being called
  • 2 -> The number of parameters the method expects
  • 5 -> java.lang.String/2004016611 -> The declared type of the first parameter
  • 5 -> java.lang.String/2004016611 -> The declared type of the second parameter
  • 6 -> bob -> The index in the string table of the value of the first parameter
  • 7 -> password123-> The index in the string table of the value of the second parameter

The above is just a quick summary of a very simple request containing no complex Java types or custom objects, and no need to distinguish between “declared” and “runtime” types. I will expand on this slightly further on, but for an in depth and detailed look into GWT-RPC, I would redirect you to Brian Slesinsky excellent paper “The GWT-RPC wire protocol” [1].

Existing tools and their limitations

Now GWT may not be masively popular, but it's been around for a long time. So surely people have done research and created tooling to aid testing? Yes, yes they have!

GWT Penetration Testing Toolset – GDS (2010)

The good people at Gotham Digital Science produced the “GWT Penetration Testing Toolset” back in 2010, consisting of three tools [2][3]:

  • GWTEnum – Enumerated methods within GWT’s obfuscated client-side code;
  • GWTParse – Identified variable parameters within GWT-RPC request payloads;
  • GWTFuzzer – A Proof of Concept tool to automatically fuzz RPC requests with Burp.

GWTEnum was the type of tool I was after. However, it was written back in 2010 and the way GWT obfuscates its code has changed. As an example, GWTEnum searched for “*.cache.html” permutation files which have not been used in GWT since before version <= 2.6 (released in 2014). Instead, newer versions generate “*.cache.js” files which have a different structure.

Another limitation of GWTEnum was that it did not parse “fragmented” code permutations. A fragmented permutation being one split across multiple files to aid performance. As such, it could only identify a limited portion of an application’s total attack surface based on the initial permutation loaded by the application's bootstrap file.

Gwt.py - @steventseeley (2017)

In 2017 Steven Seeley (mr_me) created another tool to enumerate GWT-RPC methods and also attempted to generate serialised GWT-RPC request payloads for each method [4][5]. However, this tool was also limited to older version of GWT and only worked against non-obfuscated code permutations. Furthermore, although it generated GWT-RPC request payloads, it didn’t correlate the payload with it's respective service path or policy file, which is necessary (at least in versions 2.7+) to perform a successful request to any given service method.

Limitations of the tool aside, the research on Steven’s website regarding GWT is an exceptionally good resource and I highly recommend giving it a read to understand more about the general GWT-RPC request structure.

GWTab – TheHackerish (2020)

In 2020 TheHackerish created a new Burp suite Extension called GWTab [6][7] which ported the functionality of GDS’s GWTParse directly into Burp, allowing it to automatically highlight input values within GWT-RPC payloads. This tool works great still, but it can only be used to parse GWT-RPC requests you’ve already logged in Burp, and can’t be used to enumerate additional / unknown methods.

There are some other tools which have also been created over the years, but overall what I concluded from this was:

  • There has been some good research into GWT;
  • A few tools have been created, but all serve very specific and limited purposes;
  • Most of these tools sadly no longer work, or don’t consistently work with modern versions of GWT (2.7+).

Solving these problems

So what I was after was a tool that could:

  • Enumerate all exposed GWT methods and services;
  • Parse both obfuscated and non-obfuscated code permutations;
  • Detect if a permutation was fragmented, and find any missing fragments;
  • Correlate methods with their service paths and policy files;
  • Generate GWT-RPC POST request examples;
  • Work with all[?] modern versions of GWT (2.7+);

Deciding to take the challenge head on, the first thing on my agenda was to understand the obfuscated permutation files, and identify what patterns could be used to enumerate data.

Carving out method signatures from obfuscated code

To get started I wrote a test GWT application, compiled it non-obfuscated, and took a look at the implementation of the “login” method call within generated JavaScript. I started with non-obfuscated code as I could then use this as a based to compare against when later reviewing the obfuscated version.

Before showing the generated JavaScript, take note that the “login” method is part of the “AuthenticationService” and its Java interface definition is as follows: 

@RemoteServiceRelativePath("authenticationService")
public interface AuthenticationService extends RemoteService{
  User login(String username, String password);
  ...
}

With that in mind, we can see the generated implementation of the login method call within the client-side JavaScript below: 

function $login_0(this$static, username, password, callback){
  var helper, streamWriter;
  helper = new RemoteServiceProxy$ServiceHelper(this$static, 'AuthenticationService_Proxy', 'login');
  try {
    streamWriter = $start(helper, 'com.ecorp.olympian.client.asyncService.AuthenticationService', 2);
    $append(streamWriter, '' + $addString(streamWriter, 'java.lang.String/2004016611'));
    $append(streamWriter, '' + $addString(streamWriter, 'java.lang.String/2004016611'));
    $append(streamWriter, '' + $addString(streamWriter, username));
    $append(streamWriter, '' + $addString(streamWriter, password));
    $finish_0(helper, callback, ($clinit_RequestCallbackAdapter$ResponseReader() , OBJECT));
  }
   catch ($e0) {
    $e0 = toJava($e0);
    if (!instanceOf($e0, 12))
      throw toJs($e0);
  }
}

So what did this tell me?

  • I could see that all service endpoint methods get their own function;
  • That the second line of that function declared the service proxy (“AuthenticationService_Proxy”) along with the method name (“login”);
  • Following this, there was a try-catch statement, of which the first line declared the remote service interface value (“com.exorp.olympian.client.asycnService.AuthenticationService”) and the number of parameters the method expects (2);
  • Immediately following that, there was even number of function calls (excluding "$finish_0"), of which the first half declare the method parameter types (“java.lang.String”) , and the second half declare the parameter values (username, password).

Further review found that this was relatively consistent for all methods. So with this as a starting point, I re-compiled the application in its default mode (obfuscated) and began reversing the obfuscated client-side code - searching for some of these known key values. After a bit of trial and error I found the obfuscated version of the above “$login_0” function had become a function called “Cd” as can be seen bellow:

Unfortunately, it’s not all that clear that the above is the same implementation of the login function, and my understanding of the line offsets and their purposes (within the non-obfuscated code) has been lost. To try and make this a bit clearer I created a “clean_code()” function in python that would take the obfuscated code and programmatically re-insert line-breaks and indentation: 

With what was now the beginning of my new tool, I ran it against the obfuscated permutation and found that to my surprise, there was an almost perfect 1-to-1 structural match of the original function:

function Cd(b,c,d,e){
        var f,g;
        f=new yu(b,iH,'login');
        try{
                g=xu(f,jH,2);
                nu(g,''+cu(g,cH));
                nu(g,''+cu(g,cH));
                nu(g,''+cu(g,c));
                nu(g,''+cu(g,d));
                wu(f,e,(Nu(),Ju))
        }
        catch(a){
                a=Xq(a);
                if(!bl(a,12))throw Yq(a)
        }

}

The complication here was that some of the original string values were now minified variables and nested function calls. However, I now knew that if I could identify the “method signature” (e.g. “f=new yu(b,iH,'login');”) and add in line breaks, I could infer at which offsets the other key values were located. I could then also  programmatically identify the values of nested variables and functions using regular expressions.

At first I was quite concerned as there was a huge amount of duplicate local variables names within the code in general. However, the originally hard-coded string values which formed part of these lines seemed to always global variables. And all global variables were declared toward the top of the permutation. For example, I could find the original value of the service proxy by extracting the value of the "iH" global variable with the permutation:

A day and a bit of reversing, python, and regex magic later, I could successfully enumerate all methods and their associated parameter types within my obfuscated permutation file:

$ ./wip.py -u "http://192.168.22.120/olympian/037A330198815EAE6A360B7107F8C442.cache.js"
...
AuthenticationService.login( java.lang.String/2004016611, java.lang.String/2004016611 )
AuthenticationService.logout( com.ecorp.olympian.shared.model.ContextId/1042941669 )
...
AccountService.getBio( com.ecorp.olympian.shared.model.ContextId/1042941669 )
AccountService.requestPasswordReset( java.lang.String/2004016611 )
...
TestService.testStringAndBoolean( java.lang.String/2004016611, java.lang.Boolean/476441737 )
TestService.testStringAndByte( java.lang.String/2004016611, B )
TestService.testStringAndChar( java.lang.String/2004016611, C )
...

You may notice a "B" and "C" parameter type in the above output. I'll expand on this later.

So awesome, I was back to what GWTEnum had to offer back in 2010, but with modern GWT in 2020!

Trying to match all possible variations

In my excitement, I tested it against the permutation files for some applications written with different GWT versions and… and well it didn’t work... for any of them!

It turned out that the code obfuscation process was not always the same. In fact, it turned out there were quite a few different variations of how the code was obfuscated. So although my initial regular expressions worked for my application, they didn’t work for many others.

To try and tackle this, I took a larger sample of different GWT variations, found the common patterns for each line I was interested in, and updated my (nightmarish) regex to match and segment them all. For context, here are a few instances of how different parts of the code were represented in different GWT permutations:

Method signature patterns:

Method signatures were the simplest, however, sometimes they consisted of only variables, other times they had hard-coded strings, and the character sets and lengths used for variables names varied:

helper = new RemoteServiceProxy$ServiceHelper(this$static, 'AuthenticationService_Proxy', 'login');
e=new yu(b,zH,YG);
g=new yu(b,_G,'login');
k=new q8(this,Sqc,Llc);
j=new q8(this,Sqc,'login');

Method parameter patterns:

Method parameters were a bit more varied. Variable values were often nested inside functions, and hardcoded values took various formats. In particular, Boolean values, were sometimes represented as ternary statements which my regular expressions were not expecting at all:

$append(streamWriter, '' + $addString(streamWriter, username));
TK(i,c);
aL(i,''+QK(i,d));
O7(h,M7(h,c));
c8(h.a,d?'1':'0');
nu(f,''+cu(f,'hardcoded'));
su(f.a,'5.699999809265137');

Remote Service Interface patterns:

By far one of the instances with the most drastic variations was the remote service interface definitions. Sometimes they were just a function, though more frequently they were a function used to initialise a local variable. Other times they were hugely different and nested inside some complex and long string of “things”, with completely different offsets to the other patterns:

streamWriter = $start(helper, 'com.some.example.client.service', 2);
h=(e9()&&f9(g9(g.c,g.a,jlc)),g.d=g.e.mg(),Q7(g.d,'com.some.example.client.service'),Q7(g.d,g.b),O7(g.d,2),g.d);
g=(e9()&&f9(g9(f.c,f.a,jlc)),f.d=f.e.mg(),Q7(f.d,irc),Q7(f.d,f.b),O7(f.d,2),f.d);
wbk(e,'com.some.example.client.service',2);
g=wbk(f,'com.some.example.client.service',2);
g=wbk(f,pFl,2);

Service definition patterns:

Service definitions were relatively consistent. However, one key difference was that on occasion, depending how the front-end was written, service paths were not included within the definition “line” itself, and only the policy file strong name could be extracted:

RemoteServiceProxy.call(this, getModuleBaseURL(), 'authenticationService', 'D48...780', SERIALIZER_0);
dd.call(this,ng(),'authenticationService','D48D4639E329B12508FBCA3BD0FC3780',zd)
nL.call(this,Am(),'../example/authenticationService','D48D4639E329B12508FBCA3BD0FC3780',x3)
m8.call(this,Lk(),'D48D4639E329B12508FBCA3BD0FC3780',snb)

The limits of this information in regards to service definition would be a challenge later, as at best I could enumerate the service paths and policy strong names, and at worse, only the policy strong names. However, neither of these values would directly correlate to the information found within a method’s definition, unless the relative service path of the service matched the service name within a method’s remote service interface definition. However, I found this to not always be the case.

But either way, with some extra regex wizardry, I was able to consistently extract almost all method and service information from my full sample of different application code permutations.

Almost!? Why were some still missing? Ahh yes. Code fragmentation!

Identifying and rebuilding fragmented code

Code fragmentation in GWT (or as I believe it’s formally called: ‘Dead-for-now (DFN) code splitting’) is a way to reduce the size of the initial permutation file downloaded on first access, by splitting functionality into “deferred” code fragment files. This increases performance by speeding up the initial load time of the application, allowing the browser to load additional code as and when its needed. 

So how could I infer this behaviour from the static code? And how could I then retrieve the fragments?

After some research I found that every permutation had a function called “__gwtStartLoadingFragment”, regardless of whether it was actually fragmented or not:

function __gwtStartLoadingFragment(frag) {
    var fragFile = 'deferredjs/' + $strongName + '/' + frag + '.cache.js';
    return __gwtModuleFunction.__startLoadingFragment(fragFile);
}

This was useful in itself as it told me that fragments for a given permutation were always located within a directory named after the active code permutation (aka strong name), within “deferredjs” subdirectory. And furthermore, that this deferred JS directory was always at the root of the modules base.

I also found that although fragments seemed to load in a random order (21, 27, 18, 2, 6) during use, they are generated incrementally during compilation. The reason they are not seen to load in any particular order is because the fragment being loaded depends on what action is being performed by the user.  

So the first code fragment for the permutation “4D2DF0C23D1B58D2853828A450290F3F” would be found at:

  • http://127.0.0.1/base/deferredjs/4D2DF0C23D1B58D2853828A450290F3F/1.cache.js

So I knew where fragments could be found, but I still needed to identify if a permutation was actually fragmented. 

It turned out that the function “__gwtStartLoadingFragment” retained its un-obfuscated name, even within obfuscated permutations. Furthermore, although it was always defined, it would only be called later on in the code if the permutation was indeed fragmented. For example, here is what the code looks like to start loading fragments within an obfuscated permutation:

Knowing this, I could simply enumerate all instances of the string “__gwtStartLoadingFragment” within the permutation, and if it was found more than once, I knew the application would loaded at least one additional fragment at “some point”. I could then enumerate incremental fragment IDs within the active permutations “deferredjs” directory.

So things were now getting quite complex with the code, and I needed to implement additional logic categorise whatever code the script was looking at. This resulted in me breaking "code" into three categories: GWT bootstrap, permutation, and fragment. Where the bootstrap file identifies and loads the required browser permutation, and the permutation file loads any missing code fragments.

A few new regular expressions later, I could now point the script at an application’s bootstrap file (“{module-name}.nocache.js”), enumerate and select a permutation (“{hex}.cache.js”), identify any missing fragments (“{int}.cache.js”), and then extract all exposed services and methods. Optionally, I could also now point the script at any one of those code types directly - useful if I wanted to skip the bootstrap file and analyse a specific permutation directly.

After confirming this worked consistently across my samples, I implemented additional logic and refined my regular expressions to allow the script to transparently parse both obfuscated and non-obfuscated permutations:

./wip.py -u "http://192.168.22.120/olympian/olympian.nocache.js"

[+] Analysing...
http://192.168.22.120/olympian/olympian.nocache.js
Permutation: http://192.168.22.120/olympian/46DC1658EDF5314D1F33E5554C8C54A7.cache.js
+ fragment : http://192.168.22.120/olympian/deferredjs/46DC1658EDF5314D1F33E5554C8C54A7/1.cache.js
+ fragment : http://192.168.22.120/olympian/deferredjs/46DC1658EDF5314D1F33E5554C8C54A7/2.cache.js

[+] Methods Found...
...
AuthenticationService.login( java.lang.String/2004016611, java.lang.String/2004016611 )
AuthenticationService.logout( com.ecorp.olympian.shared.model.ContextId/1042941669 )
...
SystemService.localCheck( com.ecorp.olympian.shared.model.ContextId/1042941669 )
SystemService.serverStatus( com.ecorp.olympian.shared.model.ContextId/1042941669, java.lang.String/2004016611 )
SystemService.contact( com.ecorp.olympian.shared.model.ContactForm/196160789 )
...

All that was left to do now was add the ability to generate example GWT-RPC requests for each method.

Kind of Implementing GWT-RPC Serialization in Python

Solving this challenge was harder than I anticipated, but I knew it was feasible to an extent from Steven Seeley’s “gwt.py” tool. Though, when reviewing how my test GWT application serialised various types and objects, I noticed some discrepancies between his output and what I expected. Not understanding it enough to know why, I decided to come at it with a fresh mind.

To begin I created a “TestService” which implemented various methods for serializing a range of Java types include: integers, strings, booleans, chars, ArrayLists, etc. After running each one, I analysed how each type was being serialised and then by reading some additional documentations (including the implementation of “com.google.gwt.rpc.server.RPC.java”) I noted some very specific behaviours.

Delving into everything would be an entire post in itself, but the key things that I found were the fact that GWT split its types into “Simple” and “Complex” categories.

Serializing Simple Types

Simple types had short-hand aliases within the serialised payload, and include the following:

  • I -> java.lang.Integer
  • D -> java.lang.Double
  • F -> java.lang.Float
  • B -> java.lang.Byte
  • Z -> java.lang.Boolean
  • S -> java.lang.Short
  • C -> java.lang.Char
  • J -> java.lang.Long

For example, within an RPC request to send a Boolean and String value to the server, the serialised payload would look something like this:

...testStringAndBool|java.lang.String/2004016611|Z|Bob|1|2|3|4|2|5|6|7|0|

With a breakdown of what I called the “parameter map”, being as follows:

1|2|3|4| -> First 4 elements of the string table (inc method name)
2| -> Number of parameters for the method (2)
    5| -> Index of the declared type of param 1 (java.lang.String)
    6| -> Index of the declared type of param 2 (boolean (Z))
        7| -> Index of the value of param 1 (Bob)
        0| -> Direct value of param 2 (False (0))

This demonstrated that when a value was of type String (e.g. not one of the simple types) the parameter map would hold the index of the value within the string table itself. However, when a parameter was a simple type (e.g. a boolean value (Z)), the parameter map would hold the value directly – in place of a string table index.

This behaviour seemed relatively consistent throughout the simple types with the exception of the Long type, which held an base64-esque encoded representation of the given number. I say relatively, as additional test cases uncovered that this was very much dependent on how the RPC call had been written in Java.

For example, for all intents and purposes this is the exact same RPC request, however it is serialised somewhat differently:

testStringAndBoolean|java.lang.String/2004016611|java.lang.Boolean/476441737|Bob|1|2|3|4|2|5|6|7|6|0|

By breaking this down, we get the following structure:

1|2|3|4| -> First 4 elements of the string table (inc method name)
2| -> Number of parameters for the method (2)
    5| -> Index of the declared type of param 1 (String)
    6| -> Index of the declared type of param 2 (java.lang.Boolean)
        7| -> Index of the value of param 1 (Bob)
        6| -> Index of the runtime type for param 2 (java.lang.Boolean)
            0| -> Direct value of param 2 (False (0))

So why is this different? To figure it out we can take a quick look at the Java Interface for the “TestService”

@RemoteServiceRelativePath("testService")
public interface TestService extends RemoteService {
...
    String testStringAndBool(String name, boolean flag);
    String testStringAndBoolean(String name, Boolean flag);
...
}

Reviewing the differences between the two definitions informed me that types could only be mapped to their simple-type aliases if they were defined using their primitive java types (e.g “boolean”) and not via a reference to a complex Java object (e.g the “Boolean” class). The object reference transformed the simple type into a complex type, and serialization took this into consideration.

So although this didn’t really matter from an implementation perspective, this was something I needed to be aware of, as it presented as an alternative way to serialize simple types depending on what signature I extract from the obfuscated code.

Implementation wasn’t too difficult. I could use the logic I already had to parse the method calls and create two lists, one of parameter types and another of parameter values which would provide a 1-to-1 mapping of types and values. However, couldn’t be used directly in the RPC request as it would contain duplicates, and each RPC call should only have one instance of any particular string (even if it’s a parameter value).

To fix this I created a third “normalised” List of types, which only contained unique type values. As I could then use this List within the RPC string table. Then when building the string table and parameter map I could iterate over each parameter value, and when a string was found, append it directly to the string table after the list of unique values. However, when a simple type was found, nothing would be added to the string table but its value would be added directly into the parameter map.

An example of the output when a String and integer are serialized can be seen below:

...|processStringAndInt|java.lang.String/2004016611|I|test|1|2|3|4|2|5|6|7|26|

Serializing Complex Types

Now, when more complex Java types were in use, the serialization was a bit different. For example, take a look at the following “testListAndString” method and “names” List definitions:

String testListAndString(List<String> names, String surname);
List<String> names = new ArrayList<String>();

Note that "names" is an "ArrayList<String>" implementation of "List<String>". Serializing this we get the following:

testListAndString|java.util.List|java.lang.String/2004016611|java.util.ArrayList/4159755760|Bob|Alice|Smith|1|2|3|4|2|5|6|7|2|6|8|6|9|10|

Now when we break this down we find  a slightly different structure to when only passing simple types:

1|2|3|4| -> First 4 elements of the string table (inc method name)
2| -> Number of parameters for the method (2)
    5| -> Index of param 1's declared type (java.util.List)
    6| -> Index of param 2's declared type (java.lang.String)
        7| -> Index of param 1's runtime type (java.util.ArrayList)
        2| -> Number of List elements to follow (2)
            6| -> Index of list element 1's declared type (java.lang.String)
                8| -> Index of list element 1's value (Bob)
            6| -> Index of list element 2's declared type (java.lang.String)
                9| -> Index of list element 2's value (Alice)
        10| -> Index of param 2's value (Smith)

We can contrast this again in a similar manner to the Boolean / boolean distinctions. Where if the ArrayList is declared directly, rather than as an implementation of the List Class, the serialization omits the List object’s declared type, and the declared and runtime types for the List within the serialised request reference the same value index (ArrayList):

testArrayListAndString|java.util.ArrayList/4159755760|java.lang.String/2004016611|Bob|Alice|Smith|1|2|3|4|2|5|6|5|2|6|7|6|8|9|
  
1|2|3|4| -> First 4 elements of the string table (inc method name)
2| -> Number of parameters for the method (2)
    5| -> Index of param 1's declared type (java.util.ArrayList)
    ...
        5| -> Index of param 1's runtime type (java.util.ArrayList)
        2| -> Number of List elements to follow (2)
        ...

This became slightly more complex to implement, as method declaration patterns - which I was using as a starting point within the obfuscated code – only contained the declared types of parameters.  So when I found a List, I wouldn’t know if it was implemented as an “ArrayList” or another List type, or even if it was a List of Strings or integers. You can see this limitation below where we just find the List type and String type declared, but no other context regarding the List's implementation:

function $testListAndString(this$static, names, sname, callback){
  var ex, helper, streamWriter;
  helper = new RemoteServiceProxy$ServiceHelper(this$static, 'TestService_Proxy', 'testListAndString');
  try {
    streamWriter = $start(helper, 'com.ecorp.olympian.client.asyncService.TestService', 2);
    $append(streamWriter, '' + $addString(streamWriter, 'java.util.List'));
    $append(streamWriter, '' + $addString(streamWriter, 'java.lang.String/2004016611'));
    $writeObject(streamWriter, names);
    $append(streamWriter, '' + $addString(streamWriter, sname));
    $finish_0(helper, callback, ($clinit_RequestCallbackAdapter$ResponseReader() , STRING));
  }

On the other hand, if it was defined as an ArrayList directly, it would show as such above. However, I still couldn’t determine if it was a List of Strings or any other types / custom object. Another limitation was that it wasn’t possible to predetermine how long the list would be. As this was fully dependant on what the user provides at runtime.

To try and compensate for this, I implemented logic to assume that if a List type was found, it would be of the ArrayList implementation and hold Strings. Which will of course not always be correct, but worked for many cases, and was a starting point. 

I also had to create a custom format to hold these “enriched” List type signatures, so that I could break them back down again when building the types list for the string table. I opted for the following format:

java.util.List<java.util.ArrayList/4159755760<java.lang.String/2004016611>>

Then when building the normalized types list, I could break this enriched “type string” into its individual types using the following regular expression:

(java\.util\.(?:[A-Za-z]+)?List(?:[0-9/]+)?)<([a-zA-Z0-9\./]+)[<>]?(?:(.*[^>]))?>

Serializing Complex Objects

After reviewing the simple and complex types, I realised that there was in-fact a third category of types. Custom Java Objects, which at this point I couldn’t serialize as I had no idea how they were structured within the obfuscated client-side code.

This is a something of a significant limitation, as custom object are commonly used and passed between the client and server in GWT applications. However, I implemented some logic which attempted to make an educated guess as to how they are structured by treating them 'similar' to string values with an additional reference to a runtime type, as this was often how I saw a few (though I must say rather simple examples with a single property) structured.

I was low on time, but I think in theory some logic could be implemented to identify and trace the obfuscated “$writeObject” call and its parameter to figure out what a given List consists of, or what properties a custom object may have.

$ ./gwtmap.py -u "http://192.168.22.120/olympian/olympian.nocache.js" --backup    

   ___|  \        / __ __|   \  |     \      _ \
  |       \  \   /     |    |\/ |    _ \    |   |
  |   |    \  \ /      |    |   |   ___ \   ___/
 \____|    _/\_/      _|   _|  _| _/    _\ _|
                             version 0.1

[+] Analysing
====================
http://192.168.22.120/olympian/olympian.nocache.js
Permutation: http://192.168.22.120/olympian/037A330198815EAE6A360B7107F8C442.cache.js
+ fragment : http://192.168.22.120/olympian/deferredjs/037A330198815EAE6A360B7107F8C442/1.cache.js
+ fragment : http://192.168.22.120/olympian/deferredjs/037A330198815EAE6A360B7107F8C442/2.cache.js


[+] Module Info
====================
GWT Version: 2.9.0
Content-Type: text/x-gwt-rpc; charset=utf-8
X-GWT-Module-Base: http://192.168.22.120/olympian/
X-GWT-Permutation: 037A330198815EAE6A360B7107F8C442


[+] Methods Found
====================
----- AccountService -----

AccountService.getBio( com.ecorp.olympian.shared.model.ContextId/1042941669 )
AccountService.validateToken( java.lang.String/2004016611 )
...

----- TestService -----

TestService.testArrayList( java.util.ArrayList/4159755760<java.lang.String/2004016611> )
TestService.testXsrfToken( java.lang.String/2004016611 )
...

[+] Summary
====================
Backup: ./1601747650_037A330198815EAE6A360B7107F8C442.cache.js
Showing 5/5 Services
Showing 25/25 Methods

GWTMap – My attempt at a solution

Skipping ahead past a lot of coding, debugging, frustration, and success, I turned the little test script I was working with into a user-friendly and fully functional tool!

Finally, I had a tool that could extract all method signatures from obfuscated and non-obfuscated GWT permutations, including fragmented ones. And the tool could semi-reliably generate example serialized GWT-RPC request payloads which could be saved to a file, passed to tools like Burp suite, or automatically tested using the tool itself. 

I also performed a number of test cases to check that the tool worked against both obfuscated and non-obfuscated code for all of the following GWT versions:

  • GWT Version: 2.9.0 - released 2020
  • GWT Version: 2.8.2 - released 2017
  • GWT Version: 2.8.1 - released 2017
  • GWT Version: 2.8.0 - released 2016
  • GWT Version: 2.7.0 - released 2014

Or at least it seemed to consistently work based on my rather limited sample of test applications.

And with that, GWTMap was created! (Super cool and creative name I know!)

$ ./gwtmap.py -h
usage: gwtmap.py [-h] [--version] [-u <TARGET_URL>] -F <FILE>
                 [-b <BASE_URL>] [-p <PROXY>] [-c <COOKIES>]
                 [-f <FILTER>] [--basic] [--rpc] [--probe]
                 [--svc] [--code] [--color] [--backup [DIR]] [-q]

Enumerates GWT-RPC methods from {hex}.cache.js permutation files

Arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -u <TARGET_URL>, --url <TARGET_URL>
                        URL of the target GWT {name}.nocache.js bootstrap or {hex}.cache.js file
  -F <FILE>, --file <FILE>
                        path to the local copy of a {hex}.cache.js GWT permutation file
  -b <BASE_URL>, --base <BASE_URL>
                        specifies the base URL for a given permutation file in -F/--file mode
  -p <PROXY>, --proxy <PROXY>
                        URL for an optional HTTP proxy (e.g. -p http://127.0.0.1:8080)
  -c <COOKIES>, --cookies <COOKIES>
                        any cookies required to access the remote resource in -u/--url mode (e.g. 'JSESSIONID=ABCDEF; OTHER=XYZABC')
  -f <FILTER>, --filter <FILTER>
                        case-sensitive method filter for output (e.g. -f AuthSvc.checkSession)
  --basic               enables HTTP Basic authentication if require. Prompts for credentials
  --rpc                 attempts to generate a serialized RPC request for each method
  --probe               sends an HTTP probe request to test each method returned in --rpc mode
  --svc                 displays enumerated service information, in addition to methods
  --code                skips all and dumps the 're-formatted' state of the provided resource
  --color               enables console output colors
  --backup [DIR]        creates a local backup of retrieved code in -u/--url mode
  -q, --quiet           enables quiet mode (minimal output)

Example: ./gwtmap.py -u "http://127.0.0.1/example/example.nocache.js" -p "http://127.0.0.1:8080" --rpc --color

Example Workflow

To demonstrate an example workflow, here I first run GWTMap against my target application, and specify the “--backup” flag to ensure the code state (fragments included) is backed up locally after processing:

From the summary I can see that 5 services and 25 methods were identified, and that I now have a local backup of the complete permutation.

With a local backup of the code, I no longer need to reconnect to the target server during analysis and I can just pass the backup file using the “--F/--file” flag. I can also filter the output only display what I was interested in using the “--filter” argument. For example,  in this case I'm only interested in methods of the TestService: 

$ ./gwtmap.py -F ./1601747650_037A330198815EAE6A360B7107F8C442.cache.js \
--filter TestService

   ___|  \        / __ __|   \  |     \      _ \
  |       \  \   /     |    |\/ |    _ \    |   |
  |   |    \  \ /      |    |   |   ___ \   ___/
 \____|    _/\_/      _|   _|  _| _/    _\ _|
                             version 0.1

[+] Analysing
====================
./1601747650_037A330198815EAE6A360B7107F8C442.cache.js
Warning: Individual permutation files in -F/--file mode do not include deferred fragments


[+] Module Info
====================
GWT Version: 2.9.0
Content-Type: text/x-gwt-rpc; charset=utf-8
X-GWT-Module-Base: http://127.0.0.1/stub/
X-GWT-Permutation: 037A330198815EAE6A360B7107F8C442


[+] Methods Found
====================

----- TestService -----

TestService.testArrayList( java.util.ArrayList/4159755760<java.lang.String/2004016611> )
TestService.testXsrfToken( java.lang.String/2004016611 )
TestService.testListAndList( java.util.List<java.util.ArrayList/4159755760<java.lang.String/2004016611>>, java.util.List<java.util.ArrayList/4159755760<java.lang.String/2004016611>> )
TestService.testIntAndList( I, java.util.List<java.util.ArrayList/4159755760<java.lang.String/2004016611>> )
TestService.testListAndString( java.util.List<java.util.ArrayList/4159755760<java.lang.String/2004016611>>, java.lang.String/2004016611 )
TestService.testStringAndBoolean( java.lang.String/2004016611, java.lang.Boolean/476441737 )
TestService.testStringAndByte( java.lang.String/2004016611, B )
TestService.testStringAndChar( java.lang.String/2004016611, C )
TestService.testStringAndInt( java.lang.String/2004016611, I )
TestService.testStringAndBool( java.lang.String/2004016611, Z )
TestService.testStringAndFloat( java.lang.String/2004016611, F )
TestService.testStringAndLong( java.lang.String/2004016611, java.lang.Long/4227064769 )
TestService.testArrayListAndString( java.util.ArrayList/4159755760<java.lang.String/2004016611>, java.lang.String/2004016611 )
TestService.testDetails( java.lang.String/2004016611, java.lang.String/2004016611, I, D, java.lang.String/2004016611 )


[+] Summary
====================
Showing 1/5 Services
Showing 14/25 Methods

There is a warning to highlight that individual permutation files do not include deferred JavaScript fragments (which is true). However, in this case it’s fine as we are passing a “complete” permutation generated by GWTMap - rather than an individual permutation file from the application itself.

Now let’s say I want to generate an example RPC request for one of these methods. I can further filter on a specific method, and generate an RPC request for it using the “--rpc” flag. To further reduce output, I can run the tool in “quite mode” using “-q”.

However, as I’m now analysing a local file and not a remote URL, the base module will be unknown during RPC generation. But never fear, you can fix this by passing the module base URL with the “--base” argument: 

$ ./gwtmap.py -F ./1601747650_037A330198815EAE6A360B7107F8C442.cache.js \
--base http://192.168.22.120/olympian/ --filter TestService.testDetails \
--rpc –q

Warning: Individual permutation files in -F/--file mode do not include deferred fragments

----- TestService -----

POST /olympian/testService HTTP/1.1
Host: 192.168.22.120
Content-Type: text/x-gwt-rpc; charset=utf-8
X-GWT-Permutation: 037A330198815EAE6A360B7107F8C442
X-GWT-Module-Base: http://192.168.22.120/olympian/
Content-Length: 260

7|0|10|http://192.168.22.120/olympian/|67E3923F861223EE4967653A96E43846|com.ecorp.olympian.client.asyncService.TestService|testDetails|I|D|java.lang.String/2004016611|§param_Bob§|§param_Smith§|§param_Im_a_test§|1|2|3|4|5|7|7|5|6|7|8|9|§32§|§

As seen above, this outputs a GWT-RPC request which I can save to a file, or alternatively paste it directly into Burp Intruder. As GWTMap outputs the RPC payload with the parameters pre-flagged with the same flags Burp uses, when pasted inside Intruder I can instantly see the where the variables are within the request, and can start probing for vulnerabilities.

Alternatively, I can have GWTMap automatically probe the service whilst optionally setting Burp as a proxy to log the traffic for later use. This can be done using the “--probe” and “--proxy” arguments. GWTMap will then automatically issue a HTTP POST request using the generated payload and output the response:

$ ./gwtmap.py -F ./1601747650_037A330198815EAE6A360B7107F8C442.cache.js \
--base http://192.168.22.120/olympian/ --proxy http://127.0.0.1:8080 \
--filter TestService.testDetails --rpc --probe -q                                 

----- TestService -----

TestService.testDetails( java.lang.String/2004016611, java.lang.String/2004016611, I, D, java.lang.String/2004016611 )
...
...
HTTP/1.1 200
//OK[1,["Name: param_Bob param_Smith\nAge: 32\nWeight: 76.6\nBio: param_Im_a_test\n"],0,7]

If need be, HTTP cookies and HTTP Basic authentication can also be set using the “--cookies” and “--basic” flags respectively. However, it was not required in this case, and by reviewing the output above I can see that the RPC request was generated and successfully used to interacted with the specified (filtered) service.

Now I can go into my Burp history tab, find the request, and continue to interact with the service, manually searching for vulnerabilities within this enumerated method:

Conclusion

GWT applications are as interesting as they can be frustrating. Hopefully this research helped you understand the topic a little better and GWTMap can help make testing a bit easier.

GWTMap should (hopefully) work against all versions of GWT between 2014 and 2020 (obfuscated or not) even if the permutations are fragmented. However, it does have some limitations in that it can only serialize a small set of Java Types when generating RPC requests, and infers the specifics of List and custom objects.

Depending on how the services have been initialised within the client-side Java implementation, it may also fail to retrieve the service paths on occasion, or fail to correlate a method with a given service correctly. However, if this happens a warning will be returned and GWTMap will give some suggestions based on the information it has.

If you think it’s lacking or could be improved, or better yet, if you can fix some of its limitations, feel free to contribute to it and help keep it alive!

Acknowledgements

A specific appreciation for the research and effort put into pre-existing tools such as those published by GDS, Steven Seeley, and TheHackerish.

References

[1] Brian Slesinsky - The GWT-RPC wire protocol
https://docs.google.com/document/d/1eG0YocsYYbNAtivkLtcaiEE5IOF5u4LUol8-LL0TIKU/edit

[2] GDSSecurity - GWTEnum: Enumerating GWT-RPC Method Calls
https://blog.gdssecurity.com/labs/2010/7/20/gwtenum-enumerating-gwt-rpc-method-calls.html

[3] GDSSecurity - GWT-Penetration-Testing-Toolset
https://github.com/GDSSecurity/GWT-Penetration-Testing-Toolset

[4] srcincite.io (Steven Seeley) - From Serialised to Shell :: Auditing Google Web Toolkit
https://srcincite.io/blog/2017/04/27/from-serialised-to-shell-auditing-google-web-toolkit.html

[5] srcincite.io (Steven Seeley) – gwt.py
https://github.com/sourceincite/tools/blob/master/gwt.py

[6] TheHackerish - Hacking a Google Web Toolkit application
https://thehackerish.com/hacking-a-google-web-toolkit-application/

[7] TheHackerish – GWTab Burp Extension
https://github.com/thehackerish/GWTab