Tips for Speeding Up Your Remote Appium Sessions

January 28, 2020
Jonathan Lipps

I am the founder of this consulting firm called CloudGrey, where we focus on mobile automation and helping clients get their mobile automation strategy going and whether they should use Appium or something else, training teams, stuff like that. I also work with the Appium open-source project and have been involved with that since the beginning.

Execute Driver Script

The first topic that I want to cover today is something that’s a new feature in Appium called executeDriverScript, and it sounds a lot like execute script which you might know as a feature from Selenium and Appium in web mode. This is something a little bit different. So, first let’s talk about the problem this feature is trying to solve.

Execute Driver Script

As a bit of background, we can remember that Appium is built around something called the client-server architecture. If we take a look at this diagram here, what client-server architecture means is that the part of Appium which is actually automating devices, physically connected to a device, is not necessarily located in the same place as your test code. When you’re writing Appium test code, you’re importing an Appium client into your favorite programming language and writing commands like driver.findElement or whatever.

Accelerate Appium test cycles with the HeadSpin

But what’s actually happening under the hood when you run those commands in your programming language is that those commands are turned into HTTP requests and sent to an Appium server, which is running somewhere on a network that’s accessible to you. It might be on your local network. If you’re running Appium locally, you don’t really notice very much that Appium might be far away, because it’s not. It’s in the same place as your test script, but Appium might be running somewhere on the other side of the world from your test script.

Because Appium has this client-server architecture, Appium servers can be located anywhere in the world as long as you can send an HTTP request to them, and this provides a ton of benefits. This client-server architecture is great for a lot of reasons. It means that you can have clients written in any different language, but they all speak to the server which is just a single implementation of a sort of rest API which can receive commands, doesn’t have to be written in all of these different languages. You can talk to Appium servers around the world, which might be nice because those Appium servers might be connected to devices around the world – devices that you don’t have access to locally, and yet you can still write and run your test script locally on your machine without having to zip it up and send it somewhere.

There’s a lot of really great benefits to this architecture, but it does have some downsides as well. One of the things we can observe is that every Appium command, because of this client-server architecture, is a separate HTTP request.

This isn’t a big deal when you’re running your tests against an Appium server that’s running locally, because HTTP requests over the local interface are basically instant. There’s a bit of time that it takes of course, but it’s basically instant. So, you don’t really notice any problems there, but imagine if the Appium servers across the world. Now every command that you execute in your test script is triggering an HTTP request that’s having to traverse the global internet all the way around the world, which can be a bit time-consuming.


Let’s take a look at this in a little bit more detail. What this diagram is showing is a sort of timeline from top to bottom of an Appium test which usually takes place within the course of a single Appium session. So, in any Appium test, you typically have a new session call which starts up the device or simulator or whatever that you’re working with, and then at some point you have a quit session call.

So those are always there in any given test, but then in between those two you might have any number of commands. You might have two commands. You might have 200 commands. It all depends on what you’re trying to accomplish in your test. Now the thing that this diagram is pointing out, is that for each of these, your client is sending a network request to the Appium server, and then the server is sending a response back.

Check: Why You Should Consider Appium to Setup Mobile Testing Infrastructure

Now if we imagine that the server is somewhere far away from you, or if you have particularly slow internet, or if the packets have to route through a particularly slow path, then you might be looking at a decent amount of time for each of these requests and responses to take place. Maybe let’s say a hundred milliseconds as an arbitrary guess. I think from where I’m sitting in Vancouver to let’s say Tokyo, Japan, average latency, kind of base latency, would be something like 90 milliseconds. So, we could imagine that this might be a real-world example of me running a test against an Appium server in Tokyo.

So, what we observe from all this is that if we want to figure out our total test time, we obviously have to factor latency into the equation. It’s not just the time that the Appium server takes to execute the command, it’s also the time that it takes her client to send the command and the time that the Appium server takes to send the response back to the client.

So that total latency we could formulate with this little schema that I have down at the bottom. Total latency equals the number of test commands or test steps plus 2. Plus 2 because we always have the new session and the quit session, so those aren’t counted as test steps. Those are kind of boilerplate the things that are apart of every single session, because then you have your number of actual test commands.

Each of those is multiplied by the latency that exists. I should say average latency, really, that exists between the client and server, and then x2 because we have both a request and a response of course.

So, let’s take a look at what that would look like with an example in practice. So here’s the formula specified again using something that looks like code. I guess this would be like PHP code or something like that, but hopefully you can understand it. If we substitute actual numbers for these variables here, like let’s use the figure of 90 milliseconds, or that this 90 here is the number of commands that are occurring in my test, so that’s the number of commands. So total we have 92 commands that are going back and forth across the network, and let’s say we have a latency of a hundred milliseconds for each direction of each command. Then what we see is that the latency, or I should say time lost due to latency or the total latency of our test, is 18.4 seconds. So 18,400 milliseconds. 18.4 seconds.

This is a fair amount of time we’ve lost just due to latency in our test. This is time that our Appium server isn’t doing anything. Client isn’t doing anything. It’s just data passing over the wire. So if you multiply this number by the number of tests in your build, if you’ve got a hundred tests, that’s now 1,840 seconds. If you divide that by 60, you get the number of minutes that you’ve wasted in your bill just due to latency.

Obviously wasted is a strong term. Part of this is just a fact of life and you might be fine with it, but it’s something that we can optimize, and that’s what the point of this executeDriverScript feature is, which I’m coming to describe here pretty soon.

Other than just adding time to your build, network latency can also lead to unreliability. Imagine you have a test where there is a bunch of interaction happening within your application, and that interaction is fairly fast and fairly time-dependent. So, this is especially true in apps like games where speediness is very important. If you’re trying to play a game, if you wait two seconds too long to tap a button or an object – that object might not even be there anymore. If you now imagine trying to automate playing that game with Appium, you can see that time loss due to latency isn’t just time that you wasted. It could also be time that prevents your test from actually succeeding.

So this is one reason that the authors of the Webdriver API decided to do something pretty unique with the touch actions API in the new version of the Webdriver protocol. Where instead of sending each action one at a time like put the finger here and then press it down and then swipe it here and then do something else with it. If each of those individual components of the action were sent over the network, there would be no guarantee as to when they would happen in relation to one another, because network latency is there. Sometimes requests take longer than other times. Sometimes there’s internet weather. All this kind of stuff.

So, that API allows grouping of a bunch of different actions together that are then kind of unpacked and executed on the server which is local to the device, and therefore can ensure that things are happening with a very precise amount of time in between them.

So, what executeDriverScript is is basically an attempt to do this for all Appium commands. Not just actions, but any command that you can think of. It’s essentially a way of grouping commands together to work around latency or batching them together. You could think about it as creating a batch of commands which all get executed on the server rather than being sent over the network one at a time. So how does this command work? Well, if you are using the Java client, it looks like calling driver.executeDriverScript. Of course, depending on the language you use, this might look a little different to you and you’ll have to refer to the documentation to see the command in your favorite language. It’ll look something like this: driver.executeDriverScript.

How do we use this in practice? Well, let’s first take a look at a very simple test that’s written without this feature. So this is a test of login in my silly little test app that I have, where we’re first setting an implicit wait time out, because we’re lazy and don’t want to use explicit wait, so we’re using an implicit wait. Then we’re finding these different elements by accessibilityID and interacting with them using the standard Appium Webdriver API driver.findElement, and then we’re clicking driver.Findelement, and then we’re sending keys.

So we’re filling out a username and a password, clicking the login button, and then asserting that we see the log out text to prove that we actually logged in successfully. So this is how we would normally write this test without using executeDriverScript. What if we wanted to batch all of these commands together? So then instead of being executed one at a time and waiting for the request and the response to occur for each of these, what if we group them together?

Let’s remember that it’s not just the six or so lines here, but it actually is almost double that in terms of the number of commands that we’re running – so finding an element is a command, click as a command, and so on. So we’ve actually got about eight or nine or ten different commands represented here.

So if we wanted to group all these together, we could use executeDriverScript like so. Basically calling driver.executeDriverScript, and putting all of what looks like code into basically one long string. So what’s going on here? Let’s have a look at this with a reference to the previous code that we just saw: the normal method. So what we have is basically how each of the lines in the normal method is now represented by a different line within our string here. So instead of saying driver.manage.timeouts.iimplicitlyWait, we are saying await driver.SetImplicitTimeout. So if you know the Webdriver API, you can read this code within this string here and immediately see this is just Webdriver code.

Now you might not know what this word await means, you might not know what this dollar sign function is doing, and why there’s a tilde character in front of the accessibility IDs. That might seem a little opaque to you, but if you know how to write Appium tests in any language, you could probably look at this and say what’s happening here is we’re encoding a whole set of steps, a whole multitude of commands in one string. Basically you can do it in JavaScript when you call the eval function. You can put a bunch of JavaScript in a string and then evaluate it all in one go. That’s actually more or less what’s happening here. This is actually JavaScript code that we’ve written in this string. It’s not just JavaScript code, but it’s code that’s written with access to a driver object, and this driver object comes from the Webdriver I/O library.

Execute Driver Script Allows Remote Running of a WebdriverIO Script

So, executeDriverScript is basically a method that allows you to remotely execute a WebdriverIO script – just written as a single string. If you’ve never heard of WebdriverIO, it’s probably the best Webdriver client for JavaScript, and it supports selenium. It supports Appium. It’s very well maintained. I’ve made a few contributions to it myself. It’s a great project, really fun to work with, and a lot of people use it to write their test automation code.

Basically if you wanted to know about what does this dollar sign function mean? What does this tilde mean? You would go to the WebdriverIO documentation and APIs and learn about that. Just to share – since we’re on this screen – what they do mean the dollar sign function is essentially the findElement call, just a little more concise, and the tilde in front of a string means use the accessibilityID locator strategy. So instead of having separate strategies and selectors in the WebdriverIO API, they’re all kind of mashed together into one string, and the Webdriver I/O Library kind of intelligently looks at the shape of the string and decides what locator strategy to use.

For accessibilityID the kind of magic first character is this tilde, where as you can see in the XPath query, because we start with two forward slashes, the library is able to detect that that’s an XPath locator strategy without us having to specify that explicitly.

We also have this await keyword. That’s because in JavaScript, we are dealing with promises and a weight is a way of kind of waiting for the results of a promise and on wrapping it without having to kind of mess with promises in their native way. It lets us write cleaner code. We’ve got two awaits here on each of the lines, because we’re actually running two commands: one for finding and one for clicking. There might be a more elegant way to write this. This is the first way that occurred to me and that works using the WebdriverIO Library.

Why Javascript?

So, that’s a little bit about WebdriverIO, but we can step back and actually ask why JavaScript? Why use JavaScript in the first place? The answer is that Appium itself is written in JavaScript, so executing this code on the server side was more or less trivial, because we had access to the WebdriverIO API natively from within node.js (which is what Appium runs on). Also, a lot of people use this library. Most people have some familiarity with JavaScript, even if they don’t with other languages and we like, on the Appium team, doing things in JavaScript. Sometimes it bothers some people, but we’ve been fairly successful by sticking with JavaScript, so it seemed like a good idea.

You might also be thinking: wait a second, every time I’ve written JavaScript or read anything about JavaScript, I’ve always heard to never put code in strings and evaluate it blindly. That’s actually really good advice, because what we’re enabling here through this feature is essentially a kind of remote code execution, which is always risky if you allow your users to wrap up arbitrary code and run it in another context. Who knows what kinds of clever and dastardly things they might do? So this is taken for granted. This is true.

Because there is a fundamental concern about security (because we never know what somebody might be able to do), to use this feature in your Appium server, you actually have to start the server with a special flag. This is a new flag that has been introduced recently: the allow-insecure flag, where you can give a special label to tell Appium to turn a feature on which is not on by default. So if you’re running an Appium server, this feature is not on by default. You don’t have to worry about any security implemented implications of it, unless you explicitly turn this feature on.

Also read: Mobile Application Security Testing Guide

What kind of security implications are we really talking about here? How concerned should you be by turning this feature on if you are an Appium administrator? Personally, I don’t think you should be that concerned because we take a lot of care in really delineating what you can and can’t do with the code snippets you send in.

First of all, all the code is not actually executed in the Appium context. It’s executed in a completely separate node.js process, so it doesn’t have access to any of the objects in Appium’s memory. It can’t snoop on other sessions that are happening. It actually has no access to any of the node.js API itself, so it can’t read anything from the file system or call any of the standard node.js libraries.

All it has access to is the driver object from WebDriverIO, which is pre-connected to your session, not anyone else’s session. There’s a console object, so you can write logs to the console. There’s a promise object in case you want to insert some sleeps or something into your script. So this is a pretty minimal set of things. You have access to the full JavaScript language and syntax, so you can write loops and if statements and things like that. Otherwise, it’s not like you have a lot of tools at your disposal to try and do anything nasty.

I should also say that with this API you can write logs which get returns to you as part of the API response, and you can also return any kind of object that you want. As long as it’s JSON serializable, you’ll get that object back in your client script. Also, if you return an element from this script, like you wanted to find an element and return it as the last step in your driver script, that will be turned into a web element that you can use as a native web element in your test script too. So it’s a pretty intuitive and pretty useful feature.

So, let’s talk about what this looks like in practice in terms of solving the latency problem. Because remember the whole idea behind this is that if you can combine a bunch of commands into a batch and have them executed locally to the Appium server, then you save the time lost due to latency for the request and response for each of those commands. So we should expect that if we use executeDriverScript and put our commands in this batch mode (instead of running them one at a time) that the latency problem will basically go away. That could have a large speed up or time savings within our test steps, if we’re executing our test on servers that are not very close to our client’s script.

So this is indeed what we find. Here is a little chart of some test that I ran in this series of tests. I basically compared time loss due to latency across local execution on devices that are not close to me but not too far away in California and then devices which are quite a bit further away in Tokyo, Japan. So what we can see is if we just look at the average test time for the scenarios where we’re not using execute driver, that the farther the server gets away from the client, the higher the average test time and this is solely due to latency. This isn’t due to the Appium server device taking longer in those places. We’re actually just adding all of these seconds due to latency.

So, you know between running locally and running in Tokyo, we’ve got about 2x different. So it takes twice as long for a test to run in Tokyo as it does locally, just because Tokyo’s farther away. So when we look at the situations where we are using execute driver what we see is a pretty significant speed up. We don’t see the speed up locally. We see a 0.8% speed up. That’s because we save almost no time by batching commands. I’m surprised we see anything at all. This 0.8% should probably be regarded as nill, because time loss due to latency on the local network is basically nill.

When I’m running tests on devices in California from my office in Vancouver, I see about a 40% speed up if I put my commands within the executeDriverScript batch. Against Tokyo devices, I see an even bigger speed up. So just in case you’re wondering what are the details of the test that I ran.

These are on real Android devices in these different places around the world. Each test condition: so combination of server location and choice of executeDriverScript or not contain 15 separate tests. Each test contained five loops of a login/logout flow, the same steps that I showed you earlier. In other words: each of these 15 tests contain 90 commands for a total of 1,350 commands for condition. So we were able to get a pretty significant number of commands, so I think that our results here are pretty solid.

It’s also we’re saying that the speed-up percentages you see are speed-up of test time, not total session time, so this doesn’t include session start up. Session startup can be a big factor in overall test time, but test time is highly dependent and variable based on the number of test steps you have, whereas startup time is not. So, I didn’t factor that into the speed-up, so that we are really comparing apples to apples here.

So to summarize this section on executeDriverScript, latency can be a huge cause of slowness and unreliability. We can work around this using executeDriverScript by batching whatever commands we want into a single HTTP call. ExecuteDriverScript uses the Webdriver IO API. You basically write WebdriverIO code as a string and then send that to the Appium server (which then unwraps that and executes it).

Because of potential security concerns, you do need to start Appium with a special flag to make sure that you’re not unknowingly opening yourself up to the security implications. That being said, there are very few really bad things that I think are possible within this model, but I should disclaim that I’m not a security expert. There are hackers out there that are probably a lot smarter than me who might be able to take advantage of this. So, talk to your security people. Figure it out. Beware.

Recommended post: Why is Automated Mobile Application Security Testing a Priority for Businesses

I should also say that while this is I think a really interesting feature, you shouldn’t use it just because you want to speed things up. If you don’t have a real problem with latency, there are lots of problems with it. For example, stack traces no longer will be shown to you at a certain line in your client script. You’ll have to dig through the Appium logs to figure out what exactly failed and where. So there’s a few different reasons that it’s probably not a good idea to use this unless you really need the performance benefits that it can provide. That being said, I think it’s a pretty interesting new feature.

Direct Connect

So, let’s move on and talk about another strategy for reducing the amount of time lost to latency – this time specifically when running tests on cloud services that themselves have devices scattered around the world. So, another way of putting this is that execute driver script helps latency lost or time loss due to latency between your client script and the endpoint where your client is speaking to at the cloud service.

But what about latency that exists within a cloud service itself? This can be an issue because many Appium clouds use load balancers to make life easier for the users of that service. In other words, what we can find is an architecture like this, where you’re the user of this cloud service writing a test and you’re using an Appium client, which is local to you. Then as the location of the Appium server that you encode in your scripts, you’re given some kind of cloud service entry point. This is often a load balancer. So no matter which devices or locations or platforms you’re automating, there’s often a single Appium endpoint that’s kind of an entry point into the whole cloud service.

That load balancer is responsible for sending your Appium requests to other Appium servers,  potentially hundreds or thousands of other Appium servers, which are themselves connected to devices or emulators or simulators. The load balancer is responsible for keeping track of session IDs and making sure that create session requests are getting sent to the right place and that subsequent commands are getting sent to the right place. So this is a very common architecture that we find.

Notice that within this architecture, we don’t just have an HTTP request between the client and the cloud service entry point. We also have HTTP request between the entry point, the load balancer, and the kind of most remote Appium servers or the utmost Appium servers that are actually connected to devices and actually running the Appium software. So what kind of network connection lies between the load balancer and these Appium servers? Well as a user of the cloud service, you don’t necessarily know. It could be local, it could be within a single data center, but it could also be across the world. You might have a load balancer in California and an Appium server in Tokyo, and that’s quite a bit of a way apart.

So if you’re running a test from Tokyo, now your requests are going to California and then back to Tokyo, when theoretically that’s a bit unnecessary. So there is time loss due to latency within cloud environments as well. Unfortunately, you have no real control over that. It all depends on how the cloud service has architected system. So if before we have this formula where total latency or time lost is the number of commands plus 2, times the latency doubled, now we actually have something a little bit more complex where we have both the outside latency and the inside or internal latency. You actually have to add those together when you’re dealing with a load balancer and then multiply the result by 2.

So here’s an example. This is what I actually ran in my experiments with this. We’ve got commands going from Vancouver to a load balancer in California and then over to Japan (where this is not completely out of the way, but there is a more optimal path for my request take, which would be if I went directly from Vancouver to Japan). Depending on the way that internet traffic routing works, this might be actually a lot faster, because they’re obviously physical constraints due to the speed of light. Things like that. But there are also other practical constraints, like which paths packets have to take. It might be even less efficient going through California than this simple picture shows.

So, the good news is that because of some work that we’ve done in the Appium client libraries, cloud providers can now have their load balancers (their entry points) directly connect clients your test scripts to more remote Appium servers (the ones that are directly connected to the Appium devices). So that turns the previous architectural diagram into something that looks a lot more like this: where after the initial request to the load balancer, any subsequent requests from the client to the Appium server no longer goes through the load balancer as an intermediary. Instead, it goes directly to the Appium server. So, it essentially cuts out the middleman, saving any time loss due to latency that you get from making these multiple hops.

We call this Direct Connect. Basically the way it works is that your client first speaks to the load balancer, which starts a session on a more remote Appium host somewhere within the cloud service. Then the load balancer, when it responds with the new session result to your client, it decorates that response with information about these specific remote Appium host (which is going to be servicing your session).

Now the Appium client can use that information to talk directly to that host, rather than going through the load balancer if it wants. So, the way that the load balancer decorates the response is by adding specifically these keys to the response object. They all start with Direct Connect. Obviously, we have to specify these four pieces of information: the protocol that should be used, the host, the port, and the path that should be prefixed to any of the API calls.

So, HeadSpin was able to implement this for me so I was able to again gather some data on the actual consequences of using this feature in terms of time loss due to latency. So again, let’s run some numbers. The way that this test worked was very similar to the way that the executeDriverScript worked. I basically had all the tests clients located in my office in Vancouver, and I ran tests on devices located in California and Tokyo, Japan – sometimes using Direct Connect and sometimes not using Direct Connect.

Sometimes when I wasn’t using Direct Connect, my requests were going to California and then on to the devices. Sometimes they were using Direct Connect and going directly to the most remote Appium server. So interestingly, and I think this is a good illustration here, what we see when using Direct Connect with sessions pointed at the California devices, is basically no speed – a very very small speed up of 1.2% (which you might consider real or might just be an artifact of the tests that I ran). That’s because the load balancer is in California and the devices are as well.

Whether or not I’m using Direct Connect, basically the commands are following the same route. There might be a little bit of an extra bit of time loss due to latency between the load balancer in California and the devices in California, but they’re so close to each other it basically doesn’t make any difference.

We can see a huge difference, however, when I ran tests on devices in Tokyo. Because when using Direct Connect, my test commands were able to go directly to Tokyo and back rather than going first to California and then onto Tokyo. We saw that just by cutting out this hop to California, we were able to get a speed-up of about 30%. The details of the test that I ran are basically the same as before – about 1,350 commands per condition. These results are pretty solid in my opinion.

To kind of sum up where we’re at with Direct Connect, the problem that it solves is the latency that might exist within a cloud service itself due to the combination of load balancers and geographic distribution of devices. Geographic distribution of devices is great. You can run your tests on devices in Tokyo or devices in Nairobi or whatever, but the cost for that is all this extra latency. Using Direct Connect, the cloud service can help our client know where to speak directly so we no longer have to lose that time by making the extra hop through the load balancer.

So, that’s basically it. The one thing to be aware of is that Direct Connect might not work great if you have a corporate firewall which hasn’t been configured to allow connections to the specific Appium hosts that the load balancer might redirect you to. If you’ve got a situation where your corporate firewall is only configured to allow connections to the load balancer, then when your Appium client tries to connect directly to an Appium host (if that’s not also in your firewalls whitelist), then you might run into problems. You might not be able to actually use Direct Connect.

The other thing is that because you might want to enable or disable Direct Connect on the client side, you might have to include a special flag in your client initialization options to turn this on.  For example, in the Webdriver IO Library, which I use to run all these tests, to turn Direct Connect on the client side, you have to set the enable Direct Connect configuration object to true.

Otherwise, you don’t actually have to know anything about the Direct Connect keys that I showed you. This all just happens transparently within your client code, and you just get the benefits of it as long as the cloud providers turned it on, as long as your client has it implemented and enabled, and as long as your firewall doesn’t block it. So that is Direct Connect. The other nice thing I guess I could say about it is that unlike executeDriverScript, it doesn’t come with any of these sort of security implications that I talked about before.

iOS Startup

All right as a little bonus, let’s just talk a little bit about iOS startup times, and we’ll see how this relates to cloud execution performance in a moment. Just as a brief review, some of the issues with iOS startup time come in forms, like the fact that it takes a simulator forever to boot up if you’re using a simulator, or if you’re using a real device, your device cleaning process might take some time. Appium by default always reinstalls and cleans applications which take awhile. This tool that we use called WebDriverAgent needs to be built or discovered if it’s already been built, and that can take a very long time. If you need to build WebDriverAgent for the first time, even retrieving WebDriverAgent that’s already been built can take up to a couple seconds, because the tools that Appium uses to figure out where that is depend on Apple binary (which for some reason are slow and we have no idea why). But there you have it.

Of course, WebdriverAgent needs to launch on the device itself and come alive and start listening on a server or on a port. Stuff like that can dramatically slow down your iOS startup time. I mean if you stack all these things on top of each other, you could be looking at minutes of startup time, especially if you have a cold simulator boot. Things like that.

Tips for Improving iOS Startup Time

There are a bunch of ways to work around this when you’re using Appium locally, and we covered some of this in an Appium Pro article not too long ago. Basically if you’re using a simulator, you can pre-launch that simulator and just keep it warm and just keep it running. Appium will use an existing simulator if it’s already running without the need to boot a new one, and that saves a lot of time.

If you don’t actively need your app to be reinstalled and cleaned for every test, you can use the noReset capability, and set it to true to save a few seconds there. Appium, in order to launch your app successfully, needs something called the bundleID from your application, which is  a unique identifier given to your app by the developers. So Appium has to look that up and again, it takes a second or two to parse the various files that it needs to find that information. So you can just give it that information directly and then Appium will not try and look for it. Save a couple seconds there.

Same thing goes for WebDriverAgent. If you already have WebDriverAgent built on your machine somewhere you can use this capability called usePrebuiltWebDriverAgent, set it to true, and then Appium will not try and build it for you. Instead it will look in a place specified by the derive the datapath capability for the biltz WebDriverAgent binary. So if you can build WebDriverAgent in xcode and get the magic directory from xcode, that acts as the derivedDataPath, and give those things to Appium, then you’re all set. You’ll save quite a bit of time there.

Finally, if WebDriverAgent has already been running on a device, it just stays running by default. So, you don’t necessarily need to shut it down and start it every single time you run a new session. For all the sessions except for the first one, you could have a flag in your code that just tells Appium where to find WebDriverAgent on the device and defaults to running on Port 8100, I believe. That will cause Appium to skip everything to do with WebDriverAgent and just start talking to it and assuming that it’s there.

So if you use all these techniques together, stack them on top of each other, you can get a 75% reduction or thereabouts in test startup time, which is pretty great.

This is all pretty awesome for running Appium locally, running on simulator, stuff like that. But what about running Appium in the cloud? The issue here is that you don’t run devices and manage them when they’re in the cloud. It’s the cloud providers that are managing all the issues to do a start-up, especially for real devices where cleaning a device and preparing it for new customers is extremely important.

On top of that, cloud providers have high reliability and security requirements. So, some of these tricks like using no reset or keeping a WebDriverAgent server alive or this kind of thing might be considered either a security risk or a reliability risk. So, cloud providers often have to sacrifice a bit of time to make sure that the device is in a perfectly clean and reliable state for a new test that’s coming in potentially from a completely different customer.

But it’s not all bad news because since cloud providers are in charge of the whole startup flow, if they are able to really tune that startup flow, the fact that they manage it and control it can actually work for you. So, I was talking with some folks that work at HeadSpin and thought it was pretty interesting that they’ve actually been able to work around some of the slowness inherent in Appium and WebDriverAgent by building some custom tooling to launch WebDriverAgent and things like that that according to the statistics that they gave me can cut in 10 to 15 seconds off of every session start.

So, this is really great. It really mitigates some of the time that cloud providers have to spend making the phone secure and ready to go for a new test. If they can do stuff like this to really speed up other aspects of the process, you’ll get an all-around better experience. So, I thought that was pretty interesting and wanted to share that with you all and also just that I’m happy to say when I was running all these tests that Headspin was the first Cloud that was able to support and executeDriverScript and Direct Connect and they gave me all the devices to use to come up with the numbers.

So, I thought that was pretty awesome. So thanks to HeadSpin for that and thanks to you all for your attention and learning with me about executeDriverScript and Direct Connect and some this iOS startup time.

That’s it for the presentation. In case you’re unfamiliar with it, I run an article (or I should say a weekly newsletter and blog) called Appium Pro. We wrote about Direct Connect and published it this morning and executeDriverScript last week. So there’s a lot of cutting-edge Appium stuff that I write about there and tips and tricks and how to use things like that. So, definitely check that out if you get a chance.


Q: What about error when script fails? Will we get an alert?

A: So if you’re using executeDriverScript and one of the steps within the WebdriverIO code fails, what will happen is that you will receive an error in your client. It will say executeDriverScript failed. Here’s the message. You’ll get the error that was thrown in the JavaScript execution context. If you want more information about it, you can look in the Appium server logs and it should have a stack trace for that error message within the lines of code that you sent in the string. So you do get an error. It actually works pretty well in my opinion. It’s just not an error that you can link to a specific line in your test script.

Q: We have driver start for iOS real device about 7 average. From your experience, is it possible to make it faster?

A: If you’re starting your tests in 7 seconds on a real device, I would say that’s already pretty good. I’m guessing you’re using some of the tips that we discussed here. I would say that’s on the low end of times that I’ve seen so there might be ways to improve that a little bit, but you’re already doing pretty good.  

Q: What exactly is command execution time? How different will it be between commands like click, send keys, etc?

A: So command execution time is the time that the Appium server takes to execute a specific command once it’s received that command. So, time between different commands varies quite a bit. So click is a pretty fast command. Find element can be fast depending on your locator strategy. It can be very slow. Think about if you’re sending keys to an input field – that if you’re sending one character – that would be pretty fast. If you’re sending a 100 characters – that will be quite slow. So it varies a great deal.

Q: How to enable this when using proxy with Appium desktop?

A: I’m not sure that I understand the question. I don’t think that you can currently enable executeDriverScript when using Appium desktop. It’s a command line flag at the moment. There’s probably a way to add it to Appium Desktop though. I don’t think that that has been done. Not sure what proxies would have to do with executeDriverScript. If you’re talking about Direct Connect, Direct Connect isn’t an Appium feature. So, you know it doesn’t matter whether you’re using Appium Desktop or another thing. Appium itself doesn’t know anything about Direct Connect. That’s something that a load balancer, something like the cloud provider would have or something like selenium grid would know about and that the clients would know about. It would be great to get Direct Connect support into Selenium Grid, but we haven’t looked at what that would entail or whether that would be something that the Selenium team would even be interested in at this point.

Q: Is there a locator strategy you recommend for faster execution?

A: Yeah, I have a whole Appium Pro post on this. Basically it kind of depends on the queries you use. Typically we say avoid XPath because it can be slow. Except it doesn’t have to be slow. It depends a lot on the query that you use and depends a lot on how many elements are on screen. So, there was a talk by Jonah Stiennon, who’s a partner at CloudGrey at Appium Conf 2019. You can look up that video. He actually did some experimentation on how the performance of different locators changes depending on the number of elements that are on screen. I like to say you should probably just experiment with your own application, your own views to figure out what’s fast. But something like accessibilityID is typically promoted as the best locator strategy to use, because it’s both fast and has the potential of being unique if you design your app correctly. So, that’s the one that I would recommend for both speed and maintainability.

Q: Does this work for iOS devices?

A: I’m not sure what this is, but all the things that I’ve described today work no matter what platform we’re talking about. They work on iOS and Android.

HeadSpin Logo

About HeadSpin

HeadSpin helps Telcos, media organizations, and large enterprises analyze and improve the user experience of their digital products through its global real device infrastructure, on the edge end-to-end testing, and ML-driven performance and quality of experience analytics.

The HeadSpin data science platform enables collaboration among global teams to accelerate release cycles, build for complex real user environments, and proactively detect and resolve issues whether at the code, device, or network layer. HeadSpin currently works alongside a number of global telco and media organizations today to:

  • Monitor and improve 5G user experience
  • Improve streaming experience for OTT apps
  • Test and optimize data, voice, and messaging services
  • Assess and validate device compatibility
  • Offer regression insights for accelerated development
  • Deploy software at the edge
Infosys Logo

About Infosys

Infosys is a global leader in next generation digital services and consulting. We enable clients in 50+ countries to navigate their digital transformation. With over three decades of experience in managing the system and workings of global enterprises, we expertly steer our clients through their digital journey. Visit to see how Infosys (NYSE:INFY) can help your enterprise navigate your next.