Good morning everyone, or good afternoon, or evening, or midnight, depending on where you might be in the world. I’m really excited to be here in what is my morning, talking about Appium drivers and Appium 2.0. So, Jason already gave a very complete introduction for me, so I’ll just add this picture. I’m even wearing the same hat today. This is, in fact, what I look like, and I do work on Appium, both the open-source side as well as helping my clients be successful with Appium and mobile automation.
So, today we’re going to have a brief introduction and tour of four drivers which are part of the Appium driver bucket that you might not have used before. We’re going to talk about the Espresso driver, the Windows Driver,
the macOS driver, and the Raspberry Pi driver, which is for testing IoT devices. That’s definitely the most out-there of all we’re going to look at.
And of course, we will also touch on Appium 2.0. I’ve been working on Appium 2.0 a little bit, and I thought it was time to start circulating my ideas about what Appium 2.0 is, what’s the vision, and what are some of the technical possibilities and changes that you might expect when it’s released sometime next year.
So first of all, let’s drive a little bit beyond mobile. Let’s go beyond the typical Android and iOS drivers that you probably use with Appium and talk about some of the other possibilities. The basic idea here is that Appium uses something called the Webdriver protocol to facilitate the communication between your test script and the Appium server, which then ultimately performs the automation commands that need to happen.
The Webdriver protocol
The Webdriver protocol, as you can tell in the name, was originally designed for automating web browsers. And, you know, it’s already the case that Appium uses this same protocol to automate mobile applications. We simply extended it in a few ways and added some additional capabilities, but by and large, if you look at an Appium test script, it looks and reads very similar to a Selenium test script.
So the question is: why not go further? Why not go beyond mobile to other types of applications? When you get right down to it, applications are pretty much the same regardless of the platform, whether an app is running in a web browser or mobile device or on a desktop or laptop computer. They pretty much work the same way.
So, that is in fact, what Appium has done with some of these not so well-known drivers. We’ve extended the Webdriver protocol to support many other platforms, including desktop applications. The really nice feature of this capability of Appium is that you can write your test scripts in any language, mix and match them with any framework, and that will target pretty much any platform that’s out there.
Whereas, of course, if you wanted to automate Mac outside of Appium, you would be looking at packing together your own system events accessibility scripts and things like this, which would work, but then they’re not going to have anything in common with scripts that you might already have for the web version of your app or the mobile version of your app. So you get to leverage a lot of knowledge, a lot of code, and a lot of framework infrastructure and reuse all of that stuff when you rely on Appium as your automation framework. So, that’s part of Appium’s vision. So, let’s look at some of these drivers and what they can do.
First of all, let’s talk about the Espresso driver. Now, Espresso is the latest Android automation framework from Google. It’s not very new anymore.
It’s been out for years now, but it is the recommended approach for automation of Android applications from Google. So if you’re just an Android developer using Android studio, and just reading the Google developer documentation, you’ll see a lot about Espresso and using Espresso to automate your app for testing purposes.
Key things to understand about Espresso. First of all, Espresso relies on a gray box testing model rather than the black box testing model that Appium usually uses with all of its other platforms.
So, the difference here is that a black box testing model means the automation comes in from the outside, treating the application as a black box, and it can only do those sorts of things with the application that a user can do like: tapping on elements or reading things off the screen to get information about the state of the application. It doesn’t have access to the application internals. Now, in gray box testing or white box testing, application internals are available to the test script. So, the test script is running in a context where it has access to the application source code and can therefore trigger different commands or get different information about the state of the application from inside of the application itself.
It’s like being able to look a little bit into the application and do some things there. Another great feature of Espresso is what’s called idle synchronization – and this helps to address a huge problem in any kind of UI automation – which is that the automation is often trying to do things when the UI is not actually in a resting state.
So, if you’ve ever encountered element-not-found exceptions, it’s probably because your test code made an assumption about the state of the application and the application hadn’t painted that view yet or hadn’t navigated to that webpage yet. In other words, the application was doing something when you tried to interact with it, and it wasn’t ready. And, a human user uses their eyes and their brain to detect when an application is doing something, either there’s a spinner or maybe in a browser the little loading progress bar is still going across the top of the page, or maybe they’re looking for a specific element, and it’s not there yet, so they’re just waiting for it to show up.
What Espresso does is it actually detects when the application is in the middle of doing something, so that it knows that the application isn’t quite ready for user interaction. So if you send in a command to Espresso while it’s waiting, it just kind of keeps that command and doesn’t execute it until it’s actually ready. So this helps add stability and robustness to test, because they don’t experience the same kind of flaky errors due to the app state not being what you expect. There’s some complexity here, and Espresso doesn’t know about all the different ways that your app could be busy. So, sometimes you have to teach it when your app is busy so that we can take advantage of this idle synchronization, but it’s a nice feature.
One of the other takeaways for Espresso is that because it is tied in with the application itself, it is isolated to the app under test. So, you can’t use Espresso to automate all aspects of the device UI, the home screen and things like that. For that you would need to use the existing UIAutomator2 driver, which works by automating the accessibility layer of the device which is kind of a layer that sits above all the applications. So it’s not tied to any specific application. With Espresso, we also have access to some advanced ways of getting ahold of elements. We can even find elements that haven’t shown up on the screen yet, if they’re part of a data set that is bound to, for example, a data grid or a list view of some kind.
We can also find elements by a view tag, which is an Android specific piece of information about elements that can be added by developers. This is especially helpful for React Native, because React Native puts the test IDs into the view tag on Android. And, right now the Espresso driver is the only Appium driver that actually gives you access to the view tag of an element.
If you want to use the Espresso driver, these are the kinds of capabilities you need to worry about. Basically, it’s just like any other kind of Android automation capabilities with Appium, except for you need to make the automation name equal to Espresso.
On this screen, we have an example of a command which is only available on the Espresso driver. If you look kind of in the middle of the screen here, we have something that reads “driver.executeScript” and we’re executing this Appium script “mobile:backdoor.”
This is the backdoor method which enables us to get inside of our application code and call methods internal to our application that a user would never see from the outside. So this is kind of what I was talking about earlier with the gray box testing model. This is how Appium gives you access to this gray box aspect of Espresso.
In this case, to figure out what we’re trying to do in the application, we can take a look at the scriptArgs variable here, which is basically just a big map of maps and we’re basically saying: “we want to call a method that exists on the application. The methods’ name is called raiseToast.” If that’s appropriately named it’s going to show some message on the screen. And, the argument that we are passing to this method is a string and it has the value “Hello, from the test script!”
If you look down at the very bottom line, you’d see what we would write in Java if we were writing code inside of the application that leverages this raiseToast method. We’d basically be calling the raiseToast method on the main application Java class and would be calling it with this string argument.
The scriptArgs variable above is just our way of encoding all the information as a way to pass it to Appium, so that it can then call it within the Espresso context. I’ll how you a video of what it looks like when that code is executed.
You can see it happened really quickly, but we saw this “Hello, from the test script!” message pop up. Let me see if I can show you this again. My app very quickly pops up and then we get this message: “Hello, from the test script!” That was not triggered by any kind of UI interaction; it was triggered by directly calling that method internal to the application using the Espresso driver.
If you want to learn more about the Espresso driver, I have an article about it – specifically this back door method – here.
Okay, let’s talk about the Windows Driver. The Windows Driver is pretty awesome in that it’s actually powered by a tool provided by Microsoft itself. So, Microsoft decided to build an Appium-compatible automation tool. They call it WinAppDriver. So, the Appium Windows Driver is basically just a very small bridge to WinAppDriver, which is maintained by Microsoft. Actually, it’s up on GitHub, so you can go and check it out and ask for improvements if you want.
Some key things to understand about the Windows Driver. First of all, it requires developer mode to be turned on the machine or for your user as you’re logged in. It also requires Appium to be run in a console which has administration privileges. You need to do the whole trick where you hold down command or option as you type in the command prompt in the windows start menu. You want to make sure that whatever you’re running Appium in – whether that’s the typical command prompt or Powershell or some kind of bash equivalent on Windows, you need to make sure it has admin privileges.
The way that you launch applications with the Windows Driver is by an application ID. Usually with Appium you pass a path to the application on disk, but this doesn’t work for the Windows Driver. Instead, every application that’s installed and registered with the system has a particular ID, and these IDs can look pretty crazy. They’re unique ID’s.
I found this command you can run in PowerShell called ‘get-Startapps’ and it will list all of the apps that are available and each of their IDs. If you want to automate a particular application from this command and then look forward in the list, you can get its ID. As far as I know these IDs, especially for Windows applications, are the same across all Windows installs. It’s not like they differ from computer to computer, but they’re unique in the world of Windows applications.
The other thing to be aware of with automating Windows applications is that apps don’t have an accessibility ID on Windows. But, Microsoft implemented this attribute called “AutomationId”, which is specifically for AppDriver and other automation tools.
It’s pretty nice that they built this in. If you get your page source from a Windows application, you look through it, and you see an attribute labeled “AutomationID”, you can find the element with that attribute using Appium’s accessibilityID command so driver.findElement by mobile by accessibilityID and so on. That’s how you would use that.
For the Windows Driver, the capabilities look something like this: platform named Windows, platform version 10.
This only works on Windows 10. Although, it can automate Windows apps, which are quite old, I think it has to run on a relatively new machine. Device name is Windows PC, and then here’s an example of an app. In this case, it’s the weather app that comes with Windows 10, and this was the ID that I found using the get-StartApps command. This is what a test could look like of the weather application on a Windows machine.
You can see it looks basically like any other Appium test. We’re finding some things by accessibility ID some other things by XPath. This particular test is trying to find the different days which are displayed in the weather application and get some information about each of the days. For example, the sunrise and the sunset of that particular day, and when we run this example, it prints it out to the console.
It’s RPA for Windows, getting some weather data from the weather app. Obviously, there are more efficient ways to get weather data. We could use an API, but this is a fun demonstration of Windows automation. This is what it looks like when it’s running.
We actually tap through each of these different days, and then you can see sunrise is down in the day details. So, that’s what we’re scraping off as we tap through each of these different days. If you want to learn more about Windows Automation and get the full code for automating that weather app on Windows, check out Appium Pro Edition 81.
Okay, let’s move on to the MacOS Driver. The Mac Driver, as we call it, relies on the system accessibility frameworks to give control over the entire desktop. MacOS comes with system events framework that allows applications to basically see everything that’s on the screen, to interact by moving the mouse or by clicking on different things. It’s a very powerful kind of automation.
However, the downside is that apps have to be specifically given the permissions to control the computer in this way. This is really good for security purposes. You don’t want random applications controlling your computer without your knowledge, but it does mean that there’s some manual setup that’s required to make sure that Appium has the ability to automate the system in this way.
Key things to understand about the Mac driver. First of all, there is an actual Mac application, a .app that runs on a Mac called AppiumForMac, and this has to be manually downloaded and installed and put into your applications folder.
AppiumForMac must also be granted accessibility control over the system, and it’s actually not just AppiumForMac that has to be granted accessibility control, but whoever is the kind of parent process of AppiumForMac. So, if you’re running Appium from the terminal application and you’re running a Mac test that Appium will be running inside the terminal, and it will attempt to start AppiumForMac as a subprocess. The parent process of AppiumForMac is the terminal. That means you have to grant the terminal accessibility control over the system.
This is something you may not want to do on your own machine for security reasons, or you might want to turn it off. Because if the terminal has accessibility control over your system, then you could imagine if somebody tricks you into running a malicious script in your terminal outside of your knowledge, something might happen on your machine.
It’s important to consider your device under test to be something separate from your development and test development workstation.
You probably want to have a separate Mac Mini lying around or use Mac Stadium or something like this in order to run an Appium Test. And, then you might not care so much what happens on that system or you can find ways to ensure that malicious things don’t happen using networking protocols or keeping it internal to your local office network.
Okay, so the way that you open and launch applications with the Mac driver is simply using driver.get() so instead of putting a URL for a web browser, we’re putting the name of an application. If I say driver.get(‘Calculator’) it will look for calculator.app in my applications folder. So that’s pretty simple.
One other wrinkle about the Mac driver is that there’s only one locator strategy which is XPath. While in your test code, you’ll be writing things like driver.FindElement(By.XPath()), the actual flavor of XPath, which is used by the Mac Driver is something called Absolute XPath, or AXPath or XPath without any relative nodes or queries or searches.
Every XPath query must be fully qualified. I’ll show you an example of what this looks like. What this means is that the selector for different elements can be quite long and quite tedious, and in some cases potentially quite brittle.
You have to do some imaginative thinking to make sure that your XPath locators are going to be robust. To figure out what the locator of an element is: there’s a special feature that AppiumForMac has, where if you have it loaded, you can put a mouse pointer over any element on the MacOS desktop or operating system or other applications and hold down the function key for a few seconds. It copies the XPath for that particular element to the clipboard.
Capabilities for this driver are pretty straightforward. Platform and device name should both be Mac, and if you want an app to start automatically, you can just put its name as the app capability.
Here’s what a sample test looks like for the Mac driver. I’m testing the activity monitor application in this particular example, so you can see that I have a bunch of strings here where I’m defining XPath selectors by building them up from a base accessibility or absolute XPath.
The base Xpath, in this example, is AXApplication/AXTitle=Activity Monitor/AXWindow. And, the AXTitle=ActivityMonitor portion is extremely important, because that is what ensures that all of my other selectors, which I build off of this kind of bass string, that’s what ensures that they all take place within my application and not within some other application.
Then I’ve got something called a tabSelectorTemplate, which helps me to select different tabs in the application, so the different tabs like: memory, energy, disk, network, or CPU. You can see again how I’m using this XPath filter, making sure that the accessibility title of that particular tab is memory, or energy, or disk, or network, or CPU. This is a pretty reliable and robust way of finding these elements, time and time again, without relying completely on their position within the application.
All I’m doing in this little test is tapping through the various bits of the activity monitor application. Eventually, I am typing something into a search field and then getting the text of that search field and asserting that is what I ultimately typed into it. So, it doesn’t really do anything particularly useful, but you could see how you could potentially automate applications in a useful way using these commands.
Here’s an example. Tab through all the different aspects of the activity monitor and then type into the search field. This was us controlling a Mac application using Appium. If you want to learn more about this, check out Appium Pro’s Edition 52.
Raspberry Pi Driver
Okay, let’s move on to our last driver of the day: the RaspberryPi Driver. So, this is a special one. The idea behind this driver, and other similar drivers, is that applications don’t actually always have user interfaces. What about IoT devices?
These are physical things that take some kind of sensory input, whether it’s electrical or pressure or temperature or moisture or anything like that, and use that input to send data to something online or control something in your house using a feedback system. So, these are not traditional devices with user interfaces. But, of course, they could still be tested and they could still be tested in automatic fashion. So, why not use Appium to do this?
This was the experiment that I tried to do for myself when I was developing my presentation for Appium conference earlier this year in Bengaluru and India. I took this a circuit playground express from Adafruit Industries.
This is basically a little hackable circuit board that has a bunch of sensors built into it and can do different things like turn on or off a light or output some sound. I thought: “okay, I want to figure out how to test IoT devices with Appium, but first of all, I need to have an IoT device test. So I need to build some kind of IoT device to begin with.”
So, I built this little drum machine. When I tap those different buttons, the light changes on the circuit playground express. Also, you can’t hear it on the webinar, but some sound is emitted from the headphone jack. Actually, it’s not a jack. It’s one of these little electronic paths that we can connect a headphone to with some wires and things like that. A little drum sample is played when I hit each of these buttons. I’ve got a kick, a snare, a hi-hat, and a tom that are attached to each of these different buttons. It’s a little drum machine that I developed.
Then I asked myself: “okay, I’ve got a drum machine. Now how would I want to test this?” There are a bunch of ways to think about testing this. There’s software running on the circuit board, which I wrote, that I could just write some other software that would send the same commands as the real software does, whenever it waits for someone to tap the button.
That would be kind of like one layer of testing, but I wanted to go a layer more real. What I what I thought is that each of these buttons works by changing the electrical signal that is going into the circuit playground board. Whenever I tap one of the buttons, with this particular design, is that a circuit is broken and the circuit playground express software is listening for that circuit to be broken. And, when it’s broken it emits the appropriate sound and changes the light.
I thought, ” I don’t need to physically tap a button to make this happen, I could just send electrical signals into the same points on the circuit board. Whether that signal counts as a high or a low signal will then cause the circuit playground software to believe that a button has been pressed.” This is like a functional approach to testing a circuit board sending in or removing real electrical signals from the physical ports on this board.
So to do that, I got something called a Raspberry Pi. A Raspberry Pi is a little computer that’s just printed on a circuit board. It’s pretty awesome. I recommend playing around with them. The important part about this Raspberry Pi is it has something called the GPIO header. It’s up at the top here. It’s in two different rows of pins and GPIO stands for “general purpose input output”.
These are pins that the Raspberry Pi can use to send electrical signals through. What I wanted to do was take this Raspberry Pi and connect wires from the little pins here to the ports on the circuit playground circuit board. Then have the ability to send electrical signals from the Raspberry Pi mimicking me pressing one of the drum machines’ buttons physically in reality.
What I was able to do then is develop an Appium Driver for the Raspberry Pi that lets me write an Appium test to say when these different pins on the Raspberry Pi should emit or stop emitting electrical signals. By doing that I was able to construct an Appium test script that drove the circuit playground IoT application of a drum machine without actually having to tap the drum button themselves, but also without mocking the connection. I’m still sending electrical signals the exact same way that tapping the button would.
These are the elements that correspond to the different I/O ports on the circuit playground express circuit board, and I’m making electrical signals either go into or not go into those ports by using the sendkeys command. I can send a O which means no signal or I can send a 1 which means signal.
That’s how this works. Let’s see. Now what’s happening in the other terminal window. I’m starting up the Appium server on the Raspberry Pi itself.
And, here’s a video of how it’s all connected. I’ve got the wires coming out of the Raspberry Pi connected on top of the other wires coming from the buttons because I’m not modifying my app under test, and now I’m running my test script.
You can see that as the commands are being called, the different drum lights are being activated. We can’t hear it, but the different drum sounds are being emitted as well. So that was how we ran an Appium test of an IoT device using electrical signals, a Raspberry Pi, without modifying the drum machine device under test. That was pretty fun.
Key things to understand for this Raspberry Pi driver is it’s not an official driver yet. It basically runs in standalone mode. It runs on the Raspberry Pi itself. So you have to install an operating system on the Raspberry Pi and clone and build this driver in “node.js”. You have to have “node.js” installed as well.
The idea of an app is a bit different in that there’s no software application, instead what we’re doing is defining a set of electrical inputs and outputs. So the app capability is actually just a JSON object, which defines which pins have which names and what their initial states should be – whether they should be high or low.
Then we can use driver to find element by ID, a pin by the name that we gave it in the app capability, and then all we do is we send a 0 or 1 to that pin using sendkeys to set the state of the PIN to low or high. So, it’s pretty simple. There’s not a whole lot you can do with electrical signals like this.
If you want to read the full scoop on this, I’ve got two editions on Appium Pro starting at number 74.
Okay. Now let us wrap up by discussing Appium 2.0, and then we can take some questions. The idea behind Appium 2.0 in my mind – the big picture is that Appium goes from becoming a tool and automation library to becoming a platform for a whole automation ecosystem that spans devices and platforms and frameworks and everything else.
So, the idea is that moving forward, Appium itself is going to be this one small piece of the puzzle and then we’ll have many different drivers and many different plugins – all of which have their own independent existence and development trajectories and everything else, but they are integrated with Appium as and when you need them to be. So they’re not all bundled together into one big package the way that they are now.
So, with Appium 2.0, we’ll have a whole new set of command line instructions you can run. For example, you’ll have the set of Appium driver commands. So you’ll be able to list which drivers are installed or available to install. You’ll be able to install a specific driver from the standard repository or from anywhere on npm, or anywhere on GitHub, or anywhere on your local file system.
You’ll be able to uninstall drivers. You’ll be able to update them. So, you’ll be able to take just one driver, just the XCUI test driver, and say: okay Appium, update this driver to the latest for me, but the UIAutomator2 driver, I like it how it is. You know, I don’t want to upgrade to the next version of that, because it has some breaking changes, and I’m not ready for them. So just update this one driver for me.
The idea here is that Appium’s different drivers are for completely unrelated platforms. They have completely unrelated development cycles and different technologies that are used in their development.
They shouldn’t really be bundled together in a way which combines a certain version of the XCUI test driver with a certain version of the UIAutomator 2 driver the way it is now on Appium, because those are just unrelated things. So they should be able to vary with respect to one another. We also will have a command to verify your driver manifest to make sure that all the drivers that Appium thinks are installed are actually installed and so on.
Let’s talk a little bit more about the new driver model for Appium 2.0. The basic idea here is that not just the Appium team, but anybody can create and publish a driver that anybody else can use. So right now, if you want to create a driver for your team to use you can do that, but you have to modify Appium Source code to get it plugged in, or you have to convince the Appium team that your driver is useful enough to be considered one of the standard drivers that’s a part of Appium. So the Appium team has been hand importing and coding connections to those drivers into the Appium source code, and that’s not going to happen in the future.
Instead, the Appium team will just maintain a list of supported drivers and what their names are. So, the official drivers will just be a list that everybody will be abl