Join the webinar on 'Introduction to HeadSpin Performance KPI Capture & Analysis' on Sep 26
Hello everyone and welcome back to our Appium Pro intro workshop. In this module, we are going to look at Appium's architecture. You might be asking, "Jonathan, why are we looking at architecture? I just want to use Appium. I don't want to know how it's built." Well, I find it useful to understand a little bit about how something is constructed when we use it just in case something goes wrong. That way we have an idea of what the problem might be. We can ask for help responsibly among other things. In general, it's just really interesting and I think good to know what's fundamentally going on when you're running an Appium test. So let's dig in.
Appium's overall architecture is basically this. Appium is nothing more than a web server, an HTTP server, which speaks a particular protocol. The protocol it speaks is called the WebDriver protocol. The WebDriver protocol was invented originally by the Selenium Project for the purpose of web automation which is why it's called WebDriver. The WebDriver protocol was later ratified as an official standard for web browsers by the World Wide Web Consortium or W3C. This means that every web browser you use supports the WebDriver protocol out of the box. So Appium is basically the same type of thing as Selenium. It's a web server which just speaks the same protocol that Selenium does.
What this means is that when you're writing an Appium test and you're writing your Java code or whatever it is, and you're calling what look like Appium commands, those commands are not, at the end of the day, implemented in the Java package that you're importing into your code and using in your test scripts. Instead, what you're importing into your code and using in your test scripts is actually a client library. This client library has the responsibility of turning the commands that you run into HTTP requests and sending them to Appium. Appium, when it receives these requests from the internet, whether it's the local network or the wider internet, Appium turns those requests into automation behaviors on the device, and it does not do this directly.
Typically, it does this by forwarding the command that you've expressed with your HTTP request to something called a driver. Now a driver is something which is responsible for actually performing automation on a specific device, a specific platform. Once that automation is performed and the driver has figured out whether it was actually successful... or maybe it was a type of command that was designed to retrieve information from the device, and so there's some kind of response. Whatever that result is, it is returned to Appium from the driver and then Appium wraps it up according to the WebDriver protocol as an HTTP response, which is then sent back to your client library, your Appium client library. Your Appium client library then parses this response into a format which is native for your programming language. So in Java it might be a special kind of class, for example. You can then use this object the way you would use any other object that you use in your programming language, making it very convenient for you to perform automation using Appium from your particular programming language.
Let's talk a little bit more about the WebDriver protocol. The WebDriver protocol is an object-oriented browser automation API, exposed via HTTP. So what do we mean by this? Well, when I say object-oriented, I mean that there are different objects which are represented in automation and these objects have some kind of representation in the API as well as in the browser. Both the API and the browser share some kind of knowledge of these objects, which can then persist throughout the course of a test session or an automation session.
So the API has a bunch of things you can do with it, a bunch of commands you can run. Things like getting the source of an HTML page or finding an element or sending some keystrokes into a text box. These are called commands. Now some commands take parameters that are used to modify the way that the command works or used to make sure that the command is doing the right thing on the device. So there are a couple of different ways that parameters are expressed in the WebDriver protocol. One way that a parameter is expressed is through an HTTP method. So HTTP methods are, for example, GET, DELETE, POST. Those are actually the only three that are used in the WebDriver protocol. We also have the route. Now the route is the portion of a URL which comes after the server name. So if I am talking about a URL like example.com/foo. In that case, /foo is the route. example.com would be the host and /foo is the route. Now with POST requests, we can also send a JSON body which can encode further parameters for a particular command.
Once a command is executed, the response is itself encoded as a JSON object and returned to our client script that way. So that's the basic structure of the WebDriver API. This is a little abstract just talking about it in words like this. So let's look at a few examples.
Here we have our first example. On the left hand side, we are describing the HTTP method (in this case POST) and the route (in this case, /session). Now together, these things define a particular WebDriver protocol command. When I combine a POST method with a /session route, what I'm saying to the WebDriver server, or in our case the Appium server is that we want to initialize a session. Now I'm not showing here the kind of parameters that are passed to tell the Appium server what kind of session we want, what sort of device and what sort of operating system and so on.
That would be encoded as a JSON body. So the WebDriver protocol specifies that when we call this command what we get back is a JSON object whose value contains a session ID. So in other words, the server, in this case the Appium server, when it receives this command from the client, will use the information provided to start a session and then it will create a unique ID used to refer to just this session. That's partly what I mean when I say that the WebDriver API is object-oriented; the session is the first object we're encountering here. The server has the concept of a session which exists over time. Even though individual HTTP requests and responses are stateless (they don't remember anything from one request to the next), the server knows that you're going to be sending multiple commands in the course of a given test, in the course of a given session.
So it saves this idea of a session internally and creates an ID that it maps to that session. It gives the client the ID so that the client can then send that ID back in connection with future requests. So let's look at an example of that. The next API example is a command called find element. This command uses an existing automation session ID to find an element in the UI. This command is defined as requiring a POST HTTP method and it exists at the route /session/:sid/element. Now when I say :sid here, what I mean is actually that this is the same as a session ID which we got in the response to a call to POST /session earlier. In other words, we can't use this find element command unless we've created a session and unless we have its ID. So here's an example of the client referring to a session object via its ID.
So it sends this ID back to the server so the server knows, hey, we are going to find an element on this particular session. We're not going to find an element in some other session that might be running at the same time. So when we find an element we, of course, need to say something about what we're trying to find and that gets coded up as a JSON object and isn't shown here. If Appium is able to find the element, it associates it with a brand new unique ID and it returns that ID to the client because, of course, it can't send UI elements over the internet. The internet can only send texts. So what we do is we create an ID to refer to that element and we pass that ID back to the client. That's our second type of object that we have in this API. We have the element object. So elements as UI elements only exists on the device, not in your test script.
Your test script merely knows about element IDs that it can pass to the Appium server, which can then associate those ideas with an actual real UI element that it wants to interact with. So here's an example of using an element ID. This is the command to click an element. It uses an HTTP method of POST and the route is defined as /session/:sid/element/:eid/click. Here the :sid and :eid refer to the session ID and element ID respectively. So in this example, we are referring to our session object and to one particular element object which we have found in a previous call to find element. We're telling Appium, "Click on that element for me."
Let's look at just a few more examples, but this is basically the structure of the API. If you can understand what's going on here, nothing in the API will be a surprise to you any further. So the next example lives at a /value route after including the session and element IDs. What this does is takes, again, a JSON body with particular text in it and inputs that text into an element in the application, which can receive text input for example, a text field. So this is how you type into things using Appium. So far we've only seen POST methods, but we do have GET methods as well in the WebDriver API. We can, for example, get the text of an element using its ID and also the session ID, of course, we can ask Appium to tell us the text which is being displayed on a particular element at a given point in time.
Once we're done working with elements and performing all the kind of automation steps that we want to for the purpose of implementing our test, we need to close out our session. This is so that the Appium server knows that it can go ahead and run its cleanup routines and make a device available for testing again. To quit a session in this way we need the session ID of course, so that the Appium server knows which session to quit. The HTTP method, which is used in this case is delete to reflect that we want to sort of kill or undo or get rid of a session one way or another. So this is basically the WebDriver API. All that we needed to do to understand the complete API would just be to add more and more things to this list, but I think that you probably understand the point how it works.
Now, of course, you don't need to know any of this in order to write Appium test code because Appium test code doesn't use a raw HTTP methods and routes and parameters. It's all encapsulated in something which is very convenient for you as a user of your particular programming language, but it's good to know that this is how the API works. You could, for example, just use cURL or some other API pinging tool to construct requests to an Appium server without any programming language involved at all. You don't need to write code in a programming language to run an Appium test. You can just send HTTP requests if you want.
We should say one further word about the WebDriver protocol, which is that there are actually sort of two versions of it. There is the old version, which isn't really around in the wild too much anymore, thankfully. That was called the JSON Wire Protocol.
This is what Selenium and Appium used for quite a few years. It predated the official W3C WebDriver spec. At this point, all Appium and Selenium servers support the new spec and so we hopefully don't need to worry about this old one. But I mentioned it just because you might be working with an older version of a client or a server and if you run into any kind of odd responses or compatibility issues, it could be that the client or server you're working with has not been updated to use the W3C WebDriver spec, which is the current standard. As I say, here is the result of years of efforts by a lot of different companies. Actually, these two APIs are very, very similar. Many of the commands, in fact, are identical. The only differences are that the actual request format and response format differs slightly and certain commands differ in the type of parameter that they take, but they are still pretty much identical.
Let's look at sort of more diagrammatic version of how a client might speak to an Appium server in order to make a test happen. So tests are always initiated by the client, of course. A client does something called constructing the desired capabilities for the test, which is basically creating the parameters for the new session. This is an example, JSON object of what capabilities could look like. Here we have a set of keys and values where the keys are sort of the name of the parameter for the session and the values specify what we care about for that parameter. So in this case, we're asking Appium to give us an iOS 10.2 simulator. That's what the combination of platform name, platform version and device name amounts to.
We are using the app capability to tell Appium where our native app file is for iOS. These top four capabilities are actually required, so we'll see them a lot. But I'm also showing a capability which is not required. It's called noReset. We can include the noReset capability to tell Appium not to run its normal reset routines before or after the session. There are different reasons we might want to do this. It's kind of irrelevant to the current slide. On the current slide we're just showing how we can include lots of capabilities. There are actually well over a hundred different capabilities that you can use with Appium, so something to dig into more later.
The client might construct this object in a way which doesn't look like a JSON string like it does here and might construct it in a way which looks much more appropriate. It might look like a Python dictionary in Python or a special object in Java. At the end of the day, this is what it looks like on its way to the Appium server. So the Appium client requests a session from the server with those capabilities, the server then parses those capabilities, figures out what kind of thing you're wanting to automate and spins up a driver that can automate that particular platform. At that point, we have a session that's been started. We may also have started your application or done any kind of initialization that's required based on the particular platform.
What the Appium server then does is send back a session ID which the Appium client stores for future calls. Now this isn't something that you do as a user of the Appium client. This is something the Appium client software does for you automatically so that it can make more convenient usages and methods available to you so you don't have to remember session IDs. The Appium client does that for you. At this point, the client can send arbitrary automation commands, so finding elements or interacting with them and so on. For each of these, the Appium server will parse the command request from the JSON and figure out what it is you want to do. It will do the thing that you've asked it to do by forwarding the appropriate command to the appropriate driver, which is active for your session.
Then some kind of result will surface. It could be a text that you've asked for or just a value of null, meaning that the command completed successfully. That result gets sent back to the client, which again parses it from JSON, converts it to whatever format is appropriate for your programming language, and then makes that available to you as the test author. Now as the test author, you can check the result. You can make verifications based on the value of something that you retrieved. In this way you can actually build up something which is a test of your software and not just a bunch of automation commands. These two or four steps, however you count them, can repeat as many times as you need in order to implement your particular test logic.
Once you're done with all of that, whether your test has passed or failed according to you, you need to quit the session so your client will request a session quit when you call and driver dot quit for example, or something else based on your particular programming language. At that point, Appium will shut down the application, clean up any resources it needs to and free the device you were using for future sessions. So that is the basic flow of commands using the Appium API, which is, in fact, the WebDriver protocol.
We need to say a few things about some extensions that Appium has made to this protocol. First of all, not everything we want to do with mobile apps is supported by the WebDriver spec. It was written for web browsers, not mobile applications. There are things you can do with mobile apps that you can't do with web browsers. So how does Appium handle this?
We want to make these features available to users, but they're just not available in the WebDriver API. So what we've done is we've added extensions to the WebDriver spec that provide access to these really useful automation behaviors. One example would be pushing a file to the device's file system. Browsers don't have file systems per se, so this wouldn't make sense in the context of a browser, but you might want to push a photo to the camera roll on your Android device or something like that. Appium has a command that lets you do this. The way we do it is that we create new lists in a set of commands, add it to the list of commands that is supported in the WebDriver spec. These extensions match the form of the spec.
So they have the same route styles and HTTP method styles as the official API methods, but they're just not listed as individual API items in the official web drive docs. They are listed in the Appium docs. So a lot of Appium's commands go beyond what the WebDriver protocol specifies because we found it to be useful. Thankfully, the WebDriver spec was created with extensibility in mind and what we're doing is perfectly legal from the perspective of the spec. It's just not supported officially by any web browsers, which makes a lot of sense.
So we've talked about the client server architecture, how clients and servers work together, but let's talk a little bit more specifically about the responsibility of the Appium client. So Appium clients are typically built on top of Selenium clients because Appium uses the WebDriver protocol, you would expect to be able to use a Selenium WebDriver client to run Appium tests, and this is in fact the case.
You don't technically need anything else other than a Selenium WebDriver client to write tests in your programming language. But then, you wouldn't be able to access the extensions that we were just talking about because again, those aren't supported officially by the WebDriver API and so they won't be found in the Selenium client libraries either. So, for this reason, what we've done is create an Appium client for each major programming language. As far as possible, these Appium clients just wrap the existing Selenium client so that we don't have to duplicate any effort. We just basically wrap the existing Selenium client and then add support for all of the specific Appium extensions that we've made to the API.
One thing to keep in mind is that Appium clients are very different from one another. So the Appium Python client looks and feels quite a bit different from the Appium Java client. They have different maintainers who care about different things. They're being written for different languages which have different conventions and idioms. By and large, they all will enable you to do the same things, but the way that they do that might look quite different. So it's good to pick a client in your language that you're comfortable with and learn that one. There's not much use in hopping around from language to language. All that said, there are some clients which are more complete than others. For example, the PHP client exists, but it's not very well maintained because it's basically not used as far as I can tell. So if you want to use PHP for your Appium tests, I would say, "Oh, maybe you'd consider Python instead or Ruby or whatever. Something that is going to track more closely with updates in the Appium server."
Here's a list of the current Appium clients, the ones that are more or less official. Python, JavaScript, Java, Ruby, .Net and PHP are all examples. JavaScript has quite a few clients actually, but there's two that are most commonly used.
In addition to Appium clients, we also need to talk about Appium drivers. So drivers are the piece, as we talked about before, who's responsible for automating a particular platform. Technically, it's a particular kind of code module, which can be imported into the main Appium server and basically translates between the WebDriver protocol, which the Appium server speaks and a particular platforms automation technologies. So as we mentioned in the previous video, there are a variety of automation technologies that exist that Appium builds on top of.
So an Appium driver for a particular platform is responsible for translating between WebDriver protocol and one of these automation technologies, for example, Espresso or XCUITest or whatever. So all the Appium server is basically a bundle of these drivers, which picks the right driver for you based on your new session request. Based on the type of capabilities that you've sent in, Appium will select a driver, and from that point on, all of your commands will be handled by that particular driver. There are a lot of drivers. There are 10 of them currently that are more or less officially supported: two for iOS, three for Android. There's a Mac driver, a Windows driver, a driver for a TV platform development framework called UI.TV, a driver for Tizen, which is Samsung's mobile or embedded operating system and even Flutter from Google, the cross-platform app development framework.
So with all these different platforms, the driver authors try to translate a given WebDriver command in exactly the same way across the different drivers. We can think of it in terms of syntax and semantics for a language. Each driver is responsible for basically defining the semantics for a particular WebDriver command. Appium server is responsible for parsing the syntax and passing that on to the driver. But then each driver is responsible for deciding what to actually do with that. As far as possible, all the drivers do the same thing with the same commands. But sometimes it's not possible. Sometimes a given platform doesn't have a certain command or sometimes the way a certain behavior is implemented on one platform differs from another. In these cases, we do experience some slight differences in behavior across platforms. But as far as possible, the drivers work around those differences.
Sometimes we have a platform that has more than one driver. The best example right now is Android, which has both a UIAutomator2 driver and an Espresso driver. In this case, we can't use both drivers at the same time. We should just pick one. So it's important to understand a little bit about the drivers and what they do and what they offer and their limitations so that you pick one for the life of your automation project, though that's not to say that you can't switch an Appium test suite from one driver to another. In fact, that's one of the main benefits of Appium, that because the API is more or less stable across platforms, you can with maybe a little bit of work transition from one driver to another. That being said, the drivers depend on different underlying technologies and some of them have very different feature sets. You can do different things with them, so it's good to understand a little bit about how the drivers work before you pick one.
Let's look at some of the drivers in more detail. On iOS, we have two drivers. One is called the UIAutomation driver, and this is deprecated and shouldn't be used. I really shouldn't even have it on the slide, but I list it here because you might see it around, but the last supported iOS version for this driver was iOS 9.3 which was obviously, quite a long time ago. The main driver, the only one that you should really use for iOS is called the XCUITest driver, which is based on Apple's XCUITest API. Now the XCUITest API has to be accessed from Objective-C or Swift and kicked off in Xcode.
So Appium needs some kind of bridge between Appium's own Node.js-based code, and the XCUITest-based code. That bridge is called WebDriverAgent. So if you see a "WebDriverAgent" flying around, that's part of the XCUITest driver that actually does all the real heavy lifting. There is a part that is written in Node.js, which does a bunch of other stuff, but the part that actually lives on your iOS device and makes stuff happen is called WebDriverAgent and it uses the XCUITest APIs from Apple.
On Android, we have three drivers. The UIAutomator1 driver, which is deprecated, it's old, you shouldn't use it. We also have the UIAutomator2 driver. This is the current standard supported by Google and it has quite broad Android platform support all the way back to 4.3 and yeah, it provides the current sort of all around most useful feature-full experience for Android testing. It's the one we're going to be using in this course.
There's also an Espresso driver. Now Espresso is another automation technology from Google, which has some really great benefits around test speed and reliability, so you can also use Espresso with Appium. One of the downsides of espresso is that it is limited to testing an app that you actually have developed. You can't test arbitrary applications that are on the device or it's difficult even to test webviews with Espresso or if a web browser loads up as a result of a need to authenticate somebody for your application. That's a challenging scenario for Espresso, for example, but it does have some benefits, which you can certainly look into if you're a beginning a new project and want to understand the pros and cons of each driver.
So to close out this module, I wanted to just show a diagram of all of the current Appium drivers and how they communicate with their respective platforms and with Appium itself. So starting from the top, we have the test script. That's the thing you're responsible for writing. Your test script will communicate with an Appium client library typically, by importing it into memory in your own test code. The Appium client communicates over the internet or over a local network with an Appium server and it speaks the WebDriver protocol to the Appium server. So all of your test commands are handled along this one communication path. Now, depending on what type of session you've asked for, Appium will pick one of these drivers to use in order to give you automation capabilities for your desired platform. So each of these drivers has some internal structure and complexity, sometimes quite a bit.
For example, the UIAutomator2 driver has a part that's written in JavaScript and gets embedded into Appium's memory. It has a part that is written in Java and gets launched on the Android device that you're trying to automate. That itself, imports the UIAutomator2 API from Google, which uses Google's own Android accessibility stack to make things happen on the device. So that's just one example. It's not important to know the details of this chart for your own automation, but it's a handy reference so that you can see, especially when you're dealing with debugging issues on a particular platform, or when you're using a particular driver and something is going wrong, it's helpful to know what some of these pieces are. They do show up in the logs, so it's helpful for you to understand them. That's it for understanding Appium's architecture! There is an awful lot more to learn about Appium on that front, but that's all you need to know before we get Appium set up and start actually running some sessions. All right. See you next time.