A reader recently asked if we could provide an example of how to zoom in and out using touch gestures. I decided to use Google Maps for the demo, because it is installed by default on Android emulators and thought that this would be a straightforward task. Surprisingly, it ended up being much trickier than I expected for the same reason that Google Maps often frustrates me while I'm driving!
Google Maps must implement a lot of custom logic for interpreting touch events. I'm sure multiple usability experts charted people's fingers and intentions, recorded video footage of people smearing finger-oil across screens for hours, and came up with a set of functions to describe the pattern for each gesture. The end result is that our sterile touch actions generated by machines don't trigger the UI reactions we'd expect. The same often happens when I take my life in my hands trying to change the view while driving.
(I've enabled touch gesture debugging on the emulator, so we can see the gestures that Appium simulates.)
The simplest gesture composed of two touch inputs (fingers), being placed down on the screen, moved toward or away from each other, and then lifted off the screen, does not budge the UI at all.
What I ended up doing was making the gesture more complex, in order to better simulate the more organic and imperfect actions of a real person. I found that the most important variable that led to a successful zooming action was for the gesture to be fast, around 25 to 50 milliseconds. I added a short segment of moving very quickly, followed by a 100ms pause, and then continuing the rest of the zoom gesture in 25-50 milliseconds.
Even this approach was not very satisfying, as it is not always consistent and the zoom out is more powerful than the zoom in. Plus, I couldn't find a simple way to perform a slow and controlled zoom despite it being so easy to do manually.
We achieved our goal of demonstrating a zoom gesture though, and there is plenty to learn from in the code used for this.
First off, I took advantage of what we learned from our post about Android intents and activities to launch the Google Maps app directly to a view of a chosen set of geo coordinates:
Next, I created a method which would build our "zoom interaction". We'll go over the methods used here, but for the history of Appium's gesture API design, and more examples you can visit a previous article on the topic.
The new methods for building touch actions compliant with the W3C Webdriver specification are all located under the org.openqa.selenium.interactions namespace of the Appium and Selenium Java clients. The end goal is to be able to construct a list of Interaction objects which we can then pass to driver.perform() in order to send the actions to the Appium server to then run on the device.
So, my method to create a zoom interaction returns a list of interactions.
Each interaction in the list which we will later pass to driver.perform() represents the movement of one finger on the device's touchscreen. A pinch zoom requires two fingers, though the only difference between their movement is the direction relative to the center of the pinch, which I named the "locus". startRadius and endRadius refer to the distance from the locus that the fingers move. duration will the the length of time this action takes and pinchAngle is how twisted from directly up/down the fingers are while pinching (the examples from earlier all had angles of 45 degrees).
Now let's look at the zoomSingleFinger method which actually uses the Appium client methods to create the actions for each finger. The action I decided to make, and which experimentally yielded acceptable results, was to first move the finger very quickly a small distance from the startRadius towards the endRadius. My finger then pauses for a moment, before resuming its path towards the endRadius position.
First we construct a PointerInput to represent a finger (as opposed to a mouse pointer) and use it to construct an empty Sequence which will hold individual Actions. Actions are like steps in the overall movement of a single finger.
I then calculate the coordinates on the screen which we will move the finger between for each action.
Next, we create the actions, adding them to the sequence as we go. We start with a PointerMove action to move the finger into its starting position. Then we add a PointerDown action, putting our finger in contact with the touchscreen. Now comes the quick initial movement, which we've set to take just one millisecond to complete. We then add a special Pause action which waits for 100 milliseconds in our case. Followed by another PointerMove action which completes the gesture the rest of the way and takes as much time as was passed into the function as duration. Lastly, we add a PointerUp action to the sequence to remove our finger from the screen.
Because I wanted simple "zoomIn" and "zoomOut" methods and didn't want to specify all these parameters every time I wanted to zoom, I created two more functions which set some defaults.
All it takes to now reproduce the zooming from the video at the beginning of the article is to call our methods:
I originally wrote the methods above because I wanted a generic way to build zoom gestures, but the complexity and trickiness of the Google Maps touch gesture logic resulted in having to build a rather specific set of actions. The timing is very important, but because I used durations if you specify a longer path (on a larger device let's say), the action might then be too fast. At least with the methods written this way, it was easy to experiment with many different combinations of values. A better solution would take the total length the finger has to travel and then calculate a duration based on that.
Feel free to play with actions and experiment! They can get way more complex from here. I'm curious for your solutions, if anyone has written a better set of actions for manipulating this UI.
Here's the test code in its entirety, when it's all put together: