Webinar : Delivering Reliable Mobile Experiences through Elevated Quality

How to Integrate Audio APIs into Your Appium Automation Scripts

April 29, 2020
Scott Mercer

Have you ever had trouble testing the audio functionality of your mobile application? Capturing, injecting, and comparing audio has historically been a tricky area in mobile automation testing. Even trying to validate something as simple as checking if the correct audio plays during a prescribed action can be a daunting task. HeadSpin’s Audio APIs simplify these operations and provide developers with a convenient RESTful interface. This post walks through a high-level overview of each API as well as showcases how to integrate it into your Appium automation script.

What are these APIs? How do I use them?

In total, Headspin has seven different APIs regarding audio on mobile devices. This post will primarily focus on three very popular endpoints, which include uploading audio to our servers, capturing sound on a given mobile device in the HeadSpin cloud, and comparing two audio files to see if they are a match.


The primary use of this endpoint is to allow the developer to upload a reference file that is used later to compare against a test audio file captured from the mobile device during your automation test. One thing to note is that the audio format for this endpoint is .wav. The response from this endpoint is a JSON object notifying if the upload was successful, and if it were, it would provide you with a unique audio id.

Example Code

def upload_reference_file(api_token):
   api_endpoint = 'https://api-dev.headspin.io/v0/audio/upload'
   data = open('reference_audio.wav', 'rb')
   r = requests.post(api_endpoint,
                     headers={'Authorization': 'Bearer {}'.format(api_token)},
   response = json.loads(r.text)
   return response['audio_id']


You invoke this endpoint when you want to start capturing the audio output from a given device, and the API also allows you to end the session in a couple of different ways. When you make the initial capture call to the server, you can specify how long you would like it to run, or you can store the response, which includes a worker id and use it to poll for status updates and stop the capture as well.

After it has finished capturing the audio from the device, it is uploaded to a storage system that allows anyone in your organization access it. In the response from the capture endpoint, the audio id is passed back, which is essential to store for reference later.

Example Code

def capture_audio(device_address, duration, api_token):
   api_endpoint = 'https://api-dev.headspin.io/v0/audio/capture/start'
   data = {}
   data['device_address'] = device_address
   data['max_duration'] = duration
   data = json.dumps(data)
   r = requests.post(api_endpoint,
                     headers={'Authorization': 'Bearer {}'.format(api_token)},
   response = json.loads(r.text)
   return response['audio_id']


The analysis or match API allows you to pass in two audio files, a reference, and a test, and then determine if the reference audio is present in the test audio. We define the reference file to be the original audio source while the test audio is a more extended captured audio that contains the reference.

The use case here is to detect exact audio matches as well as locate the reference audio inside of the test audio. Additionally, it can also compare the audio quality of the test relative to the reference.

The response from this API includes multiple result parameters from the analysis. The key takeaways for me in this response is if it was a success first and foremost. Success does not indicate if the audios match but instead indicates that the algorithm was able to run correctly. Along with success, there are two objects in the form of parameters and results inside our response. Parameters described the thresholds for values used during the analysis, e.g. the sample rate. The most important behind a successful analysis is, of course, the results. The results object gives us the following stats:

  • Match: Full, Partial, No, or Error
  • Quality of Match
  • Start and End time of reference in terms of test

Example Code

def compare_audio(test_id, reference_id, api_token):
   api_endpoint = 'https://api-dev.headspin.io/v0/audio/analysis/match'
   data = {}
   data['test_audio_id'] = test_id
   data['ref_audio_id'] = reference_id
   data = json.dumps(data)
   r = requests.post(api_endpoint,
                     headers={'Authorization': 'Bearer {}'.format(api_token)},
   response = json.loads(r.text)
   if response['success'] == True:
       Return response['result']

How can we tie all of these together?

To showcase how we can integrate all of these endpoints into a single automation run, we are going to look at a customer use case. The goal of this customer was to verify that the correct automated response was given when a user of their service made a call with no balance on their sim card.

Given the proper reference audio, this is made relatively straightforward with the use of Headspins audio APIs. We executed this test in the following steps:

  1. Upload a reference file to Headspin’s storage system using the upload endpoint and store the audio id for later use.
  2. Launch the Android/iOS device in Headspins mobile device cloud, which has audio enabled.
  3. Navigate to the device’s phone application using the Appium framework.
  4. Enter a given number and place the call using a sim card provided by the customer.
  5. Once the call connects, hit Headspins capture endpoint with a max duration of 20 seconds to record the automated response and store the audio id in the response for later use.
  6. Send both the reference and test audio ids to Headspins analysis API to verify they are a match.


1. What is the essential requirement for writing Appium tests?

The essential prerequisites for creating Appium tests are:

  • Driver Command - Appium supports writing tests by providing a set of commands.
  • Appium Session - As Appium tests are executed in sessions, it is essential to initialize an Appium session before performing tests.
  • Desired Capabilities: This refers to the parameters defining the test automation required from the Appium server. The version of the platform, device, and network environment are among the desired capabilities.
  • Driver Client Library- Client library aids in creating Appium tests and wraps the procedures required to send them to the Appium server via HTTP.

2. What is XPATH, and how is it used to find elements?

Generally, XPATH enables testers to browse the XML structure of any document, including XML and HTML files. XPATH gives a unique syntax for locating any element. It is a String-based elements identifier. Appium has a 'FindBy' function where the XPATH string can be passed, and the elements are returned.

3. What are Implicit and Explicit waits in Appium?

There are moments when certain elements do not load on the app screen during the testing phase. These circumstances necessitate a wait or delay for a specific test on the element in question. In Appium, implicit wait refers to a delay for a particular duration. On the other hand, explicit wait refers to the delay added in response to a specific condition.

4. What type of file is required for test automation in Android using Appium?

.apk files are needed for test automation in Android using Appium.