How to Integrate Audio APIs into Your Appium Automation Scripts

Have you ever had trouble testing the audio functionality of your mobile application? Capturing, injecting, and comparing audio has historically been a tricky area in mobile automation testing. Even trying to validate something as simple as checking if the correct audio plays during a prescribed action can be a daunting task. HeadSpin’s Audio APIs simplify these operations and provide developers with a convenient RESTful interface. This post walks through a high-level overview of each API as well as showcases how to integrate it into your Appium automation script. 

What are these APIs? How do I use them?

In total, Headspin has seven different APIs regarding audio on mobile devices. This post will primarily focus on three very popular endpoints, which include uploading audio to our servers, capturing sound on a given mobile device in the HeadSpin cloud, and comparing two audio files to see if they are a match. 

Uploading

The primary use of this endpoint is to allow the developer to upload a reference file that is used later to compare against a test audio file captured from the mobile device during your automation test. One thing to note is that the audio format for this endpoint is .wav. The response from this endpoint is a JSON object notifying if the upload was successful, and if it were, it would provide you with a unique audio id. 

Example Code

def upload_reference_file(api_token):
    api_endpoint = 'https://api-dev.headspin.io/v0/audio/upload'
    data = open('reference_audio.wav', 'rb')
    r = requests.post(api_endpoint,
                      headers={'Authorization': 'Bearer {}'.format(api_token)},
                      data=data)
    response = json.loads(r.text)
    return response['audio_id']

Capturing

You invoke this endpoint when you want to start capturing the audio output from a given device, and the API also allows you to end the session in a couple of different ways. When you make the initial capture call to the server, you can specify how long you would like it to run, or you can store the response, which includes a worker id and use it to poll for status updates and stop the capture as well. 

After it has finished capturing the audio from the device, it is uploaded to a storage system that allows anyone in your organization access it. In the response from the capture endpoint, the audio id is passed back, which is essential to store for reference later.

Example Code

def capture_audio(device_address, duration, api_token):
    api_endpoint = 'https://api-dev.headspin.io/v0/audio/capture/start'
    data = {}
    data['device_address'] = device_address
    data['max_duration'] = duration
    data = json.dumps(data)
    r = requests.post(api_endpoint,
                      headers={'Authorization': 'Bearer {}'.format(api_token)},
                      data=data)
    response = json.loads(r.text)
    return response['audio_id']

Analysis

The analysis or match API allows you to pass in two audio files, a reference, and a test, and then determine if the reference audio is present in the test audio. We define the reference file to be the original audio source while the test audio is a more extended captured audio that contains the reference. 

The use case here is to detect exact audio matches as well as locate the reference audio inside of the test audio. Additionally, it can also compare the audio quality of the test relative to the reference. 

The response from this API includes multiple result parameters from the analysis. The key takeaways for me in this response is if it was a success first and foremost. Success does not indicate if the audios match but instead indicates that the algorithm was able to run correctly. Along with success, there are two objects in the form of parameters and results inside our response. Parameters described the thresholds for values used during the analysis, e.g. the sample rate. The most important behind a successful analysis is, of course, the results. The results object gives us the following stats:

  • Match: Full, Partial, No, or Error
  • Quality of Match
  • Start and End time of reference in terms of test

Example Code

def compare_audio(test_id, reference_id, api_token):
    api_endpoint = 'https://api-dev.headspin.io/v0/audio/analysis/match'
    data = {}
    data['test_audio_id'] = test_id
    data['ref_audio_id'] = reference_id
    data = json.dumps(data)
    r = requests.post(api_endpoint,
                      headers={'Authorization': 'Bearer {}'.format(api_token)},
                      data=data)
    response = json.loads(r.text)
    if response['success'] == True:
        Return response['result']

How can we tie all of these together?

To showcase how we can integrate all of these endpoints into a single automation run, we are going to look at a customer use case. The goal of this customer was to verify that the correct automated response was given when a user of their service made a call with no balance on their sim card. 

Given the proper reference audio, this is made relatively straightforward with the use of Headspins audio APIs. We executed this test in the following steps:

  1. Upload a reference file to Headspin’s storage system using the upload endpoint and store the audio id for later use. 
  2. Launch the Android/iOS device in Headspins mobile device cloud, which has audio enabled.
  3. Navigate to the device’s phone application using the Appium framework.
  4. Enter a given number and place the call using a sim card provided by the customer.
  5. Once the call connects, hit Headspins capture endpoint with a max duration of 20 seconds to record the automated response and store the audio id in the response for later use. 
  6. Send both the reference and test audio ids to Headspins analysis API to verify they are a match.

Accelerate your Automation Skills

Introducing HeadSpin University