Including bots, alexa and mobile
Sarah-Lynn Brunner: Good morning and welcome to our webinar, Automate Testing for Next Generation Interfaces Bots, Alexa, and mobile. My name is Sarah Lynn Bruner and I will be your host today. Today’s presenter is Andrew Morgan. Andrew is the director of product marketing at Apexon. He’s an experienced leader in strategic analysis, opportunity assessment and roadmap execution.
With his guidance and expertise, he has helped companies expand their digital initiatives to groundbreaking level. Andrew has over 10 years of experience in working with a wide range of companies including global automotive, pharmaceutical and technology manufacturers. Additionally, he has directed the development of market first such as life science applications, customer engagement program, and predictive analytics platform. Let’s welcome, Andrew.
Andrew Morgan: Thank you.
Sarah-Lynn: Before we begin, let me read you some housekeeping item. First, this webcast is being recorded and will be distributed via e-mail, allowing you to share it with your internal teams or watch it again for later. Second, your line is currently muted. Third, please feel free to submit any questions during the call by utilizing the chat function on the bottom of your screen. We will answer all questions towards the end of presentation and also we will do our best to keep this webinar to the 45-minute time allotment. At this time, I like to turn the presentation over to Andrew.
Andrew: Excellent. Hi, everyone. Thank you for joining. Thank you Sarah Lynn for the introduction. Like Sarah Lynn said, my name is Andrew Morgan, I’ll be the product marketing strategy here at Apexon. We are digital first services and consulting firm specializing in accelerating digital initiatives such as quality engineering, development and dev ops, and any innovation.
We have offices all around the globe and focused on complex industries such as financial services, healthcare, and technology. Additionally, we have been doing this for over a decade. As I walk through the factors to consider and approach to testing these technologies, is because we’ve directly seen how they can enhance consumers being able to engage with companies or on the flip side the impact of organizations revenue.
Today, we’re going to cover six things. The transformation of consumer applications and what they look like today. New generation digital interfaces and the characteristics they carry. Some of the existing bugs within these new generation digital interfaces. Then we will get into both the factors to consider when testing digital applications and some of the demos and how we test a few of them. Lastly, scaling these new generation digital interfaces. What it really means to not only go to market but, to have satisfied customers.
Now when we talk about transformation of consumer applications, the term next-gen is often used. We even use it here today because they’re still coming up next for an adoption skill ability standpoint. A lot are already in the market, but this isn’t like a showcase at CES. This can be found every day if you know where to look. These technologies are really going to foster a completely new generation of consumer engagement.
With our consumers we all be working on different projects, they’re at the bottom line to whether or not our companies are successful. Let’s look at these new generation digital interface. Mobile apps started as a fairly one-dimensional pull of information. They will now be instrumental to how we manage and engage all of our IoT devices. Going to gesture base devices and being able control the flight path of a drone with your hand to more common voice bots such as Alexa and Google Home.
These devices actions are based on how we engage with them and how they interpret our actions. Wearables, it is no longer just the visual representation of a mobile app but these devices leverage a multitude of sensors to provide a more useful engagement and insights to users. Looking at some more technologies, take a second and think about how many times you’ve interacted with your Alexa or you’re actually happy with those chat bot and voice bot interaction.
From what we’ve seen in the market and from both consumers and different companies, the answer tends to be, no. Now, bots have a much larger task than to just provide another medium humanless interaction, but they also have to actually interpret the user and carry out human-like conversation. Augmented mix and virtual reality is something that we have heard about for a while and are seeing a little bit in pieces pop up nowadays from intelligent dashboard and the new BMW to whole land.
Like these wearables or like wearables, these technologies have to incorporate data and sensor readings into providing not only the best but the most accurate consumer engagement. Think about driving a car, you really need to have that information to be accurate. Lastly, we’ve seen tangible UI in the marketplace like Microsoft Surface hub and the Google JamBoard where you have more multiple applications and user engagement needing to integrate together correctly from multiple users.
Now that we’ve considered the landscape of these new-gen interfaces, let’s look a few interesting characteristics that they incorporate, all of which have an impact on testing. From the top right, starting at number one, we have the understanding users. Number two, the conversational element. Number three at the bottom, we have identifying and distinguishing both objects and users. Number four is, these interfaces tend to have a form of cognitive intelligent. Number five, at the top left, the ability to augment the virtual objects.
Number one, understanding the user. Two interesting devices here, Google picked the bugs which I believe are currently only working for the Google pickle device and the Microsoft Kinect XBox. Pickle bug translates audio automatically for you and the Kinect reflects your actions automatically as well. Both these cases, they have to discern an intention and act on it, like what language is being spoken in the outside surrounding? Is it Mandarin? Is it Spanish? What language it needs to be translated to? English, French, and so on. The Kinect has to make sure your actions are directed at the device while accounting for other moving objects that may be present.
What that means is that they both are subject to environment conditions in the outside world. Surrounding noises like cars or moving objects like pets or children in the background. We’ve actually built harnesses in our lab to test these types of devices like the Microsoft Kinect to be able to establish what we call a gold standard while also being able to replicate all those real-world conditions.
All right. Now to number two. In terms of the conversational element, we have all seen IVR BOTS where the interaction is press one to go here. Like I mentioned, these new BOTS have a human-like conversation. They need to be able to understand users intent using natural language processing and reply to the user with natural language generation. Same goes for voice bots. The users of these technologies expect the BOT to be able to interpret a wide variety of dialects and phrases.
Something that we actually have built and rolled out for our own employees is an HR BOT where you can quickly ask types of questions like, how many days a PTO you have apply a sick day or get media information when open an enrollment for benefits are. For identifying distinguishing face identification and biometric, face identification biometrics haven’t made huge leaps in the last few years but conditions like lighting, wearing a hat, and grease or dirty fingerprints have to be taken account as they are everyday real-world factors that can impact the user experience. Additionally, multiple moving objects is a very real scenario testers have to incorporate. They have to create test cases around the parameters in the scenario and then understand what it takes to actually scale that testing in that product.
For one of our financial customers, we actually worked with them on how to create a smart mirror to be able to interact with users to display the correct account information. Lighting in different identification were both major factors that we were able to identify and actually help them address. Next generation interfaces do have to have a cognitive intelligence as well. Couple examples here. A display that interacts with users’ device and real-world objects to display more information about that object.
The challenge with cognitive interpretation is that you have to go really narrow for it to be useful. This is because cognitive intelligence has to take context into account, and context is almost always multi-dimensional. For one of our clients we built a customer service ChatBot that could understand the mood of the customer. I can tell you this, moods can be very hard to read sometimes and some customers don’t really mean what they say. I’m sure we’ve all, at one point, said or did something that we didn’t really mean.
Lastly, let’s look at the characteristics of virtual objects that augment reality. For the healthcare industry, the application of this technology is huge. Doctors now can immediately interact with multiple forms of data to provide the best care to patients in a timely fashion like past medical history, current vitals and specific areas in the bodies that they are treating. For the consumer side, customers can see how products will look like in their home before they buy it.
Now, this is much more than just an e-commerce feature, those we have been testing for quite a while. It is about being able to incorporate the correct information, schematics and being able to provide value to the users by incorporating additional information. Looks like in this example we have a slow-cooker so it has to be able to quickly access previous history of the user to determine the best recipes and other information that may entice that user to purchase it right away.
Now that we’ve looked at the various attributes of these interfaces, let’s look at some examples of how these bugs manifest. I want to show you some real-world examples that have actually occurred. A New Zealand passport robot told an applicant of Asian descent to, “Open their eyes.” Oops. Now in hindsight, you know how to test it, but is it a boundary condition or a critical path error? It’s actually both because the cross between digital and the physical real-world is where the issue occurs.
Now let’s look at two conversation box. The user on the left is asking a question about the weather, they ask, “What was the weather like this weekend?” The bot doesn’t understand it, figuring out where exact location they’re referring to. The user then asks a very specific question and the Bot replies with that direct answer. The user wants affirmation that this is the correct information but as you see, the Bot doesn’t understand where that question is coming from.
Now you can read the user’s intent in one sequence and you can see that they’re referring to the same topic, but the bottom one is processing the last ask question and it’s caught in a loop instead of reaffirming the user of the previously supplied information. For the other example, instead of clicking to the see more button, the user wants to see more of the product in the bot or the application they’re using.
The bot is not built or tested to understand distance desired engagement so instead provides a canned respond, letting the user know that they will be forced to proceed with a different medium for this desired action. Now, both these have been built and tested, but not all the situations where the bot could fail have been accounted for.
Now, who has heard of the drop-in feature for Alexa? If you haven’t, basically allows you to interact with an Alexa from anywhere of the world, but the person where the Alexa actually is on the receiving end often is very confused or scared by this interaction. If you’re dropping in on your sister in New York and if you are in California, or your parents in Europe, whatever it may be, there isn’t a lot of communication and clarity to how the recipient receives this information.
Additionally, just in time for Halloween, you can see examples here of where the Alexa actually interacted on its own without the user’s direction. For example, Alexa decided to let out a creepy laugh. They can see by these tweets where users or the user thinks that there is a good chance they get murdered tonight, albeit hopefully in a jokingly manner. This is not a good customer sentiment that you want to have. Well, I think everyone joining us this has in great context and now in the characteristics and potential liabilities of this new generation technology.
Let’s look in some examples how we can test an approaches that we take for these specific application. First, let’s start with mobile applications. You have a bunch of things going on, biometrics, Bluetooth, location, date and time, et cetera. You need to be able to test these in a scalable and quick manner. These tests will need to simulate and then test application make sure the functionality, performance and UX are all accurate, well, additionally accounting for other features and sensors involved. Making sure that you are enhancing one element while breaking another.
One example is biometric and authentication testing. Before we start, what is the approach? Are we testing the fingerprint or are we actually testing the application’s response to the fingerprint? It’s the latter. We have to test to make sure that while incorporating all the ruled factors that the application responding to correctly, that then the next feature or interface is the correct one or what the correct passes for that application when there is a failure to identify that biometric.
This is one approach we take to testing it. Using a library of simulations controlled by an external controller. We have a small SDK for all these libraries with these features. The test case library move the hardware and software. This automation library and approach has saved not only time for our customers in being able to launch their product but has drastically reduce the cost it takes to launch them as well.
Now, let’s look at some testing nuances of the conversational UI. We have looked at chat and voice bots a few times already in terms of characteristics and software bug. We talked about how they both have commonalities and differences, being able to needing to mimic human conversations but they also incorporate different scenarios. Look at the factors to test and then we will show you a quick demo.
All of these are important to comprehensively test a bot. We help move a client from an IVR to a chat bot, so let’s go through each of these. These were some of the commonalities that we identified within the scenario. Starting on the top line in red, we have different responses to the same query, such as, “Thank you. You’re welcome. My pleasure. No problem.” Different ways to actually respond to that same question and then how quickly that responds to the bot is or response time from the bot is.
Are they able to determine those differences and apply the correct response in an appropriate fashion? Going down below the middle line, we have the bot’s understandings of intent. Different users ask the same query in different ways, “What’s the growth of my portfolio? Where is the percentage change in my portfolio?” Then additionally, could be multiple queries in the same sentence.
How to understand a question such as, “What is the growth of my portfolio and how much have I saved in the last year? What is the difference in that savings?” Now, how does that bot actually be able to understand those multiple questions within one sentence while being able to provide the correct response? Then lastly at the bottom, we see understanding typo errors as well as mixed languages.
How far do the bot can understand that typo or how well is it understanding the users intent as well as mixed language queries. I remember in Spanish class, we used to always say, [Spanish language] car or bank or whatever it may be. If you’re interacting with this chat bot and you’re typing in these different languages, how is this system able to actually understand what is being said by the user.
Additionally, as bots have become more prominent in these factors have to be considered, we’ve seen the following desired features requested for specific bot testers. There’s even a new role in the position actually formulated in the market. We call this a conversational designer. Now, for the demo. Here we have a chatbot tester. We can take already defined health cases in an Excel or CSV file with a defined type of interaction that you are looking for from that bot in a spreadsheet.
Then automation program will ask specific categorized questions, the bot will respond and all of the answers whether they are correct or not from the bot, captured in a report so you can see the performance of the bot. This automates the testing for the bot. For one of our customers, it would have taken three weeks to do a manual testing of 1,700 interactions. We automated this to occur just nine minutes. Now that is a huge benefit not only to saving time and money but also being able to bring that feature to market quicker for your consumers. Here, you can see the results of the bot as we go down. You know what the desired response was and making sure that bot replied correctly.
Now let’s turn to voice bots and see some of their nuances. There’s some pretty unique nuances with of voice bots different than chatbots. Let’s see what they are starting with the top and purple. There’s different accents in genders. American male versus the British female or punctuations. For example, tools without any skills are helpless or tools without any skills are helpless? Your ability to provide punctuations within your responses needs to be interpreted by that bot. Same thing are down below the same meaning for different utterances. Yes, yeah, true, exactly, certainly can be used interchangeably but the bot must understand that.
The background noise as well needs to be checked to understand exactly where the user’s intent is coming from and trying to go. Down at the bottom, we have different pronunciations. People often pronounced the word accessory instead of accessory. Does that bot understand that user’s intention? Then speaking at a distance, the fact that speaking of distance in case of listening to advice or being stationary, understanding where that’s coming from, what other echoes may be involved and how that hitch is impacting the performance of the bot.
Environmental conditions do matter a lot when testing voice bots and they have to be accounted for. For demo on the approach, we’ve taken to testing voice bots, let’s hope the audio is working on this. This is similar to the chatbot who take already defined test cases in an Excel or CSV file, but the define type of interaction you are looking for from that bot in the spreadsheet. We upload that spreadsheet here where an automated process of the testing can occur overnight, not disrupt anyone else in the office where work is critical.
Andrew: Here you have that web-based result. This showcase is exactly what the expected output and actual output were. Another approach to testing voice bots is accounting for that distance and pitch that we alluded to earlier. This is used to take into account factors like background noise and other elements that could be used that are better real-world scenarios.
Being able to test, and as you can see here, as a harness moves back and forth, there is a test case that pass or fail based off of the distance of the bot. We’ve actually built this testing harness, so our customers can bring the best quality products in the market to the quickest time. Now, let’s quickly look into testing augmented reality application. Most of you I’m sure are familiar with the AR applications to a degree, but you probably haven’t tested or had to work on too many AR applications.
These capabilities in applications of these technologies are barely scratching the tip of the iceberg. Most likely you’d be testing them much sooner than you anticipate. Let’s look at these nuances of AR applications again. Starting at the top in purple, we have environmental conditions and device under motion. Low lighting affects the behavior of the application and can affect how that application senses or recognizes different elements.
As well as moving around, making sure that as the motion occurs then you are using the different applications that, that consumer engagement experience is correct. Secondly, we have placing of objects as well as augmenting objects dimensions. My objects need to be placed only if enough surface is detected while some of the object like glasses need to be placed only when specific items are detected.
Any object placed in a real-world must have a realistic dimension. This is for augmenting that objects dimension. Understanding where they need to be able to match, as similar to that slow-cooker that we had mentioned previously. Then lastly, the users interactions. Capturing the users interactions based off of the pinch-to-zoom, swipe, double tap, whatever it may be. Other factors to consider are AR apps involve many compute and memory intensive task. Like recognizing the scene, rendering objects to be able to incorporate the real world camera stream.
This is one of the approaches that we have and that we take to deploying AR applications. This shows you that we have an actual method to the madness. We upload the training data of sample images of AR objects and train machine learning models using tensor flow. Objects are detected in the AR application and then validated. We simulate the users interactions with the AR object.
Now this can be, again, many things based of off how the application is designed, and what you are actually intending the user to do. From that swiping or that 3D touch or the type of clicking that needs to be done within that application. These interactions are verified such as the movement, reorientation and even more based off of those interactions. Finally, a detailed report is generated using any automation framework.
Here, we have some screenshots of examples from our AR testing. On the left is the Training Data Set for the expected output, on the right you can see the real-world screenshots from the automated interactions. The process here, the app must know how to identify the objects, distinguish the correct object that we want to augment. Imitate the users gestures based off of how and what type of information we want to showcase and then identify new objects and rectify it. How those objects appear and what they need to look like for that user based off any device they’re using, any operating system whatever it may be.
From augmented reality, I want to move to virtual reality. I also want to be conscious of your time because virtual reality can go very, very deep. [laughs] As we all know, there’s tons of opportunity within all these new generation technologies. I’m just going to cover these main elements to really consider because diving into each one will take a vast amount of time. We’d be happy to follow up with any of you for a specific virtual reality discussion and how these applications can actually improve your customer experience.
VR can incorporate a lot more elements into the application than AR. These are the four main areas of testing for VR applications that everyone has to consider. Such as content of video, audio, Buller, animations, frame rate and so on. The performance of those applications such as total time frame, application drop drop frame as well as warping scenarios. The user experience like the gyroscope tracking, user gaze and interactions, and consistency of content navigation. Lastly, other parameters that we needed to take into account or fresh rates audio and video synching and then any motion or audio latency.
Each one of these could have very deep test cases and use cases around how to actually make sure those specific scenarios are working within that virtual reality application. Finally, before we get into some questions, we have to called machine learning and AI. You saw earlier the factors you need to consider when testing these new generation interfaces is vast to be able to scale these technologies and bring them to market at a state of where they will be able to find true value to your customer.
Having those different technologies of machine learning and AI is really where the value will be for these new generation products. AI and ML need to be integrating the specific areas to really provide that best user experience and account for every scenario, the real-world or the user intent. I don’t know if I have taken up enough of your time yet on these different technologies and how they can really be applied in the new testing scenarios that you need to take into account. I’ll pause here and let me ask Sarah Lynn if there’s any specific questions that we have.
Sarah-Lynn: Great. Thank you so much, Andrew. All right. Now this time we’d like to get your questions answered and as a reminder you cannot any time ask any questions utilizing the chat feature at the bottom of your screen. We do have a couple questions for you, Andrew. First one is, at what stage of the STLC for these bots? Do you see companies doing the most testing?
Andrew: For us, we naturally want to shift left as much as possible in any of our testing effort. We try to corporate in the very beginning to make sure that the truth scope of the application can be successful. However, these bots and applications are often being developed in an innovation or R&D department. Those department need to really showcase the full capability before they test for the rest of the organization before they decide whether or not to roll it out or integrate it.
Testing isn’t always the current as early as it needs to and there has to be a top-down alignment in vision to where and the how organization wants these technologies to increase their consumer experience.
Sarah-Lynn: Great. Thank you. Another one here, how many test cases are needed to begin testing these different bots?
Andrew: Isn’t that the $10 million question? That really depends on what this bot is intended to do or what your bot intended to do. We’ve leveraged and we continue to leveraged technologies like model-based testing and natural language processing to ensure that any one of our customers have that correct amount of coverage and need for their application and that for the application quality. There isn’t a base number per se but what we do is get you that right numbers that we can ensure that quality and coverage.
Sarah-Lynn: Great. Thanks, Andrew. We have time for one more, we have a bot and it does what we want to do well, but what we want it to do is very limited. What can we learn from testing to help increase its functionality?
Andrew: Taking testing and applying it back. That’s a great visionary and great forward thinking approach by that person. It’s a great question because we’ve seen it has been moved from a zero testing to automated testing to now continuous testing and there’s a lot of approach in trying to determine how continuous testing and that automation can actually feed back into the product development and the business initiatives that you’re trying to achieve.
One of the main benefits from all these technologies is that to be able to scale increase the functionality is that you will need automation because of that are required for manual testing isn’t practical. With this automation, you’re now able to have in more data which allows you to then apply intelligent algorithms as well. Also integrating other activities in other departments such as operations and development. Incorporating those environments and being able to take all that data and have some actual insights be able to gather. It will actually leverage big data you create and held insights into what functionality you should really bring to market.
Sarah-Lynn: Great. Thank you, Andrew. I do have another question real quick. How does it integrate with traditional test framework for web application or mobile application?
Andrew: Really depends on that specific framework. There are many frameworks out there that are much more advanced and developed and are able to incorporate our existing technologies. We do have a seamless integration with most of them. That would be a question we would want to dive deep down in to understand exactly what your traditional testing framework is used for, what those repositories look like, and what different tools they’re actually being plugged in and how you’re leveraging it within your software development lifecycle.
We would have taken in all those factors to be able to provide the best road map and course of action to make sure that these testing approaches would be able to integrate with that ramework or what potential shifts need to be accounted for to get these products to market.
Sarah-Lynn: Perfect. Thanks Andrew. I believe that’s all the questions we are able to answer at this time. Also, please be sure to check out to subscribe DTV, a new digital transformation channel that brings in industry experts. Many thanks to our speaker, Andrew. Thank you everyone for joining us today. You’ll receive an email in the next 24 hours with a slide presentation and link to the webcast replay. If you have any questions, please contact us at firstname.lastname@example.org or call 1-408-727-1100 to speak with a representative. Thank you all. Enjoy the rest of your day.