If you think that communication with computers is only possible by using a keyboard, mouse, or touchscreen, you are completely incorrect. Touchless technologies are simply around the corner and already permeates our daily life: clever automobiles, online consumer support, virtual assistants, smart homes, augmented and virtual truth video games. But what backs up it?
Human communication approaches are several. We connect with voice, gesture, or direct touch, with the option to change in between them happening naturally and seamlessly. Nevertheless, human-computer interaction is a lot more complicated. While in the past we used to manage computers counting on direct touch only, that is by pressing the button on the keyboard or swiping the screen, today touchless systems enable us to interact with a computer system by utilizing voice or gesture.
Today, voice recognition and gesture recognition are the hype in the computing world and both are progressing at a fast rate. These innovations are based upon Natural Language Processing (NLP) and Computer Vision (CV), 2 locations of artificial intelligence that make machines efficient in communicating as human beings do, be it with spoken or non-verbal language.
However what are Computer Vision and NLP exactly?
Computer Vision or Machine Vision is a field of computer science that allows a computer system to comprehend images or visual data the exact same way human vision does and draws out beneficial details in order to produce the suitable output. Amongst its main applications are video monitoring, machine object detection/identification/avoidance, medical image analysis, enhanced truth (AR)/ virtual truth (VR) advancement, localization and mapping, transforming documents into digital information, human feeling analysis, ad insertions into images and videos, face acknowledgment, property development optimization, and so on
. Whereas Computer Vision gives a device the ability “to see”, Natural Language Processing is attributed to the ability “to speak”. NLP is a field of the expert system that allows automatic processing of human languages. In essence, NLP allows one to turn natural language into a device language in order to communicate with computer systems. The main objective is to bridge the gap in between human interaction and computer understanding. This innovation can be discovered in a variety of applications, such as information extraction, summarization, spelling and grammar checking, machine-assisted translation, details retrieval, document clustering, question answering, text segmentation, natural language interfaces to databases, e-mail understanding, optical character recognition (OCR), sentiment analysis, and so on
Where are CV and NLP being utilized today?
Today, computer system vision is greatly driven by a great number of industries and a variety of applications. The fast advancement of technologies, software algorithm enhancement, advanced cameras, the increasing adoption of Industry 4.0, and decreasing costs are all significant aspects that contribute to the development of this market. The international income from computer system vision hardware, software, and services is anticipated to rise from $1.1 billion in 2016 to $26.2 billion by 2025, according to a report from market expert Tactical. Besides, over the recent years, Computer Vision platforms have actually had the biggest quantities of acquisitions amongst all AI innovations. Venture Scanner’s report estimated its worth around $16 billion, which is 72% of all AI acquisition activity. For instance, Intel obtained Movidius in 2016 for $400 million. In 2016, there was 106 business focusing on Computer Vision, among them 33 companies supplying gesture control.
Because of a wide range of application possibilities, vision-based technologies are utilized in quite a number of various market sections, such as a vehicle, consumer and mobile, robotics and machine vision, healthcare, sports and home entertainment, security and monitoring, and retail markets.
The significant application market is the automobile and it relates to innovations used in cars and trucks and other cars, such as Advanced Driver Assistance Systems (ADAS), self-governing vehicles, parking help, obstacle detection, and so on. Self-parking systems developed by Mercedes, Google’s self-driving automobiles, BMW’s iDrive Controller with touchscreen control and gesture acknowledgement are just a couple of examples. This market is anticipated to grow in the coming years due to the rising capacity of vision-based technology for autonomous and semi-autonomous lorries.
The second biggest market is the customer and the mobile market. Applications of this market are used in consumer and mobile devices such as smart devices and tablets, which are embedded with digital cams. The crucial applications are gesture recognition, mobile phone apps, VR and AR, OCR. Microsoft’s Kinect or the PlayStation Eye, which are very popular today in the gaming industry, can be good examples. Virtual reality is a “killer app” for today, and according to Practice, it may cause a brand-new market segment within the consumer and mobile market.
The medical sector, regardless of its reasonably low current volume, seems to be most appealing in regards to general human development. CV supplies options for oncology detection, medical imaging and diagnosis, image archiving and interactions, surgical imaging, etc. A fine example is Microsoft’s InnerEye effort, which is focused on image diagnostics and has actually advanced in defining malignant tumours.
The primary Computer Vision platforms are Amazon Recognition API, Google Cloud Vision API, Microsoft Computer Vision API, IBM Watson Visual Recognition API, and Oracle (Apiary) CloudSight API.
Natural Language Processing
Natural language processing
Compared with the CV market, the worldwide market of NLP is evolving less rapidly. However, according to the forecast of Practice, it is anticipated to grow from $136 million in 2016 to $5.4 billion by 2025.
While today NLP is used mainly as a user interface innovation greatly driven by the consumer market, its enormous organisation potential lies in the processing of disorganized data such as text documents, audio and video files. This innovation is developing at a rapid pace while the international market is prepared and waiting.
According to the readily available stats, in 2016 there were about 170 companies specializing in NLP innovations, consisting of speech recognition and virtual personal assistants.
The main application markets for NLP are the vehicle, customer, and mobile, entertainment, healthcare, the Banking, Financial Services and Insurance (BFSI market), education and research study, the consumer and BFSI markets leading the field.
The driving aspect for the predicted growth is the speeding up adoption of chatbots and virtual digital assistants.
The chatbot is a computer program that supplies conversation through voice commands or/and text talks. The worldwide chatbots market was approximated at over $190 million in 2016 and is anticipated to grow according to Forbes publication. This innovation is already rather popular. A great deal of “top quality” manufacturers have already gotten this technology and introduced it into their items, such as smart automobiles, smart houses, socials media, and so on. The large business is likewise picking up the wave and actively utilize chatbot chances for service requirements. A popular example is customer care call assistance, which can offer a consumer with individualized virtual assistance.
When it comes to the BFSI market, along with individual virtual help, NLP has discovered its usage in market intelligence. NLP innovations have the ability to extract the essential info for a company from unstructured data and offer a company with insight into the state of the marketplace, work modifications, and other relative info.
It is also worth discussing that NLP has likewise shown beneficial in the healthcare market. The essential applications here are nursing assistants, automated care, management workflows, administrative workflows, and telemedical network.
The greatest NLP Platforms are Wit.ai, Api.ai, Luis AI, Amazon Lex, and IBM Watson.
Voice VS Gesture
Gesture recognition is a perceptual user interface, which is based upon CV innovation that allows the computer system to translate human movements as commands. Also, voice acknowledgement is likewise an affective user interface, based on NLP, which enables a device or program to recognize spoken language and, as a result, understand and carry voice commands.
Both user interfaces are utilized as an alternative to touch control, permitting users to communicate with a computer without making use of hands, hence making the mouse and keyboard unneeded.
The significant challenge of dealing with voice acknowledgement is the complexity of human language, which is unclear, has a plentiful lexicon, and numerous expression methods. In addition, its vibrant nature requires regular upgrading. At present, the voice recognition interface works well when specifically for easy jobs, with difficulties provided by slang, regional accents, sarcasm and irony, mumbling, ambient noise, etc. still to be conquered. Despite all this, the voice acknowledgement mistake rate is continuously enhancing, and according to Google, in 2017 it was 4.9% compared to 8.5 % in 2016.
By contrast, gesture recognition does not have troubles with recognition. It can differentiate between people, so an unapproved person can not utilize the system. This feature is important for smart homes, securing them from trespassers. Gesture acknowledgement’s primary weak point is a light condition since gesture control is based upon computer vision, which greatly counts on video cameras. These cameras are used to translate gestures in 2D and 3D, so the drawn out information can vary depending on the source of light. The limitation of the system can not operate in a dark environment.
Speaking of video cameras, this technology requires 2-3 camera sensing units that spot thousands of points to translate the gesture properly. Such cameras can be pricey, but rates are anticipated to reduce as time passes.
Unlike gesture recognition, speech acknowledgement does not depend upon the light condition, has a small size, and is not expensive. On the other hand, gesture control beats voice control due to the fact that of its natural, spontaneous character. This element plays an essential function in the application of speech acknowledgement innovations into a variety of items, making the process faster, easier, and much safer.
So, what does the future look like? Voice or gesture?
There is no absolute response to this concern. Voice and gesture control are applicable mostly in various fields in reaction to particular jobs requiring their use. Nevertheless, there are some markets where these technologies compete with one another. One such market is the automobile market.
The significant concern for the automotive market is safety. Today, 20-40% of cars and truck mishaps are brought on by motorist interruption. Mainly the chauffeur’s attention is impacted by basic things such as changing radio stations, turning on the air conditioning system, or setting directions in a navigation program. In the beginning glance, voice control appears to be an excellent option to haptic control. Nevertheless, the majority of vehicle manufacturers, in particular, BMW, Volkswagen, Subaru, Hyundai, and Seat, lean toward gesture control. The main factor for the preference of gesture recognition technology is the reality that gestures are used automatically, without distracting the motorist. By contrast, making use of voice can affect visual attention. After providing the voice command the chauffeur requires a physical recommendation point to make certain whether the command was comprehended by the system. Besides, an automobile is too loud of an environment for voice recognition. Discussions with fellow travellers, traffic sound, and other sounds can cause system mistakes. Another aspect is that the output of the voice system can end up being irritating with time as it interrupts other auditory processes like music. Besides, comparing the technologies, voice recognition is harder to execute in practice, as it needs advancement for different languages and routine updating of the system. For that reason, the gesture control system appears to be preferable and more widely used for the automobile market.
Another sphere where these innovations complete is the consumer market. Voice in addition to gesture recognition has actually discovered an application in a Smart house. A Smart house is a location that integrates innovative automation systems to provide remote control of some electronic devices, heating, and lighting. A number of voice automated gadgets, in particular, the Amazon Echo and Google Home, are already on the mass market and basically popular.
Voice UIs are absolutely leading this market today, however, as Tactical forecasts, gesture control will be acquiring more traction in the upcoming years. Moreover, there will not be a leading Smart house UI platform supplier.
It is not a surprise that for the mobile market, voice control is more dominant to name a few mobile interfaces today, and it is expected to remain so, primarily due to the international propensity of screens getting much smaller (e.g. Smart watches). Besides, a lot of commands is simply easier and less expensive to control with voice. The essential players here are Google, Apple, and Amazon.
In the healthcare market, both innovations have a large capacity, but most of them are now at the preliminary stage of implementation. For today, there is no AI-interface that offers a fully-open API for different applications with a low cost and massive adoption. Still, voice recognition technology is more in need due to a range of its application areas.
According to a report from Accenture, the top three AI-applications in the healthcare market are robot-assisted surgery ($40 billion value), virtual nursing assistants ($20 billion value), and administrative workflow help ($18 billion worth), which are all NLP-based technologies. A good example is the virtual assistant Sensely, which raised $8 million for its virtual nurse app. On the other hand, gesture control innovation matches the operating room much better. In addition, gesture control likewise can be utilized in medical gadgets with push-button control or navigation.
In summary, both voice and gesture acknowledgement are developing technologies, and their application depends on the nature of specific tasks needing making use of one or the other. In areas where both technologies can be utilized interchangeably, the choice in favour of voice or gesture recognition innovation is often made on the ground of the level of technological development. In this regard, gesture recognition presently is more established, however, voice recognition is capturing up at a fast lane.