• PRODUCTS
    • Listen: Social and Digital Intelligence
    • Respond: Social Media Response
    • Analyze: Page Analytics
    • Publish: Social Media Publishing
    • Social Studio Alternative
    • Command Center
    • Sentiment API
  • RESOURCES
    • Reports & Whitepapers
    • Premium Reports
    • Webinars
    • Radarr Academy
    • certification
    • System Status
  • BLOG
  • COMPANY
    • About Us
    • Careers
    • Media Coverage
    • Contact Us
  • FREE 7-DAY TRIAL
  • BOOK A DEMO

Behind the Scenes: Learn About the NLP Engine of Radarr (Part 1)

  April 21, 2022
  Radarr Admin
  Behind The Scenes
NLP engine of radarr social listening

You all know of Radarr as one of the top tools for social listening, monitoring and analytics. But what makes us stand out is our technology and in this blog, we’re sharing some behind the scenes information about the NLP engine of Radarr that brings the insights you derive from our dashboard. 

Radarr leverages powerful Natural Language Processing (NLP) algorithms to uncover quick and actionable insights from billions of online conversations. The NLP Engine can process data in more than 140 languages, specializing in Asian Languages like Indonesian, Chinese, Japanese, Vietnamese, Thai, etc apart from English and provide analytics on a micro per-post level and a macro level. 

There are two parts to this article – The first part walks you through some steps that are key to Radarr’s NLP Engine in transforming unstructured and messy data into a structured, machine understandable format by using some data preprocessing techniques.


Steps to organizing your data in the NLP engine

Step 1: Language identification 

The first and most important step is Language Identification – to automatically detect the language(s) present in each conversation and in the queries built using the Query Engine. 

Radarr uses an ensemble of models and techniques to infer the language of a text. 

Language models built using a train dataset that contains a mix of social media and formal language data provides the basis for our language detection. 

In cases of short form text, where it is a challenge to predict the language, we resort to statistical approaches based on the language-specific vocabularies that we have built in-house over time. 

In cases where the language has still not been identified, we use the country and locale to give us hints about the language.

Language identification l Radarr

Once the language has been identified, the data stream is distributed into multiple preprocessing pipelines for groups of languages. 

This step is essential as certain groups of languages have their own unique vocabulary, tokenization methodology and direction of writing among other differences. 

For example languages like Japanese and Chinese do not have spaces between their words and languages like Arabic and Urdu are written from right to left. 

Step 2: Tokenization 

The next step is to break down streams of text into words, terms, sentences, or other meaningful elements called tokens. The Tokenization step has a very important effect on the rest of the pipeline as they form the basis of chunking the text into meaningful pieces for further lexical analysis. 

The simplest form of tokenization is the white space tokenization which is used to split words based on just a white space (which is useful for latin languages). Radarr uses multiple models pre-trained on social data across industries in order to learn the vocabulary per language to tokenize text meaningfully. 

Step 3: Normalization 

Once the posts have been segmented into meaningful tokens, they are converted into their normalized form. 

Normalization is the process of converting tokens to their base standard format in order to make semantic comparison easier across conversations. An ensemble of methods is used for normalizing words like Lemmatization and Stemming. 

Step 4: Vectorization 

Finally, before we glean insights from these billions of conversations, we convert all the text data into a machine understandable, vector format or Embeddings in order to perform advanced NLP techniques such as making word/sentence predictions, finding word/sentence similarities and understanding text semantics. This step of Vectorization or representation of text forms the foundation for all of Radarr’s Advanced NLP models. 

Vectorization l Radarr

So when it comes to being able to listen to and monitor online conversations in multiple languages, our NLP engine is one of the most robust out there. 

But that’s not all about it. 

In Part 2 of this article, we will explain about the insights that we extract using some Advanced NLP techniques after the initial data preprocessing.

To be notified, don’t forget to subscribe to our blog or try Radarr.

Recent Posts

View All Posts

Trusted by 1000+ brands

  • Book A Demo with Radarr

Book A Demo

Thank you for contacting us!
We will be in touch with you shortly.
online sentiments around singapore politics
Singaporeans’ Online Sentiment About the Recent Issues That Shook Singapore Politics
| Media Monitoring

Singaporeans’ Online Sentiment about the Recent Issues that Shook Singapore Politics It seemed like time stood still for Singapore

How to Decide Your Social Media Marketing Budget
How to Decide Your Social Media Marketing Budget
| Social Media Marketing

Social media is getting bigger and bigger with each passing day.  Just when your brand has established its presence

ai will not replace social media managers
Why AI Will Not Be Replacing Social Media Managers
| Social Media Marketing

Social media managers, marketers and advertisers, this piece is for you. If you’ve landed here, anxious and nervous about

ugc in marketing - examples from etsy
Etsy’s Approach to UGC in Marketing and What You Can Learn
| Digital Marketing

Here’s a complete breakdown of Etsy’s approach to UGC (user generated content) and lessons for your brand. Don’t miss

employee advocacy on social media
What is Employee Advocacy on Social Media and How to Measure It?
| Social Media Marketing

The advent of social media has undoubtedly opened up a world of opportunities for businesses across industries. Today, there

instagram carousels vs reels for social media marketing
Social Media Marketing Experiments: Instagram Carousels Vs Reels
| Social Media Marketing

Deep dive into Instagram carousels vs Reels for brand marketing. Instagram has become one of the leading channels for brands

Brand marketing during SEA games 2023
Insights on How Brands Utilized SEA Games 2023 for Marketing
| Social Media Marketing

The SEA Games 2023 in Cambodia presented a golden opportunity for brands to harness the power of this prestigious

everything you need to know about dark social in marketing
Everything You Need to Know About Dark Social and How to Measure It
| Social Media Marketing

Monitoring your social media presence is relatively easy, but there’s a hidden realm called “dark social” that poses measurement

complete guide to instagram seo
Guide to Instagram SEO (2023): Do These 10 Things to Skyrocket Your Reach
| Social Media Marketing

 Use this simple, step-by-step guide to optimize your Instagram SEO for your page, posts and reels! When Adam Mosseri,

glossier social media marketing strategy
Glossier’s Social Media Marketing: A Blueprint for Success
| Social Media Marketing

Social media marketing is a powerful tool for brands to engage with their audience, increase brand awareness, and drive

Radarr Newsletter

Become part of our list for updates and get first dibs on free industry reports. Sign up today!

Radarr

Radarr Command Center

Radarr Sentiment API

Copyright Radarr 2021

Privacy Policy

Terms of Use

This website uses cookies to improve your user experience. By clicking 'accept' or continuing to navigate the website, you agree to our use of cookies.Accept & Continue