Skip to content
  • (+91) 9409548155
  • support@appdividend.com
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Menu
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Javascript

Removing HTML Tags from a String in JavaScript

  • 16 Sep, 2024
  • Com 0
How to Remove HTML Tags from a String in JavaScript

Removing HTML tags prevents XSS attacks (Cross-site scripting attacks) and ensures consistent formatting. If you are performing a word count, your text must be cleaned from all the scripting tags; otherwise, the word count will be inaccurate.

But how do you distinguish HTML tags in normal text? Well, HTML is a markup language, and you can identify each tag with an opening bracket (“<“) and a closing bracket (“>”). Everything in between these brackets is a tag. 

Here is the pictorial representation:

Text with HTML Tags and Stripped off HTML Tags

 

Here are three ways to strip HTML tags from a string in JavaScript:

  1. Using string.replace() (Regular Expression)
  2. Using the DOM Parser
  3. Using .textContent or .innerHTML

Method 1: Using string.replace() (Regular Expression)

The string.replace() method uses patterns in the form of regular expressions to identify tags and remove them from a text. This is not a browser-specific method and you can use it on “Node.js” platform as well.

Method 1 - Using string.replace() (Regular Expression)

function stripHtmlTags(str) {
    // Identifying and removing HTML Tags
    return str.replace(/<[^>]*>/g, '');
}

// Example usage:
const input_text = "<p>Hey! <strong>Welcome to AppDividend</strong>!</p>";
console.log("Before stripping: ", input_text)
console.log("After stripping: ", stripHtmlTags(input_text));

Output

Before stripping:  <p>Hey! <strong>Welcome to AppDividend</strong>!</p>

After stripping:  Hey! Welcome to AppDividend!

This approach is basic and works well with most basic HTML structures. However, if you are dealing with nested html tags or malformed HTML tags, it may not handle all these edge cases very well. Furthermore, it can be slower for larger strings.

Method 2: Using DOMParser

The DOMParser() constructor available in the browser will create a new DOMParser object that has parseFromString() method to parse a string using either the HTML parser or the XML parser.

The .textContent property contains all the text inside the HTML <body> tag, strips the tags, and returns the clean textual output.

Method 2 - Using DOMParser

The DOMParser() constructor is only available in the browser and not available in the “Node.js” environment. So, you have to execute your program in the browser.

Here is the implementation code in the form of an HTML file:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Removing HTML Tags</title>
  </head>
  <body>
    <script>
      function stripHtmlTagsDOM(html) {
        // Identifying and extracting text from html
        const doc = new DOMParser().parseFromString(html, "text/html");
        return doc.body.textContent || "";
      }

      // Example usage:
      const input_text = "<div><p>Nested <span>tags</span> text</p></div>";
      console.log("Before stripping: ", input_text);
      console.log("After stripping: ", stripHtmlTagsDOM(input_text));
    </script>
  </body>
</html>

Output

Using DOMParser method

I would highly recommend the DOMParser approach because it handles complex HTML structures and correctly interprets HTML entities. However, it might be slight complex and can overkill for simple markup removal.

If you are working with user-generated content where comments or related text are filled with malformed html then you can use this method. 

Method 3: Using .textContent or .innerHTML

The .textContent fetches the text content between the HTML tags, effectively removes the tags. But we also use the .innerHTML property as a fallback because of the older versions of browsers. If both of the props are not available to the browsers then empty string will be returned as a final resort.

Again, this approach is a browser-based and would not work directly in Node.js envionment without additional libraries.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Stripping HTML Tags</title>
  </head>
  <body>
    <script>
      function stripHtmlTagsTextContent(html) {
        const temp = document.createElement("div");
        temp.innerHTML = html;
        return temp.textContent || temp.innerText || "";
      }

      // Example usage:
      const input_text =
        "<p>This is a <em>complex</em> example with &quot;html entities&quot;.</p>";
      console.log("Before stripping: ", input_text);
      console.log("After stripping: ", stripHtmlTagsTextContent(input_text));
    </script>
  </body>
</html>

Output

Using .textContent or .innerHTML

Conclusion

If you are dealing with a simple string with simple html markup, use the “string.replace()” method. 

If you are working with complex and lengthy string with complex and nested html markup, use the “DOMParser.parseFromString()” method.

Post Views: 57
Share on:
Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.

How to Check If a String is a Valid Email Address in Python
How to Extract Email Address from Text in Python

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Address: TwinStar, South Block – 1202, 150 Ft Ring Road, Nr. Nana Mauva Circle, Rajkot(360005), Gujarat, India

Call: (+91) 9409548155

Email: support@appdividend.com

Online Platform

  • Pricing
  • Instructors
  • FAQ
  • Refund Policy
  • Support

Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of services

Tutorials

  • Angular
  • React
  • Python
  • Laravel
  • Javascript
Copyright @2024 AppDividend. All Rights Reserved
Appdividend