.Net Remove javascript and css code blocks from html page

I have html as string with javascript and css code blocks.

Something like this:

<script type="text/javascript">

  alert('hello world');


<style type="text/css">
  A:link {text-decoration: none}
  A:visited {text-decoration: none}
  A:active {text-decoration: none}
  A:hover {text-decoration: underline; color: red;}

But i dont need them. How can i remove with reqular expressions those blocks?

The quick ‘n’ dirty method would be a regex like this:

var regex = new Regex(
   RegexOptions.Singleline | RegexOptions.IgnoreCase

string ouput = regex.Replace(input, "");

The better* (but possibly slower) option would be to use HtmlAgilityPack:

HtmlDocument doc = new HtmlDocument();

var nodes = doc.DocumentNode.SelectNodes("//script|//style");

foreach (var node in nodes)

string htmlOutput = doc.DocumentNode.OuterHtml;

*) For a discussion about why it’s better, see this thread.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s