.Net Remove javascript and css code blocks from html page

I have html as string with javascript and css code blocks.

Something like this:

<script type="text/javascript">

  alert('hello world');


<style type="text/css">
  A:link {text-decoration: none}
  A:visited {text-decoration: none}
  A:active {text-decoration: none}
  A:hover {text-decoration: underline; color: red;}

But i dont need them. How can i remove with reqular expressions those blocks?

The quick ‘n’ dirty method would be a regex like this:

var regex = new Regex(
   RegexOptions.Singleline | RegexOptions.IgnoreCase

string ouput = regex.Replace(input, "");

The better* (but possibly slower) option would be to use HtmlAgilityPack:

HtmlDocument doc = new HtmlDocument();

var nodes = doc.DocumentNode.SelectNodes("//script|//style");

foreach (var node in nodes)

string htmlOutput = doc.DocumentNode.OuterHtml;

*) For a discussion about why it’s better, see this thread.



