Monday, 19 January 2009

Diagnostic Console and Regular Expressions

In my last post I have introduced a Diagnostic Console plugin for EPiServer. Today I would like to show you one more example how it can be used with regular expressions to scan pages for some data.

In this example I want to find all images which are referenced by "MainContent" (XHTML String) property. To do that following script can be used:


   1:  clr.AddReference("System")
   2:  from System.Text.RegularExpressions import RegexOptions
   3:  
   4:  def processProperty(property, page):
   5:   if property.Value and property.Name == "MainContent":
   6:    regexp = "src=\"[a-zA-Z0-9\-_./]*\""
   7:    elements = DiagnosticUtils.FindMatchingElements(property.Value, regexp, RegexOptions.IgnoreCase)
   8:    if elements.Length > 0:
   9:     print str(page.PageLink) + " " + str(elements.Length) + "<br/>"
  10:     for element in elements:
  11:      tempvalue = element.Substring(5)
  12:      url = tempvalue.Substring(0, tempvalue.IndexOf("\""))
  13:      print url + "<br/>"
  14:  
  15:  DiagnosticUtils.ProcessProperties(PageReference("1"), processProperty)

I use DiagnosticUtils class to make those scripts shorter ... writing code without IntelliSense can be very annoying! I use ProcessProperties() method to scan all pages under selected root and one additional method FindMatchingElements(). FindMatchingElements was added just 2 days ago so you have to download the latest version of the plugin to have it. This method encapsulates creation of Regex object and as a result returns arrays of all matches. In above example I'm simply displaying all URLs but I can as well replace them with some different URL and update the property like this:


   1:  def processProperty(property, page):
   2:   if property.Value and property.Name == "MainContent":
   3:    regexp = "src=\"[a-zA-Z0-9\-_./]*\""
   4:    elements = DiagnosticUtils.FindMatchingElements(property.Value, regexp, RegexOptions.IgnoreCase)
   5:    if elements.Length > 0:
   6:     writableClone = page.CreateWritableClone()
   7:     for element in elements:
   8:      tempvalue = element.Substring(5)
   9:      url = tempvalue.Substring(0, tempvalue.IndexOf("\""))
  10:      filename = url.Substring(url.LastIndexOf("/") + 1)
  11:      print str(page.PageLink) + "   [" + url + "]   [" + filename + "]<br/>"
  12:  
  13:      updatedProperty = writableClone.Property.Get(property.Name).Value.Replace(url, "/images/news/2008/" + filename)
  14:      writableClone.Property.Get(property.Name).Value = updatedProperty
  15:  
  16:     DataFactory.Instance.Save(writableClone, SaveAction.Publish);
  17:  
  18:  DiagnosticUtils.ProcessProperties(PageReference("1"), processProperty)


Here you can download the latest DLLs and source code.

No comments: