Feb 13

XSS AntiSamy policy configuration

AntiSamy is an HTML, CSS and JavaScript filter for Java that sanitizes user input based on a policy file. AntiSamy is not an HTML, CSS and JavaScript validator. It is merely a way to make sure HTML, CSS and JavaScript input strictly follows rules defined by a policy file.

AntiSamy is a sanitization framework; it is up to the user how it does its sanitization. Using AntiSamy does not guarantee to filter of all malicious code. AntiSamy simply does what is described by the policy file.

Configuring Your AntiSamy Policy

If the default policy file wm-xss-policies.xml doesn’t meet your needs, you can tweak your policy file to your needs and upload. Each section of the policy file is broken down in the next few sections to give the developer background on tweaking the output.

Policy File Directives [<directives></directives>]

The following table shows the directives, their default values, and their impact on the AntiSamy filtering process.

Directive Type Default Description
useXHTML boolean false When this feature is on, AntiSamy will output the sanitized data in XHTML format as opposed to just regular HTML
omitXMLDeclaration boolean true When “omitXMLDeclaration” is turned on, AntiSamy will automatically prepend the XML header to output. Enabling this feature will tell AntiSamy not to do that.
omitDoctypeDeclaration boolean true When this feature is enabled, AntiSamy will automatically prepend the HTML doctype declaration. By default, AntiSamy will not prepend this header.
formatOutput boolean false When enabled, AntiSamy will automatically format the output according to some basic rules and indentation. Kind of like “pretty print.”
maxInputSize int 100000 This directive specifies the maximum size (in bytes) of user input before it’s validated.
embedStyleSheets boolean true When the developer chooses to allow CSS, this directive will specify whether or not remote style sheets found referenced in the user’s input will be pulled down and embedded into the current user input.
maxStyleSheetImports int 1 This feature allows developers to specify how many remote style sheets can be downloaded from any one input.
connectionTimeout int 1000 When “embedStyleSheets” is enabled, this timeout value (in milliseconds) will be used when fetching the offsite resource in question. This should be used to prevent validation threads from blocking when connecting to 3rd party systems that may purposefully act really, really slowly.
preserveComments boolean false When enabled, AntiSamy will keep HTML comments supplied in the input.
nofollowAnchors boolean false When enabled, AntiSamy will append rel=”nofollow” attributes to all anchor ( <a> ) tags supplied in the input. This is useful for telling search engines not to associate your site with sites that are under the control of your users
validateParamAsEmbed boolean false When enabled, AntiSamy will treat attributes of <param> tags in the policy the same as any tags nested inside the <embed>. This allows users to, according to policy, pass in data in either of those two methods with equal security. This is needed for sites that allow users to supply videos, etc.
preserveSpace boolean false When enabled, this feature is intended to preserve spaces as specified in the input without normalization.
onUnknownTag String remove When this directive is set to “encode,” it will encode the tag using HTML entities, instead of removing the tag, which is the default behavior.


Common Regular Expressions [<common-regexps> </common-regexps>]

You can declare regular expressions here that can be used in the rest of the Policy File.
Examples:

This regular expression is used to determine whether the text in an id attribute is valid or not.

We can define any number of regular expressions and each regex has its unique name to refer to other sections for validating over regex. The regexes defined here can be used to

Common Attributes [<common-attributes> </common-attributes>]

Here you can declare attributes here that are common to many different tags. Antisamy will match the attribute that encountered on any tag with regexes / literal-list defined for that attribute in this section. If doesn’t match then Antisamy will apply the action defined for that attribute with the onInvalid attribute.

Below the schema structure.

Attribute:

  • Name – The name of the attribute (required).
  • Description – Description of the attribute (optional ).
  • onInvalid – Defines the action to be taken if the value of attribute fails matching (optional, Default : removeAttribute ) and possible actions are [ removeAttribute, encodeTag, filterTag, removeTag ].
    • removeAttribute – Remove the attribute only
    • encodeTag – Encode the tag in which the attribute constitute.
    • filterTag – Remove the tag and children but keep the contents.
    • removeTag – Remove the tag along with the content.

Regexp-list – Value of the attribute should match all the regexp defined here.

  • Regexp
    • Name – The name of the regex that already defined in Common Regular Expressions.
    • Value – Value of regex instead of defining int Common Regular Expressions.

Literal-list – The value of the attribute should be one of the literal constants defined here.

  • Literal
    • Value – Defines the literal constant value of the attribute.

Example 1:

And sample html text as follows
<body><p id='$$test'> Test Me </p></body>
As id attribute contains $ it doesn’t match the regex htmlId so it will result as follows for different actions.
removeAttribute - <body><p> Test Me </p></body>
encodeTag – <body><p id="test"> Test Me </p></body>
filterTag - <body> Test Me </body>
removeTag - <body> </body>

Example 2:

Here the media attribute should match the regular expression and it should be in specified literal as well.

Note: In order to apply these validations on an attribute of the given tag, the attribute should be declared in section OR in under section otherwise that attribute will be removed without applying the attribute validation that defined in common-attributes.

Global Tag Attributes [<global-tag-attributes> </global-tag-attributes>]

You can declare attributes here that are global to all different tags.
Example:
<attribute name="id"/>

The id attribute is global to many different tags.

Tags to Encode [<tags-to-encode> </tags-to-encode>]

You can declare tags here that will not be removed, filtered, validated or truncated, but encoded using HTML entities.
Example:
<tag>g</tag>

The g tag doesn’t actually do anything, but it isn’t malicious either so we can encode it, rather than remove it.

Tag Rules [<tag-rules> </tag-rules>]

You can define parsing rules here that will be used for each tag individually. The following section shows what happens to tags according to what actions AntiSamy has decided to perform on it. This will help understand how to build your own tag-rules rules.

  • remove – Will remove entire tag when it encountered.
  • validate – Validate the tag attributes based on defined rules.
  • filter – Filter out the tag and children and preserve the content.
  • truncate – Remove the content and children but keep the tag.

Default: Behavior when the tag is not listed in <tag-rules>
Input:

<anewtag id=”newtag” anewattrib=”attrib”>Text goes here.

Output:
<div>Text goes <b>here</b>.</div>

remove: Behavior when the tag-rule action is set to “remove” for given tag. Tag is deleted with all of its child text.
Input:

<anewtag id=”newtag” anewattrib=”attrib”>Text goes here.

Output:
<div></div>

filter: Behavior when the tag-rule action is set to “filter” for given tag. Delete tag, but keep its child text.
Input:

<anewtag id=”newtag” anewattrib=”attrib”>Text goes here.

Output:
<div>Text goes <b>here</b>.</div>

validate: Behavior when the tag-rule action is set to “validate” for given tag. Verify that its attributes and children elements follow rules defined in policy file.
Input:

<anewtag id=”newtag” anewattrib=”attrib”>Text goes here.

Output:

truncate: Behavior when the tag rule action is set to “truncate” for given tag. Keep the element, but remove all attributes.
Input:

Output:
<div><anewtag >Text goes <b>here</b>.</anewtag></div>

Example:
<tag name="html" action="validate"/>
The html tag is included in the clean output after it has been validated that it has been properly used.

CSS Rules [<css-rules> </css-rules>]

You can define parsing rules here that will be used for each CSS property individually. Only CSS defined in this section is allowed.
Example:

The CSS width property is allowed only when it matches these rules when used. The value of the width element must be a length, percentage or one of the two literal values “auto” or “inherit”.

 
We use cookies to provide you with a better experience. By using our website you agree to the use of cookies as described in our Privacy Policy.