@NotThreadSafe public class HtmlPolicyBuilder extends Object
HtmlSanitizer
.
To create a policy, first construct an instance of this class; then call
allow…
methods to turn on tags, attributes, and other
processing modes; and finally call build(renderer)
or
toFactory()
.
// Define the policy. Function<HtmlStreamEventReceiver, HtmlSanitizer.Policy> policy = new HtmlPolicyBuilder() .allowElements("a", "p") .allowAttributes("href").onElements("a") .toFactory(); // Sanitize your output. HtmlSanitizer.sanitize(myHtml, policy.apply(myHtmlStreamRenderer));
Embedded URLs are filtered by
protocol
.
There is a canned policy
so you can easily white-list widely used policies that don't violate the
current pages origin. See "Customization" below for ways to do further
filtering. If you allow links it might be worthwhile to
require
rel=nofollow
.
This class simply throws out all embedded JS. Use a custom element or attribute policy to allow through signed or otherwise known-safe code. Check out the Caja project if you need a way to contain third-party JS.
This class does not attempt to faithfully parse and sanitize CSS.
It does provide one
styling option
that allows through a few CSS properties that allow textual styling, but that
disallow image loading, history stealing, layout breaking, code execution,
etc.
You can easily do custom processing on tags and attributes by supplying your
own element policy
or
attribute policy
when calling
allow…
.
E.g. to convert headers into <div>
s, you could use an element policy
new HtmlPolicyBuilder() .allowElement( new ElementPolicy() { public String apply(String elementName, List<String> attributes){ attributes.add("class"); attributes.add("header-" + elementName); return "div"; } }, "h1", "h2", "h3", "h4", "h5", "h6") .build(outputChannel)
Throughout this class, several rules hold:
disallow…
methods, but those reverse
allows instead of rolling back overly permissive defaults.
build(org.owasp.html.HtmlStreamEventReceiver)
.
Allows or disallows after build
is called have no
effect on the already built policy.
This class is not thread-safe. The resulting policy will not violate its security guarantees as a result of race conditions, but is not thread safe because it maintains state to track whether text inside disallowed elements should be suppressed.
The resulting policy can be reused, but if you use the
toFactory()
method instead of build(org.owasp.html.HtmlStreamEventReceiver)
, then
binding policies to output channels is cheap so there's no need.
Modifier and Type | Class and Description |
---|---|
class |
HtmlPolicyBuilder.AttributeBuilder
Builds the relationship between attributes, the values that they may have,
and the elements on which they may appear.
|
Modifier and Type | Field and Description |
---|---|
static com.google.common.collect.ImmutableSet<String> |
DEFAULT_RELS_ON_TARGETTED_LINKS
These
rel attribute values leaking information to the linked site,
and prevents the linked page from redirecting your page to a phishing site
when opened from a third-party link from your site. |
static com.google.common.collect.ImmutableSet<String> |
DEFAULT_SKIP_IF_EMPTY
The default set of elements that are removed if they have no attributes.
|
Constructor and Description |
---|
HtmlPolicyBuilder() |
Modifier and Type | Method and Description |
---|---|
HtmlPolicyBuilder.AttributeBuilder |
allowAttributes(String... attributeNames)
Returns an object that lets you associate policies with the given
attributes, and allow them globally or on specific elements.
|
HtmlPolicyBuilder |
allowCommonBlockElements()
A canned policy that allows a number of common block elements.
|
HtmlPolicyBuilder |
allowCommonInlineFormattingElements()
A canned policy that allows a number of common formatting elements.
|
HtmlPolicyBuilder |
allowElements(ElementPolicy policy,
String... elementNames)
Allow the given elements with the given policy.
|
HtmlPolicyBuilder |
allowElements(String... elementNames)
Allows the named elements.
|
HtmlPolicyBuilder |
allowStandardUrlProtocols()
A canned URL protocol policy that allows
http ,
http , and mailto . |
HtmlPolicyBuilder |
allowStyling()
Convert
style="<CSS>" to sanitized CSS which allows
color, font-size, type-face, and other styling using the default schema;
but which does not allow content to escape its clipping context. |
HtmlPolicyBuilder |
allowStyling(CssSchema whitelist)
Convert
style="<CSS>" to sanitized CSS which allows
color, font-size, type-face, and other styling using the given schema. |
HtmlPolicyBuilder |
allowTextIn(String... elementNames)
Allows text content in the named elements.
|
HtmlPolicyBuilder |
allowUrlProtocols(String... protocols)
Adds to the set of protocols that are allowed in URL attributes.
|
HtmlPolicyBuilder |
allowUrlsInStyles(AttributePolicy newStyleUrlPolicy)
Allow URLs in CSS styles.
|
HtmlPolicyBuilder |
allowWithoutAttributes(String... elementNames)
Assuming the given elements are allowed, allows them to appear without
attributes.
|
HtmlSanitizer.Policy |
build(HtmlStreamEventReceiver out)
Produces a policy based on the allow and disallow calls previously made.
|
<CTX> HtmlSanitizer.Policy |
build(HtmlStreamEventReceiver out,
HtmlChangeListener<? super CTX> listener,
CTX context)
Produces a policy based on the allow and disallow calls previously made.
|
HtmlPolicyBuilder.AttributeBuilder |
disallowAttributes(String... attributeNames)
Reverse an earlier attribute
allow . |
HtmlPolicyBuilder |
disallowElements(String... elementNames)
Disallows the named elements.
|
HtmlPolicyBuilder |
disallowTextIn(String... elementNames)
Disallows text in elements with the given name.
|
HtmlPolicyBuilder |
disallowUrlProtocols(String... protocols)
Reverses a decision made by
allowUrlProtocols(java.lang.String...) . |
HtmlPolicyBuilder |
disallowWithoutAttributes(String... elementNames)
Disallows the given elements from appearing without attributes.
|
HtmlPolicyBuilder |
requireRelNofollowOnLinks()
Adds
rel=nofollow
to links. |
HtmlPolicyBuilder |
requireRelsOnLinks(String... linkValues)
|
HtmlPolicyBuilder |
skipRelsOnLinks(String... linkValues)
Opts out of some of the
DEFAULT_RELS_ON_TARGETTED_LINKS from being added
to links, and reverses pre |
PolicyFactory |
toFactory()
Like
build(org.owasp.html.HtmlStreamEventReceiver) but can be reused to create many different policies
each backed by a different output channel. |
HtmlPolicyBuilder |
withPostprocessor(HtmlStreamEventProcessor pp)
Inserts a post-processor into the pipeline between the policy and the
output sink.
|
HtmlPolicyBuilder |
withPreprocessor(HtmlStreamEventProcessor pp)
Inserts a pre-processor into the pipeline between the lexer and the policy.
|
public static final com.google.common.collect.ImmutableSet<String> DEFAULT_SKIP_IF_EMPTY
<img>
is in this set, by default, a policy will remove
<img src=javascript:alert(1337)>
because its URL is not allowed
and it has no other attributes that would warrant it appearing in the
output.public static final com.google.common.collect.ImmutableSet<String> DEFAULT_RELS_ON_TARGETTED_LINKS
rel
attribute values leaking information to the linked site,
and prevents the linked page from redirecting your page to a phishing site
when opened from a third-party link from your site.public HtmlPolicyBuilder allowElements(String... elementNames)
public HtmlPolicyBuilder disallowElements(String... elementNames)
public HtmlPolicyBuilder allowElements(ElementPolicy policy, String... elementNames)
policy
- May remove or add attributes, change the element name, or
deny the element.public HtmlPolicyBuilder allowCommonInlineFormattingElements()
public HtmlPolicyBuilder allowCommonBlockElements()
public HtmlPolicyBuilder allowTextIn(String... elementNames)
allowed elements
that can contain character data per
the HTML5 spec, but text content is not allowed by default in elements that
contain content of other kinds (like JavaScript in <script>
elements.
To write a policy that whitelists <script>
or <style>
elements, first allowTextIn("script")
.
public HtmlPolicyBuilder disallowTextIn(String... elementNames)
This is useful when an element contains text that is not meant to be
displayed to the end-user.
Typically these elements are styled display:none
in browsers'
default stylesheets, or, like <template>
contain text nodes that
are eventually for human consumption, but which are created in a separate
document fragment.
public HtmlPolicyBuilder allowWithoutAttributes(String... elementNames)
public HtmlPolicyBuilder disallowWithoutAttributes(String... elementNames)
public HtmlPolicyBuilder.AttributeBuilder allowAttributes(String... attributeNames)
public HtmlPolicyBuilder.AttributeBuilder disallowAttributes(String... attributeNames)
allow
.
For this to have an effect you must call at least one of
HtmlPolicyBuilder.AttributeBuilder.globally()
and HtmlPolicyBuilder.AttributeBuilder.onElements(java.lang.String...)
.
Attributes are disallowed by default, so there is no need to call this with a laundry list of attribute/element pairs.
public HtmlPolicyBuilder requireRelNofollowOnLinks()
rel=nofollow
to links.public HtmlPolicyBuilder requireRelsOnLinks(String... linkValues)
skipRelsOnLinks(java.lang.String...)
public HtmlPolicyBuilder skipRelsOnLinks(String... linkValues)
DEFAULT_RELS_ON_TARGETTED_LINKS
from being added
to links, and reverses prerequireRelsOnLinks(java.lang.String...)
public HtmlPolicyBuilder allowUrlProtocols(String... protocols)
Do not allow any *script
such as javascript
protocols if you might use this policy with untrusted code.
public HtmlPolicyBuilder disallowUrlProtocols(String... protocols)
allowUrlProtocols(java.lang.String...)
.public HtmlPolicyBuilder allowStandardUrlProtocols()
http
,
http
, and mailto
.public HtmlPolicyBuilder allowStyling()
style="<CSS>"
to sanitized CSS which allows
color, font-size, type-face, and other styling using the default schema;
but which does not allow content to escape its clipping context.public HtmlPolicyBuilder allowStyling(CssSchema whitelist)
style="<CSS>"
to sanitized CSS which allows
color, font-size, type-face, and other styling using the given schema.public HtmlPolicyBuilder allowUrlsInStyles(AttributePolicy newStyleUrlPolicy)
<span style="background-image: url(http://example-com.njmu.s5.bt8.net/image.png)">
.
URLs in CSS are typically loaded without user-interaction, the way links are, so a greater degree of scrutiny is warranted.
newStyleUrlPolicy
- receives URLs from the CSS that pass the allowed
protocol policies, and may return null to veto the URL or the URL
to use. URLs will be reported as content in <img src=...>
.public HtmlPolicyBuilder withPreprocessor(HtmlStreamEventProcessor pp)
public HtmlPolicyBuilder withPostprocessor(HtmlStreamEventProcessor pp)
Try doing what you want with a pre-processor instead of a post-processor but if you're thinking of doing search/replace on a sanitized string, then definitely use either a pre or post-processor instead.
public HtmlSanitizer.Policy build(HtmlStreamEventReceiver out)
out
- receives calls to open only tags allowed by
previous calls to this object.
Typically a HtmlStreamRenderer
.public <CTX> HtmlSanitizer.Policy build(HtmlStreamEventReceiver out, @Nullable HtmlChangeListener<? super CTX> listener, @Nullable CTX context)
out
- receives calls to open only tags allowed by
previous calls to this object.
Typically a HtmlStreamRenderer
.listener
- is notified of dropped tags and attributes so that
intrusion detection systems can be alerted to questionable HTML.
If null
then no notifications are sent.context
- if (listener != null)
then the context value passed
with alerts. This can be used to let the listener know from which
connection or request the questionable HTML was received.public PolicyFactory toFactory()
build(org.owasp.html.HtmlStreamEventReceiver)
but can be reused to create many different policies
each backed by a different output channel.Copyright © 2017 OWASP. All rights reserved.