6.1. Hash Filter
- Plugin Key
mediatypeis a media type like text/html
- Value Type
- Plugin Value Type
The string is the fully-qualified name of a Java class implementing the
<entry> <string>text/html_filter_factory</string> <string>edu.example.plugin.publisherx.PublisherXHtmlHashFilterFactory</string> </entry>
To canonicalize content before comparison between nodes in the LOCKSS audit and repair protocol, a plugin can define a hash filter for each affected media type. The goal is to pre-process content so that it is fit for a logical comparison between nodes, even if different nodes do not have byte-identical versions. This occurs frequently in HTML content that has personalizations ("You are logged in as..."), advertising, and other variable content ("You may also be interested in...", "Top 10 viewed articles this week...", "Recently added articles...") other than the main content. It can be needed for other media types like PDF and RIS because of timestamping, watermarking, and other dynamic server behaviors.
org.lockss.plugin.FilterFactoryinterface defines a
createFilteredInputStreammethod that accepts an
InputStreamof the URL's raw content, and a string representing the encoding, and returns an ``InputStream` of the canonicalized byte stream, which does not need to be a valid object of that media type (it is only used to compute a checksum).