2.4. Plugin Configuration Parameters#

Plugin Key

plugin_config_props

Plugin Value Type

List of <org.lockss.daemon.ConfigParamDescr> stanzas

Sample
<entry>
  <string>plugin_config_props</string>
  <list>
    <org.lockss.daemon.ConfigParamDescr>
      <key>base_url</key>
      <displayName>Base URL</displayName>
      <description>Usually of the form http://&lt;journal-name&gt;.com/</description>
      <type>3</type>
      <size>40</size>
      <definitional>true</definitional>
      <defaultOnly>false</defaultOnly>
    </org.lockss.daemon.ConfigParamDescr>
    <org.lockss.daemon.ConfigParamDescr>
      <key>journal_id</key>
      <displayName>Journal Identifier</displayName>
      <description>Identifier for journal (often used as part of file names)</description>
      <type>1</type>
      <size>40</size>
      <definitional>true</definitional>
      <defaultOnly>false</defaultOnly>
    </org.lockss.daemon.ConfigParamDescr>
    <org.lockss.daemon.ConfigParamDescr>
      <key>volume_name</key>
      <displayName>Volume Name</displayName>
      <type>1</type>
      <size>20</size>
      <definitional>true</definitional>
      <defaultOnly>false</defaultOnly>
    </org.lockss.daemon.ConfigParamDescr>
  </list>
</entry>
Description

A list of configuration parameter descriptors, defining the placeholders in use in the plugin's rules and code.

A plugin's rules and code (start and permission URLs, crawl rules, substance patterns...) are made general by identifying placeholders for AU-specific values and substituting them later. These placeholders for variable values are called plugin configuration parameters.

Defining the necessary configuration parameters for a given plugin comes mostly from studying the URL structure of the preservation target, finding patterns, and identifying the parts of those patterns that differ between Archival Units.

Structure

Each plugin configuration parameter is represented by a <org.lockss.daemon.ConfigParamDescr> stanza that looks like this:

<org.lockss.daemon.ConfigParamDescr>
  <key>...</key>
  <type>...</type>
  <displayName>...</displayName>
  <description>...</description>
  <size>...</size>
  <definitional>...</definitional>  <!-- default: true -->
  <defaultOnly>...</defaultOnly>    <!-- default: false -->
</org.lockss.daemon.ConfigParamDescr>

Only <key> and <type> are required.

Each <org.lockss.daemon.ConfigParamDescr> stanza contains the following important elements:

  • <key>: the parameter key, an identifier for the configuration parameter, standing in as a placeholder for the AU-specific value in rules and code. Example: base_url for a base URL (URL prefix common to all or most URLs in an AU).

  • <type>: the parameter type, an integer describing the type of value the configuration parameter represents (string, integer, etc.). See Parameter Types below for details.

  • <definitional>: whether the parameter is a definitional parameter or non-definitional parameter, expressed as the booleans true or false. Most parameters are definitional (true), meaning the parameter is part of the set of parameters that together form the unique identity of the AU.

  • <defaultOnly>: set to false in almost all cases.

The other elements only play a role in the Manual Add/Edit screen in the LOCKSS Web user interface:

  • <displayName>: the parameter display name, a user-friendly name for the parameter in in the Manual Add/Edit screen.

  • <description>: the parameter description, a user-friendly text string describing the parameter and giving an example value in the Manual Add/Edit screen.

  • <size>: the parameter display size in characters in the Manual Add/Edit screen.

2.4.1. Parameter Types#

The following plugin configuration parameter types are defined in the LOCKSS software:

Parameter Type Code

Parameter Type

1

String

2

Integer

3

URL

4

Year

5

Boolean

6

Non-Negative Integer

7

String Range

8

Numeric Range

9

Set

10

User Credentials

11

Long Integer

12

Time Interval

2.4.1.1. String#

Parameter Type Code

1

Description

A non-empty string.

Built-In Examples

Volume Name, Journal Directory, Journal Abbreviation, Journal Identifier, Journal ISSN, Publisher Name, OAI Spec, Crawl Proxy, Crawl Test Substance Threshold

2.4.1.2. URL#

Parameter Type Code

3

Description

Used most frequently as a URL prefix. This must be a valid URL string according to Java's java.net.URL constructor (https://docs.oracle.com/javase/8/docs/api/java/net/URL.html#URL-java.lang.String-).

Built-In Examples

Base URL, Second Base URL, OAI Request URL

See Also

Derivative URL Parameters

2.4.1.3. User Credentials#

Parameter Type Code

10

Description

A colon-separated username and password, for instance myuser:mypass.

Built-In Examples

Username and Password

2.4.1.4. Integer#

Parameter Type Code

2

Description

The integer can be negative. Represented internally as a 32-bit integer.

2.4.1.5. Non-Negative Integer#

Parameter Type Code

6

Description

The integer can be zero but cannot be negative. Represented internally as a 32-bit integer.

Built-In Examples

Volume Number

2.4.1.6. Long Integer#

Parameter Type Code

11

Description

The value can be negative. Represented internally as a 64-bit integer.

2.4.1.7. Year#

Parameter Type Code

4

Description

A four-digit year, or the special value 0 to denote an unspecified year.

Built-In Examples

Year

See Also

Derivative Year Parameters

2.4.1.8. Time Interval#

Parameter Type Code

12

Description

Specified as a long integer followed by a suffix indicating a time unit: ms for milliseconds, s for seconds, m for minutes, h for hours, d for days, w for weeks (7 days), y for years (365 days). If there is no suffix, the default interpretation is milliseconds. The time unit suffixes are case-insensitive.

Built-In Examples

New Content Crawl Interval

2.4.1.9. String Range#

Parameter Type Code

7

Description

The range is specified with two strings separated by a dash (-) and is inclusive. If there is a single string with no dash, the range is interpreted to contain only that string.

Built-In Examples

Issue Range

2.4.1.10. Numeric Range#

Parameter Type Code

8

Description

The range is specified with two integers separated by a dash (-). If there is a single integer, the range is interpreted to contain only that integer.

Built-In Examples

Numeric Issue Range

2.4.1.11. Set#

Parameter Type Code

9

Description

Specified as a comma-separated list of strings, with whitespace surrounding strings ignored, and empty strings discarded.

The string {n,m}, where n and m are integers, will be replaced by all the integers in the range from n to m inclusive. For instance, the set {2002-2005}, 2003Supp, 2004Supp is equivalent to 2002, 2003, 2003Supp, 2004, 2004Supp, 2005.

Built-In Examples

Issue Set

2.4.1.12. Boolean#

Parameter Type Code

5

Description

The canonical values are true or false, although yes, on and 1 are recognized as true, and no, off and 0 are recognized as false. All these value strings are case-insensitive.

Built-In Examples

AU Down, AU Off-Limits, AU Closed

2.4.2. Built-In Definitional Parameters#

The LOCKSS software defines a number of built-in definitional parameters.

Definitional parameters give an AU its identity -- change the value for a definitional parameter and you will be describing a different slice of content (different year, different directory, etc.).

2.4.2.1. Base URL#

Parameter Key

base_url

Parameter Type

URL

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>base_url</key>
  <type>3</type>
  <displayName>Base URL</displayName>
  <description>Usually of the form http://&lt;journal-name&gt;.com/</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.2. Second Base URL#

Parameter Key

base_url2

Parameter Type

URL

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>base_url2</key>
  <type>3</type>
  <displayName>Second Base URL</displayName>
  <description>Use if AU spans two hosts</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.3. Year#

Parameter Key

year

Parameter Type

Year

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>year</key>
  <type>4</type>
  <displayName>Year</displayName>
  <description>Four digit year (e.g., 2004)</description>
  <size>4</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.4. Volume Number#

Parameter key

volume

Parameter Type

Non-Negative Integer

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>volume</key>
  <type>6</type>
  <displayName>Volume No.</displayName>
  <description>Numeric volume number, e.g. 7</description>
  <size>8</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.5. Volume Name#

Parameter Key

volume_name

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>volume_name</key>
  <type>1</type>
  <displayName>Volume Name</displayName>
  <description>Volume name, e.g. 23A</description>
  <size>20</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.6. Issue Range#

Parameter Key

issue_range

Parameter Type

String Range

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>issue_range</key>
  <type>7</type>
  <displayName>Issue Range</displayName>
  <description>A Range of issues in the form: aaa-zzz</description>
  <size>20</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.7. Numeric Issue Range#

Parameter Key:

num_issue_range

Parameter Type

Numeric Range

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>num_issue_range</key>
  <displayName>Numeric Issue Range</displayName>
  <description>A Range of issues in the form: min-max</description>
  <type>8</type>
  <size>20</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.8. Issue Set#

Parameter Key

issue_set

Parameter Type

Set

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>issue_set</key>
  <type>9</type>
  <displayName>Issue Set</displayName>
  <description>A comma delimited list of issues. (eg issue1, issue2)</description>
  <size>20</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.9. Journal Directory#

Parameter Key

journal_dir

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>journal_dir</key>
  <type>1</type>
  <displayName>Journal Directory</displayName>
  <description>Directory name for journal content (i.e. 'american_imago').</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.10. Journal Abbreviation#

Parameter Key

journal_abbr

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>journal_abbr</key>
  <type>1</type>
  <displayName>Journal Abbreviation</displayName>
  <description>Abbreviation for journal (often used as part of file names).</description>
  <size>10</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.11. Journal Identifier#

Parameter Key

journal_id

Parameter type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>journal_id</key>
  <type>1</type>
  <displayName>Journal Identifier</displayName>
  <description>Identifier for journal (often used as part of file names).</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.12. Journal ISSN#

Parameter Key

journal_issn

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>journal_issn</key>
  <type>1</type>
  <displayName>Journal ISSN</displayName>
  <description>International Standard Serial Number.</description>
  <size>20</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.13. Publisher Name#

Note

Use of this parameter is not recommended. It is unlikely the publisher name will appear in URLs, as opposed to a publisher abbreviation or code.

Parameter Key

publisher_name

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>publisher_name</key>
  <type>1</type>
  <displayName>Publisher Name</displayName>
  <description>Publisher Name for Archival Unit</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.14. OAI Request URL#

Parameter Key

oai_request_url

Parameter Type

URL

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>oai_request_url</key>
  <type>3</type>
  <displayName>OAI Request URL</displayName>
  <description>Usually of the form http://&lt;journal-name&gt;.com/</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.2.15. OAI Spec#

Parameter Key

oai_spec

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>oai_spec</key>
  <type>1</type>
  <displayName>OAI Spec</displayName>
  <description>Spec for journal in the OAI crawl</description>
  <size>40</size>
  <definitional>true</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>

2.4.3. Built-In Non-Definitional Parameters#

The LOCKSS software also defines a number of non-definitional parameters.

Non-definitional parameters are necessary as placeholders in plugin rules and code, but they do not contribute to the AU's identity -- you may need to change the value of a non-definitional parameter but it will not change which slice of content the AU corresponds to.

Some non-definitional parameters might be listed in the plugin itself, like the user_pass parameter for user credentials, if all AUs are expected to supply a value for the parameter, but most others are involved in the lifecycle of an AU and need not be listed in the plugin, like the pub_down parameter for AUs that are not currently allowed to crawl.

2.4.3.1. Username and Password#

Parameter Key

user_pass

Parameter Type

User Credentials

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>user_pass</key>
  <type>10</type>
  <displayName>Username:Password</displayName>
  <description>Colon-separated username and password string, e.g. myuser:mypass</description>
  <size>30</size>
  <definitional>false</definitional>
  <defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

Some harvesting processes may require user credentials (username and password). A non-definitional parameter is needed because the username and password might be different for different harvesting nodes, or may change over time, without changing the identity of the AU (for instance its year).

2.4.3.2. AU Down#

Parameter Key

pub_down

Parameter Type

Boolean

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>pub_down</key>
  <type>5</type>
  <displayName>Pub Down</displayName>
  <description>If true, AU is no longer available from the publisher</description>
  <size>4</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used routinely in the title database files of LOCKSS networks, but does not need to appear explicitly in plugins.

When this parameter value is supplied as true for an AU, the AU is considered to be "down", meaning that it is currently unavailable from its source and should not attempt to crawl or recrawl.

The name pub_down, for "publisher down", reflects the idea that the entire publisher site (content provider) might be unavailable, but this parameter can be used to mark individual AUs as being down outside the context of an entire content provider being unavailable.

2.4.3.3. AU Off-Limits#

Parameter Key

pub_never

Parameter Type

Boolean

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>pub_never</key>
  <type>5</type>
  <displayName>Pub Never</displayName>
  <description>If true, don't try to access any content from publisher</description>
  <size>4</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used routinely in the title database files of LOCKSS networks, but does not need to appear explicitly in plugins.

When this parameter value is supplied as true for an AU, the AU is considered to be "off-limits", meaning that the LOCKSS software will not satisfy a proxy request for a URL it determines to be in this AU by going to the original Web site.

2.4.3.4. AU Closed#

Parameter Key

au_closed

Parameter Type

Boolean

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>au_closed</key>
  <type>5</type>
  <displayName>AU Closed</displayName>
  <description>If true, AU is complete, no more content will be added</description>
  <size>4</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used routinely in the title database files of LOCKSS networks, but does not need to appear explicitly in plugins.

When this parameter value is supplied as true for an AU, the AU is marked as "closed", meaning it is considered that no more content will be added to it in the future.

2.4.3.5. Crawl Proxy#

Parameter Key

crawl_proxy

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>crawl_proxy</key>
  <type>1</type>
  <displayName>Crawl Proxy</displayName>
  <description>If set to host:port, crawls of this AU will be proxied. If set to DIRECT, crawls will not be proxied, even if the LOCKSS node has been configured with a default crawl proxy.</description>
  <size>40</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used routinely in the title database files of LOCKSS networks, but does not need to appear explicitly in plugins.

When this parameter value is supplied as a host:port pair (for example proxy.myuniversity.edu:8080) for an AU, crawls of the AU will be proxied through the given proxy. When this parameter value is supplied as the special value DIRECT for an AU, crawls of the AU will not be proxied, even if the LOCKSS node is configured to always use a crawl proxy.

2.4.3.6. New Content Crawl Interval#

Parameter Key

nc_interval

Parameter Type

Time Interval

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>nc_interval</key>
  <type>12</type>
  <displayName>Crawl Interval</displayName>
  <description>The interval at which the AU should crawl the publisher site.</description>
  <size>10</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used routinely in the title database files of LOCKSS networks, but does not need to appear explicitly in plugins.

When this parameter value is supplied as a time interval for an AU, crawls of the AU will be attempted with the given requested interval rather than the LOCKSS node's default new content crawl interval.

2.4.3.7. Crawl Test Substance Threshold#

Parameter Key

crawl_test_substance_threshold

Parameter Type

String

Canonical Form
<org.lockss.daemon.ConfigParamDescr>
  <key>crawl_test_substance_threshold</key>
  <type>1</type>
  <displayName>Crawl Test Substance Threshold</displayName>
  <description>Minimum number of substance URLs necessary for successful abbreviated crawl test.</description>
  <size>20</size>
  <definitional>false</definitional>
  <defaultOnly>true</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
Description

This non-definitional parameter is used in special circumstances, for networks set up to perform abbreviated test crawls.

2.4.4. Derivative Parameters#

For parameters of type URL and Year, the system automatically brings into existence derivative parameters with special names, as if those parameters had also been defined by the plugin.

Tip

Derivative parameters have fallen out of favor. The contemporary way to achieve the same effect is through parameter functors.

2.4.4.1. Derivative URL Parameters#

For any parameter of type URL with key urlkey, the following derivative parameters are automatically defined:

  • urlkey_host of type String, whose value is just the host portion of the corresponding URL value. For example, if base_url has a value of https://www.publisher.com/jabc/, base_url_host has a value of www.publisher.com.

  • urlkey_path of type String, whose value is just the path portion of the corresponding URL value. For example, if base_url has a value of https://www.publisher.com/jabc/, base_url_path has a value of /jabc/.

2.4.4.2. Derivative Year Parameters#

For any parameter of type Year with key yearkey, the following derivative parameter is automatically defined:

  • au_short_yearkey of type Integer, whose value is the corresponding year value modulo 100. For example, if year has a value of 1998, au_short_year has a value of 98; if year has a value of 2002, au_short_year has a value of 2 (the integer 2, not the string 02.

    Tip

    In many cases, what is useful is the zero-padded, two-character string from the derivative short year, not the potentially single-digit integer; use %02d in the printf format string.