public interface IProcessActivity extends IVersionActivity
| Modifier and Type | Field and Description |
|---|---|
static String |
_rcsid |
BAD_URL, EXCLUDED_CONTENT, EXCLUDED_DATE, EXCLUDED_LENGTH, EXCLUDED_MIMETYPE, EXCLUDED_URL, NULL_URL| Modifier and Type | Method and Description |
|---|---|
void |
addDocumentReference(String documentIdentifier)
Add a document description to the current job's queue.
|
void |
addDocumentReference(String documentIdentifier,
String parentIdentifier,
String relationshipType)
Add a document description to the current job's queue.
|
void |
addDocumentReference(String documentIdentifier,
String parentIdentifier,
String relationshipType,
String[] dataNames,
Object[][] dataValues)
Add a document description to the current job's queue.
|
void |
addDocumentReference(String documentIdentifier,
String parentIdentifier,
String relationshipType,
String[] dataNames,
Object[][] dataValues,
Long originationTime)
Add a document description to the current job's queue.
|
void |
addDocumentReference(String documentIdentifier,
String parentIdentifier,
String relationshipType,
String[] dataNames,
Object[][] dataValues,
Long originationTime,
String[] prereqEventNames)
Add a document description to the current job's queue.
|
boolean |
checkDocumentNeedsReindexing(String documentIdentifier,
String newVersionString)
Check if a document needs to be reindexed, based on a computed version string.
|
boolean |
checkDocumentNeedsReindexing(String documentIdentifier,
String componentIdentifier,
String newVersionString)
Check if a document needs to be reindexed, based on a computed version string.
|
void |
deleteDocument(String documentIdentifier)
Delete the specified document permanently from the search engine index, and from the status table,
along with all its components.
|
void |
deleteDocument(String documentIdentifier,
String version)
Deprecated.
|
void |
ingestDocument(String documentIdentifier,
String version,
String documentURI,
RepositoryDocument data)
Deprecated.
|
void |
ingestDocumentWithException(String documentIdentifier,
String version,
String documentURI,
RepositoryDocument data)
Ingest the current document.
|
void |
ingestDocumentWithException(String documentIdentifier,
String componentIdentifier,
String version,
String documentURI,
RepositoryDocument data)
Ingest the current document.
|
void |
noDocument(String documentIdentifier,
String version)
Remove the specified document from the search engine index, and update the
recorded version information for the document.
|
void |
noDocument(String documentIdentifier,
String componentIdentifier,
String version)
Remove the specified document from the search engine index, and update the
recorded version information for the document.
|
void |
recordDocument(String documentIdentifier,
String version)
Record a document version, WITHOUT reindexing it, or removing it.
|
void |
recordDocument(String documentIdentifier,
String componentIdentifier,
String version)
Record a document version, WITHOUT reindexing it, or removing it.
|
void |
removeDocument(String documentIdentifier)
Remove the specified document primary component permanently from the search engine index,
and from the status table.
|
void |
retainAllComponentDocument(String documentIdentifier)
Retain all existing document components of a primary document.
|
void |
retainDocument(String documentIdentifier,
String componentIdentifier)
Retain existing document component.
|
void |
setDocumentOriginationTime(String documentIdentifier,
Long originationTime)
Override a document's origination time.
|
void |
setDocumentScheduleBounds(String documentIdentifier,
Long lowerRecrawlBoundTime,
Long upperRecrawlBoundTime,
Long lowerExpireBoundTime,
Long upperExpireBoundTime)
Override the schedule for the next time a document is crawled.
|
recordActivitybeginEventSequence, completeEventSequence, retryDocumentProcessingcreateConnectionSpecificString, createGlobalString, createJobSpecificStringcheckJobStillActivecheckDateIndexable, checkDocumentIndexable, checkLengthIndexable, checkMimeTypeIndexable, checkURLIndexableretrieveParentData, retrieveParentDataAsFilesstatic final String _rcsid
boolean checkDocumentNeedsReindexing(String documentIdentifier, String newVersionString) throws ManifoldCFException
documentIdentifier - is the document identifier.newVersionString - is the newly-computed version string.ManifoldCFExceptionboolean checkDocumentNeedsReindexing(String documentIdentifier, String componentIdentifier, String newVersionString) throws ManifoldCFException
documentIdentifier - is the document identifier.componentIdentifier - is the component document identifier, if any.newVersionString - is the newly-computed version string.ManifoldCFExceptionvoid addDocumentReference(String documentIdentifier, String parentIdentifier, String relationshipType, String[] dataNames, Object[][] dataValues, Long originationTime, String[] prereqEventNames) throws ManifoldCFException
documentIdentifier - is the local document identifier to add (for the connector that
fetched the document).parentIdentifier - is the document identifier that is considered to be the "parent"
of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
MUST be present in the case of carrydown information.relationshipType - is the string describing the kind of relationship described by this
reference. This must be one of the strings returned by the IRepositoryConnector method
"getRelationshipTypes()". May be null.dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null.
The type of each object must either be a String, or a CharacterInput.originationTime - is the time, in ms since epoch, that the document originated. Pass null if none or unknown.prereqEventNames - are the names of the prerequisite events which this document requires prior to processing. Pass null if none.ManifoldCFExceptionvoid addDocumentReference(String documentIdentifier, String parentIdentifier, String relationshipType, String[] dataNames, Object[][] dataValues, Long originationTime) throws ManifoldCFException
documentIdentifier - is the document identifier to add (for the connector that
fetched the document).parentIdentifier - is the document identifier that is considered to be the "parent"
of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
MUST be present in the case of carrydown information.relationshipType - is the string describing the kind of relationship described by this
reference. This must be one of the strings returned by the IRepositoryConnector method
"getRelationshipTypes()". May be null.dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null.
The type of each object must either be a String, or a CharacterInput.originationTime - is the time, in ms since epoch, that the document originated. Pass null if none or unknown.ManifoldCFExceptionvoid addDocumentReference(String documentIdentifier, String parentIdentifier, String relationshipType, String[] dataNames, Object[][] dataValues) throws ManifoldCFException
documentIdentifier - is the document identifier to add (for the connector that
fetched the document).parentIdentifier - is the document identifier that is considered to be the "parent"
of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.
MUST be present in the case of carrydown information.relationshipType - is the string describing the kind of relationship described by this
reference. This must be one of the strings returned by the IRepositoryConnector method
"getRelationshipTypes()". May be null.dataNames - is the list of carry-down data from the parent to the child. May be null. Each name is limited to 255 characters!dataValues - are the values that correspond to the data names in the dataNames parameter. May be null only if dataNames is null.
The type of each object must either be a String, or a CharacterInput.ManifoldCFExceptionvoid addDocumentReference(String documentIdentifier, String parentIdentifier, String relationshipType) throws ManifoldCFException
documentIdentifier - is the document identifier to add (for the connector that
fetched the document).parentIdentifier - is the document identifier that is considered to be the "parent"
of this identifier. May be null, if no hopcount filtering desired for this kind of relationship.relationshipType - is the string describing the kind of relationship described by this
reference. This must be one of the strings returned by the IRepositoryConnector method
"getRelationshipTypes()". May be null.ManifoldCFExceptionvoid addDocumentReference(String documentIdentifier) throws ManifoldCFException
documentIdentifier - is the document identifier to add (for the connector that
fetched the document).ManifoldCFExceptionvoid ingestDocumentWithException(String documentIdentifier, String version, String documentURI, RepositoryDocument data) throws ManifoldCFException, ServiceInterruption, IOException
documentIdentifier - is the document's identifier.version - is the version of the document, as reported by the getDocumentVersions() method of the
corresponding repository connector.documentURI - is the URI to use to retrieve this document from the search interface (and is
also the unique key in the index).data - is the document data. The data is closed after ingestion is complete.IOException - only when data stream reading fails.ManifoldCFExceptionServiceInterruptionvoid ingestDocumentWithException(String documentIdentifier, String componentIdentifier, String version, String documentURI, RepositoryDocument data) throws ManifoldCFException, ServiceInterruption, IOException
documentIdentifier - is the document's identifier.componentIdentifier - is the component document identifier, if any.version - is the version of the document, as reported by the getDocumentVersions() method of the
corresponding repository connector.documentURI - is the URI to use to retrieve this document from the search interface (and is
also the unique key in the index).data - is the document data. The data is closed after ingestion is complete.IOException - only when data stream reading fails.ManifoldCFExceptionServiceInterruption@Deprecated void ingestDocument(String documentIdentifier, String version, String documentURI, RepositoryDocument data) throws ManifoldCFException, ServiceInterruption
documentIdentifier - is the document's identifier.version - is the version of the document, as reported by the getDocumentVersions() method of the
corresponding repository connector.documentURI - is the URI to use to retrieve this document from the search interface (and is
also the unique key in the index).data - is the document data. The data is closed after ingestion is complete.
NOTE: Any data stream IOExceptions will be converted to ManifoldCFExceptions and ServiceInterruptions
according to standard best practices.ManifoldCFExceptionServiceInterruptionvoid noDocument(String documentIdentifier, String version) throws ManifoldCFException, ServiceInterruption
documentIdentifier - is the document's local identifier.version - is the version string to be recorded for the document.ManifoldCFExceptionServiceInterruptionvoid noDocument(String documentIdentifier, String componentIdentifier, String version) throws ManifoldCFException, ServiceInterruption
documentIdentifier - is the document's local identifier.componentIdentifier - is the component document identifier, if any.version - is the version string to be recorded for the document.ManifoldCFExceptionServiceInterruptionvoid removeDocument(String documentIdentifier) throws ManifoldCFException, ServiceInterruption
documentIdentifier - is the document's identifier.ManifoldCFExceptionServiceInterruptionvoid retainDocument(String documentIdentifier, String componentIdentifier) throws ManifoldCFException
documentIdentifier - is the document's identifier.componentIdentifier - is the component document identifier, which cannot be null.ManifoldCFExceptionvoid retainAllComponentDocument(String documentIdentifier) throws ManifoldCFException
documentIdentifier - is the document's identifier.ManifoldCFExceptionvoid recordDocument(String documentIdentifier, String version) throws ManifoldCFException
documentIdentifier - is the document identifier.version - is the document version.ManifoldCFExceptionvoid recordDocument(String documentIdentifier, String componentIdentifier, String version) throws ManifoldCFException
documentIdentifier - is the document identifier.componentIdentifier - is the component document identifier, if any.version - is the document version.ManifoldCFExceptionvoid deleteDocument(String documentIdentifier) throws ManifoldCFException
documentIdentifier - is the document's identifier.ManifoldCFException@Deprecated void deleteDocument(String documentIdentifier, String version) throws ManifoldCFException, ServiceInterruption
documentIdentifier - is the document's local identifier.version - is the version string to be recorded for the document.ManifoldCFExceptionServiceInterruptionvoid setDocumentScheduleBounds(String documentIdentifier, Long lowerRecrawlBoundTime, Long upperRecrawlBoundTime, Long lowerExpireBoundTime, Long upperExpireBoundTime) throws ManifoldCFException
documentIdentifier - is the document's identifier.lowerRecrawlBoundTime - is the time in ms since epoch that the reschedule time should not fall BELOW, or null if none.upperRecrawlBoundTime - is the time in ms since epoch that the reschedule time should not rise ABOVE, or null if none.lowerExpireBoundTime - is the time in ms since epoch that the expire time should not fall BELOW, or null if none.upperExpireBoundTime - is the time in ms since epoch that the expire time should not rise ABOVE, or null if none.ManifoldCFExceptionvoid setDocumentOriginationTime(String documentIdentifier, Long originationTime) throws ManifoldCFException
documentIdentifier - is the document's identifier.originationTime - is the document's origination time, or null if unknown.ManifoldCFException