Package org.apache.nutch.parse
Class HTMLMetaTags
- java.lang.Object
-
- org.apache.nutch.parse.HTMLMetaTags
-
public class HTMLMetaTags extends Object
This class holds the information about HTML "meta" tags extracted from a page. Some special tags have convenience methods for easy checking.
-
-
Constructor Summary
Constructors Constructor Description HTMLMetaTags()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description URLgetBaseHref()MetadatagetGeneralTags()PropertiesgetHttpEquivTags()booleangetNoCache()Get the current value ofnoCache.booleangetNoFollow()Get the current value ofnoFollow.booleangetNoIndex()Get the current value ofnoIndex.booleangetRefresh()Get the current value ofrefresh.URLgetRefreshHref()intgetRefreshTime()voidreset()Sets all boolean values tofalse.voidsetBaseHref(URL baseHref)Sets thebaseHref.voidsetCache()SetsnoCachetofalse.voidsetFollow()SetsnoFollowtofalse.voidsetIndex()SetsnoIndextofalse.voidsetNoCache()SetsnoCachetotrue.voidsetNoFollow()SetsnoFollowtotrue.voidsetNoIndex()SetsnoIndextotrue.voidsetRefresh(boolean refresh)Setsrefreshto the supplied value.voidsetRefreshHref(URL refreshHref)Sets therefreshHref.voidsetRefreshTime(int refreshTime)Sets therefreshTime.StringtoString()
-
-
-
Method Detail
-
reset
public void reset()
Sets all boolean values tofalse. Clears all other tags.
-
setNoFollow
public void setNoFollow()
SetsnoFollowtotrue.
-
setFollow
public void setFollow()
SetsnoFollowtofalse.
-
setNoIndex
public void setNoIndex()
SetsnoIndextotrue.
-
setIndex
public void setIndex()
SetsnoIndextofalse.
-
setNoCache
public void setNoCache()
SetsnoCachetotrue.
-
setCache
public void setCache()
SetsnoCachetofalse.
-
setRefresh
public void setRefresh(boolean refresh)
Setsrefreshto the supplied value.- Parameters:
refresh- value to set
-
setBaseHref
public void setBaseHref(URL baseHref)
Sets thebaseHref.- Parameters:
baseHref- value to set
-
setRefreshHref
public void setRefreshHref(URL refreshHref)
Sets therefreshHref.- Parameters:
refreshHref- value to set
-
setRefreshTime
public void setRefreshTime(int refreshTime)
Sets therefreshTime.- Parameters:
refreshTime- value to set
-
getNoIndex
public boolean getNoIndex()
Get the current value ofnoIndex.- Returns:
- true if no index is desired, false otherwise
-
getNoFollow
public boolean getNoFollow()
Get the current value ofnoFollow.- Returns:
- true if no follow is desired, false otherwise
-
getNoCache
public boolean getNoCache()
Get the current value ofnoCache.- Returns:
- true if no cache is desired, false otherwise
-
getRefresh
public boolean getRefresh()
Get the current value ofrefresh.- Returns:
- true if refresh is desired, false otherwise
-
getBaseHref
public URL getBaseHref()
- Returns:
- the
baseHref, if set, ornullotherwise.
-
getRefreshHref
public URL getRefreshHref()
- Returns:
- the
refreshHref, if set, ornullotherwise. The value may be invalid ifgetRefresh()returnsfalse.
-
getRefreshTime
public int getRefreshTime()
- Returns:
- the current value of
refreshTime. The value may be invalid ifgetRefresh()returnsfalse.
-
getGeneralTags
public Metadata getGeneralTags()
- Returns:
- all collected values of the general meta tags. Property names are tag names, property values are "content" values.
-
getHttpEquivTags
public Properties getHttpEquivTags()
- Returns:
- all collected values of the "http-equiv" meta tags. Property names are tag names, property values are "content" values.
-
-