TextServer
Company Structured Text Software Solutions Customers Support
Demonstrations Documentation Downloads Publications Acknowledgements



Functions that Operate on Texts

XPATH_VALUE(text, xpath)
Returns an output text corresponding to the input text, in which those nodes matching the xpath pattern which are to be marked, become marked in the output text.

MARK_UNION(text1, text2, ...)
Returns a new text being the smallest that includes each input text. This new text has as its set of marks the union of the marks occuring in all input texts. All input texts must be reference the same external document.

MARK_INTERSECT(text1, text2, ...)
Returns a new text being the smallest specified within the input text which is contained within all input texts. This new text has as its set of marks the interesection of the marks occuring in all input texts. All input texts must be reference the same external document.

MARK_EXCEPT(text1, text2)
Returns a new text being identical to text1 except that all marks contained within text 2 are absent in this new text.

COUNT_MARKS(text)
This function has been depreciated since it is now supported by the more general text_to_integer() class of functions.

KEEP_MARKS(text, integer1 [, integer2])
Returns a new text being identical to text1 but containing only the indicated set of marks, extracted from the list. Integer1 refers to the initial mark offset within the list to keep (with the first mark occurring at offset 1). Integer2 if present specifies the number of marks within the list to keep. If absent, all marks occurring at or after offset integer1 are kept.

SUBTREE(text [, integer])
Returns a new text rooted at the nth subtree identified by the marks in the input text. The desired subtree is specified by the integer, which defaults to 1. The first subtree is rooted at the earliest mark in the list of marks, the second subtree is rooted at the first mark not occuring under this first mark, and so on. All marks occurring in the input text which do occur under the specified subtree are included as the only marks in the resulting text.

MARK_TO_TEXT(text [, integer])
Returns a new text rooted at the nth textmark within the input text. This defaults to 1 if not specified. Returns NULL if the mark requested is less than 1 or greater than the number of marks currently in the text.

TEXT_TO_STRING(text , method [, tagid])
Casts the input text as a string. If not specified tagid defaults to "Marked". The method qualifies how the text is to be converted into a string. This may be any of:
default
Equivalent to omitting the string. Some default method is chosen, consistent with the method employed when text is explicitly cast as a string.
clear
Removes all markup and comments from the input text. Replaces such markup by whitespace.
nodes
Returns an XML string documenting the node numbers and offsets used within the text.
offsets
Returns an XML string documenting the text offsets and lengths within the external text.
raw
Returns the raw text spanned by the input text.
content
Returns the raw text after removing any outer tags.
tagged
Returns an XML string containing the original text, with all marked subtexts tagged with this string themselves tagged with <tagid> .. </tagid> tags.
keep
Similar to tagged but returns only those parts of the tagged text tagged with <tagid> .. </tagid> tags.
omit
Similar to tagged but returns only those parts of the tagged text not tagged with <tagid> .. </tagid> tags.
markup
Similar to raw but returns only those parts of the tagged text which constitute markup.
cmarkup
As per markup but also preserve comments.
rootname
Returns the string representing the name of the root node in the text. This will be a FILE name, a DOCTYPE name, an ELEMENT name, ATTRIBUTE name, or word.
roottype
Returns a string defining the type of the root element. This is one of:
  • FILE
  • DOCUMENT
  • BODY
  • PI
  • PI WORD
  • COMMENT
  • COMMENT WORD
  • ELEMENT
  • ATTRIBUTE
  • ATTRIBUTE VALUE
  • WORD
start
Returns the raw text from this node up to but not including the start of the next node.
prefix
Returns the namespace prefix associated with this element or attribute node else the empty string.
localname
Returns the local part of the name associated with this element or attribute else the empty string.
uri
Returns the namespace uri associated with this node, or the empty string if none.
name
Returns the original name of this element or attribute node else the empty string.
attributes
Returns the attributes in the text associated with this element else the empty string.
baseattrs
Returns all attributes in the text associated with this element except 'xmlns' attributes, else the empty string.
namespaces
Returns as XML text the set of namespaces which are applicable to this node and its descendants.
value
Returns the value associated with this attribute node else the empty string.

TEXT_TO_INTEGER(text [, string])
Returns some property of the root of the input text as a 64-bit integer value. An optional method qualifies what value is desired. This may be any of:
default
Equivalent to specifying 'node'.
node
The node number of the root of the text is returned.
span
The number of nodes spanned by the root of the text is returned. This is 0 if the root of the text is also a leaf.
tonode
The last node number spanned by this text. This is equivalent to node+span.
truespan
The number of nodes actually spanned by the root node of the text. Note that the span associated with the root node of a text may differ from the true number of nodes spanned by this root node.
offset
The physical character offset within the text is returned.
length
The number of characters in the text spanned by the specified span.
truelength
The number of characters in the text spanned by the root node, spanned by the true span of this root node.
end
The physical character offset beyond the end of the text is returned.
parent
The parent of the root node is returned. Nodeno 1 Represents the entire text. Nodeno 0 is returned for nodes that have no parents.
marks
Returns the number of marks in the text
children
Returns the number of children beneath the root.
child
Returns the sibling number of the root with respect to siblings of its parent. 0 is the left most sibling. For a parent with 'n' siblings sibling number 'n-1' would be the rightmost sibling under this parent.

TEXT_TO_DOUBLE(text [, string])
Returns some property of the root of the input text as a 64-bit integer value. An optional method qualifies what value is desired. This may be any of:
default
Equivalent to specifying 'sum'.
sum
The sum of the marks when interpreted as numeric values.
max
The maximum value of any mark when the text spanned by this mark is interpreted as a numeric value.
min
The minimuum value associated with any mark in the text.
avg
The average value associated with any mark in the text.

TEXT_ROOT(text [, integer1 [, integer2]])
Returns the input text rooted at the indicated node integer1 (or at the start of the document if absent), and if present having span integer2. Adjusts a -ve node upwards to 1, decrementing as appropriate the span. Adjusts a node greater than the max down.

ADD_MARKS(text, integer1 , integer2, ...)
Adds the specified node/span pairs as marks to the input text, provided that these marks are subsummed by the root of the input text. Otherwise the marks are ignored.

TEXT_SECTION(text, integer1, integer2)
Returns as a string the section of text beginning at offset integer1 and having length integer2.

GRAMMAR_TO_STRING(text, string)
Casts the grammar associated with the input text as a string. An optional method qualifies how the grammar is to be converted into a string. This may be any of:
default
Equivalent to omitting the string. Some default method is chosen, consistent with the method employed.
type
Returns a string documenting the type of the grammar employed within this input text.
content
Returns a string documenting the grammar associated with the text.
dtdroot
The element name which defines the root of the DTD
indexname
The name of the index file supporting access to this text
encoding
The character encoding used on the text. This is one of:
  • ASCII
  • UNICODE FFFE (low endian)
  • UNICODE FEFF (high endian)
endian
The endian architecture that the index was built on. This is one of:
  • LOW
  • HIGH
  • DAMAGED (Index is corrupt)
sourcetype
The type of input used to construct the index. This is currently one of:
  • UNKNOWN
  • FILE
  • STRING
  • NETWORK
locale
The GUID of the class (supporting the interface IQsm2Locale) used to encode the text into words, numerics, etc.
sourcename
The name of the input source from which the index was constructed if known.
internal
Returns YES if the text is stored internally within the index. Returns NO if it is stored separately.

GRAMMAR_TO_INTEGER(text [, string])
Returns some property of the grammar as a 64-bit integer value. An optional method qualifies what value is desired. This may be any of:
default
Equivalent to specifying 'sot'.
sot
Returns the start of text character offset.
eot
Returns the number of characters in the text.
maxnode
Returns the maximum node number used within the original text.
version
Version of indexer used to index text
elements
Number of elements in the grammar
attributes
Number of attributes in the grammar
blksize
The blocking factor used to index the text and grammar
indexsize
The number of bytes in the index
nodesperblock
The number of node descriptions per block
filetime
The filetime when the index was created

Cast functions

CAST(text as string)
This operation used the default TEXT_TO_STRING method to convert the text to a string.

CAST(text as integer)
This operation attempts to cast the contents of the text as an integer. A null is returned rather than an exception if this is not possible. If a run time exception is desired cast the text to a string and then the string to an integer.

CAST(text as real)
This operation attempts to cast the contents of the text as an real. A null is returned rather than an exception if this is not possible. If a run time exception is desired cast the text to a string and then the string to an real.

CAST(text as date)
This operation attempts to cast the contents of the text as an date. A null is returned rather than an exception if this is not possible. If a run time exception is desired cast the text to a string and then the string to an date.

CAST(text as time)
This operation attempts to cast the contents of the text as an time. A null is returned rather than an exception if this is not possible. If a run time exception is desired cast the text to a string and then the string to an time.

CAST(text as timestamp)
This operation attempts to cast the contents of the text as a timestamp. A null is returned rather than an exception if this is not possible. If a run time exception is desired cast the text to a string and then the string to an timestamp.

Functions that Operate on Strings

TRANSLATE_STRING(string1, string2)
The XML string contained in string1, is translated into a new string according to the rules specified in string2. This new string is returned.

PLAN_TO_STRING(string, query)
The plan produced by the query is converted into a string, according to the method documented in string. Note that string is specified before the query since this simplifies parsing of the construct. Method may be any of:
default
Some default method is chosen.
all
Returns a string documenting the entire plan.
tables
Returns a string documenting the tables used within the plan.

DOCUMENT(string1, [string2])
Casts the input string1 as an object of type text, using the text indexer. String2 if present provides runtime instructions to the indexer, as would be provided on the command line. If the file() function wraps this string, then the string is taken to be the name of a file, or URL whose contents are to be cast as a text.

STRING_TO_STRING(string1, string2)
Converts string1 according to the method specified in string2. This may be one of:
XML
Encode the special XML markup characters.
Specifically maps (1) '>' to '&gt;' (2) '<' to '&lt;' (3) '&' to '&amp;' (4) '#' to '&#35;' (5) " to '&quot;' and (6) ' to '&pos;'
CGI
Encode the special CGI markup characters. This is the appropriate translation to apply to XML encoded data.
Specifically maps (1) whitespace (cr, nl, blank, tab) to '+' (2) '%' to '%25' (3) '+' to '%2B' (4) '&' to '%26' and (5) '?' to '%3F'.
For safety (although these symbols are not expected to be seen in the input string) also maps " to '%22', ' to '%27', '<' to '%3C' and '>' to '%3E'.
URI
First encode the string according to XML.
Then encode the resulting string using CGI. This is the appropriated CGI translation to perform when embedding plain text in a CGI command line.
Unquote
Remove any leading/trailing single or double quote from the input string.
Noblanklines
Removes all the blank lines from the input string.

XSTL(string1, string2)
Performs the desired XSLT string transformation on the XML string1, using the XSLT rule file string2, returning the result. If the file() function, wraps either input string, then this string is presented to XSLT as the name of a file to opened, rather than as a string to be processed. Providing XSLT with filenames, avoids the need to construct memory resident images of the file contents in huge strings. For this function to work MSXML4.0 (available as a free download from Microsoft) must first be installed.

Other Functions

FILE_TO_TEXT(string)
Assumes that the filename provided in the input string is a text index constructed using tokenize, and returns the internal representation of this text value. This function will return null (for security reasons) if the input string fails to have a file extension described in the value of the registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Dealers Choice Software\TextServer\FileTypes\Text.
(See also TEXT_FILES.)

FILE_TO_STRING(string)
Assumes that the filename provided in the input string itself contains an ascii or unicode string. Returns as its result the value of this internal file string. This function will fail if the string exceeds the maximum string length allowed. This function will return null (for security reasons) if the input string fails to have a file extension described in the value of the registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Dealers Choice Software\TextServer\FileTypes\String.
(See also STRING_FILES.)
FILE(string)
This function merely returns a copy of its input. The resulting string has a special attribute set indicating that where possible it is to be considered to represent a filename. At present this special attribute is employed only by xslt() and document() functions.

Maintainer
webmaster@textserver.com
Back