BigOWLIM Advanced Features

Skip to end of metadata
Go to start of metadata
Search

OWLIM Documentation

All versions

OWLIM 5.1 (latest)
OWLIM 5.0
OWLIM 4.3
OWLIM 4.2
OWLIM 4.1
OWLIM 4.0
OWLIM 3.5 (this version)

Full-Text Indexing and Search

Full-text search (FTS) concerns retrieving text documents out of a large collection by keywords or, more generally, by tokens (represented as sequences of characters). Formally, the query represents an unordered set of tokens and the result is set of documents, relevant to the query. In a simple FTS implementation, relevance is Boolean: a document is either relevant to the query, when it contains all the query tokens, or not. More advanced FTS implementations deal with a degree of relevance of the document to the query, usually judged on some sort of measure of the frequency of appearance of each of the tokens in the document, normalized versus the frequency of their appearance in the entire document collection. Such implementations return an ordered list of documents, where the most relevant documents come first.
FTS and structured queries, like those in database management systems (DBMS), are different information access methods based on a different query syntax and semantics, where the results are also displayed in a different form. FTS and databases usually require different types of indices too. The ability to combine these two types of information access methods is very useful for a wide range of applications. Many relational DBMS support some sort of FTS (which is integrated into the SQL syntax) and maintain additional indices that allow efficient evaluation of FTS constraints. Typically, relational DBMS allow the user to define a query, which requires specific tokens to appear in a specific column of a specific table. In SPARQL there is no standard way for the specification of FTS constraints. In general, there is neither a well defined nor widely accepted concept for FTS in RDF data. Nevertheless, some semantic repository vendors offer some sort of FTS in their engines. This section documents the FTS supported by BigOWLIM.
Two approaches are implemented in BigOWLIM, a proprietary implementation called 'Node Search', and a Lucene-based implementation called 'RDF Search'. The two approaches are collectively referred to in this guide as 'full-text indexing' and both of them enable OWLIM to perform complex queries against character data, which significantly speeds up the query process. To select one of them, one should consider their functional differences, which are outlined in the table below. Furthermore, there can be considerable differences between indexing and search speed of the two FTS implementations. Thus, performance-conscious users are recommended to experiment with the performance of both methods with respect to dataset and queries representative for the intended application.

  Node Search RDF Search
FTS query form List of tokens List of tokens (with Lucene query extensions)
Result form Unordered set of nodes Ordered list of URIs
Textual Representation For literals: the string value.
For URIs and B-nodes: tokenized URL
Concatenation of the text representations of the nodes from the molecule (1-step neighbourhood in the graph) of the URI
Relevance Boolean, based on presence of the query tokens in the text Vector-space model, reflecting the degree of relevance of the text and the RDF rank of the URI
Implementation Proprietary full-text indexing and search implementation The Lucene engine is integrated and used for indexing and search


The Node Search (with parameter ftsLiteralsOnly set to true) resembles functionality similar to typical FTS implementations in relational DBMS. However, RDF Search is a novel information retrieval concept, which allows for efficient extraction of RDF resources from huge datasets, where ordering of the results by relevance is crucial.

Node Search – Proprietary Full-Text Search

The parameters for OWLIM's full-text index control when/if the index is to be created, the index cache size, and whether literals only or all types of nodes should be indexed. See the parameters ftsIndexPolicy, fts-memory and ftsLiteralsOnly in the configuration section.
The following example configures the database engine to create a 20 megabyte cache for the full-text index on start up that indexes all literals and URIs:

owlim:ftsIndexPolicy "onStartup" ;
owlim:fts-memory "20m" ;
owlim:ftsLiteralsOnly "false"

Full-text search patterns are embedded in SPARQL and SeRQL queries by adding extra statement patterns that use special system predicates:

<String:> <Algorithm predicate> <Binding> .

Each of the elements of this triple is explained below:

  • <String:> the search string - a list of tokens separated by colons ':', whose use is determined by the choice of predicate, see below;
  • <Algorithm predicate> specifies the search method, i.e. how the tokens in the search string are to be used, see below;
  • <Binding> the variable containing the result, i.e. the values (URIs or literals) that match with the given search string and method.
Predicate Description
fts:exactMatch Matches literals that contain all tokens considering the case. For example, searching for <United:States> will match "The president of the United States", but not "United Statesless", "united states" or "notUnited notStates.".
fts:matchIgnoreCase Similar to the above but ignores case. <United:States> will match "The president of the United States", "united states" but not "United Statesless" or "notUnited notStates."
fts:prefixMatch Matches tokens that begin with the given search tokens considering the case. For example, <United:States> will match "The president of the United States" and "United Statesless" but not "notUnited notStates" or "united states."
fts:prefixMatchIgnoreCase Similar to the above but ignores case. For example, <United:States> will match "The president of the United States", "United Statesless", "united states" but not "notUnited notStates".

The namespace prefix onto in the above table <http://www.ontotext.com/owlim/fts#>
There follow some query examples for Node search in SPARQL and SeRQL:

  • Example 1: Get all values that contain a token that matches exactly with 'abstract'
    SPARQL query:
    PREFIX fts: <http://www.ontotext.com/owlim/fts#>
    SELECT ?label
    WHERE { <abstract:> fts:exactMatch ?label . }
    

    SeRQL query:

    SELECT L
    FROM {<abstract:abstract>} fts:exactMatch {L}USING NAMESPACE
    fts = <http://www.ontotext.com/owlim/fts#>
    

    Note that in SeRQL, abstract: is not a valid URI, so abstract:abstract is used instead, which works the same and also conforms with what the parser expects.

  • Example 2: Get all values that contain both tokens 'Remorselessness' and 'books' using case-insensitive search (SPARQL):
    PREFIX fts: <http://www.ontotext.com/owlim/fts#>
    SELECT ?label
    WHERE { <Remorselessness:books> fts:matchIgnoreCase ?label. }
    

    The corresponding SeRQL query is omitted due to its similarity with the above SPARQL query.

  • Example 3: Find everything that has a label that starts with "3d" regardless of the language or the case (SPARQL):
    PREFIX fts: <http://www.ontotext.com/owlim/fts#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?Label WHERE {
        ?X rdfs:label ?label .
        <3d:> fts:prefixMatchIgnoreCase ?label. }
    

    This query cannot be expressed in SeRQL using full-text search predicates, because the SERQL parser won't accept a URI starting with a digit.
    The above example is hard to formulate without a full text search capability. For example, the trivial query below won't match an entry with the label "3d"@en, because this literal is an rdf:PlainLiteral and not the same as "3d", which is an xsd:string, i.e. the data types are different.

    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?x
    WHERE { ?x rdfs:label "3d" }
    

RDF Search - Full-Text Search using Lucene

Apache Lucene is a high-performance, full-featured text search engine written entirely in Java. BigOWLIM supports full text search capabilities using Lucene with a variety of indexing options and the ability to simultaneously use multiple, differently configured indices in the same query.
In order to use Lucene full-text search in BigOWLIM a Lucene index must first be computed. Before being created, each index can be parameterised in a number of ways using SPARQL ASK queries. This provides the ability to:

  • select what kinds of nodes are indexed (URIs/literals/blank-nodes)
  • select what is included in the 'molecule' (explained below) associated with each node
  • select literals with certain language tags
  • choose the size of the RDF 'molecule' to index
  • choose whether to boost the relevance of nodes using RDF Rank values
  • select alternative analysers
  • select alternative scorers

In order to use the indexing behaviour of Lucene, a text document must be created for each node in the RDF graph to be indexed. This text document is called the 'RDF molecule' and is made up of other nodes reachable via the predicates that connect nodes to each other. Once a molecule has been created for each node, Lucene creates an index over these molecules. During search (query answering) Lucene identifies the matching molecules and OWLIM uses the associated nodes as variables substitutions when evaluating the enclosing SPARQL query.
The scope of an RDF molecule includes the starting node and its neighbouring nodes that are reachable via the specified number of predicate arcs. What type of nodes are indexed and what type of nodes are included in the molecule can be specified for each Lucene index. Furthermore, the size of the molecule can be controlled by specifying the number of allowed traversals of predicate arcs starting from the molecule centre (the node being indexed). Note that blank nodes themselves are never included in the molecule. If a blank node is encountered the search is extended via any predicate to the next nearest entity and so on. Therefore even when the molecule size is 1, entities reachable via several intermediate predicates can still be included in the molecule if all the intermediate entities are blank nodes.
The parameters are described in more detail as follows:

Parameter Exclude
Predicate http://www.ontotext.com/owlim/lucene#exclude
Description Provides a regular expression to identify nodes that will be excluded from to the molecule. Note that the regular expression will be applied case-sensitively to literals and URI local names.
The example given below will cause matching URIs (e.g. <http://example.com/uri#helloWorld> ) and literals (e.g. "hello world!") not to be included.
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:exclude luc:setParam "hello.*" }


Parameter Exclude entities
Predicate http://www.ontotext.com/owlim/lucene#excludeEntities
Description A comma/semi-colon/white-space separated list of entities that will NOT be included in an RDF molecule.
The example below will include any URI in a molecule, except the two listed.
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:excludeEntities luc:setParam "http://www.w3.org/2000/01/rdf-schema#Class http://www.example.com/dummy#E1

" }


Parameter Exclude predicates
Predicate http://www.ontotext.com/owlim/lucene#excludePredicates
Description A comma/semi-colon/white-space separated list of properties that will NOT be traversed in order build an RDF molecule.
The example below will prevent any entities being added to an RDF molecule if they can only be reached via the two given properties.
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:excludePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#subClassOf http://www.example.com/dummy#p1

" }


Parameter Include
Predicate http://www.ontotext.com/owlim/lucene#include
Description Indicates what kinds of nodes are to be included in the molecule. The value can be a list of values from: uri, literal, centre (the plural forms are also allowed: uris, literals, centres). The value of centre causes the node for which the molecule is built to be added to the molecule (provided it is not a blank node). This can be useful, for example, when indexing URI nodes with molecules that contain only literals, but the local part of the URI should also be searchable.
Default "literals"
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:include luc:setParam "literal uri" . }


Parameter Include entities
Predicate http://www.ontotext.com/owlim/lucene#includeEntities
Description A comma/semi-colon/white-space separated list of entities that can be included in an RDF molecule.
Any other entities will be ignored. The example below will build molecules that only contain the two entities.
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:includeEntities luc:setParam "http://www.w3.org/2000/01/rdf-schema#Class http://www.example.com/dummy#E1

" }


Parameter Include predicates
Predicate http://www.ontotext.com/owlim/lucene#includePredicates
Description A comma/semi-colon/white-space separated list of properties that can be traversed in order build an RDF molecule.
The example below will allow any entities to be added to an RDF molecule, but only if they can be reached via the two given properties.
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#subClassOf http://www.example.com/dummy#p1

" }


Parameter Index
Predicate http://www.ontotext.com/owlim/lucene#index
Description Indicates what kinds of nodes are to be indexed. The value can be a list of values from: uri, literal, bnode (the plural forms are also allowed: uris, literals, bnodes).
Default "literals"
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:index luc:setParam "literals, bnodes" . }


Parameter Language(s)
Predicate http://www.ontotext.com/owlim/lucene#languages
Description A comma separated list of language tags. Only literals with the indicated language tags will be included in the index. To include literals that have no language tag, use the special value 'none'.
Default "" (which is used to indicate that literals with any language tag are used, including those with no language tag)
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:languages luc:setParam "en,fr,none" . }


Parameter Molecule size
Predicate http://www.ontotext.com/owlim/lucene#moleculeSize
Description Set the size of the molecule associated with each entity. A value of zero indicates that only the entity itself should be indexed. A value of 1 indicates that the molecule will contain all entities reachable by a single 'hop' via any predicate (predicates not included in the molecule). Note that blank nodes themselves are never included in the molecule. If a blank node is encountered the search is extended via any predicate to the next nearest entity and so on. Therefore even when the molecule size is 1, entities reachable via several intermediate predicates can still be included in the molecule if all the intermediate entities are blank nodes. Molecule sizes of 2 and upwards are allowed, but with large datasets it can take a very long time to create the index.
Default 0
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:moleculeSize luc:setParam "1" . }


Parameter Use RDF rank
Predicate http://www.ontotext.com/owlim/lucene#useRDFRank
Description Indicates whether the RDF weights (if they have been computed already) associated with each entity should be used as boosting factors when computing the relevance to a given Lucene query. Allowable values are 'no', 'yes' and 'squared'. This last value indicates that the square of the RDF Rank value is to be used.
Default "no"
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:useRDFRank luc:setParam "yes" . }


Parameter Set alternative analyser
Predicate http://www.ontotext.com/owlim/lucene#analyzer
Description Used to set an alternative analyser for processing text to produce terms to index. By default, this parameter has no value and the default analyser used is:
org.apache.lucene.analysis.standard.StandardAnalyzer
An alternative analyser must be derived from:
org.apache.lucene.analysis.Analyzer
To use an alternative analyser, use this parameter to identify the name of a Java factory class that can instantiate it. The factory class must be available on the Java virtual machine's classpath and must implement this interface:
com.ontotext.trree.plugin.lucene.AnalyzerFactory
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:analyzer luc:setParam "com.ex.MyAnalyserFactory" . }


Parameter Set alternative scorer
Predicate http://www.ontotext.com/owlim/lucene#scorer
Description Used to set an alternative scorer that provides boosting values that adjust the relevance (and hence the ordering) of results to a Lucene query. By default, this parameter has no value and no additional scoring takes place, however, if the useRDFRank parameter is set to true, then the RDF Rank scores are used (see section 10.1).
An alternative scorer must implement this interface:
com.ontotext.trree.plugin.Scorer
In order to use an alternative scorer, use this parameter to identify the name of a Java factory class that can instantiate it. The factory class must be available on the Java virtual machine's classpath and must implement this interface:
com.ontotext.trree.plugin.ScorerFactory
Default <none>
Example PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:scorer luc:setParam "com.ex.MxScorerFactory" . }


Once the parameters for an index have been set, the index is created and named using a SPARQL ASK query of this form, where the index name appears as the subject in the query statement pattern:

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
ASK { luc:myIndex luc:createIndex "true" . }

The index name must have the http://www.ontotext.com/owlim/lucene# namespace and the local part can contain only alphanumeric characters and underscores.
Creating an index can take some time, although usually no more than a few minutes when the molecule size is 1 or less. During this process, for each node in the repository its surrounding molecule is computed. Then each such molecule is converted into a single string document (by concatenating the textual representation of all the nodes in the molecule) and this document is indexed by Lucene. If RDF Rank weights are used (or an alternative scorer is specified) then the computed values are stored in Lucene's index as a boosting factor that will later on influence the selection order.
To use a custom Lucene index in a SPARQL query use the index's name as the predicate in a statement pattern, with the Lucene query as the object using the full Lucene query vocabulary.

The following query will produce bindings for ?s from entities in the repository, where the RDF molecule associated with that entity (for the given index) contains terms that begin with "United". Furthermore, the bindings will be ordered by relevance (with any boosting factor):

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
SELECT ?s
WHERE { ?s luc:myIndex "United*" . }

The Lucene score for a bound entity for a particular query can be exposed using a special predicate:

http://www.ontotext.com/owlim/lucene#score

This can be useful when lucene query results should be ordered in a manner based on, but different from, the original Lucene score. For example, the following query will order results by a combination of the Lucene score and some ontology defined importance value:

PREFIX luc: <http://www.ontotext.com/owlim/lucene#>
PREFIX ex: <http://www.example.com/myontology#>
SELECT * {
  ?node luc:myIndex "lucene query string" .
  ?node ex:importance ?importance .
  ?node luc:score ?score .
} ORDER BY ( ?score + ?importance )

The luc:score predicate will work only on bound variables. There is no problem disambiguating multiple indices because each variable will be bound from exactly one Lucene index and hence its score.

The combination of ranking RDF molecules together with full-text search provides a powerful mechanism for querying/analysing datasets even when the schema is not known. This allows for keyword-based search over both literals and URIs with the results ordered by importance/interconnectedness. For an example of this kind of 'RDF Search', see FactForge.

Geo-spatial Extensions

BigOWLIM has special support for 2-dimensional geo-spatial data that uses the WGS84 Geo Positioning RDF vocabulary (World Geodetic System 1984). Special indices can be used for this data that permit the efficient evaluation of special query forms and extension functions that allow:

  • locations to be found that are within a certain distance of a point, i.e. within the specified circle on the surface of the sphere (Earth), using the nearby(...) construction;
  • locations that are within rectangles and polygons, where the vertices are defined using spherical polar coordinates, using the within(...) construction.

The WGS84 ontology can be found at: http://www.w3.org/2003/01/geo/wgs84_pos and contains several classes and predicates:

Element Description
SpatialThing Class used to represent anything with spatial extent, i.e. size, shape or position.
Point Class used represent a point (relative to Earth) defined using latitude, longitude (and altitude).
subClassOf http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing
location The relation between a thing and where it is.
range SpatialThing
subPropertyOf http://xmlns.com/foaf/0.1/based_near
lat The WGS84 latitude of a SpatialThing (decimal degrees).
domain http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing
long The WGS84 longitude of a SpatialThing (decimal degrees).
domain http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing
lat_long A comma-separated representation of a latitude, longitude coordinate.
alt The WGS84 altitude of a SpatialThing (decimal meters above the local reference ellipsoid).
domain http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing

Before the geo-spatial extensions can be used, the geo-spatial index must be built. This is achieved using a special predicate as follows:

PREFIX ontogeo: <http://www.ontotext.com/owlim/geo#>
ASK { _:b1 ontogeo:createIndex _:b2. }

If the indexing is successful, the above query will return true, false otherwise. Information about the indexing process and any errors can be found in the log.

Geo-spatial query syntax

The special syntax used to query geo-spatial data makes use of SPARQL's RDF Collections syntax. This syntax uses round brackets as a shorthand for the statements connecting a list of values using rdf:first and rdf:rest predicates with terminating rdf:nil. Statement patterns that use one of the special geo-spatial predicates supported by BigOWLIM are treated differently by the query engine. The following special syntax is supported when evaluating SPARQL queries (the descriptions all use the namespace omgeo: <http://www.ontotext.com/owlim/geo#>):

Construct Nearby (lat long distance)
Syntaxt ?point omgeo:nearby(?lat ?long ?distance)
Description This statement pattern will evaluate to true if the following constraints hold:
  • ?point geo:lat ?plat .
  • ?point geo:long ?plong .
  • Shortest great circle distance from (?plat, ?plong) to (?lat, ?long) <= ?distance

    Such a construction will use the geo-spatial indices to find bindings for ?point that lie within the defined circle.
    Constants are allowed for any of ?lat ?long ?distance, where latitude and longitude are specified in decimal degrees and distance is specified in either kilometres ('km' suffix) or miles ('mi' suffix). If the units are not specified, then 'km' is assumed.
Restrictions Latitude is limited to the range -90 (South) to +90 (North)
Longitude is limited to the range -180 (West) to +180 (East)
Examples Find the names of airports that are within 50 miles of Seoul:
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo: <http://www.ontotext.com/owlim/geo#>
SELECT distinct ?airport
WHERE {
  ?base geo-ont:name "Seoul" .
  ?base geo-pos:lat ?latBase .
  ?base geo-pos:long ?longBase .
  ?link omgeo:nearby(?latBase ?longBase "50mi") .
  ?link geo-ont:name ?airport  .
  ?link geo-ont:featureCode geo-ont:S.AIRP .
}



Construct Within (rectangle)
Syntax ?point omgeo:within(?lat1 ?long1 ?lat2 ?long2)
Description This statement pattern is used to test/find points that lie within the rectangle specified by diagonally opposite corners ?lat1 ?long1 and ?lat2 ?long2. The corners of the rectangle must be either constants or bound values.
It will evaluate to true if the following constraints hold:
  • ?point geo:lat ?plat .
  • ?point geo:long ?plong .
  • ?lat1 <= ?plat <= ?lat2
  • ?long1 <= ?plong <= ?long2

    Note that the corners must be specified most westerly and southerly (first) and most northerly and easterly (second). Proper account is taken for rectangles that cross the +/-180 degree meridian.
    Constants are allowed for any of ?lat1 ?long1 ?lat2 *long2, where latitude and longitude are specified in decimal degrees. If ?point is unbound then bindings for all points within the rectangle will be produced.
Restrictions Latitude is limited to the range -90 (South) to +90 (North)
Longitude is limited to the range -180 (West) to +180 (East)
Rectangle vertices must be specified in the order lower-left followed by upper-right
Examples Find tunnels lying within a rectangle enclosing Tirol, Austria:
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo:   <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
  ?link omgeo:within(45.85 9.15 48.61 13.18) .
  ?link geo-ont:featureCode geo-ont:R.TNL .
  ?link geo-ont:name ?feature .
  ?link geo-pos:lat ?lat .
  ?link geo-pos:long ?long .
}


Construct Within (polygon)
Syntax ?point omgeo:within(?lat1 ?long1 ... ?latn ?longn)
Description This statement pattern is used to test/find points that lie within the polygon whose vertices are specified by three or more latitude/longitude pairs. The values for the vertices must be either constants or bound values.
It will evaluate to true if the following constraints hold:
  • ?point geo:lat ?plat .
  • ?point geo:long ?plong .
  • the position ?plat ?plong is enclosed by the polygon
    The polygon is closed automatically if the first and last vertices do not coincide. The vertices must be constants or bound values. Coordinates are specified in decimal degrees. If ?point is unbound then bindings for all points within the polygon will be produced.
Restrictions Latitude is limited to the range -90 (South) to +90 (North)
Longitude is limited to the range -180 (West) to +180 (East)
Examples Find caves in the sides of cliffs lying within a polygon approximating the shape of England:
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo:   <http://www.ontotext.com/owlim/geo#>
SELECT ?feature ?lat ?long
WHERE {
?link omgeo:within( "51.45" "-2.59"
  "54.99" "-3.06"
  "55.81" "-2.03"
  "52.74"  "1.68"
  "51.17"  "1.41" ) .
  ?link geo-ont:featureCode geo-ont:S.CAVE .
  ?link geo-ont:name ?feature .
  ?link geo-pos:lat ?lat .
  ?link geo-pos:long ?long .
}

Extension query functions

At present there is just one SPARQL extension function.

Function Distance function
Syntax double omgeo:distance(?lat1, ?long1, ?lat2, ?long2)
Description This SPARQL extension function computes the distance between two points in kilometres and can be used in FILTER and ORDER BY clauses.
Restrictions Latitude is limited to the range -90 (South) to +90 (North)
Longitude is limited to the range -180 (West) to +180 (East)
Examples Find all the airports within 80 miles of Bournemouth and filter out those that are more than 80 kilometres from Brize Norton, order the results with the closest to Brize Norton first:
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX geo-ont: <http://www.geonames.org/ontology#>
PREFIX omgeo:   <http://www.ontotext.com/owlim/geo#>

SELECT distinct ?airport_name
WHERE {
  ?a1 geo-ont:name "Bournemouth" .
  ?a1 geo-pos:lat ?lat1 .
  ?a1 geo-pos:long ?long1 .
  ?airport omgeo:nearby(?lat1 ?long1 "80mi" ) .
  ?airport geo-ont:name ?airport_name .
  ?airport geo-ont:featureCode geo-ont:S.AIRP .
  ?airport geo-pos:lat ?lat2 .
  ?airport geo-pos:long ?long2 .
  ?a2 geo-ont:name "Brize Norton" .
  ?a2 geo-pos:lat ?lat3 .
  ?a2 geo-pos:long ?long3 .
  FILTER( omgeo:distance(?lat2, ?long2, ?lat3, ?long3) < 80)
}
ORDER BY ASC( omgeo:distance(?lat2, ?long2, ?lat3, ?long3) )

Implementation details

Knowledge of the implementation's algorithms and assumptions will allow users to make the best use of the BigOWLIM geo-spatial extensions. The following points are significant and can affect the expected behaviour during query answering:

  • Spherical Earth – the current implementation treats the Earth as a perfect sphere with a radius of 6371.009km;
  • Only 2-Dimensional points are supported, i.e. there is no special handling of geo:alt (metres above the reference surface of the Earth);
  • All latitude and longitude values must be specified using decimal degrees, where East and North are positive and -90 <= latitude <= +90 and -180 <= longitude <= +180;
  • Distances must be in units of kilometres (suffix 'km') or statute miles (suffix 'mi'). If the suffix is omitted, kilometres are assumed;
  • omgeo:within( rectangle ) construct uses a 'rectangle' whose edges are lines of latitude and longitude, so the north-south distance is constant and the rectangle described forms a band around the Earth that starts and stops at the given longitudes;
  • omgeo:within( polygon ) joins vertices with straight lines on a cylindrical projection of the Earth tangential to the equator. A straight line starting at the point under test and continuing East out of the polygon is examined to see how many polygon edges it intersects. If the number of intersections is even then the point is outside the polygon, if the number of intersections is odd, the point is inside the polygon. With the current algorithm, the order of vertices is not relevant (clockwise or anticlockwise);
  • omgeo:within() may not work correctly when the region (polygon or rectangle) spans the +/-180 meridian;
  • omgeo:nearby() uses the great circle distance between points.

RDF Rank

RDF Rank is an algorithm that identifies the more important or more popular entities in the repository by examining their interconnectedness. The popularity of entities can then be used to order query results in a similar way to internet search engines, such as how Google orders search results using PageRank http://en.wikipedia.org/wiki/PageRank.
The RDF Rank component computes a numerical weighting for all the nodes in the entire RDF graph stored in the repository, including URIs, blank nodes and literals. The weights are floating point numbers with values between 0 and 1 that can be interpreted as a measure of a node's relevance/popularity.
Since the values range from 0 to 1, the weights can be used for sorting a result set (the lexicographical order works fine even if the rank literals are interpreted as plain strings). Here is an example SPARQL query that uses RDF rank for sorting results by their popularity:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX opencyc-en: <http://sw.opencyc.org/2008/06/10/concept/en/>
SELECT * WHERE {
  ?Person a opencyc-en:Entertainer .
  ?Person rank:hasRDFRank ?rank .
}
ORDER BY DESC(?rank) LIMIT 100

As seen in the example query, RDF Rank weights are made available via a special system predicate. Triple patterns with the predicate http://www.ontotext.com/owlim/RDFRank#hasRDFRank are handled specially by OWLIM, where the object of the statement pattern is bound to a literal containing the RDF Rank of the subject.
In order to use this mechanism the RDF ranks for the whole repository must be computed in advance. This is done by executing a series of SPARQL ASK queries to parameterise the weighting algorithm, followed by a query that triggers the computation itself.

Parameter Maximum iterations
Predicate http://www.ontotext.com/owlim/RDFRank#maxIterations
Description Sets the maximum number of iterations of the algorithm over all entities in the repository.
Default 20
Example PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { rank:maxIterations rank:setParam "16" . }


Parameter Epsilon
Predicate http://www.ontotext.com/owlim/RDFRank#epsilon
Description Used to terminate the weighting algorithm early when the total change of all RDF Rank scores has fallen below this value.
Default 0.01
Example PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { rank:epsilon rank:setParam "0.05" . }


To trigger the computation of the RDF Rank weights, use the following query:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:compute _:b2. }

The computed weights can be exported to an external file using a query of this form:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:export "/home/user1/rdf_ranks.txt" . }

The query will return true if the export was successful, false otherwise. If the export failed then an error message will be recorded in the log file.
Lastly, when using RDF Priming, the RDF Rank weights can be used as the initial action values. To set this up, use the following query:

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
ASK { _:b1 rank:ranksAsWeights _:b2 . }

RDF Priming

RDF Priming is a technique that selects a subset of available statements for use as the input to query answering. It is based upon the concept of 'spreading activation' as developed in cognitive science.
RDF Priming is a scalable and customizable implementation of the popular connectionist method on top of RDF graphs. It allows "priming" of large datasets with respect to concepts relevant to the context and to the query. It is implemented in the TRREE engine and controlled using SPARQL ASK queries. This section provides an overview of the mechanism and explains the necessary SPARQL queries used to manage and set up RDF Priming.

RDF Priming Configuration

To enable RDF Priming over the repository, the repository-type configuration parameter should be set to weighted-file-repository.
The current implementation of RDF Priming does not store activation values, which means that they are only available at runtime and are lost when the repository is shutdown. However, they can be exported and imported using the special query directives shown below. Another side effect is that the activation values are global, because they stored within the shared Entity pool.
The initialization and management of the RDF Priming module is achieved by performing SPARQL ASK queries.

Controlling RDF Priming

RDF Priming is controlled using SPARQL ASK queries, which allows all the parameters and default values to be set. These queries use special system predicates, which are described below:

Function Enable Activation Spreading
Predicate http://www.ontotext.com/owlim/RDFPriming#enableSpreading
Description Used to enable or disable the RDF Priming module. The Object value of the statement pattern should be a Literal whose value is either "true" or "false"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:enableSpreading "true".}


Function Set Activation Decay
Predicate http://www.ontotext.com/owlim/RDFPriming#decayActivations
Description Used to alter all the activation values for the nodes in the RDF graph by multiplying them by a factor specified as a Literal in the Object position of the Statement pattern of the query. The following example will reset all the activation values to zero by multiplying them by "0.0"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:decayActivations "0.0".}


Function Trigger Activation Spreading Cycle
Predicate http://www.ontotext.com/owlim/RDFPriming#spreadActivation
Description Used to trigger an Activation spreading cycle that starts from the nodes that were scheduled for activation for this round. No special values are required for the Subject or Object part of the statement pattern – blank nodes suffice
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK {_:b1 prim:spreadActivation _:b2.}


Function Set Statement Weight
Predicate http://www.ontotext.com/owlim/RDFPriming#assignWeight
Description Used to set a non-default weight factor for statements with a specific predicate. The Subject of the Statement pattern is the predicate to which the new value should be set. The Object of the pattern is the new weight value as a Literal. The example query sets 0.5 as a weight factor to all the rdfs:subClassOf statements
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
ASK { rdfs:subClassOf prim:assignWeight "0.5" . }


Function Schedule Nodes for Activation
Predicate http://www.ontotext.com/owlim/RDFPriming#activateNode
Description Used to schedule the nodes specified as Subject or Object of the statement pattern for activation. Scheduling for activation can also be performed by evaluating an ASK query with variables in the body, in which case the nodes bound to the variables used in the query will be scheduled for activation. The behaviour of such an ASK query is altered, so that all the solutions are exhausted before returning the query result. This could take a long time, since LIMIT and OFFSET are not available in this case. The first example activates two nodes gossip:hasTrack and prel:hasChild and the second example activates many nodes identifying people (and their names) that have an album called "American Life".
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX gossip: <http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#>
PREFIX prel: <http://proton.semanticweb.org/2007/10/proton_rel#>
ASK { gossip:hasTrack prim:activateNode prel:hasChild }

PREFIX gossip: <http://www.ontotext.com/rascalli/2008/04/gossipdb.owl#>
PREFIX onto: <http://www.ontotext.com#>
ASK {
?person gossip:hasAlbum ?album .
?album gossip:name "American Life" .
?person gossip:name ?name }


The following URI's are used with conjuction with the <http://www.ontotext.com/owlim/RDFPriming#decayFactor> predicate to change the parameters of the RDF Priming module. In general, the names of the parameters are Subjects of the statement pattern and the new values are passed as its Object.

Parameter Activation Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#activationThreshold
Description During activation spreading activations are accumulated in nodes and can grow indefinitely. The activationThreshold allows the user to trim those value to a certain threshold. The default value of this parameter is 1.0, which means that all values bigger than 1.0 are set to 1.0 on every iteration. This parameter is applied on every iteration of the process and guarantees that no activations larger than the parameter value will be encountered.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:activationThreshold prim:setParam "0.9" . }


Parameter Decay Factor
Predicate http://www.ontotext.com/owlim/RDFPriming#decayFactor
Description Is used during spreading activation to control how much a node's activation level is transferred to nodes that it affects. The following example query sets the new decayFactor to "0.55"
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:decayFactor prim:setParam "0.55" . }


Parameter Default Activation Value
Predicate http://www.ontotext.com/owlim/RDFPriming#defaultActivation
Description Sets the default activation value for all nodes in the repository. If the default activation is not preset then the default activation for all repository nodes is 0. This does not affect the activation origin nodes, whose activation values are set by using http://www.ontotext.com/owlim/RDFPriming#initialActivation
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:defaultActivation prim:setParam "0.4" . }


Parameter Default Weight
Predicate http://www.ontotext.com/owlim/RDFPriming#defaultWeight
Description Edges in the RDF graph can be given weights that are multiplied by the source node activation in order to compute the activation that is spread across the edge to the destination node (see assignWeight). If the predicate of the edge is not given any specific weight (via assignWeight) then the edge weight is assumed to be 1/3 (one third). This default weight can be changed by using the defaultWeight parameter. Any floating point value in the range [0,1] can be used.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:defaultWeight prim:setParam "0.2" . }


Function Export Activation Values
Predicate http://www.ontotext.com/owlim/RDFPriming#exportActivations
Description Is used to export activation values for a set of nodes. The values are stored in a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:exportActivations prim:setParam "file:///D/work/my_activations.txt" . }


Parameter Filter Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#filterThreshold
Description Sets the new filter threshold value used to decide when a statement is visible depending on the activation level of its subject, predicate and object.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:filterThreshold prim:setParam "0.50" . }


Parameter Firing Threshold
Predicate http://www.ontotext.com/owlim/RDFPriming#firingThreshold
Description Sets the threshold above which a node will activate its neighbours
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:firingThreshold prim:setParam "0.25" . }


Function Import Activation Values
Predicate http://www.ontotext.com/owlim/RDFPriming#importActivations
Description Is used to import activation values for a set of nodes. The values are loaded from a file identified by the URL given as the Object of the statement pattern. The format of the data in the file is simply one line per URI followed by a tab character and the floating-point value of its activation value.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:importActivations prim:setParam "file:///D/work/my_activations.txt" . }


Parameter Initial Activation Value
Predicate http://www.ontotext.com/owlim/RDFPriming#initialActivation
Description Sets the initial activation value for each of the nodes from which the activation process starts. The nodes that are scheduled for activation will receive that amount at the beginning of the spreading activation process.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:initialActivation prim:setParam "0.66" . }


Parameter Maximum Nodes Fired Per Cycle
Predicate http://www.ontotext.com/owlim/RDFPriming#maxNodesFiredPerCycle
Description Sets the number of nodes that should fire activations during one spreading activation cycle. The default value is 100000.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:maxNodesFiredPerCycle prim:setParam "10000" . }


Parameter Number of Cycles
Predicate http://www.ontotext.com/owlim/RDFPriming#cycles
Description Sets the number of activation spreading cycles to perform when the process is initiated.
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:cycles prim:setParam "4" . }


Parameter Number of Worker Threads
Predicate http://www.ontotext.com/owlim/RDFPriming#workerThreads
Description Sets the number of worker threads that will perform the spreading activation (the default is 2).
Example PREFIX prim: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { prim:workerThreads prim:setParam "4" . }


RDF Priming Example

The following example uses data from DBPEDIA http://dbpedia.org/About and was imported into BigOWLIM with the RDF Priming mode enabled. The management queries are evaluated through the Sesame console application for convenience. The initial step is to evaluate a demo query that retrieves all the instances of the dbpedia:V8 concept:

SELECT *
WHERE {?x <http://dbpedia.org/property/class> <http://dbpedia.org/resource/V8>. }

The above query returns the following results:

?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:BMW_N62
dbpedia3:Chrysler_Flathead_engine
dbpedia3:Duramax_V8_engine
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Power_Stroke_engine
dbpedia3:Ford_Y-block_engine
dbpedia3:Ford_Yamaha_V8_engine
dbpedia3:GM_Premium_V_engine
dbpedia3:Lincoln_Y-block_V8_engine
dbpedia3:Mercedes-Benz_M113_engine
dbpedia3:Nissan_VH_engine
dbpedia3:Nissan_VK_engine
dbpedia3:BMW_N63
dbpedia3:Toyota_UR_engine
dbpedia3:Toyota_UZ_engine

As can be seen, the query returns many engines from different manufacturers. The RDF Priming module can be used to reduce the number of results returned by this query by targeting the query to specific parts of the global RDF graph, i.e. the parts of the graph that have been activated.
The following text shows an example of setting up and configuring the RDF Priming module for the purpose of making the example query return a smaller set of more specific results. It is assumed that a SPARQL endpoint is available that is connected to a running repository instance.
Enable the RDF Priming module:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "true" . }

Change the default decay factor:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:decayFactor onto:setParam "0.55" . }

Change the firing threshold parameter:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:firingThreshold onto:setParam "0.25" . }

Change the filter threshold:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:filterThreshold onto:setParam "0.60" . }

The initial Activation Level is changed to reflect the specifics of the data set:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { onto:initialActivation onto:setParam "0.66" . }

Adjust the Weight factors for a specific predicate so that it activates the relevant sub-set of the RDF graph, in this case the rdfs:subClassOf predicate:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
ASK { rdfs:subClassOf onto:assignWeight "0.5" . }

The next step alters the Weight Factor of the rdf:type predicate so that it does not propagate activations to the classes from the activated instances. This is a useful technique when there are a lot of instances and a very large classification taxonomy which should not be broadly activated (as is the case with the DBpedia dataset).

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ASK {  rdf:type onto:assignWeight "0.1" . }

If the example query is executed at this stage, it will return no results, because the RDF graph has no activated nodes at all. Therefore the next step is to activate two particular nodes, the Ford Motor Company dbpedia3:Ford_Motor_Company and one of the cars they build dbpedia3:1955_Ford, which came out of the factory with a very nice V8 engine:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
PREFIX dbpedia3: <http://dbpedia.org/resource/>
ASK { dbpedia3:1955_Ford onto:activateNode dbpedia3:Ford_Motor_Company }

Finally, tell the RDF Priming module to spread the activations from these two nodes:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b0 onto:spreadActivation _:b1 . }

This will normally take 8-10 seconds after which the example query can be re-evaluated with the following results:

?x
------------------------------------
dbpedia3:Jaguar_AJ-V8_engine
dbpedia3:BMW_M62
dbpedia3:Ford_385_engine
dbpedia3:Ford_MEL_engine
dbpedia3:Ford_Y-block_engine

As can be seen, the result set is smaller and most of the engines retrieved are made by Ford. However, there is an engine made by Jaguar which is most probably there because Ford owned Jaguar for some time in the past, so both manufacturers are somehow related to each other. This might also be the case for the other non-Ford engines returned, since BMW also owned Jaguar for some time. Of course, these remarks are a free interpretation of the results.
Finally, disable the RDF Priming module:

PREFIX onto: <http://www.ontotext.com/owlim/RDFPriming#>
ASK { _:b1 onto:enableSpreading "false" . }

to return to the normal operating mode.

Local Notifications

Notifications are a publish/subscribe mechanism for registering and receiving events from a BigOWLIM repository whenever triples matching a certain graph pattern are inserted or removed. The Sesame API provides such a mechanism, where a RepositoryConnectionListener can be notified of changes to a NotifiyingRepositoryConnection. However the BigOWLIM notifications API works at a lower level and uses the internal raw entity IDs for subject, predicate, object instead of Java objects. The benefit of this is that a much higher performance is possible. The downside is that the client must do a separate lookup up to get the actual entity values and because of this, the notification mechanism will only work when the client is running inside the same JVM as the repository instance. See the next section for the remote notification mechanism.
The user of the notifications API registers for notifications by providing a SPARQL query. The SPARQL query is interpreted as a plain graph pattern by ignoring all the more complicated SPARQL constructs like FILTER, OPTIONAL, DISTINCT, LIMIT, ORDER BY, etc. Therefore the SPARQL query is interpreted as a complex graph pattern involving triple patterns combined by means of joins and unions at any level. The order of the triple patterns is not significant.
Here is an example how to register for notifications based on a given SPARQL query:

In the example code, the caller would be asynchronously notified about incoming statements matching the pattern ?s rdf:type ?o. In general, notifications will be sent for all incoming triples that contribute to a solution of the query. The integer parameters in the notifyMatch method can be mapped to values using the EntityPool object. Furthermore, any statements inferred from newly inserted statements will also be subject to handling by the notification mechanism, i.e. new implicit statements will also be notified to clients when the requested triple pattern matches.
The subscriber should not rely on any particular order or distinctness of the statement notifications. Duplicate statements might be delivered in response to a graph pattern subscription in an order not even bound to the chronological order of the statements insertion in to the underlying triple store.
The purpose of the notification services is to enable the efficient and timely discovery of newly added RDF data. Therefore it should be treated as a mechanism for giving the client a hint that certain new data is available and not as an asynchronous SPARQL evaluation engine.

Remote notifications

OWLIM's remote notification mechanism provides filtered statement add/remove and transaction begin/end notifications for a local or a remote BigOWLIM repository. Subscribers for this mechanism use patterns of subject, predicate and object (with wildcards) to filter the statement notifications. JMX is used internally as a transport mechanism.

Using remote notifications

Registering and deregistering for notifications is achieved through the NotifyingOwlimConnection class, which wraps a RepositoryConnection object connected to an OWLIM repository and provides an API to add/remove notification listeners of type RepositoryNotificationsListener. Here is a simple example of the API usage:

Note that the transactionStarted() and transactionComplete() events are not bound to any statement, they are dispatched to all subscribers, no matter what they are subscribed for. This means that pairs of start/complete events can be detected by the client without receiving any statement notifications in between.

The above example will work when the OWLIM repository is initialized in the same JVM that runs the example (local repository). If a remote repository is used (e.g. HTTPRepository) the notifying repository connection should be initialized differently:

where host (String) and port (int) are the host name of the remote machine where the repository resides and the port number of the JMX service in the repository JVM. The other part of the example remains valid for the remote case. The repository connection used to initialize a NotifyingOwlimConnection instance could be a ReplicationClusterConnection in which case notifications will work in cluster mode (transparently to the user) - no changes on the client side are required.

Remote Notification Configuration

For remote notifications, where the subscriber and the repository are running in different JVM instances (possibly on different hosts), a JMX remote service should be configured in the repository JVM. This is done by adding the following parameters to the JVM command line:

-Dcom.sun.management.jmxremote.port=1717
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false

If the repository is running inside a servlet container, then these parameters must be passed to the JVM that runs the container and OWLIM. For Tomcat, this can be done using the JAVA_OPTS or CATALINA_OPTS environment variable.
The port number used should be exactly the port number that is passed to the NotifyingOwlimConnection constructor (as in the example above). One should make sure that the specified port (e.g. 1717) is accessible remotely, i.e. no firewalls or NAT redirection prevent access to it.
In replication cluster setup, all the worker nodes should have their JMX configured properly in order to enable notifications for the whole cluster. The master node assumes that each worker is exposing its JMX service on port 1717 but this can be overridden when nodes are added to the cluster (the third parameter to addClusterNode() operation is the JMX service port of that node) or by editing the cluster.properties configuration file and adding the following parameter:

jmxport<N> = <PORTN>

where N is the consecutive number of the node we want to configure and PORTN is the port number of that node's JMX service. Cluster workers should also have their com.sun.management.jmxremote.* JVM parameters properly configured. Replication cluster master nodes will therefore be controlled and emit notifications using the same JMX port number.

Query modifiers/extensions

Managing Explicit and Implicit Statements

In order to control whether only explicit or only implicit statements are considered during SPARQL query evaluation, some special context identifiers can be used with the FROM and FROM NAMED SPARQL constructs. The following table gives details:

Clause Behaviour
FROM <http://www.ontotext.com/explicit> The default graph (used in triple patterns with such scope) will include only explicit statements (with or without a context)
FROM <http://www.ontotext.com/implicit> The default graph (used in triple patterns with such scope) will include only inferred statements
FROM NAMED <http://www.ontotext.com/explicit> This means that the NAMED graph (used in triple patterns with such scope e.g. {GRAPH ?g {?s ?p ?o} . ) will include only explicit statements (with or without a context)
FROM NAMED <http://www.ontotext.com/implicit> That the NAMED graph (used in triple patterns with such scope e.g. {GRAPH ?g {?s ?p ?o} . ) will include all the inferred statements

Effectively, statements behave as though they have a context of http://www.ontotext.com/implicit or http://www.ontotext.com/explicit independent of whether they have an actual context or not. Various combinations of FROM and FROM NAMED are allowed in alignment with SPARQL semantics.

Other special query behaviour

There are several more special graph URIs used in BigOWLIM that can be used to control query evaluation.

Clause Behaviour
FROM/FROM NAMED <http://www.ontotext.com/disable-sameAs> Used to switch off the enumeration of the equivalence classes produced by owl:sameAs during triple pattern matching, which is the default behaviour, so that solutions followed by these are excluded. Its purpose is to reduce the number of results to only those that are valid for a single representative of the class (this is a rough description and not fully explanatory). For example, given a triple that matches a pattern: test:Inst rdf:type, test:SomeClass and test:Inst is owl:sameAs to test:Inst2 then, by default there would be 2 triples matching the pattern, one for test:Inst and another for test:Inst2. Using the above system graph in FROM/FROM NAMED clauses excludes such redundancies. BE AWARE that if the query uses filters over the textual representation of a node that modifier may skip some valid solutions since not all the nodes within an equivalence class will be matched against such a FILTER.
FROM/FROM NAMED <http://www.ontotext.com/count> Will trigger the evaluation of the query so that it will give a single result in which all the variable bindings in the projection will be replaced with a plain literal holding the value of the total number of solutions of the query, i.e. the equivalent of COUNT(*) from SQL. In the case of a CONSTRUCT query in which the projection contains three variables (?subject, ?predicate, ?object), the subject and the predicate will be bound to <http://www.ontotext.com/> and the object will hold the literal value. This is because there cannot exist a statement with literal in the place of the subject or predicate.
FROM/FROM NAMED <http://www.ontotext.com/skip-redundant-implicit> Will trigger the exclusion of implicit statements when there exists an explicit one within a specific context(even default). Initially implemented to allow for filtering of redundant rows where the context part is not taken into account and which leads to 'duplicate' results.
FROM <http://www.ontotext.com/distinct> Using this special graph name in DESCRIBE and CONSTRUCT queries will cause only distinct triples to be returned. This is useful when several resources are being described, where the same triple can be returned more than once, i.e. when describing its subject and its object.
FROM <http://www.ontotext.com/owlim/cluster/control-query> Identifies the query to the replication cluster as needing to be routed to all worker nodes.
Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.