Monday, July 16, 2012

Solr concept and Architecture

Conceptually, the Solr can be divided into four blocks:

    Mode (schema.xml)
    Configuration (the solrconfig.xml)
    Index
    Search

A Document contains one or more Field, A Field by name, content, and content approach of metadata. Analysis of the content to be searched. Analysis is completed by a Tokenizer and zero or more TokenFilter link to the Tokenizer to the input stream split into words (tags) TokenFilter can change (for example, stem) or remove the tag . Solr schema can easily configure the code analysis process. It also provides a more powerful type of feature that allows you to Field specific designated as String, int, float, or other type of original or custom.

Solr schema for the index is defined in schema.xml. it contains field type definitions within  the <types> element and the index's fields within <fields> elements. you may also notice <copyField> element, which copy an input field as provided to another field.

  1. Schema design decisions in which you map your source data to document limited structure.
  2. The structure of the schema.XML file where the schema definition is defined. Within this file are both the definition of field types and the fields of those types that store your data.
  3. Text analysis—the configuration of how text is processed (tokenized and so on) for indexing. This configuration affects whether or not a particular search is going to match a particular document.


In the configuration, the solrconfig.xml file specified not only Solr how to deal with the index, highlight, classify, search, and other requests, and also specifies the properties used to specify the cache approach, and used to specify the method of the Lucene index properties . Configuration depends on the model, but the pattern does not depend on the configuration.

Solr's solrconfig.xml file contains lots of parameters that can be tweaked. At the moment, we're just going to take a peak at the request handlers, which are defined with <requestHandler> elements. They make up about half of the file. In our first query, we didn't specify any request handler.

<requestHandler name="search" class="solr.SearchHandler default="true">
<!-- default values for query parameters can be specified, these will be overridden by parameters in the request -->
   <lst name="defaults">
   <str name="echoParams">explicit</str>
   <int name="rows">10</int>
   </lst>
</requestHandler>

Each HTTP request to Solr, including posting documents and searches, goes through a particular request handler. Handlers can be registered against certain URL paths by naming them with a leading "/".

<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />


No comments:

Post a Comment