Category

Solr

Configuring replication in Apache Solr

By | Apache, Blogs, Solr | No Comments

Written by Nirmal Prabhu, Former Cloud Engineer, Powerupcloud Technologies

Apache Solr Replication: In this Solr replication example, we will set up replication in Apache Solr and demonstrate how a new record gets replicated from the master to slave cores. For this, we will consider one master and one slave server. In the production environment, we will use different machines for hosting the master and the slave server.

Step 1: [Install java]

Install Java and set Environment variable.

Step 2: [Install Apache Solr]

To begin with let’s download the latest version of Apache Solr from Here.

Once the Solr zip file is downloaded unzip it into a folder. The extracted folder will look like the below.

We can start the server using the command line script. Let’s go to the bin directory from the command prompt and issue the following command

  • solr start

This will start the Solr server under the default port 8983.

We can now open the following URL in the browser and validate that our Solr instance is running. The specifics of solr admin tool is beyond the scope of the example.

http://localhost:8983/solr/

Step 3: [Configuring Solr — master]

In this section, we will show you how to configure the master core for a Solr instance. Apache Solr ships with an option called Schemaless mode. This option allows users to construct an effective schema without manually editing the schema file. For this example, we will use the reference configset sample_techproducts_configs.

Step 4: [Creating master Core]

First, we need to create a core for indexing the data. The Solr create command has the following options:

  • -c <name> — Name of the core or collection to create (required).
  • -d <confdir> — The configuration directory, useful in the SolrCloud mode.
  • -n <configName> — The configuration name. This defaults to the same name as the core or collection.
  • -p <port> — Port of a local Solr instance to send the create command to; by default the script tries to detect the port by looking for running Solr instances.
  • -s <shards> — Number of shards to split a collection into, default is 1.
  • -rf <replicas> — Number of copies of each document in the collection. The default is 1.

In this example, we will use the -c parameter for core name, -rf parameter for replication and -d parameter for the configuration directory.

Now navigate the solr-5.0.0\bin folder in the command window and issue the following command.

solr create -c master -d sample_techproducts_configs -p 8983 -rf 3

We can see the following output in the command window.

Now we can navigate to the following URL and see master core being populated in the core selector. You can also see the statistics of the core.

http://localhost:8983/solr/#/master

Step 5: [Modify solrconfig]

Open the file solrconfig.xml under the folder server\solr\master\conf

Solrconfig.xml

<requestHandler name=”/replication” class=”solr.ReplicationHandler” >

<lst name=”master”>

<str name=”enable”>${enable.master:true}</str>

<str name=”replicateAfter”>commit</str>

<str name=”confFiles”>schema.xml,stopwords.txt</str>

</lst>

<lst name=”slave”>

<str name=”enable”>${enable.slave:false}</str>

<str name=”masterUrl”>http://privateip:8983/solr</str>

<str name=”pollInterval”>00:00:60</str>

</lst>

</requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin

  • solr stop -all
  • solr start

Step 6: [Configuring Solr — slave]

The data from the master core will get replicated into both slaves. We will run the two slaves on the same machine with different ports along with the master core. To do so, extract another copy of solr server to a folder called solr1. Navigate to the solr-5.0.0\bin folder of solr1 in the command window and issue the following command.

  • solr start -p 9000

The -p option will start the solr server in a different port. For the first slave, we will use port 9000.

Now navigate to the solr-5.0.0\bin folder of the slave in the command window and issue the following command.

Now open the file solrconfig.xml under the folder server\solr\slave\confand add the configuration for the slave under the request handler tag. In the configuration, we will point the slave to the masterUrl for replication. The poll interval is set to 20 seconds. It is the time difference between two poll requests made by the slave.

Solrconfig.xml

<requestHandler name=”/replication” class=”solr.ReplicationHandler” >

<lst name=”slave”>

<! — fully qualified url for the replication handler of master. It is possible

to pass on this as

a request param for the fetchindex command →

<str name=”enable”>${enable.slave:true}</str>

<str name=”masterUrl”>http://privateip:8983/solr/master/replication</str>

<! — Interval in which the slave should poll master .Format is HH:mm:ss . If

this is absent slave does not

poll automatically.

But a fetchindex can be triggered from the admin or the http API →

<str name=”pollInterval”>00:00:20</str>

<str name=”httpBasicAuthUser”>Administrator</str>

<str name=”httpBasicAuthPassword”>2z)DVL.7FNs</str>

</lst>

</requestHandler>

Since we have modified the solrconfig we have to restart the solr server. Issue the following commands in the command window navigating to solr-5.0.0\bin

  • solr stop -all
  • solr start -p 9000

Now open the slave console using the following URL. The replication section will show the configuration reflecting the configuration we made in the solrconfig.

http://localhost:9000/solr/#/slave/replication

Step 7: [Indexing and Replication]

Now we will index the example data pointing to the master core. Apache Solr comes with a Standalone Java program called the SimplePostTool. This program is packaged into JAR and available with the installation under the folder example\exampledocs.

Now we navigate to the example\exampledocs folder in the command prompt and type the following command. You will see a bunch of options to use the tool.

java -jar post.jar -h

The usage format, in general, is as follows

Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg>

[<file|folder|url|arg>…]]

As we said earlier, we will index the data present in the “books.csv” file shipped with Solr installation. We will navigate to the solr-5.0.0\example\exampledocs in the command prompt and issue the following command.

java -Dtype=text/csv -Durl=http://localhost:8983/solr/master/update -jar post.jar books.csv

The System Properties used here are:

  • -Dtype — the type of data file.
  • -Durl — URL for the jcg core.

The file “books.csv” will now be indexed and the command prompt will display the following output.

Now open the console of the slave cores and we can see the data replicated automatically.

http://localhost:9000/solr/#/slave

Step 8: [Add new record]

Now we validate the replication further by adding a record to the master core. To do it, let’s open the master console URL.

http://localhost:8983/solr/#/master/documents

Navigate to the documents section and choose the document type as CSV and input the following content into the document text area and click on Submit.

id,cat,name,price,inStock,author,series_t,sequence_i,genre_s

123,book,Apache Solr,6.99,TRUE,Ram,JCG,1,Technical

The data will be added to master core and get replicated to the slave servers. To validate it lets navigate to the slave core. We can find the count of documents getting increased to 11. We can also use the query section in the slave admin console to validate it. Open the following URL.

http://localhost:9000/solr/#/slave/query

Input the values name: apache in the q text area and click on Execute Query. The new record we inserted on the master core will get reflected in the slave core.