At the end of November, we’ll be migrating the Sematext Logs backend from Elasticsearch to OpenSearch

Solr 6 as JDBC Data Source

April 26, 2016

Table of contents

Last week, in the Solr 6, SolrCloud and SQL Queries post, we described how the recent release of Solr 6 in its SolrCloud mode is able to understand SQL. But this is not the only SolrCloud / Solr 6. Another addition that we can use is the Solr JDBC driver. We can use it just like any other JDBC driver. In this blog post we will show how to use Solr JDBC driver from our code, which should give you an idea of how to proceed when using this functionality elsewhere, say with Apache Zeppelin or any other data exploration or visualization tool that has JDBC support.

Let’s look at using the standard JDBC driver API to retrieve data from Solr. That should be doable as we now have the ability to run SQL queries and SolrJ includes the JDBC driver.

We will start with setting up a very simple Maven project that will include all the needed libraries. To setup the project we will use the following command:

mvn archetype:generate -DgroupId=com.sematext.blog -DartifactId=solrjdbc -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

That will generate a simple project structure for us:

JDBC project structure

To use Solr using the JDBC bridge we need to modify the created pom.xml file to include the SolrJ version 6 there. The modified pom.xml file should look as follows:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.sematext.blog</groupId>
  <artifactId>solrjdbc</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>solrjdbc</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
      <groupId>org.apache.solr</groupId>
      <artifactId>solr-solrj</artifactId>
      <version>6.0.0</version>
    </dependency>
  </dependencies>
</project>

We do that so that Maven sees the additional library and allow JDBC driver usage.

Using the JDBC driver

We can now start using the driver. For the purpose of this blog post we will use the same collection that we’ve used when we were discussing the SQL support in SolrCloud. We will run a simple : query ordered by the id field and print out the id and the name field for each document that has been returned. The code that does that using the JDBC driver looks as follows:

package com.sematext.blog;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class App {
  public static void main(String[] args) {
    Connection connection = null;
    Statement statement = null;
    ResultSet resultSet = null;
    try {
      // define connection string - point to ZooKeeper and collection
      String connectionString = "jdbc:solr://127.0.0.1:9983?collection=gettingstarted&aggregationMode=map_reduce&numWorkers=1";
      // get the connection
      connection = DriverManager.getConnection(connectionString);
      // create statement
      statement = connection.createStatement();
      // run a simple query
      resultSet = statement.executeQuery("SELECT id, name FROM gettingstarted ORDER BY id");
      // iterate through the results
      while (resultSet.next()) {
        // read two fields
        String id = resultSet.getString("id");
        String name = resultSet.getString("name");
        // print out results
        System.out.println(id + ": " + name);
      }
    } catch (SQLException sqlEx) {
      System.out.println("Error while running SQL query to Solr: " + sqlEx.getMessage());
    } finally {
      // close the objects
      if (resultSet != null) {
        try {
          resultSet.close();
        } catch (Exception ex) {
        }
      }
      if (statement != null) {
        try {
          statement.close();
        } catch (Exception ex) {
        }
      }
      if (connection != null) {
        try {
          connection.close();
        } catch (Exception ex) {
        }
      }
    }
  }
}

We can run the code using Maven by running the following command:

mvn exec:java -Dexec.mainClass="com.sematext.blog.App"

The execution results should look as follows:

0579B002: [Canon PIXMA MP500 All-In-One Photo Printer]
100-435805: [ATI Radeon X1900 XTX 512 MB PCIE Video Card]
3007WFP: [Dell Widescreen UltraSharp 3007WFP]
6H500F0: [Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300]
9885A004: [Canon PowerShot SD500]
EN7800GTX/2DHTV/256M: [ASUS Extreme N7800GTX/2DHTV (256 MB)]
EUR: [One Euro]
F8V7067-APL-KIT: [Belkin Mobile Power Cord for iPod w/ Dock]
GB18030TEST: [Test with some GB18030 encoded characters]
GBP: [One British Pound]
IW-02: [iPod & iPod Mini USB 2.0 Cable]
MA147LL/A: [Apple 60 GB iPod with Video Playback Black]
NOK: [One Krone]
SOLR1000: [Solr, the Enterprise Search Server]
SP2514N: [Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133]
TWINX2048-3200PRO: [CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail]
USD: [One Dollar]
UTF8TEST: [Test with some UTF-8 encoded characters]
VA902B: [ViewSonic VA902B - flat panel display - TFT - 19"]
VDBDB1A16: [A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM]
VS1GB400C3: [CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail]
adata: null
apple: null
asus: null
ati: null
belkin: null
canon: null
corsair: null
dell: null
maxtor: null
samsung: null
viewsonic: null

As you can see we’ve got all the documents from our gettingstarted collection (32 documents). Not all documents have the name field, thus we have some null values, but that is not important. What is important is that the JDBC driver allowed us to go through the results using the JDBC bridge.

Pay attention to four things

There are four things that you should pay attention to when using SolrJ JDBC driver:

  1. You need to specify the ZooKeeper address and the collection name when providing the connection string. You can see that on line 16 in the source code listing. We’ve specified the ZooKeeper address just after the jdbc:solr://, in our case we are using the embedded ZooKeeper, so it runs on the port that is 1000 larger than the one Solr is using. We also specified the collection using the collection parameter. We can also specify the aggregation mode and the number of workers that should be used.
  2. We use the defined connection parameters in the line 18 to obtain java.sql.Connection object that we will use as the connection to our collection.
  3. Once we have the Connection object, we can use it (line 20) to obtain the java.sql.Statement object, to be able to run SQL queries.
  4. Finally, on line 22, we are running the query itself. We do that by executing the executeQuery method on the Statement object that we already got on line 20. The executeQuery method will return the java.sql.ResultSet object which allows us to work with the results. We iterate through the results until the next method of the ResultSet instance returns false.

What’s next?

The future is bright as they say. As you can see, using the JDBC driver is very simple and developers used to connecting to and working with SQL databases should have an easy start when working with Solr and its JDBC driver. Don’t wait too long, start using it and let us know what kind of nice features you developed using Solr and its SQL support! 🙂
If you need any help with Solr / SolrCloud – don’t forget @sematext does Solr Consulting, Production Support, as well as Solr Training!

Java Logging Basics: Concepts, Tools, and Best Practices

Imagine you're a detective trying to solve a crime, but...

Best Web Transaction Monitoring Tools in 2024

Websites are no longer static pages.  They’re dynamic, transaction-heavy ecosystems...

17 Linux Log Files You Must Be Monitoring

Imagine waking up to a critical system failure that has...