Last week, in the Solr 6, SolrCloud and SQL Queries post, we described how the recent release of Solr 6 in its SolrCloud mode is able to understand SQL. But this is not the only SolrCloud / Solr 6. Another addition that we can use is the Solr JDBC driver. We can use it just like any other JDBC driver. In this blog post we will show how to use Solr JDBC driver from our code, which should give you an idea of how to proceed when using this functionality elsewhere, say with Apache Zeppelin or any other data exploration or visualization tool that has JDBC support.
Let’s look at using the standard JDBC driver API to retrieve data from Solr. That should be doable as we now have the ability to run SQL queries and SolrJ includes the JDBC driver.
We will start with setting up a very simple Maven project that will include all the needed libraries. To setup the project we will use the following command:
mvn archetype:generate -DgroupId=com.sematext.blog -DartifactId=solrjdbc -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false
That will generate a simple project structure for us:
To use Solr using the JDBC bridge we need to modify the created pom.xml file to include the SolrJ version 6 there. The modified pom.xml file should look as follows:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.sematext.blog</groupId> <artifactId>solrjdbc</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>solrjdbc</name> <url>http://maven.apache.org</url> <dependencies> <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId> <version>6.0.0</version> </dependency> </dependencies> </project>
We do that so that Maven sees the additional library and allow JDBC driver usage.
Using the JDBC driver
We can now start using the driver. For the purpose of this blog post we will use the same collection that we’ve used when we were discussing the SQL support in SolrCloud. We will run a simple : query ordered by the id field and print out the id and the name field for each document that has been returned. The code that does that using the JDBC driver looks as follows:
package com.sematext.blog; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class App { public static void main(String[] args) { Connection connection = null; Statement statement = null; ResultSet resultSet = null; try { // define connection string - point to ZooKeeper and collection String connectionString = "jdbc:solr://127.0.0.1:9983?collection=gettingstarted&aggregationMode=map_reduce&numWorkers=1"; // get the connection connection = DriverManager.getConnection(connectionString); // create statement statement = connection.createStatement(); // run a simple query resultSet = statement.executeQuery("SELECT id, name FROM gettingstarted ORDER BY id"); // iterate through the results while (resultSet.next()) { // read two fields String id = resultSet.getString("id"); String name = resultSet.getString("name"); // print out results System.out.println(id + ": " + name); } } catch (SQLException sqlEx) { System.out.println("Error while running SQL query to Solr: " + sqlEx.getMessage()); } finally { // close the objects if (resultSet != null) { try { resultSet.close(); } catch (Exception ex) { } } if (statement != null) { try { statement.close(); } catch (Exception ex) { } } if (connection != null) { try { connection.close(); } catch (Exception ex) { } } } } }
We can run the code using Maven by running the following command:
mvn exec:java -Dexec.mainClass="com.sematext.blog.App"
The execution results should look as follows:
0579B002: [Canon PIXMA MP500 All-In-One Photo Printer] 100-435805: [ATI Radeon X1900 XTX 512 MB PCIE Video Card] 3007WFP: [Dell Widescreen UltraSharp 3007WFP] 6H500F0: [Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300] 9885A004: [Canon PowerShot SD500] EN7800GTX/2DHTV/256M: [ASUS Extreme N7800GTX/2DHTV (256 MB)] EUR: [One Euro] F8V7067-APL-KIT: [Belkin Mobile Power Cord for iPod w/ Dock] GB18030TEST: [Test with some GB18030 encoded characters] GBP: [One British Pound] IW-02: [iPod & iPod Mini USB 2.0 Cable] MA147LL/A: [Apple 60 GB iPod with Video Playback Black] NOK: [One Krone] SOLR1000: [Solr, the Enterprise Search Server] SP2514N: [Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133] TWINX2048-3200PRO: [CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail] USD: [One Dollar] UTF8TEST: [Test with some UTF-8 encoded characters] VA902B: [ViewSonic VA902B - flat panel display - TFT - 19"] VDBDB1A16: [A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM] VS1GB400C3: [CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail] adata: null apple: null asus: null ati: null belkin: null canon: null corsair: null dell: null maxtor: null samsung: null viewsonic: null
As you can see we’ve got all the documents from our gettingstarted collection (32 documents). Not all documents have the name field, thus we have some null values, but that is not important. What is important is that the JDBC driver allowed us to go through the results using the JDBC bridge.
Solr 6 comes with a JDBC Driver. Here's how to use it: https://t.co/JkKMXPaTtW #solr6 #jdbc #solrcloud
— Sematext Group, Inc. (@sematext) April 27, 2016
Pay attention to four things
There are four things that you should pay attention to when using SolrJ JDBC driver:
- You need to specify the ZooKeeper address and the collection name when providing the connection string. You can see that on line 16 in the source code listing. We’ve specified the ZooKeeper address just after the jdbc:solr://, in our case we are using the embedded ZooKeeper, so it runs on the port that is 1000 larger than the one Solr is using. We also specified the collection using the collection parameter. We can also specify the aggregation mode and the number of workers that should be used.
- We use the defined connection parameters in the line 18 to obtain java.sql.Connection object that we will use as the connection to our collection.
- Once we have the Connection object, we can use it (line 20) to obtain the java.sql.Statement object, to be able to run SQL queries.
- Finally, on line 22, we are running the query itself. We do that by executing the executeQuery method on the Statement object that we already got on line 20. The executeQuery method will return the java.sql.ResultSet object which allows us to work with the results. We iterate through the results until the next method of the ResultSet instance returns false.
What’s next?
The future is bright as they say. As you can see, using the JDBC driver is very simple and developers used to connecting to and working with SQL databases should have an easy start when working with Solr and its JDBC driver. Don’t wait too long, start using it and let us know what kind of nice features you developed using Solr and its SQL support! 🙂
If you need any help with Solr / SolrCloud – don’t forget @sematext does Solr Consulting, Production Support, as well as Solr Training!