Core Solr – 2 Day Workshop in London, UK


For those of you interested in some comprehensive Solr training taught by an expert from Sematext who know it inside and out, we’re running a super hands-on training workshop in London on April 4 and 5. In two days of training Rafal will:
  1. Bring Solr novices to the level where he/she would be comfortable with taking Solr to production
  2. Give experienced Solr users proven and practical advice based on years of experience designing, tuning, and operating numerous Solr clusters to help with their most advanced and pressing issues
Rafal Kuć

This two-day workshop will be taught by Sematext engineer — and author of Solr books.

Rafal Kuć

Audience / Level / Pre-requisites


Developers who want to configure, tune and manage Solr at scale and learn about a wide array of Solr features.


Attendees are encouraged to arrive at least 20 minutes early to class and on time after each break. Important We require all participants to bring their own laptop during the workshop. Laptops are required with the latest version of Java installed in Mac, Linux or Windows. You should be comfortable using a terminal or command line.

Course Structure

Each section is followed by a lab with multiple hands​on exercises.

What We Provide

For this training Sematext provides:
  • A digital copy of the training materials will be available on the portal 48 hours prior to the training course. Please read our Public Training Agreement
  • Refreshments. This usually includes coffee, tea, juices, soft drinks, and water to keep you hydrated.
  • Snacks. This usually includes croissants, bagels, danishes, or other pastry.

Course Outline


  1. Introduction to Solr
    • What is Solr and use ­ cases
    • Solr master ­ slave architecture
    • SolrCloud architecture
    • Solr master ­ slave vs SolrCloud
    • Starting Solr with schema­less configuration
    • Inverted index
    • TF/IDF basics
    • Indexing documents
    • Retrieving documents using URI request
    • Deleting documents
  2. Indexing data
    • Data structure
    • Index structure configuration
    • Defining custom field types
    • String vs Text based types
    • Basic field usage examples
    • Tokenizers
    • Char filters
    • Filters
    • Stemming
    • Dynamic fields
    • Copy fields
    • Running Solr with our own configuration
    • XML data format explained
    • JSON data format explained
    • CSV data format explained
    • Batch indexing
    • Doc values
    • Additional field properties
    • Nested documents support
  3. Searching
    • Simple URI search
    • Paging
    • Sorting
    • Choosing display fields
    • Pseudo fields
    • Debug query
    • Lucene query language
    • Standard query parser
    • Dismax query parser
    • Extended dismax query parser
    • Examples of other parsers
    • Timing out searches
    • Using cursor for deep paging
    • Nested documents support
    • Dealing with relevancy
  4. Data analysis
    • Introduction to faceting
    • Basic use cases
    • Field faceting
    • Field prefix faceting
    • Sorting faceting results
    • Limiting faceting
    • Faceting execution control
    • Range faceting
    • Query faceting
    • Hierarchical faceting
    • Interval faceting
    • Clustering component
  5. JSON facets
    • JSON facets
    • Facet functions
    • Nested JSON facets
    • Execution type
  6. Spatial search
    • Indexing spatial data
    • patial filters
    • Distance function queries
    • Bounding box field
    • Heatmap faceting
  7. Beyond Search ­ highlighting and More Like This
    • Introduction to highlighting
    • Highlighting query hits
    • Specifying fields to highlight
    • Choosing highlighting tags
    • Using FastVectorHighlighter
    • Using PostingsHighlighter
    • Finding similar documents
    • Prerequisites for More Like This functionality
    • Configuring More Like This functionality
  8. Beyond Search ­Spellchecking
    • Spellchecker with its own index
    • File based spellchecker
    • Index based spellchecker
    • Building spellchecker
    • Including spell checking results with queries
    • Querying spellchecker independently
    • Maximum number of suggestions
    • Collation
    • Controlling collation
    • Accuracy
    • Extended results
  9. Beyond Search ­ Suggesters
    • What are suggesters
    • Suggester types
    • Configuring suggesters
    • Using different dictionary factories
  10. Beyond Search ­ Documents grouping
    • Grouping documents by field value
    • Grouping documents by function value
    • Grouping documents by query
    • Paging in grouped results
    • Controlling number of groups and documents count
    • Sorting inside groups
    • Documents grouping and faceting
    • Using collapse query parser
    • Using expand component
  11. Controlling relevance algorithm
    • Types of similarity models
    • Configuring global similarity
    • Configuring per field similarity
    • Use cases for similarity models
  12. Function queries
    • Using function queries
    • Math function queries
    • Term function queries
    • Example use cases
    • Boosting by using functions
    • Sorting by function
    • External file field type
    • Using external file field type for boosting
  13. Search under control
    • Routing
    • Index time routing
    • Query time routing
    • Basic syntax for local params
    • Parameter dereferencing
    • Using parameter dereferencing in handlers configuration
    • Using filters tagging
    • Using faceting exclusions
    • Re­ranking queries results
  14. Configuring Solr
    • General solrconfig.xml sections
    • Lucene directory configuration
    • Schema factory settings
    • Merge policy
    • Merge scheduler
    • Transaction log configuration
    • Replication
    • Update request processors
    • Language detection
    • Schema API
    • Managed resources
  15. Tuning Solr
    • Indexing threads
    • Indexing buffer size
    • Auto commit tuning
    • Caches
    • Replication throttling
    • Warming up
  16. Scaling Solr
    • Proper Solr master configuration
    • Proper Solr slaves configuration
    • Multiple masters architecture
    • Setting up Solr slaves for multiple masters
    • Indexing data in multi­master environment
    • Querying in multi­master environment
  17. Scaling SolrCloud
    • ZooKeeper role explained
    • Uploading configuration to ZooKeeper
    • Sharding
    • Using collections API
    • Cluster state explained
    • Creating replicas
    • Removing replicas
    • Caches in SolrCloud
    • Shard splitting
    • Migrating data between collections
    • Aliases
  18. Streaming aggregations
    • Streaming expressions basics
    • Types of functions
    • Export request handler
    • Requirements
  19. Term Vectors
    • What are term vectors
    • Retrieving additional information from Solr
    • Understanding term vector component
  20. Operations
    • Running Solr as a service on Linux and Windows systems
    • Backing up Solr master ­ slave
    • Backing up SolrCloud
    • Current cluster state view
    • Creating new handlers
    • Authentication and authorization
    • Monitoring using JMX
    • Monitoring using SPM
  21. Data Import Handler
    • Configuring data import handler
    • Indexing data from SQL database
    • Indexing data from SQL database using delta imports
    • Deleting data from Solr when data changes
    • Indexing data from XML files
  22. Developer API’s
    • Connecting to Solr using Java
    • Connecting to SolrCloud using Java
    • Using SolrJ to index data
    • Using SolrJ to query Solr
    • Connecting to Solr using Python
    • Using ​pySolr​to index data
    • Using ​pySolr​to query Solr
    • Streaming aggregations explained
  23. Ecosystem
    • Using Logstash with Solr
    • Solritas as the out of the box tool for data discovery
    • Visualizing data using Banana