sematext

home · products · services · technology · clients · testimonials · jobs · about · contact · blog
Sematext implements Open-Source Search, Natural Language Processing, and Text Analytics technology in the enterprise.

We focus on the design and development of scalable, high-performance search and solutions.

"We've worked closely with Sematext since day 1 of developing our Salesforce Content product. Their real-world expertise in designing and scaling Lucene and SOLR based solutions has proved invaluable."
-- Tim Barker, Sr. Director Product Management, Salesforce Content.
Recent projects:
  • Horizontally scalable, multi-cluster, distributed search architecture for Solr indexing and searching close to 1 billion documents
  • 30% Solr search performance improvement for an e-commerce site handling 100 million queries/month
  • Solr cluster for multilingual indexing and searching for over a dozen languages
  • Content-processing framework with Topic Classification, Named Entity Recognition, Sentiment Detection, and Key Phrase Extraction with Solr multi-core and Distributed Search of over 350M documents
  • Integration of AutoComplete and DYM ReSearcher with Solr for the largest bookstore on its continent
  • Advertising click and impression log mining and reporting, utilizing Hadoop's MapReduce
  • Nutch and Solr-based country-wide search engine with cultural and linguistic awareness, utilizing Amazon's EC2 and S3
  • ...