←  Back to blog

Presenting Fuse 1.1: Document Search… Simplified


Written by Jason Simon

April 24, 2015

We know things can always be better, faster, simpler. As software and product developers, we’re always on the hunt for improvement – and over the past few months, we’ve been improving and simplifying document search. If you have a need for higher precision, more efficient, and unified document search that combines structured and unstructured data… keep reading.

Document Service (Document Ingest, Text Extraction, and OCR)

Our new document service makes it extraordinarily simple to get content from your documents into the Fuse platform. Ingest data with directory monitoring, API polling, or push documents to Fuse via our RESTful API. Everything else is completely hands-off. The document service will automatically extract content from any document, regardless of type. It will even parse images using our built-in OCR technology – Fancy!

Faster Data Ingest and Higher Compression

Getting you up to speed with document search quickly not only means fast deployment and implementation, but also fast data ingest. We replaced our MongoDB content layer with LevelDB, dramatically improving write times. Oh, this swap also allows us to use compressed storage making more efficient use of your valuable hardware.

Lemmatization

While we champion the importance of structured data to quickly find, analyze, and understand your data, documents contain a bulk of unstructured data. To that end, we’ve bolstered our search capabilities for unstructured data through lemmatization. Lemmatization is a more sophisticated form of root word matching allowing you to search for one form of a word and receive all forms of that word back in your results. The illustration below explains it better…

lemma_illustration

Other Improvements

Simplified Data Model
Our data model already provides a more appropriate, understandable, and simplified view of your data. We’ve made building the data model easier by adhering to the notion of convention over configuration for several attributes including primary keys.

Stability, Stability, Stability
As with all releases, bug squashing is a vital necessity. While we have taken immense pleasure squashing bugs in this release, we more importantly spent a lot of time building out sophistication in our testing, integration, and release processes. And boy does it feel good!

Early Access Features

We’re well on our way with big features for our next release – still focusing on document search. Two of these features are available for early access, so if you’re interested let us know.

Entity Extraction Service
What happens when you don’t have enough structured data to perform high precision search and analysis? You use the entity extraction service we’ve built. With it, you can generate structured data through pattern matching and named-entity lists. Awesome!

Docker Deployment
We’ve been impressed by Docker’s precipitous adoption curve. So much so that we’re jumping on-board to fully support it. Deploying our platform has never been easier.

New and Upcoming

New Dashboard
If you haven’t seen it already, we launched an awesomely interactive dashboard in the Verizon Innovation Center. Over the past few weeks, we’ve worked on simplifying the setup and customization so anyone can use this type of dashboard for their own data. Take a look at the VIC demo here.

Statistics Index
Usage and monitoring statistics are at the heart of any well oiled application. So we’ve been busy making them magnificent. By loading all of this data into a Fuse index itself, the full power and potential of search and analysis is available to monitor and understand your Fuse implementation.

In Closing

In the spirit of the hunt, our developers are continuing to build a better, faster, and simpler document search. Keep a lookout for our next release in a few months or get in touch for early access feature testing.