Now that I have migrated my pocket bookmarks to readeck, I wanted to create a unified local search that will index these bookmarks and my internal documentation - stored in outline. Fortunately both readeck and outline have APIs, so it was ‘just’ a case of finding a suitable search application. Enter typesense.
Typesense is an open-source, typo-tolerant search engine optimized for instant (typically sub-50ms) search-as-you-type experiences and developer productivity.
Ultimately, I deployed typesense in kubernetes but for a quick start, the documentation provides a docker compose file. With this, we can have typesense up and running easily:
services:
typesense:
image: typesense/typesense:27.1
restart: on-failure
ports:
- "8108:8108"
volumes:
- ./typesense-data:/data
command: '--data-dir /data --api-key=xyz --enable-cors'
Create the typesense-data directory and off we go:
mkdir "$(pwd)"/typesense-data
docker-compose up
Within typesense, data is stored in collections. Before we can start searching we need to create a collection and add some data to it:
My collection is quite simple with just three fields (title
, text
and url
) to start with. In the snippet below, I am using https
and an internal subdomain, but we could also just connect to localhost
with port 8108
and http
.
const Typesense = require('typesense')
let client = new Typesense.Client({
'nodes': [{
'host': 'typesense.example.com', // For Typesense Cloud use xxx.a1.typesense.net
'port': 443, // For Typesense Cloud use 443
'protocol': 'https' // For Typesense Cloud use https
}],
'apiKey': '<API_KEY>',
'connectionTimeoutSeconds': 2
})
let homeSchema = {
'name': 'home',
'fields': [
{'name': 'title', 'type': 'string' },
{'name': 'text', 'type': 'string'},
{'name': 'url', 'type': 'string' },
]
}
client.collections().create(homeSchema)
.then(function (data) {
console.log(data)
})
A this point, I have not created a default_sorting field as I haven’t yet decided how I want the results to be sorted.
The typesense documentation provides example code to load a json file. I experimented initially with this but I for the actual search, I wanted to load data from the outline and readeck APIs. Luckily, both these APIs are very similar with a list
endpoint and a content
endpoint. All that is needed is to loop through and post to typesense. Here’s the code for outline, it uses an API token for access - these are created in the outline UI. I’m also removing the markdown code from within the outline documents so they index better and display cleanly in the typesense search results:
const axios = require('axios');
const rateLimit = require('axios-rate-limit');
const axiosRateLimited = rateLimit(axios.create(), { maxRequests: 1, perMilliseconds: 1000});
const removeMarkdown = require('remove-markdown');
let bearerOutline = "outline_bearer_code"
let urlOutline = "https://outline.example.com"
async function getOutlineDocumentList(offset)
{
let apiUrl = urlOutline + "/api/documents.list"
try {
let data = {
'limit': apiListLimit,
'offset': offset
};
let config = {
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + bearerOutline
}}
let r = await axiosRateLimited.post(apiUrl, data, config)
return r.data.data;
} catch (error) {
console.error('Error getting document list')
}
}
async function getOutlineDocument(d)
{
let apiUrl = urlOutline + "/api/documents.info"
try {
let data = {
'id': d.id
};
let config = {
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer ' + bearerOutline
}}
let r = await axiosRateLimited.post(apiUrl, data, config)
return r.data.data;
} catch (error) {
console.error('Error getting document')
}
}
async function loadOutline()
{
let offset = 0;
while (true) {
let docs = await getOutlineDocumentList(offset * apiListLimit);
if (docs.length == 0) {
console.log('Finished getting outline documents');
return;
}
for (const d of docs)
{
console.log(`Getting ${d.id}`)
let outlineDocument = await getOutlineDocument(d);
let newSearchEntry = {};
newSearchEntry.id = outlineDocument.id;
newSearchEntry.title = outlineDocument.title;
newSearchEntry.url = urlOutline + outlineDocument.url;
newSearchEntry.text = removeMarkdown(outlineDocument.text);
client.collections('home').documents().upsert(newSearchEntry);
}
offset++;
}
}
loadOutline();
Amazingly, there is only one line for typesense:
client.collections('home').documents().upsert(newSearchEntry);
Before we create a search UI application, we can check everything is working from the command line.
curl "https://typesense.example.com/collections/home/documents/search?q=typesense&query_by=title&x-typesense-api-key=<API_KEY>"
The typesense documentation is pretty helpful. Again I’m following it here… The folks over at Algolia have built and open-sourced Instantsearch.js which is a collection of out-of-the-box UI components that you can use to build interactive search experiences quickly. Typesense have built an adapter that uses the same Instantsearch widgets, but send the queries to Typesense instead. There is a basic javascript example (without using any package managers) available at https://github.com/typesense/typesense-instantsearch-demo-no-npm-yarn . I cloned thie repo as the basis of a simple app. I haven’t yet even changed the title or page header…
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="theme-color" content="#000000">
<link rel="manifest" href="./manifest.webmanifest">
<link rel="shortcut icon" href="./favicon.png">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/instantsearch.css@7/themes/algolia-min.css">
<link rel="stylesheet" href="index.css">
<title>Typesense InstantSearch.js Demo</title>
</head>
<body>
<header class="header">
<h1 class="header-title">
<a href="/">Instant Search Demo</a>
</h1>
<p class="header-subtitle">
using
<a href="https://github.com/algolia/instantsearch.js">
Typesense + InstantSearch.js
</a>
</p>
</header>
<div class="container">
<div class="search-panel">
<div class="search-panel__results">
<div id="searchbox"></div>
<div id="hits"></div>
</div>
</div>
<div id="pagination"></div>
</div>
<script src="https://cdn.jsdelivr.net/npm/instantsearch.js@4.44.0"></script>
<script src="https://cdn.jsdelivr.net/npm/typesense-instantsearch-adapter@2/dist/typesense-instantsearch-adapter.min.js"></script>
<script>
// Adapted from https://github.com/typesense/typesense-instantsearch-demo-no-npm-yarn
// Search API parameters are at https://www.algolia.com/doc/api-reference/search-api-parameters/
function getQueryParam(param) {
const urlParams = new URLSearchParams(window.location.search);
return urlParams.get(param);
}
function getSearchParam() {
let param = urlParams.get(param);
let q = '';
if (typeof(param) !== 'undefined') {
q = param;
}
return q;
}
const typesenseInstantsearchAdapter = new TypesenseInstantSearchAdapter({
server: {
apiKey: 'API-KEY', // Be sure to use an API key that only allows searches, in production
nodes: [
{
host: 'typesense.example.com',
port: '443',
protocol: 'https',
},
],
},
// The following parameters are directly passed to Typesense's search API endpoint.
// So you can pass any parameters supported by the search endpoint below.
// queryBy is required.
// filterBy is managed and overridden by InstantSearch.js. To set it, you want to use one of the filter widgets like refinementList or use the `configure` widget.
additionalSearchParameters: {
queryBy: 'title,text,url',
},
});
const searchClient = typesenseInstantsearchAdapter.searchClient;
const search = instantsearch({
searchClient,
indexName: 'home',
});
search.addWidgets([
instantsearch.widgets.searchBox({
container: '#searchbox',
}),
instantsearch.widgets.configure({
query: getQueryParam('q'),
distinct: 1,
attributeForDistinct: 'url',
hitsPerPage: 16,
}),
instantsearch.widgets.hits({
container: '#hits',
templates: {
item(item) {
return `
<div>
<div class="hit-name">
${item._highlightResult.title.value}
</div>
<div class="hit-text">
${item._snippetResult.text.value}
</div>
<a href="${item._highlightResult.url.value}">${item._highlightResult.url.value}</a>
</div>
`;
},
},
}),
instantsearch.widgets.pagination({
container: '#pagination',
}),
]);
window.onload = () => {
const searchParam = getSearchParam();
if (searchParam) {
document.getElementById('searchbox').value = searchParam;
}
};
search.start();
</script>
</body>
</html>
There are a couple of points to call out - firstly, we update the config to point to our server:
server: {
apiKey: 'API-KEY', // Be sure to use an API key that only allows searches, in production
nodes: [
{
host: 'typesense.example.com',
port: '443',
protocol: 'https',
},
],
},
additionalSearchParameters: {
queryBy: 'title,text,url',
},
And our collection:
const search = instantsearch({
searchClient,
indexName: 'home',
});
We use our field names for the results (hits) page:
instantsearch.widgets.hits({
container: '#hits',
templates: {
item(item) {
return `
<div>
<div class="hit-name">
${item._highlightResult.title.value}
</div>
<div class="hit-text">
${item._snippetResult.text.value}
</div>
<a href="${item._highlightResult.url.value}">${item._highlightResult.url.value}</a>
</div>
`;
},
},
}),
There’s one last change in the code, for integrating with homepage - I wanted to be able to pass a search query using the standard ?q=search_term
page url. To do this we need get the q
parameter from the URL, if it exists:
function getQueryParam(param) {
const urlParams = new URLSearchParams(window.location.search);
return urlParams.get(param);
}
function getSearchParam() {
let param = urlParams.get(param);
let q = '';
if (typeof(param) !== 'undefined') {
q = param;
}
return q;
}
window.onload = () => {
const searchParam = getSearchParam();
if (searchParam) {
document.getElementById('searchbox').value = searchParam;
}
};
We can then use it to drive the query:
instantsearch.widgets.configure({
query: getQueryParam('q'),
distinct: 1,
attributeForDistinct: 'url',
hitsPerPage: 16,
}),
I use homepage as my default browser start page and homepage has a search widget. We can easily enable our search engine with a little yaml:
widgets.yaml: |
- search:
provider: custom
focus: true
url: http://search.example.com/?q=
target: _blank
suggestionUrl: http://search.example.com/search/?q= # Optional
showSearchSuggestions: true # Optional
The homepage page search widget also supports suggestions… We can enable these with small express
json
server. The homepage documentation provides the format needed in response body for the URL provided with the suggestionUrl
- the first entry of the array contains the search query, the second one is an array of the suggestions. In the example above, the search query was home.
[
"home",
[
"home depot",
"home depot near me",
"home equity loan",
"homeworkify",
"homedepot.com",
"homebase login",
"home depot credit card",
"home goods"
]
]
The code is essentially the typesense documentation javascript example with a little extra code to handle the ?q= url
parameter and create the response json
in the correct format.
const express = require('express');
const url = require('url');
const typesense = require('typesense')
const app = express();
const router = express.Router();
const port = 8080;
app.use('/', express.static('search'))
// set the server to listen on port 3000
app.listen(port, () => console.log(`Listening on port ${port}`));
app.get('/search', function (req, res) {
var urlParts = url.parse(req.url, true);
var parameters = urlParts.query;
var q = '';
if (typeof(parameters.q) == 'undefined')
{
res.json({text: 'search parameter is required'});
return;
} else {
q = parameters.q;
}
let searchParameters = {
'q' : q,
'query_by' : 'title,text'
}
let client = new typesense.Client({
'nodes': [{
'host': 'typesense.example.com', // For Typesense Cloud use xxx.a1.typesense.net
'port': 443, // For Typesense Cloud use 443
'protocol': 'https' // For Typesense Cloud use https
}],
'apiKey': 'API-KEY',
'connectionTimeoutSeconds': 2
});
let searchResult = [];
let searchResultReturn = [];
client.collections('home')
.documents()
.search(searchParameters)
.then(function (searchResults) {
for (h of searchResults.hits)
{
searchResult.push(
h.document.title
);
}
searchResultReturn.push(q);
searchResultReturn.push(searchResult);
res.json(searchResultReturn);
})
});
Using the docker compose example above, it was straight forward to deploy a basic typesense kubernetes instance.
The only tricky part was getting the correct statement for the container command. This seems to be working for me:
command: ['/opt/typesense-server', '--data-dir', '/data', '--api-key', 'API-KEY', '--enable-cors']
Full yaml:
---
apiVersion: v1
kind: Service
metadata:
name: typesense
namespace: search
spec:
selector:
app: typesense
ports:
- protocol: TCP
port: 8108
targetPort: 8108
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: typesense
namespace: search
labels:
app: typesense
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: typesense
template:
metadata:
labels:
app: typesense
spec:
containers:
- name: typesense
image: typesense/typesense:27.1
command: ['/opt/typesense-server', '--data-dir', '/data', '--api-key', 'xyz', '--enable-cors']
ports:
- containerPort: 8108
volumeMounts:
- name: typesense
mountPath: /readeck
volumes:
- name: typesense
persistentVolumeClaim:
claimName: typesense
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: typesense
namespace: search
labels:
app: typesense
spec:
storageClassName: ceph-block
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 1Gi
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: typesense
spec:
entryPoints:
- websecure
routes:
- match: Host(`typesense.example.com`)
kind: Rule
services:
- name: typesense
port: 8108
That’s it for now. I have a few improvements to make, such as: creating a CronJob to update the search data, tidying up the search UI results page, adding more content to the search index and investigating a high-availability deployment for typesense, but this is good for now!