Jump to content

Coryssa gets a search makeover


rasiel

Recommended Posts

I'm happy to announce a significant improvement to the search code on Coryssa.org has posted a few hours ago and is now available for everyone to try out.

The main complaints users had been running into were fruitless searches, whether because they got few hits or because they were flooded with irrelevant ones. I did my best to steer people away from over-reliance on text search (as that chucks out most of the advantages of an indexed database) but, hey, I realize that swimming against the current isn't the best method for winning over converts.

So for those of you who do their magic in a text box I think you will be very pleased with the new code. In addition to the old "natural language" method of search (yeah, it does kinda suck) you now have these new features:

  • Find exact text strings by placing within quotes. Before if you typed a common two word pattern like RIC 123 the search would retrieve all those records which had either or both of these words. Obviously, this can drown you in low relevancy results. By searching for "RIC 123" you will now find records where RIC is followed by 123 which should greatly aid in getting you the results you actually meant. 
  • Support for Boolean operators. Use of AND, OR and NOT in between your target text is a basic but effective way to filter results. For example, typing: gold AND stater NOT silver  <-- this is pretty much self-explanatory
  • Support for wildcards. This one is especially handy for those cases when your keyword is spelled twenty different ways. Use a question mark as a stand-in for any one character and an asterisk to mean any string of letters immediately following. So, for example: si?il* will find Sicily, Sicilia, Sizilien, etc.
  • Mix of filters, dates and field restrictions. The control panel now allows for multiple simultaneous filters to augment your text search. You can go wild with this! Say you vaguely remember a Julius Caesar denarius selling ten years ago for around $500 but you can't recall any details on the description. Maybe it was written in French. Alright, so you cast a wide net on the search string with a clever "jul* caesar" den* (this covers spellings Julius and Jules along with denarius and denier) but since you anticipate this will yield an avalanche of hits you can now additionally specify a date range between 2010 and 2020 and also a weight range between 3 and 4 grams. You get the idea :- )
  • New "search within search" tickbox lets you make a new search limiting the search-space to current set of results. This is useful in cases where you were looking for Justinian folles but your initial search dumped in a bunch of coins of Justin. Rather than start afresh you can just make a secondary search using a NOT Justin. Less hay to make the needle easier to find!

Two more options are available if you're willing to make an account. This is because they put more strain on the server and should help cut down on bot-generated searches.

  • Pattern matching: You can now get really creative with your searchcraft by adding leading wildcards and concatenating keywords. Honestly, I haven't begun to scratch the surface here and haven't played around with complex queries but a basic outline would be something like ?obol -(owl athen*) which would find records containing Obol, Diobol, Hemi-obol, etc. but omit if contains string "owl" or words beginning with "athen" (Athens, Athena, etc.)
  • Regular Expressions: This right here is the ultimate weapon in data search. Regex is a set of syntax protocols generally used by programmers to parse user input into a database - or to retrieve data according to a list of rather cryptic parameters. A bit intimidating for beginners but a must learn for researchers. Once you get the hang of it you'll probably never go back to any of the other methods described above. You activate this feature when you enclose a search with backslashes. Even a basic overview is beyond the scope here but to give you just a glimpse entering \SNG\s+(COP|Copenhagen)\s+([1-8]\d\d|9[0-4]\d)\D*\ will find all instances of string SNG Copenhagen followed by any number higher than 100 but lower than 950 (yyyiiiikes!!)

Now that we have a mature set of tools for data retrieval it's time to work on the other subsystems. The programmer has a good-sized list of items to keep him busy for months but I'm happy to take feedback and refocus efforts where it's most needed. Bugs, new features, redo of this or that... share your own wishlist!

Rasiel

Edited by rasiel
typo
  • Like 8
  • Thanks 4
  • Clap 1
  • Heart Eyes 1
Link to comment
Share on other sites

Posted · Supporter
2 minutes ago, Rand said:

Thank you!

Checking ... AND Boolean operator increased not decreased the number of identified coins. What could be the issue?

image.png.97f8d5fd396eca82d1b36e3d4942d281.png

 

image.png.7cea54d914819873dfb0a2f698b4b2ba.png

Now it's picking up all listings with "solidus" in addition to "anastasius"?

Link to comment
Share on other sites

Posted · Supporter

Thanks @rasiel for the improvements! I will be making use of them. I was just using the site a few minutes ago before you posted looking for past sales of a particularly rare coin. Works great! 👍

Link to comment
Share on other sites

17 minutes ago, Rand said:

Checking ... AND Boolean operator increased not decreased the number of identified coins. What could be the issue?

The boolean search syntax is a bit unusual (click on "Query Help" below the search box) ... What you want is: +anastasius +solidus

It seems that "boolean mode" doesn't use boolean expressions, but rather is an implicit AND of a bunch of terms prefixed by '+' (must include) or '-' (must exclude), and may include parenthesized lists of items which are implicity OR'd.

e.g.

+(foo bar) -(cat dog)  is equivalent to (foo OR bar) AND NOT (cat OR dog)

I think.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...