11 Nov
2020

How semantic search/automated patent analysis tools can save researchers time

Co-published

Time is a critical factor in the field of prior art searches. Research professionals can search by keyword, class or citation in order to extract an exhaustive list of prior arts relevant to the invention proposed to be claimed in the disclosure. However, semantic searches – a concept introduced by Google in the mid-2010s – can be effective and time-saving, as they combine a research professional’s expertise of the subject matter with an intelligent algorithm that teaches itself how to break down complex queries into manageable chunks from which it can decipher the correct meaning.

What are semantic searches?

A semantic search is an automatic search technique that relies on the ability of a search engine to consider the contextual meaning of search phrases, as well as the intention of the users to rank results according to the concept-matching query phrases provided by the user. Semantic searching can help a user to dig through information and find connections that they might not otherwise have realised existed. The basic difference between keyword searching and semantic searching is that semantic searching delivers results based on the concept within the used keywords rather than exact keyword matching. Further, semantic searching does not merely count the repetition of keywords and measure the proximity of terms as keyword searches do, but rather uses AI to predict and understand the contextual meaning of query phrases.

Sometimes prior art is intentionally written in a vague or unclear manner – in such cases, semantic searching tools can quickly reveal hidden information. Semantic searching can prove extremely useful when a researcher has a plethora of knowledge about the domain and market usability. It can rightly be said that, with semantic searching, what a researcher loses in terms of transparency (ie, blindly trusting AI algorithms without knowing the logic behind it) and control (ie, no authority over how a query is constructed), they gain in terms of the intrinsic relationship extracted between the concept and the documents.

There are several prior art analysis tools available in third-party search databases, which allow semantic search engines to construct a conceptually relevant list of prior arts. A searcher inputs claims or specific claim elements, abstracts, phrases or the subject patent number as a query and the search engine establishes a relationship between this and what exists in the body of a prior art, before delivering a list of conceptually appropriate prior art documents.

Overview of semantic searching in different prior art analysis tools

Patseer
ReleSense, the NLP text processing engine used by PatSeer, has been taught scientific and patent literature with more than 12 million semantic rules. The engine constantly learns from publicly available patents, scientific journals, clinical trials and associated data sources.

Further, Patseer offers a semantic search suggester as an icon adjacent to the text-based search fields in a quick search format, which obtains the current set of text filled in the text field as the input and looks up an intrinsic semantic index for related terms. The database then offers a list of associated technical terms in a dropdown table, which can be selected or copied to include in the search query.

Questel Orbit
The semantic questing offered by the Questel Orbit database lets a user input freely entered text phrases to find relevant similar patent families. The different types of semantic searching offered by the databank include the following:

  • A free-form text search that allows you to paste a text excerpt into it or freely enter a technological description of the disclosure to find relevant prior art. At least one paragraph of text is mandatory to run this format. Additionally, the field supports the use of Google translation, if you wish to choose any other language than English.
  • Concept selection, which works by entering a keyword and running a search. It is then possible to drill down into the displayed concept tree to select the concepts relevant to your query.
  • The process of concept selection further leads to a display of a list of relevant families ordered based on their relevance score.
  • A similarity search can further be executed on the list of selected relevant families, using International Patent Cooperation and Cooperative Patent Classification codes, as well as concepts, to find similar patents.

PatSnap
PatSnap’s Semantic Search can be an efficient tool for finding conceptually corresponding patents based on a portion of text or a provided application/publication number across a database of approximately 142 million patent documents.

The background algorithm is capable of performing rapid calculations and assigning a similarity score to each patent, which is then used to provide a list of top 1,000 most relevant patents.

PatBase
Accessible via the search menu, PatBase’s Semantic search is a recently introduced technological addition that allows users to select or deselect from, as well as add new concepts to, the provided list of concepts based on an extracted concept from the searched text. Based on the chosen concepts, the database unlocks information in the full text of millions of patent documents and retrieves relevant patent families, before then allowing the user to narrow down the list based on technological areas of interest.

Ambercite
Generally miscompared to semantics-based search engines, Ambercite distinguishes itself from other similarity-based algorithms as it uses citation analytics to compile a list of potentially relevant documents.

Patentcloud
InQuartik’s machine-learning powered semantic search engine, PatentCloud, lets users copy and paste specific claims, abstracts, summaries and sections of the disclosure, or simply hit a query using natural language terms, to deliver a list of closely related results sorted according to their relevance. The database has an extensive library of millions of patent documents from patent offices that include the USPTO, China National IP Administration, Japan Patent Office, EPO, WIPO and Korea Intellectual Property Office. The content of the search query is converted into vectors, which are compared against the pre-process vectored databank and the similarity of the search query vectors is calculated against the entire patent document database.

The database uses the latest machine-learning technology combined with a regularly updated patent database of millions of documents to provide a wide list of relevant patent documents.

Further, the database provides a ‘more like’ feature related to every search result. The researcher can select a patent from the list of results and press the ‘more like’ button, which re-orders the result list based on similarity to the selected patent document. Another distinguishing feature of the database is that even after pressing this button, the database does not alter the original query, but rather focuses it to narrow down the similar results.

Comment

The semantic searching performed using any of these tools can serve as an efficient and easy way to access relevant prior arts, providing researchers with a technical edge. Further, the selection of the tool can be based on the relevance as well as the limit of the results provided by the database, making it more convenient for the researcher to zero down on the final list of potentially relevant prior arts.

Useful Links:

https://help.patsnap.com/hc/en-us/articles/210698769-What-Is-Semantic-Search-How-Does-It-Work-

https://www.patbase.com/pb201610.pdf

https://minesoft.com/2016/11/28/minesoft-release-semantic-search-module-patbase/

https://www.ambercite.com/amberblog/2019/06/11/ambercite-patent-office-assessment

https://www.inquartik.com/inq-semantic-search/

https://patseer.com/2019/04/semantic-patent-search-analysis-relesense/

http://www.gtuipr.gtu.ac.in/PDF/Annexure%20-PatSeer%20User%20Manual.pdf

For further information contact:

Rahul Bhattacharya
Effectual Knowledge Services Pvt Ltd
View website

This is a co-published article whose content has not been commissioned or written by the IAM editorial team, but which has been proofed and edited to run in accordance with the IAM style guide.