Legal tech and biotech patents
This is an Insight article, written by a selected partner as part of IAM's co-published content. Read more on Insight
Patents are a true form of capital: nice to have, hard to get and looking at others will rarely make you happy. Part of this equation is the impenetrable ‘patentese’ that one must embrace while drafting and analysing patents and patent applications. Is there a shortcut?
Heralded as a revolution in the legal industry, legal tech’s advent could transform the way that practitioners deal with legal documents. Could some of these solutions ease the burden of drafting and analysing patents and patent applications, especially in the biotech arena? This chapter explores the promise of legal tech in three fields:
- small tools;
- patent landscape reports; and
The promise of legal tech
‘Legal tech’ is the neologism for any technology used in the legal industry. In its broadest sense, it can refer to mundane software applications such as docketing systems used in daily practice. Most commonly, however, the term is reserved for specialised solutions dedicated to specific legal tasks. Legal-tech software has, for example, been developed to determine the parties to an agreement and check signatures for a large collection of documents in a matter of seconds.
In recent years, legal tech has been all the rage and almost every major law firm has jumped on the bandwagon. Behind the podcasts and the white papers lies the realisation that it could have the power to upend a traditionally conservative industry. The number of legal-tech patent applications filed worldwide has followed suit, rising from 831 in 2017 to more than 1,025 in 2018 and up to 1,369 in 2019 (Thomson Reuters).
The full impact of the legal-tech revolution remains to be seen. While its believers enthusiastically foretell the disruption of the white collar industry, sceptics point to the need for human intervention to deal with legal nuance, complexity and liability. Taking a cautious stance, it is safe to say that any task requiring patterned or repetitive analyses could benefit from legal tech.
What about legal tech in patent law?
Today, most highly skilled patent practitioners regularly perform repetitive, algorithmic or mind-numbing tasks that could easily be automated, at least in principle. In addition to these quick wins, the question arises whether legal tech, and particularly AI, could be used to interpret the language of patents or patent applications? There is no reason why not. For all its apparent shortcomings, patentese is a highly patterned and verbose language, and reading it seems to be a teachable and repetitive task.
While the promise exists, few advanced legal-tech solutions abound in daily practice apart from ‘vanilla’ office suites or docketing systems. Notwithstanding the issues regarding development and widespread adoption, there are already numerous examples of, and experiments with, tech-driven solutions that could prove useful in daily practice.
At one end of the spectrum, legal tech can be used to automate any task involving only simple repetition and no decision making. Outsourcing such tasks to a machine is straightforward from an IT perspective, as it does not require the training of an artificial agent that is typical in AI. All assumptions and decisions ultimately lie with the human patent practitioner.
Patent drafting serves us with many examples of tasks that are still performed by hand but could be automated. Long, explicit lists of parameter values (60.0%, 60.1% and so on) could be generated with a click of a button via a text-editor plug-in. Functional, extended clipboards could be used to recycle boilerplate passages and allow the writing of combination embodiments with more ease. Reformatting R&D reports into patent-ready experimental sections often involves repeated, simple operations that could be programmed.
Sequence listings are another field in which quick wins can be made. If a patent application discloses a biological sequence such as a nucleic acid or a protein, a separate file containing a list of the complete sequences is mandated in most jurisdictions. In the claims and the description of an application, the complete sequence is replaced by references to the listing (eg, SEQ ID NO).
Currently, the common standard for sequence listings is the WIPO’s ST.25. Generating these files is often a hassle and the PatentIn and BiSSAP tools provided by the USPTO and the EPO respectively do not provide a smooth ride, especially when dealing with non-natural residues or long sequence listings.
The new ST.26 XML-based standard has been in place since January 2021. The WIPO has taken this opportunity to release a new set of tools called ‘WIPO Sequence’ and ‘WIPO Sequence Validator’ to alleviate the existing friction, even dedicating several workshops to their use. At first, these tools appear simple, powerful and user-friendly. However, a major problem remains. Sequence listings are not stand-alone documents; they are linked to the claims and the description of a patent application. Often, sequence listings are generated at the end of the drafting process based on the text of the application, to avoid any misnumbering. This process could be simplified by a legal-tech tool truly linking an application with its sequence listing. One could easily draw inspiration from the myriad tools available for managing scientific bibliographies.
Whether it involves parameter lists or sequence listings, mindless, ceaseless button-clicking is the modern office’s equivalent of the cogwheels of yore. Anyone involved in drafting a patent application has contemplated the idea that some of those clicks could be automated, soon realising that they lack the time or experience to do so, before moving on to click once more. Similar issues pop up during examination and prosecution.
Although some commercial legal-tech tools have already been developed to help with patent drafting, few have gained much traction. There is clearly an untapped market. Whether to develop such tools in-house or subscribe to a third-party service is an important consideration. On the one hand, necessary IT expertise may not be found in the ranks of patent practitioners. On the other hand, keeping development close might be beneficial, as interoperability with any existing applications is crucial for the effectiveness of the tools and to ensure trust in them.
The bottom line is that any repetitive task should prompt the question of whether it is worth investing effort in finding a suitable legal-tech tool. While over-engineering a one-off menial task should be avoided, eliminating repetition will save money and reduce errors in the long run.
Searching for patent publications in a database and analysing the results is another common example of legal tech. A typical application is a patent landscape report, which should address a particular question about the patent situation of a given technology. The specific question varies greatly and may range from providing a high-level snapshot to giving partial freedom-to-operate or patentability advice.
The role of the patent practitioner is to combine the results of several searches and analyses in a shrewd way to answer the question. Legal tech is used to execute these searches and analyses on vast data sets but not to make intelligent decisions.
An interesting example can be found in “Data Mining Patented Antibody Sequences” (Konrad et al, Mabs, volume 13(1), 2021). The authors of this article try to answer the question whether patent literature can be used to map the field of therapeutically relevant antibodies. This is far from evident, as patents “are not designed to convey scientific knowledge, but rather legal protection”. Clearly, this question can be answered only by performing a patent search in public databases followed by a bulk analysis of the results.
In this particular case, a naive sequence search in public registers of the WIPO and the USPTO led to 16,526 patent families disclosing antibody sequences, totalling 245,109 unique antibody chains. Based on the Cooperative Patent Classification (CPC), it was found that most antibody chains were disclosed for medicinal purposes. Then, a list of therapeutic antibodies in clinical use together with their targets was compiled as a separate list from a number of databases (eg, the World Health Organisation’s lists of international non-proprietary names, the international ImMunoGeneTics information system, the Antibody Society and Thera-SAbDab). Cross-linking all these lists, the patent landscape was mapped to the clinical landscape, showing a significant overlap in terms of antibody sequence and target identity.
A similar approach could be used for a lot of patent landscape reports. Regardless of the specific form, a few issues must be resolved at the start. What question needs to be answered? Which data are needed to answer that question? Are metadata such as title, abstract and patent classification sufficient, or is some full-text analysis needed? Which database provides this data in a suitable format and at what cost? In the Konrad article, for example, the EPO’s databases were only used to collect metadata via the Open Patent Services API, as bulk retrieval of sequence listings is not free.
Identifying which approximations can be made is arguably the most important issue. For example, is the patent classification a good filter or should the abstract be checked? Whereas crude yardsticks may be enough for a run-of-the-mill landscape report, they may be unacceptable for partial freedom-to-operate analyses or patentability advice. Taking all these issues and approximations into account, the right legal-tech tool should be chosen. If a blunt and absolute filter on patent classification is ill-advised, for example, AI might be an intelligent alternative.
The most advanced legal tech in patent law is probably the AI used by commercial participants and patent offices to search the patent literature. Their search algorithms have been trained to retrieve relevant patent publications for a given search query more effectively and efficiently than conventional searches.
AI is defined broadly according to Andreas Kaplan as “a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation”. As such, AI encompasses a plethora of techniques such as machine learning, deep learning, Bayesian networks and evolutionary algorithms to solve problems once considered accessible to human intelligence alone. When the artificial agent AlphaGo defeated a human player in the board game Go in 2015, it fuelled popular interest in AI, auguring a legal-tech revolution.
The EPO is a key actor when it comes to AI-assisted searches. It has developed an in-house searching algorithm based on a number of open-source and proprietary software libraries, and trained in large part on an immense database of historical search results. This system is used for various tasks, including text and image-based prior art searches. Although the inner workings of this system are not publicly known, it is clear that the EPO is aware of the reputation of its search reports and monitors the quality of the AI-assisted searches closely.
The USPTO has developed a similar AI tool to search the prior art and even suggest better search strategies. After achieving promising results with its beta version in March 2020, it is now taking steps to use the tool on a broader scale.
In April 2020, the UK Intellectual Property Office (UKIPO) published an insightful report on the use of AI in prior art searches. It evaluated its own efforts in developing AI-assisted searches and patent classifications. Although limited to English prior art searches in some non-biotech fields, it concluded that no fully automatic algorithm can currently do all the tasks of a patent office’s search department. Most notably, setting up a search query is a task for which a human and technically skilled patent examiner is still required. On the other hand, the search itself and several parts of the subsequent analysis such as ranking documents, suggesting synonyms and classifications, and clustering and visualising hits can be automated with good results.
In addition, several offices such as the UKIPO, the EPO and the USPTO use AI to (pre)classify patent applications according to the CPC system and annotate the patent literature with other metadata.
It is an attractive prospect to apply AI beyond mere prior art searches and metadata. In various non-patent domains, techniques such as natural language processing (NLP) have made it possible to deduce meaning and context from a large volume of text. This leads to the question of whether more complex patenting questions can be answered by AI. For example, would a machine be able to determine the scope of protection of a claim?
At this stage, a deep analysis of patent claims or descriptions seems beyond the scope of AI. The UKIPO report points out the challenges of meaningfully parsing patent publications: complex and long sentences full of legal and technical vocabulary and acronyms. Similar obstacles are mentioned throughout scientific literature.
Despite these inherent challenges, it is only part of the picture. A compelling factor to consider is that the world of patents is alien to most AI, which has been developed to cope with other types of text. Even algorithms specialised in recognising chemical names (chemical named-entity recognition) or parsing text teeming with biotech lingo (BioNLP) do not fare well in patent analysis. AI research groups dedicating effort to biotech patents are few and far between compared to those parsing scientific literature.
Likewise, attempts to use AI in determining the disclosure or the scope of Markush formulas (ie, generic structures in chemistry patents representing a group of molecules) have yet to prove successful. Only in the simplest cases can side groups can be connected to the general structure, making only the extraction of key compounds feasible. The same holds for biological sequences. Even though the highly standardised sequence listings make the retrieval of core sequences straightforward, AI is not yet capable of deriving degrees of freedom such as substitutions or allowable sequence identity percentages from the text of an application.
In its most disruptive form, however, AI has the potential to render the patent literature a standardised, machine-readable library, changing the tide for patent practitioners and refocusing the role of patents in biotech. While it takes a technological leap of the imagination, it is a prophecy to be reckoned with.
In the meantime, at least the EPO is reported to use various flavours of AI during its examination phase. Some of these implementations are discreet attempts at taking things up a notch. For example, AI is used to automatically detect problems and solutions in the description and determine exclusions from patentability. Moreover, AI-driven machine translations are used as non-certified translations by various patent offices throughout the world.
Legal tech has tremendous potential to simplify or automate patent management, be it in the form of small tools for drafting or for data-analysis techniques. AI could even mould the patent literature into a machine-readable library. Any innovator trying to achieve that promise will face hurdles, from both a technological perspective and from the practitioners’ healthy prudence.