Claude rules: An evaluation of large language models’ applicability to solve cases in German business law
- In the evolving field of legal information systems, Claude 3 and other advanced conversational agents (CAs) are emerging as transformative forces. This interdisciplinary study combines quantitative methods, legal analysis, and digital transformation approaches to evaluate the efficacy of leading commercially available CAs in the German legal environment. Employing a corpus of 200 unique legal tasks, the research benchmarks Claude 3 against notable systems such as Google Gemini and ChatGPT versions 4 and 3.5. Through automated evaluations of 1,600 responses generated by these CAs, Claude 3 is demonstrated to be the most effective system, capable of successfully addressing realistic legal challenges and passing a German business law examination with an overall score of 60%—significantly surpassing the 50% score of the previous performance leader ChatGPT-4. Despite its superior performance, Claude 3, along with other evaluated systems, exhibits considerable limitations that can be difficult to identify. Based on these insights, it is recommended that legal professionals thoroughly verify all CA-generated content before use. Additionally, caution is advised for novices utilizing CA-generated legal advice, due to the specialized knowledge required for proper evaluation. This study contributes to the ongoing study of digital transformation in the legal domain, offering insights for both academic and industry stakeholders.
Author of HS Reutlingen | Schweitzer, Sascha; Conrads, Markus; Naeve, Jörg |
---|---|
URN: | urn:nbn:de:bsz:rt2-opus4-53416 |
DOI: | https://doi.org/10.1016/j.procs.2024.09.406 |
ISSN: | 1877-0509 |
Erschienen in: | Procedia computer science |
Publisher: | Elsevier |
Place of publication: | Amsterdam |
Document Type: | Conference proceeding |
Language: | English |
Publication year: | 2024 |
Tag: | conversational agents; digital transformation; large language models; legal information systems; performance assessment |
Volume: | 246 |
Page Number: | 9 |
First Page: | 2675 |
Last Page: | 2683 |
DDC classes: | 004 Informatik |
Open access?: | Ja |
Licence (German): | ![]() |