TY - CPAPER U1 - Konferenzveröffentlichung A1 - Schweitzer, Sascha A1 - Conrads, Markus A1 - Naeve, Jörg T1 - Claude rules: An evaluation of large language models’ applicability to solve cases in German business law T2 - Procedia computer science N2 - In the evolving field of legal information systems, Claude 3 and other advanced conversational agents (CAs) are emerging as transformative forces. This interdisciplinary study combines quantitative methods, legal analysis, and digital transformation approaches to evaluate the efficacy of leading commercially available CAs in the German legal environment. Employing a corpus of 200 unique legal tasks, the research benchmarks Claude 3 against notable systems such as Google Gemini and ChatGPT versions 4 and 3.5. Through automated evaluations of 1,600 responses generated by these CAs, Claude 3 is demonstrated to be the most effective system, capable of successfully addressing realistic legal challenges and passing a German business law examination with an overall score of 60%—significantly surpassing the 50% score of the previous performance leader ChatGPT-4. Despite its superior performance, Claude 3, along with other evaluated systems, exhibits considerable limitations that can be difficult to identify. Based on these insights, it is recommended that legal professionals thoroughly verify all CA-generated content before use. Additionally, caution is advised for novices utilizing CA-generated legal advice, due to the specialized knowledge required for proper evaluation. This study contributes to the ongoing study of digital transformation in the legal domain, offering insights for both academic and industry stakeholders. KW - legal information systems KW - digital transformation KW - conversational agents KW - large language models KW - performance assessment Y1 - 2024 UN - https://nbn-resolving.org/urn:nbn:de:bsz:rt2-opus4-53416 SN - 1877-0509 SS - 1877-0509 U6 - https://doi.org/10.1016/j.procs.2024.09.406 DO - https://doi.org/10.1016/j.procs.2024.09.406 VL - 246 SP - 2675 EP - 2683 S1 - 9 PB - Elsevier CY - Amsterdam ER -