TL;DR
We made ChatGPT tackle one LSAT logical reasoning section and it scored 19 out of 25. That's a 76% hit rate, indicating it's not ready to replace human testing experts. Although it's a promising beginning, we aim to enhance AI's performance through fine-tuning and analysis, while still relying on our expert testers for final decisions in crucial situations.
Ever wondered how a cutting-edge AI like ChatGPT would fare against the formidable LSAT Logical Reasoning questions? Well, we got curious and put it to the test.
The Challenge & Method
We presented ChatGPT with section 2 of the LSAT preptest 93 – a tough mix of logical brain-teasers.
We employed the zero-shot approach, as detailed by Takeshi Kojima et al. in 2022, using the prompt: "Let's think step by step before answering the question."
Performance
ChatGPT secured only 19 correct answers out of 25, yielding a modest 76% accuracy rate against these logical reasoning questions. In contrast, our testing experts average 23-25 correct answers in the logical reasoning section.
What's Next
With a scorecard reflecting a 76% hit rate, ChatGPT, in its current state, is not poised to replace our human testing experts any time soon.
While this is a good start, there's room for improvement. Here's what we plan to do:
- Fine-tuning the baseline model and analyzing incorrectly answered questions to bolster the AI's performance.
- Leveraging AI to assist in problem-solving, while ensuring our top-scoring testing experts have the final say in those high-stakes scenarios.