Recent research conducted at the Icahn School of Medicine at Mount Sinai has found that large language models (LLMs) have limited accuracy when it comes to reproducing original medical codes. The study tested several LLMs, including GPT-4, GPT-3.5, Gemini-pro, and Llama-2-70b, and found that their performance in accurately extracting medical codes was below 50%. Among the models tested, GPT-4 demonstrated the highest exact match rates for ICD-9-CM (45.9%), ICD-10-CM (33.9%), and CPT codes (49.8%). However, even with technically correct codes, a significant number of errors remained. The researchers suggest that combining LLMs with expert knowledge could improve the automation of medical code extraction, potentially enhancing billing accuracy and reducing administrative costs in healthcare [7a8d42bf].
This study highlights the importance of human oversight in the use of artificial intelligence (AI) systems, especially in critical areas such as healthcare. While LLMs have shown great potential in various applications, their limitations and potential errors emphasize the need for human expertise to ensure accuracy and reliability. By combining the capabilities of LLMs with the knowledge and experience of medical professionals, healthcare organizations can leverage AI technology to streamline processes and improve outcomes while maintaining the necessary oversight and quality control [7a8d42bf].
In a recent development in the field of AI language translation, Unbabel, a tech company that provides translation services, claims that its new AI model called TowerLLM has outperformed OpenAI's GPT-4 and other AI systems in translating between English and six commonly spoken European and Asian languages. TowerLLM demonstrated higher translation accuracy, particularly in English-Korean translations. Unbabel also tested TowerLLM on translations of documents for specific professional domains, where it performed 1-2% better than OpenAI's models. While the results have not been independently verified, they suggest that GPT-4 may be vulnerable to newer AI systems. TowerLLM was trained on a large multilingual dataset and fine-tuned with a curated dataset of high-quality translations. Unbabel plans to expand the number of languages supported and improve translation for specific tasks [4c3d95b8].
Chinese AI startup DeepSeek has released DeepSeek Coder V2, an open-source coding model that outperforms closed-source models like GPT-4 Turbo. DeepSeek Coder V2 supports over 300 programming languages and excels at coding and math tasks. It also maintains comparable performance in general reasoning and language capabilities. DeepSeek achieved these advances by pre-training the base V2 model on a dataset of 6 trillion tokens sourced from GitHub and CommonCrawl. DeepSeek Coder V2 is offered under a permissive license for research and unrestricted commercial use. Users can download the models or access them via API through DeepSeek's platform [c42fa83c].
SelfDecode, a leader in personalized health and wellness, has launched DecodyGPT, the world's first precision health GPT. DecodyGPT is an innovative tool that uses artificial intelligence and personalized health data to provide tailored insights and guidance. It integrates genetic information, lab test results, lifestyle factors, symptoms, conditions, and health goals to offer personalized guidance and support. The tool continually evolves and learns from user interactions to provide up-to-date recommendations. SelfDecode aims to empower individuals to take control of their health and revolutionize the approach to wellness and disease prevention [e9369c96].