Follow us
pubmed meta image 2
🧑🏼‍💻 Research - October 12, 2024

Improving tabular data extraction in scanned laboratory reports using deep learning models.

🌟 Stay Updated!
Join Dr. Ailexa’s channels to receive the latest insights in health and AI.

⚡ Quick Summary

This study focuses on enhancing the extraction of tabular data from scanned laboratory reports using advanced deep learning models. The proposed method significantly improves the accuracy of data extraction, achieving a TEDS score of 0.699 and demonstrating the potential for better clinical decision-making.

🔍 Key Details

  • 📊 Dataset: 650 tables from 632 laboratory test reports
  • ⚙️ Technology: DETR R18 and YOLOv8s for table detection; PaddleOCR and EDD for table recognition
  • 🏆 Performance Metrics: Average Precision (AP), Average Recall (AR), Tree-Edit Distance (TEDS)

🔑 Key Takeaways

  • 📊 Advanced OCR technology is crucial for extracting lab results from scanned documents.
  • 🤖 DETR R18 achieved impressive detection metrics: AP50 of 0.774 and AP75 of 0.644.
  • 🏆 EDD model outperformed others in table recognition with a TEDS score of 0.815.
  • 📈 The combined OCR pipeline showed a TEDS score of 0.699, indicating effective data extraction.
  • 🌍 Implications for clinical data analysis and timely decision-making are significant.
  • 🧠 Deep learning models can enhance the efficiency of healthcare data processing.
  • 📅 Study published in the Journal of Biomedical Informatics in 2024.

📚 Background

In the realm of healthcare, medical laboratory testing plays a pivotal role in diagnosing and treating patients. However, the traditional method of transferring lab results via fax often leads to delays in accessing critical information. This study addresses the need for innovative technologies that can accurately extract lab testing data from scanned reports, thereby facilitating timely clinical decision-making.

🗒️ Study

The research aimed to develop a sophisticated Optical Character Recognition (OCR) method tailored for scanned laboratory reports. The study involved two main stages: table detection, which identifies the area of a table, and table recognition, which extracts the tabular structures and contents. A total of 650 tables from 632 randomly selected laboratory test reports were annotated for training and evaluation of the models.

📈 Results

The results were promising, with the fine-tuned DETR R18 model demonstrating superior performance in table detection, achieving an AP of 0.601 and an AR of 0.766. For table recognition, the fine-tuned EDD model excelled with a TEDS score of 0.815. The overall OCR pipeline showed a TEDS score of 0.699 and a TEDS structure score of 0.764, indicating effective extraction capabilities.

🌍 Impact and Implications

The implications of this study are profound. By leveraging state-of-the-art deep learning models for data extraction, healthcare providers can enhance the accuracy and efficiency of clinical data analysis. This advancement not only streamlines the workflow but also supports timely decision-making, ultimately improving patient care and outcomes. The integration of such technologies could revolutionize how laboratory data is processed and utilized in clinical settings.

🔮 Conclusion

This study highlights the remarkable potential of deep learning in improving the extraction of tabular data from scanned laboratory reports. The high TEDS scores achieved by the proposed OCR pipeline underscore its effectiveness and promise for future applications in healthcare. Continued research and development in this area could lead to significant advancements in clinical data management and patient care.

💬 Your comments

What are your thoughts on the advancements in OCR technology for healthcare? We invite you to share your insights and engage in a discussion! 💬 Leave your comments below or connect with us on social media:

Improving tabular data extraction in scanned laboratory reports using deep learning models.

Abstract

OBJECTIVE: Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients’ lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports. This study aims to develop an advanced deep learning-based Optical Character Recognition (OCR) method to identify tables containing lab testing results in scanned laboratory reports.
METHODS: Extracting tabular data from scanned lab reports involves two stages: table detection (i.e., identifying the area of a table object) and table recognition (i.e., identifying and extracting tabular structures and contents). DETR R18 algorithm as well as YOLOv8s were involved for table detection, and we compared the performance of PaddleOCR and the encoder-dual-decoder (EDD) model for table recognition. 650 tables from 632 randomly selected laboratory test reports were annotated and used to train and evaluate those models. For table detection evaluation, we used metrics such as Average Precision (AP), Average Recall (AR), AP50, and AP75. For table recognition evaluation, we employed Tree-Edit Distance (TEDS).
RESULTS: For table detection, fine-tuned DETR R18 demonstrated superior performance (AP50: 0.774; AP75: 0.644; AP: 0.601; AR: 0.766). In terms of table recognition, fine-tuned EDD outperformed other models with a TEDS score of 0.815. The proposed OCR pipeline (fine-tuned DETR R18 and fine-tuned EDD), demonstrated impressive results, achieving a TEDS score of 0.699 and a TEDS structure score of 0.764.
CONCLUSIONS: Our study presents a dedicated OCR pipeline for scanned clinical documents, utilizing state-of-the-art deep learning models for region-of-interest detection and table recognition. The high TEDS scores demonstrate the effectiveness of our approach, which has significant implications for clinical data analysis and decision-making.

Author: [‘Li Y’, ‘Wei Q’, ‘Chen X’, ‘Li J’, ‘Tao C’, ‘Xu H’]

Journal: J Biomed Inform

Citation: Li Y, et al. Improving tabular data extraction in scanned laboratory reports using deep learning models. Improving tabular data extraction in scanned laboratory reports using deep learning models. 2024; (unknown volume):104735. doi: 10.1016/j.jbi.2024.104735

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on whatsapp
WhatsApp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.