NLP-centric MultiModal Biological Experiment Agent

Designing an intelligent agent for protocol retrieval, generation, and question answering in biological experiments

Large Language ModelMultiModalLangChainKBQAInformation Retrieval

Project Snapshot

  • Timeframe: Sep 2023 – Mar 2024
  • Category: Research, Natural Language Processing, MultiModal Systems
  • Role: Research Assistant

Project Description

  • This project focuses on developing an NLP-centric multimodal agent to support biological experiment design and knowledge access. Over 20,000 experimental protocols were crawled from journal websites and analyzed through exploratory data analysis to understand protocol structure and distribution. Based on these insights, a protocol generation agent was customized using the LangChain framework with few-shot prompt design. In addition, a knowledge-based question answering (KBQA) system was built using markdown-formatted protocol data, Faiss indexing, and keyword embeddings to improve retrieval accuracy and recall. The system demonstrates how large language models and retrieval-augmented generation can assist scientific experimentation and workflow automation.