Multimodal Multilingual Clinical NLP

Supervisor: Farhad Nooralahzadeh

Radiology Report Generation as a Question Answering task

There is a growing need to model interactions between data modalities (e.g., vision, language) both to improve AI predictions on existing tasks and to enable new applications.
In the recent field of multimodal medical AI, integrating multiple modalities has gained wide spread popularity as multimodal models have proven to improve performance, robustness, requireless training samples and add complementary information.
Based upon observation of the vast flow of AI technology coming into the medical field, we envision that most of the common radiology reports will be processed by AI in the near future. It will give an opportunity to radiologis (instead of being kept busy) to have a lot of time for reading the rest of the cases, which are "difficult" for AI.

To transfer the knowledge of domain experts to the future AI, the AI should be able to respond to the questions from the domain experts. The current framework of the automatic text generation from the image will not satisfy the need for this interactive capability.
However, this automatic text generation could be easily transferred to this QA task, because the computer already knows what is shown in the image. The only step to be added is to choose the right sentence from the text (the knowledge in the AI) to present to the domain expert.

This is very straightforward from the context of the radiology reports, because all the radiology reports are not written to show “what can be seen in the image”, but to “answer the questions from the clinicians to aid decisions for the patient”.
Therefore, We would like to address this research question to strengthen capability and applicability of the model in the clinical context. In this way, the model, which will be able to answer both simple and complicated questions depending on the level of the human, will be able to serve for a non-intrusive decision support.


  • Programming knowledge in Python

  • Familiar with PyTorch framework

  • Foundational knowledge in the field of machine learning


  1. Information Maximizing Visual Question Generation Ranjay Krishna, Michael Bernstein, Li Fei-Fei
  2. Progressive Transformer-Based Generation of Radiology Reports Farhad Nooralahzadeh, Nicolas Perez Gonzalez, Thomas Frauenfelder, Koji Fujimoto, Michael Krauthammer