Research, Technology & Development Evaluation
Syed Latifi, PhD, MS, MEd.
Acting Director- Office of Educational Development
Weill Cornell Medicine - Qatar, Qatar
Syed Latifi, PhD, MS, MEd.
Acting Director- Office of Educational Development
Weill Cornell Medicine - Qatar, Qatar
Syed Latifi, PhD, MS, MEd.
Acting Director- Office of Educational Development
Weill Cornell Medicine - Qatar, Qatar
Syed Latifi, PhD, MS, MEd.
Acting Director- Office of Educational Development
Weill Cornell Medicine - Qatar, Qatar
Mark Healy, MSc.
Education Assessment Analyst
Weill Cornell Medicine - Qatar, Qatar
Mark Healy, MSc.
Education Assessment Analyst
Weill Cornell Medicine - Qatar, Qatar
Location: Grand Ballroom 8
Abstract Information: Recent advances in artificial intelligence (AI) have opened up new avenues for program evaluators. One such advancement is Generative AI. Generative AI is an emerging field which utilises algorithms to generate content or data of varying formats (e.g., text, image, voice, video) that is similar to human-generated content.
This has the potential to support evaluators with various tasks, such as predictive modelling, identify themes and sentiments, generating summaries from surveys, interviews and focus groups etc. It can also assist with the designing of the instruments used. One such branch of generative AI, large language models (LLMs) can assist evaluators with numerous tasks, such as, thematic and sentiment analysis, generating concise actionable summaries from documents, reports surveys, interviews and focus groups etc. It can also assist with producing drafts of these instruments and even piloting them [1,2]. Through automation of aspects such as those outlined previously, the evaluator will have more time to focus on enhancing the program through the findings and their subject-matter expertise. Generative AI can assist with tailoring specific survey/interview invites to different segments of your population, which may improve response rates. Users can in fact, refine the output through subsequent prompts, and through adding their knowledge on the topic improve the validity of output [3].
The focus of this session will be to co-learn and discussing the challenges and opportunities of using generative AI for programmatic evaluation. A high-level session plan is as follows:
Part-1: (Overview) What is Generative AI, its current and potential applications [15 minutes]
Part-2: (e-polling) Perceived challenges and opportunities for programmatic evaluation [5 minutes]
Part-3: (Small-groups*) — Discuss the contextual factors and propose guidelines [20 minutes for semi-structured discussion]
Part-4: (Reconvene /large-group) — Share enhanced understanding with the large group [15 minutes]
Part-5: (Take-home) — Guiding principles and take-home message [5 minutes]
*Small-group outline:
Within small groups, participants will discuss issues surrounding the use of generative AI, such as possible use-case scenarios, data privacy issues, algorithmic bias, cost-effectiveness, evolving landscape, and using shared knowledge, develop guiding principles for program evaluators to reflect upon when considering its use. Specifically, the discussion will revolve around the four contextual factors relevant to the use of generative AI for program evaluation. Firstly, distill the needs and wants of evaluators for potential applications of this technology. Secondly, reflect on the ethical concerns around privacy, rights , algorithmic bias, transparency. Thirdly, discuss the cost-effectiveness of using generative AI compared to other alternatives. Finally, to propose guiding principles for evaluators while using the generative AI technologies. [Please see references in the relevance statement]
Relevance Statement:
While there are benefits to being early adopters to such technologies, the path can also be fraught with risk. If a ‘black box’ technology is used without knowing its inner workings, it is inappropriate to blindy trust its output. One can for example, place some credence of trust in a statistical software output, as the user has an understanding of how the results are computed. Algorithms such as those in LLMs however are relatively newer, more complex, and as a result, less comprehendible to the average user. As such, it would be remiss of us to not highlight some of the risks [4] posed by such technologies if used inappropriately or trusted blindly. The session will revolve around the four contextual factors relevant to the use of generative AI for program evaluation. Firstly, distill the needs and wants of evaluators for potential applications of this technology. Secondly, reflect on the ethical concerns around privacy, rights , algorithmic bias, transparency. Thirdly, discuss the cost-effectiveness of using generative AI compared to other alternatives. Finally, to propose guiding principles for evaluators while using the generative AI technologies. In summary, this session will provide a solid grounding for evaluators on the capabilities, opportunities and challenges evaluators will be presented with when considering the use of generative AI. The findings from the group activity will through consensus of numerous subject matter experts on the area of evaluation, help result in a refined set of guidelines for practitioners to consider with such technologies, ensuring that they are used in an effective, and responsible manner.
1. Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M., Abid, A., Fisch, A., ... & Kim, H. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615.
2 Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J. R., Rytting, C., & Wingate, D. (2022). Out of one, many: Using language models to simulate human samples. Political Analysis, 1-15.
3. Shen, Y., Heacock, L., Elias, J., Hentel, K. D., Reig, B., Shih, G., & Moy, L. (2023). ChatGPT and other large language models are double-edged swords. Radiology, 230163.
4. Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., ... & Gabriel, I. (2022, June). Taxonomy of risks posed by language models. In 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 214-229).