๐ŸŽ„ Let's code and celebrate this holiday season with Advent of Haystack ๐ŸŽ„

Advent of Haystack

Welcome back to another year of Haystack challenges

with 10 challenges in the month of December ๐ŸŽ‰

Complete and submit all challenges by December 31 for a chance to win gift cards, swag, and more! ๐ŸŽ Learn more in Advent of Haystack

โœจ๐ŸŽ„ Plus: Share Your Haystack Story This Holiday Season! ๐ŸŽ„โœจ

Spread the cheer and get perks by sharing your journey with Haystack. See How.

Day 10: Jingle Metrics All the Way ๐Ÿ””

Haystack Elves

Haystack Elves worked tirelessly this year to make the holiday season stress-free and joyful. Determined to innovate, they tackled challenges with cutting-edge AI solutions.

They enhanced pipelines with speech-to-text models, explored various LLM providers, and customized Haystack pipelines for unique needs. They built AI Agents with tool-calling and self-reflection, added tracing mechanisms, and developed faster with deepset Studio. To ensure a top-notch tech stack, they partnered with tools like Weaviate, AssemblyAI, NVIDIA NIMs, Arize Phoenix, and MongoDB.

However, there’s one crucial step remaining before taking anything into production: ๐Ÿ“Š Evaluation ๐Ÿ“Š

Haystack equips the elves with the tools they need, including integrations with evaluation frameworks and built-in evaluators. Adding to this, the Haystack ecosystem now features a powerful new tool: EvaluationHarness. This tool streamlines the evaluation process for Haystack pipelines by eliminating the need to create a separate evaluation pipeline while also making it easier to compare configurations using overrides.

For this challenge, you need to help Haystack elves evaluate a simple RAG pipeline using RAGEvaluationHarness, a specialized extension of EvaluationHarness designed to simplify and optimize evaluation specifically for RAG pipelines.

๐ŸŽฏ Requirements:

๐Ÿ’ Some Hints:

โญ Bonus Task: Take it a step further by incorporating hybrid retrieval into your pipeline. Use EvaluationHarness with customizations to test whether hybrid retrieval improves Recall and MRR ๐Ÿ‘€

๐Ÿฉต Here is the Starter Colab