Please turn JavaScript on
header-image

Blogs, Ideas, Train of Thoughts

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  0 / week

Message History

In my earlier post — Understanding Re-Rankers: The Key to Smarter Search Results — we explored the basics of rerankers: what they are, why they matter, and how they turn&n...

Read full story

Imagine searching for the “best CrossFit shoes” on a search engine. The initial results will bring up hundreds of options — some highly relevant, others not so much. Somewhere in the mix you’ll see the perfect training shoes, but you might also see casual sneakers, hiking boots, or even sandals.

This is where a reranker comes in. ...

Read full story

Large Language Models (LLMs) like GPT-4, Claude, or Gemini are powerful tools, but they don’t run with infinite capacity. Just like your laptop has CPU and memory limits, LLMs exposed via APIs have token limits and throughput constraints you need to design around. href="https://medium.com/@rajesh.sgr?source=post_page---b...

Read full story

When we talk about system performance, the first number people usually quote is the average response time.

For example: “Our API responds in 200ms on average.”

In system performance, the average response time is a common benchmark. Yet, this metric hides a critical truth: users don’t encounter an average — t

Read full story

When you walk into a coffee shop, you notice something:

  • If the server is quick, customers don’t wait long.
  • If the line moves slowly, the shop gets crowded — even if the number of customers arriving stays the same.
  • This simple idea is captured by Little’s Law, a principle from queuing the...

    Read full story