Home
Shivam Mehta
Cancel

The need for sampling temperature and differences between whisper, GPT-3, and probabilistic model's temperature

Introduction When I published my new work OverFlow to arXiv and added the model to the Coqui-TTS framework. Someone on the discord channel asked me that our system OverFlow had a parameter called ...

Deriving categorical cross entropy and softmax

Introduction Recently, on the Pytorch discussion forum, someone asked the question about the derivation of categorical cross entropy and softmax. So I thought it would be a good idea to write a bl...

Welcome to the blog

Hello and welcome to the blog (again)! Again because I used to post on shivammehta.me/blog but I decided to move to a new platform. I will try to migrate my old posts to here to keep everything...

Universal approximation theorem - The intuition

(Migrated from old blog) Recently, I attended a course on Deep Learning and found this very nice intuition for how the Universal Approximation Theorem works. What is Universal Approximation Theor...

PyTorch - Computation graph

(Migrated from old blog) I started with Deep Learning when PyTorch was already a big name and with extensive community support, which is just growing every day the more I use it the more I fall in...

Data structure Trie - Prefix trees, spell checkers

(Migrated from old blog) Ever wonder? How Microsoft Word checks that the spelling that you wrote is correct or not? So there can be various language models that can be used but one of the most maj...

An idea to test programming solution

New Edit I have moved from emacs to VSCode + Vim key bindings IDE wise, but the idea is still the same and useful. (Migrated from my old blog) Alright! Today I just had a cool idea while trying ...

Your own mini google search - inverted indexes and boolean retrieval

(Migrated from my old blog) Ever wondered how google gets relevant documents for your query within milliseconds despite of such a huge amount of information it contains. Recently, I was looking i...

Paper summary - Distractor generation for multiple choice question using learning to rank

(Migrated from old blog) A paper by Chen Liang, Xiao Yang, Neisarg Dave, Drew Wham, Bart Pursel, C. Lee Giles from Pennsylvania State University. Recently, I was starting with such topics in Fiel...

Kadane's algorithm - Maximum subarray problem

(Migrated from my old blog) Okay so, I was brushing up my Algorithm skills at CodeSignal and I found this maxSubArrayProblem: https://app.codesignal.com/challenge/LrAwpTnYZR6NMCbfs So we will sol...