Poisson regression is a relatively rare classical statistics technique. I almost never use Poisson regression because in most cases, in my opinion, a neural network creates a better prediction model.
The goal of a Poisson regression problem is to predict a discrete count value, where the probability of getting a “success” at any given point in time is quite small. For example, you might want to predict the number of defects on a just-produced computer chip, based on the physical size of the chip, the number of “transistor A” components, and the number of “transistor B” components. The number of defects will be a count of 0, 1, 2, 3, etc. and the probability of a defect is small.
Poisson regression first assumes that the count of defects has a Poisson distribution. This is quite restrictive because the Poisson distribution is a “pure” mathematical concept and real data has quirks. The more your data deviates from a pure Poisson distribution, the less accurate the resulting Poisson regression prediction model will be.
Poisson regression essentially creates a math prediction equation where the log of the count-to-predict is a linear combination of weights times the input variables. In this respect, Poisson regression is quite similar to logistic regression. Finding the weights for Poisson regression involves minimizing a negative log-likelihood function.
When I’m faced with predicting a count value, where the count value will be a small number, instead of using Poisson regression, which has many assumptions, I’ll just throw the training data at a neural network. Interestingly, the count-to-be-predicted can be predicted directly, but because the output of a neural network is a real value like 2.73 you’d have to cast to an integer like 3.
Alternatively, you can encode the count to predict with 1-of-N (also called one-hot) encoding. For example, if the possible counts are 0, 1, 2, 3-or-more, you could encode 0 = (1, 0, 0, 0), 1 = (0, 1, 0, 0), 2 = (0, 0, 1, 0), 3-more = (0, 0, 0, 1). The output of the neural network will be a vector like (0.13, 0.65, 0.20, 0.02) which would map to (0, 1, 0, 0) which would map to 1.
The advantage of using a neural network is that it usually creates a better prediction model. The disadvantage is that you lose some interpretability because a neural network is basically a black box.