Posts

Showing posts from April, 2022

Imbalance Data Problem

Image
Imbalance Data Problem When using technology to solve real-world challenges, perhaps the most frequent problems are a large number of noise and extreme data imbalances in unimaginable forms. In this blog, we would like to share our efforts to resolve data imbalances. 1 . What is imbalanced data? 1-1 . A notion Imbalanced data refers to data that significantly differentiates the number of observations in the normal category and the number of observations in the abnormal category. For example, there are significantly fewer cases with cancer than those who don’t get cancer, and significantly fewer cases of credit card fraud than normal transactions. These data can be seen as unbalanced data. 1-2 . The point at issue It is generally more important to categorize the abnormalities accurately, between accurately classifying the normal and accurately classifying the abnormalities. This is because abnormal data is usually the target value. When you look at the picture, blue represents normal ob...

Visualizing decision tree partition and decision boundaries

Image
  This visualization precisely shows where the trained decision tree thinks it should predict that the passengers of the Titanic would have  survived  (blue regions) or not (red) , based on their Age and Pclass.

What are Probability Distribution?

Image
probability distribution is a statistical function that describes all the possible values and probabilities for a random variable within a given range. This range will be bound by the minimum and maximum possible values, but where the possible value would be plotted on the probability distribution will be determined by a number of factors. The mean (average), standard deviation, skewness, and kurtosis of the distribution are among these factors.