Please turn JavaScript on

Datascience

Subscribe to Datascience’s news feed.

Click on “Follow” and decide if you want to get news from Datascience via RSS, as email newsletter, via mobile or on your personal news page.

Subscription to Datascience comes without risk as you can unsubscribe instantly at any time.

You can also filter the feed to your needs via topics and keywords so that you only receive the news from Datascience which you are really interested in. Click on the blue “Filter” button below to get started.

Website title: Data Science Stack Exchange

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  8.07 / day

Message History

I checked many posts to figure out how random forest (RF) learning algorithm (an ensemble of many decision trees (DT) constructed by Rain forest algorithm) within bagging select split points at each leaf. There are some close questions which ...


Read full story

Categories to learn and predict:

df.race.unique() array(['0', '1', '3', '2', '4'], dtype=object)

Data:

train_generator = image_gen.flow_from_dataframe( df_train, x_col="img_name", y_col="race", directory=str(data_folder), class_mode="sparse", target_size=(IMAGE_SIZE, IMAGE_SIZE), batch_size=BATCH_SIZE, shuffle=True, ) val_generator = image_gen.flow_from_d...

Read full story

If I have 3 embeddings Anchor, Positive, Negative from a Siamese model trained with Euclidean distance as distance metric for triplet loss.

During inference can cosine similarity similarity be used?

I have noticed if I calculate Euclidean distance with model from A, P, N results seem somewhat consistent with matching images getting smaller dista...


Read full story

I am new to ML and data science and am struggling with a simple problem. In my problem, I am given a series of datapoints $X_i$ where $X_i = (x_{i1}, x_{i2})$ with each data point having a label $y_i$ where $y_i \in [-1, 1]$.

My first task that I must complete the following: Given a weight vector $w$, write a function to compute the logistic loss (also known as the neg...


Read full story

I have some data in a CSV that pertains to bandwidth tests, like so:

2025/12/24 12:06:46 88382 6046 2025/12/24 12:22:59 93813 3986 2025/12/24 13:36:06 91530 8136 2025/12/24 13:49:28 86613 12586 ... ... ... 2026/02/26 09:56:33 53294 19979 2026/02/26 10:10:16 33435 16331

In RStudio, that data is in a frame with some other derived columns ("Day.of.Week"...


Read full story