Please turn JavaScript on

Datascience

Subscribe to Datascience’s news feed.

Click on “Follow” and decide if you want to get news from Datascience via RSS, as email newsletter, via mobile or on your personal news page.

Subscription to Datascience comes without risk as you can unsubscribe instantly at any time.

You can also filter the feed to your needs via topics and keywords so that you only receive the news from Datascience which you are really interested in. Click on the blue “Filter” button below to get started.

Website title: Data Science Stack Exchange

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  14 / day

Message History

I suppose an image is worth too many words, so here is the image:

As you can see, in the middle where there are voxels to be segmented, no artifacts are present. Whereas on the top and bott...


Read full story

My task consists of forecasting number of upvotes for Reddit posts at time t after posting (how many hours t it was posted ago) based on text/title/time t, current architecture is basically transformer's encoders taking text as input after which is placed a linear network taking 'how long ago was posted' and encoder's outputs as input and outputting the regression value.

...

Read full story

I've read somewhere (I forgot the source) that we perform 2 types of EDA:

Light EDA: See shapes of data with df.shape See null with df.isna().mean() or df.isna().sum See duplicates with df.duplicated().sum() See central tendency, dispersion etc with df.describe().T Skewness and kurtosis with df.skew() and df.kurt() In depth EDA Herewe should tests hypothesis against ...

Read full story

I am clustering time-series datasets which are not labeled (No Ground truth) and I want to measure the quality of the clusters. Could you please suggest any Clustering performance evaluation methods that can be used in time-series clustering ??


Read full story

I am working on a multi-class text classification problem with the following constraints:

A relatively small number of target classes (~20) Very limited labeled data per class (single-digit examples) Significant feature overlap across classes (shared vocabulary and descriptors) Problem

Standard approaches based on semantic similarity or embedding proximity perfo...


Read full story