Rashik.
  Academic research

Heart Disease NLPTransformer comparison on medical reports

A transformer comparison on medical report classification — DistilBERT, DistilRoBERTa, and BETO evaluated honestly on the same preprocessing pipeline.

Academic research
Year2024
RoleResearch project · pipeline + comparison
StackTransformers · PyTorch · Hugging Face · Python
StatusAcademic research
Academic research. Not intended for clinical diagnosis or medical use.
  The problem

What it had to solve.

The interesting question wasn't whether a transformer can classify medical text — it was which transformer family generalises well when you hold preprocessing, tokenisation, and the evaluation split constant. Comparison papers in this space often quietly vary all three; the goal here was to vary only the model.
  The methodology

Pipeline, end-to-end.

Shared preprocessing (cleaning, tokenisation per model, attention-mask construction) into three fine-tuned encoders. Each model trained with the same hyperparameter envelope; results reported on a single held-out test split.

  Results

What the comparison actually said.

Model comparison

All three transformers fine-tuned on identical preprocessing and split. DistilRoBERTa came out ahead, with DistilBERT and BETO tied behind it.

DistilBERT
Test accuracy91.91%
DistilRoBERTa
Test accuracy93.38%
BETO
Test accuracy91.91%
  Approach

Choices that shaped the comparison.

01 / 04

Same preprocessing for every model

Tokeniser changes per model (BERT vs RoBERTa vs BETO use different subword vocabularies), but the cleaning, normalisation, and split were identical. The comparison only means something if everything except the model is constant.

02 / 04

Attention masks built explicitly

Padding masks, special-token masks, and segment IDs constructed per model's expectations rather than relying on tokeniser defaults. A few percentage points of accuracy lived in this detail alone.

03 / 04

Reported the comparison, not just the winner

DistilRoBERTa led, but DistilBERT and BETO were close. The honest takeaway is 'three lightweight encoders are all viable on this task' — not 'DistilRoBERTa is the answer'.

04 / 04

Held the core hyperparameters fixed

Same learning rate (5e-6) and batch sizes across all three models. DistilRoBERTa ran for 100 training epochs; DistilBERT and BETO ran for 50. No early stopping was applied. Per-model tuning could shift the ranking; the goal was to read base behaviour, not optimise a leaderboard.

Approach
  What I learned

The honest part.

The piece of writing that matters most here is the comparison itself, not any single accuracy number. Two of the three models tied; one led by a margin. Calling the winner without showing the table would have been the easiest sentence to write and the least useful — which is, in retrospect, why so many comparison papers do exactly that.
  Current status

Where this lives now.

Completed as part of coursework / research at NSU. Notebooks archived; not actively developed.