Natural Language Processing (NLP) has gained attention in several fields when examining human language computerized systems. NLP models can aid professions, for instance, doctors, in extracting and summarizing essential information from an enormous pool of data during the decision-making process in hospitals (Jain et al., 2018). The performance of transformer-based models, including BERT, XLNet, and RoBERT, varies when testing stress and conducting differential diagnoses (Aspillaga et al., 2020). Amongst various technologies deployed in extracting information, the sentence embedding approach holds several applications in encoding sentences into precise length vectors and semantic texts.
BERT and RoBERT models have depicted a modern level in their performance during sentence-pair tasks of regression such as semantic textual similarity (STS). Sentences are appropriately fed into the selected network causing an enormous computational overhead while aiding clinicians perform differential diagnosis (Ndukwe et al., 2020). In an attempt to extract relevant clinical information from massive data, finding the required similar pair from a huge collection of approximately 10000 sentences needs roughly 50 million inference searches and computations. The extracted information helps clinicians conduct a differential diagnosis just from the facts gather via the models. BERT uses approximately 65 hours for such a computation (Reimers & Gurevych, 2019). The BERT construction makes it ineffective for semantic similarity computation and unmonitored tasks such as clustering.
Sentence-BERT, which is a modification of the existing BERT model, uses Siamese together with triplet network to process and derive refined semantically significant sentence which is embedding. The refined clinical text will be compared by deploying a cosine-similarity approach (Wang & Kuo, 2020). S-BERT reduces the time and effort of clinicians when identifying the most similar pair from approximately 65 hours while using RoBERT/BERT to approximately 5 seconds. Additionally, SBERT is computationally efficient since it is roughly 9 percent faster than the famous InferSent model (Reimers & Gurevych, 2019). Moreover, it is approximately 55 percent faster than the Universal Sentence Encoder (Reimers & Gurevych, 2019). Therefore, clinicians will access relevant data quicker and easier when doing differential medical examination when using the extracted information via SBERT. The evaluation and performance of SBERT on various STS tasks outperforms several sentence embedding techniques, hence, is recommendable in clinical practices.
References
Aspillaga, C., Carvallo, A., & Araujo, V. (2020). Stress test evaluation of transformer-based models in natural language understanding tasks. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020 (pp. 1882-1894). Santiago; European Language Resources Association. Web.
Jain, A., Kulkarni, G., & Shah, V. (2018). Natural language processing. International Journal of Computer Sciences and Engineering, 6(1), 161-167.
Ndukwe, I., Amadi, C., Nkomo, L., & Daniel, B. (2020). Automatic grading system using Sentence-BERT network.Lecture Notes in Computer Science, 121(64), 224-227. Web.
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-Networks [Ebook] (pp. 3982–3992). Association for Computational Linguistics. Web.
Wang, B., & Kuo, C. (2020). SBERT-WK: A sentence embedding method by dissecting BERT-based word models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2146-2157.