Abstract
-
Machine learning (ML), a subset of artificial intelligence, is crucial for spine care and research due to its ability to improve treatment selection and outcomes, leveraging the vast amounts of data generated in health care for more accurate diagnoses and decision support.
-
ML's potential in spine care is particularly notable in radiological image analysis, including the localization and labeling of anatomical structures, detection and classification of radiological findings, and prediction of clinical outcomes, thereby paving the way for personalized medicine.
-
The manuscript discusses ML's application in spine care, detailing supervised and unsupervised learning, regression, classification, and clustering, and highlights the importance of both internal and external validation in assessing ML model performance.
-
Several ML algorithms such as linear models, support vector machines, decision trees, neural networks, and deep convolutional neural networks, can be used in the spine domain to analyze diverse data types (visual, tabular, omics, and multimodal).
Relevance of machine learning for spine care and research
Machine learning (ML) is a field of artificial intelligence (AI) that seeks to construct algorithms that enhance their performance through experience (1). It is a rapidly expanding technical discipline that lies at the intersection of computer science and statistics and has been driven by the huge growth in data availability over the last few years (2).
The rise of machine learning has led to a shift toward data-driven decision-making in numerous areas including manufacturing, education, financial modeling, policing, marketing, and health care. In health care, machine learning is transforming the industry by diagnosing diseases and predicting their course from clinical data or images (3, 4). Although evidence-based studies about the potential of better patient outcomes are still relatively limited, the potential of ML in this respect is clear, paving the way for personalized medicine (5, 6). Besides, the use of machine learning in health care has the potential to contribute to enhancing the efficiency of health-care systems (7, 8).
Spine care is not an exception. Many recent papers have focused on the role of ML in the spine domain, underlining its great potential in radiological image analysis, specifically in tasks such as localization and labeling of the various anatomical structures (vertebrae, intervertebral discs, etc.), detection and classification of radiological findings, and segmentation (9, 10, 11). Additionally, these studies also underscore the potential of ML in predicting clinical outcomes.
This manuscript will commence a comprehensive review of ML theory, starting with the differences between supervised learning, which encompasses tasks like regression and classification, and unsupervised learning, related to clustering. Emphasis will be given to the distinction between internal and external validation, highlighting the importance of both in assessing the performance of ML models. Finally, the manuscript will provide an overview of the most commonly used ML algorithms, detailing their applications in the aforementioned tasks. Then, the manuscript will transition into the core topic, which is related to the application of ML to the spine domain. The types of data under consideration are diverse, ranging from visual data such as images, including MRI scans or X-rays, to structured tabular data, including patient demographics, clinical measurements, and questionnaire data. The discussion will also extend to omics data, a term that refers to the collective technologies in biological research that end in -omics, such as genomics, proteomics, and metabolomics (12, 13). More recently, multimodal data has gained a lot of attention in health care (14). This refers to datasets that integrate multiple types of the aforementioned data modalities, offering a more comprehensive overview of the subject matter (15). In the context of spinal studies, this could mean combining image data, tabular data, and omics data to create a more complete picture of a patient’s condition. This integrated approach allows for more nuanced analysis and can potentially lead to more accurate diagnoses and treatments. Finally, a brief overview of large language models (LLMs) will be given as they are changing the way people work and could be a powerful resource in the health-care domain. In light of these methodologies, some guidelines for their application in a clinical environment will be outlined, along with some insights into future perspectives.
An introduction to machine learning
ML has been described as a discipline that enables computers to learn without being explicitly programmed (16), and it has demonstrated superior results compared to conventional non-data-driven techniques, particularly in the domain of computer vision (17). Indeed, this approach contrasts with the traditional methods in which computer scientists leverage their expertise to manipulate input data and generate an output (18). A thorough understanding of the specific learning paradigms and terminologies within the context of ML is crucial for effectively understanding this field.
The terms ‘supervised’ and ‘unsupervised’ refer to whether or not the learning algorithm is trained using labeled or unlabeled data, respectively (19) (Fig. 1). In supervised learning, the model learns from labeled data, i.e. data paired with the correct output (ground truth), and then applies what it has learned to new data (20). Unsupervised learning, in contrast, finds patterns in unlabeled data, offering a more exploratory approach to understanding data (21). Both these learning paradigms form the backbone of many ML applications and systems today. It should be noted that other learning paradigms have been described, including semi- and self-supervised, which are not described in this paper for the sake of simplicity.
Supervised and unsupervised learning can be further split into regression, classification, and clustering tasks. Regression is a supervised learning technique that predicts a continuous outcome based on the value of one or more predictor variables. It is used extensively in fields where the prediction of a range of numeric outcomes is essential, such as predicting an outcome score or localizing a vertebral landmark. Classification, another supervised learning technique, is used to identify the category of new observations based on training data. It is particularly useful in scenarios where the output is categorical, such as classifying vertebral fractures or whether a patient has cancer or not. On the other hand, clustering is an unsupervised learning method that groups unlabeled data based on the inherent similarities among different data points. It is often used to discover patterns and structures in data to form groups of similar patients and then draw some conclusions about the formed groups.
Internal/external validation
It is crucial to understand the difference between internal and external validation in the ML field (22). Internal validation refers to the process of assessing a model’s performance within the same dataset that has been used to develop the model. It helps to ensure that the model is reliable and can accurately predict outcomes based on the given data. It is the most commonly used type of validation as it requires a simple train/test split or cross-validation (in which the split is repeated multiple times with different partitions), which are common practices in ML (23). On the other hand, external validation tests the model’s performance on a dataset that has never been seen during the development phase. This dataset can potentially have different statistical properties with respect to the dataset used for the model’s development, and may therefore result in a worse performance of the model if the model is not capable of generalizing well. External validation indeed serves as a test for the applicability and effectiveness of a model beyond the context of its initial development. It has paramount importance since the model’s clinical utility lies in its ability to be generalizable to other health-care settings that can have different patient demographics, disease prevalence, and clinical protocols. There is general consensus about the fact that models should not be recommended for clinical use before external validity is established.
Nevertheless, external validation is often overlooked in health care: many models are developed, but only a small number are externally validated (24). This could be due to the fact that external validation studies often require data-sharing agreements between institutions, which can be challenging to obtain (25). Besides, since the performance of prediction models is generally poorer in patients coming from an external dataset, authors may be tempted to report only the most positive results that are usually obtained within the internal validation.
Supervised learning methods
This section provides a comprehensive overview of the most common supervised algorithms (Fig. 1). Supervised learning is the most used paradigm in medical research as it aims at mapping an input, such as an image or a set of clinical variables, to an output value. In simpler terms, supervised learning is like teaching a computer to mimic an expert. For instance, a spine specialist might mark up a scan to highlight the angles of the spine or a fracture. The computer model then learns from these expert annotations. The strength of supervised learning is that it leverages the expertise of one or more humans to establish a connection between the input (like the scan) and the output (the expert’s annotations). So, it is like the computer is being taught by an expert.
Supervised learning algorithms are heavily dependent on the quality and quantity of the training data. High-quality data ensure that the model is learning from accurate and reliable examples. A large quantity of data allows the model to learn from a diverse set of examples, improving its ability to generalize to new, unseen data. The latter aspect is especially important to prevent two common challenges in machine learning: overfitting and underfitting. Overfitting occurs when the model learns the training data too well, to the point where it captures the noise or outliers in the data. This results in a model that performs well on the training data but poorly on new, unseen data. On the other hand, underfitting occurs when the model is too simple to capture the complexity of the data. This results in a model that performs poorly on both the training data and new, unseen data. Therefore, it is crucial to select a model with the right level of complexity for the given task and to ensure that the training data is representative of the data the model will encounter in the real world. This balance is key to achieving good performance in supervised learning algorithms.
Linear models
Linear models are the simplest models in machine learning. They aim to map a set of input variables to an output assuming that the relation is linear. They can be used both for regression and classification, using linear regression and logistic regression, respectively.
where X and Y are the input variables and the output value, respectively, and they are known. W and b are the coefficients of the mapping function that are estimated during the model fit. X is the set of variables of the input that can be very simple such as age, gender, body mass index, image pixels, etc., or more complicated such as features previously extracted using some feature extraction algorithms like principal component analysis (PCA) (26) or Uniform Manifold Approximation and Projection (UMAP) (27). Y is the response variable that can be a measure or a score associated with the input data.
where P (y = 1) is the probability that the output belongs to class 1, and the linearity refers to the linearity between the parameters W and the log-odds of the event ‘being in class 1’.
Non-linear models
Sometimes linear models cannot find a good fit to the data, and more complex models can come into play. These models include support vector machines (SVMs), tree-based models (decision trees, random forests, boosting trees), and artificial neural networks. They can model non-linear relationships, finding more complex patterns in the data.
Support vector machines
SVMs are supervised learning models that have been developed for classification problems but can be adapted for regression analysis. They were first introduced in the work by Vapnik et al. (28). The basic idea of SVMs for classification is to map data to a high-dimensional feature space so that the data points can be categorized even if they are not linearly separable in the original space. SVMs are quite flexible because different functions (called kernels) can be used to project data points to the high-dimensional space. The kernel can also be linear, so SVMs can also deal with linear classification tasks.
Tree-based models
Tree-based models apply a sequence of conditional rules to formulate predictions (29). These models can be employed for both regression tasks, which predict numerical outcomes, and classification tasks, which predict categorical outcomes. A major advantage of tree-based models is that they are easy to interpret as they create a set of if-else rules in a human-understandable way to predict an outcome (30, 31). A drawback is that these if-else rules can become very specific to the dataset on which they are fitted, potentially causing overfitting (32).
The simplest tree-based models are the decision trees (33). The tree structure grows by creating partitions based on binary decisions applied to input features (Fig. 2). It starts from a root node including all data and it branches into child nodes each time a decision is made until a stop condition, such as the maximum depth of the tree, is reached. The node without children used to make the decision is called a leaf node. The splits are created based on a learning algorithm that aims to minimize a certain criterion (like Gini impurity for classification tasks, or mean squared error for regression tasks). To make a prediction for a new instance, we traverse the tree from the root to a leaf, following the decisions (splits) that apply to this instance’s features.
Random forests (RFs) have been proposed to overcome the limitations of decision trees (34, 35). They work by combining the predictions made by many decision trees into a single model because predictions made by decision trees may not be accurate, but combined, they can produce a more accurate and stable prediction (Fig. 2). Random forests can reduce the overfitting of decision trees as they average predictions from many trees. It is important to note that the trees of an RF are created independently on subsets of data using the so-called bagging approach (36).
Similar to RF, Boosting Trees (BTs) are an ensemble of many decision trees (37). However, trees in BTs are weak learners (very simple trees with a small depth) and are built additively one after the other, where a new tree is added to improve the previous one. The new tree is added based on an optimization algorithm that uses a loss and a gradient (Fig. 2). In simple words, the goal is to minimize the loss by using the gradient of the loss to build the new tree in the model. The most popular BT models are eXtreme Gradient Boosting (XGBoost) (38), Light Gradient Boosting Model (LightGBM) (39), and CatBoost (40). The main differences lie in the symmetry of the trees and in the splitting methods used.
Artificial neural networks
Artificial neural networks (ANNs) are computational networks inspired by biological neural networks that form the structure of the biological brain (41). They are ML models that have seen the biggest advancements in the last few years, even if the precursors of the ANNs date back to the 1940s. The first neuron model was proposed in 1943 (42), and it was a very simple binary logic model. After a few years, the concept of Hebbian learning, which is still in use for neural networks, was developed (43). It was only in 1986 that the backpropagation algorithm, crucial for training ANNs to perform a specific task, was introduced (44). Similar to what happens to the receptors of our brain, ANNs receive as input the raw data, and each neuron in this first layer receives one feature of our dataset (age, gender, pixel of an image, etc.) (Fig. 3). The inputs are processed by the neurons of the following layers, the so-called hidden layers, that apply transformations to the input data to discover hidden patterns. Each neuron in a hidden layer applies a weight, adds a bias, and then applies a simple activation function (like rectified linear unit (ReLU) or sigmoid) to the result. The final layer of the network constitutes its output layer, which provides the result for given inputs.
The learning process in a neural network involves adjusting the weights and biases of each layer. This adjustment is based on the difference between the network’s predictions and the actual outputs (also known as the ground truth). The method used to calculate these adjustments is the core of the backpropagation algorithm, which computes the gradients of the network parameters. Following this, an optimization algorithm, such as gradient descent, is employed to update these parameters. The goal of these updates is to improve the accuracy of the network’s predictions, i.e. minimizing the difference between the prediction and the ground truth, in subsequent iterations compared to the current one.
In broad terms, ANNs can be divided into fully connected neural networks (FCNNs) and convolutional neural networks (CNNs). FCNNs have neurons in a layer fully connected to all neurons in the previous layer, making them suitable for problems where all inputs are equally relevant to the output. They are mainly used for tabular data where each variable in the dataset is fed into the network and processed in subsequent layers. On the other hand, CNNs are primarily used for image processing tasks, taking advantage of the spatial structure of the data. As ANNs in general, CNNs are inspired by what happens to the mammalian brain when it comes to vision. In the 1960s, Hubel and Wiesel described that specific groups of neurons in the visual cortex are stimulated only by small areas of the visual field and extract features and information from those areas (45).
Within ANNs and CNNs, ‘deep’ architectures, i.e. networks with multiple layers, have been shown to be successful in modeling complex patterns in data. These multilayered networks, which are generically described under the name of ‘deep learning’ (DL), enable the automatic learning of hierarchical feature representations from raw input data, significantly enhancing the capability for tasks such as image recognition, natural language processing, and predictive analytics.
AI in spine care and research
In the subsequent sections, we will provide a summary of various AI applications in the field of spine research. The studies will be organized based on the type of data used, in particular imaging data, clinical tabular data, omics data, and multimodal data.
Imaging
ANNs, and in particular deep CNNs, have been widely applied to various imaging analysis tasks. One example is the use of deep learning to speed up MR image reconstruction, which has been documented in a paper aiming to compare the image quality and interobserver agreement in evaluations of neuroforaminal stenosis between 1.5T cervical spine MRI with deep learning reconstruction (DLR) and 3T MRI without DLR (46). The results showed that the use of DLR improved both the image quality (noise value of 8.4 vs 10.3) and interobserver agreement (0.92 vs 0.89) in evaluations of neuro foraminal stenosis on 1.5T cervical spine MRI compared to 3T MRI without DLR. Another study compared the image quality and diagnostic accuracy between standard turbo spin-echo (TSE) MRI and accelerated MRI with DLR for degenerative lumbar spine diseases (47). The acquisition time was reduced by 32.3%, and the quality of the DL-reconstructed images was similar to or better than the standard acquisition with similar sensitivity and specificity for the detection of degenerative diseases. Although MRI DLR is an interesting field with a lot of potential, it mainly concerns imaging research, while its clinical significance in the field of spine care may be relatively limited.
The integration of ML into radiological image analysis for spinal disorders represents a transformative leap in the diagnosis, treatment planning, and prognostic assessment within the field of spinal care. At the heart of this integration is the capability of ML algorithms to process and analyze vast datasets of radiological images, such as X-rays, MRI scans, and CT images, with precision and efficiency that can surpass traditional manual methods. This not only streamlines the diagnostic process but also enhances the accuracy of identifying and classifying spinal disorders, such as disc herniation, spinal stenosis, vertebral fractures, and degenerative spine diseases. ML algorithms, particularly those based on deep CNNs, have the potential to detect subtle patterns and features in spinal images that might elude even experienced radiologists. This capability stems from the deep learning models' ability to learn from data in a hierarchical fashion, where lower layers capture basic image features like edges and textures, and higher layers assimilate these features into more complex patterns corresponding to specific pathologies.
Deep CNNs are increasingly being utilized to carry out clinically relevant radiological evaluations automatically, for example assessing the severity of scoliosis and the sagittal curvature of the spine. In 2019, Galbusera et al. introduced a CNN that could accurately measure various parameters, such as lumbar lordosis, thoracic kyphosis, sagittal vertical alignment, coronal Cobb angle, and spinopelvic parameters (pelvic incidence, pelvic tilt, and sacral slope) in biplanar images of the entire trunk, achieving an average accuracy on par with human evaluators (48). An enhanced version of this model was subsequently applied to analyze a substantial dataset of 9832 patients, aiming to investigate the correlations among different radiological parameters in a sizable cohort (49). This demonstrated that automated deep learning-based models are highly effective for conducting retrospective analyses on archived data, tasks that would otherwise demand considerable time and resources if done manually. Furthermore, numerous studies have reported outstanding outcomes regarding the automatic measurement of spinopelvic parameters (50), sagittal alignment (51), and the Cobb angle for scoliosis (52, 53), using varied technical methodologies. These findings underscore the significant potential of deep learning in the areas of landmark localization and measurement within the field of spine radiology.
The application of deep learning algorithms has shown promising results in fully automated radiological image segmentation, using well-known techniques such as UNet as well as more refined ones customized for spinal imaging. Studies like the one conducted by Liebl et al. (54) on a dataset of 300 spine CT scans highlight the significance of machine learning in overcoming the limitations of traditional segmentation algorithms, especially in cases with pathological signs and anatomical variants not frequently present in training datasets. Regarding CT imaging, it is worth mentioning the CTSpine1K dataset (55), which contains annotated images of more than 11,000 vertebrae, either healthy or pathological, that constitutes an excellent basis for the development and testing of automated segmentation algorithms. Automatic tissue segmentation from MRI data showcases the ability of machine learning to perform complex tasks, such as vertebrae recognition and segmentation with a high degree of accuracy, comparable to manual segmentation by physicians (56, 57, 58). Attention has also been devoted to the segmentation of paravertebral muscles (59, 60, 61), the properties of which (such as cross-sectional areas and fatty infiltration) are known to be associated with disability and treatment outcomes (62).
Integrated tools that can perform several tasks including localization and morphological analysis of anatomical structures as well as detection and classification of radiological findings have also been reported. A notable example is SpineNet, introduced by Jamaludin et al. (63), which leverages deep learning to automate the detection and labeling of vertebral bodies in T2-weighted sagittal MRI scans (Fig. 4). By doing so, SpineNet provides a framework for enhancing diagnostic accuracy, reducing manual labor, and facilitating a more efficient assessment of spinal pathologies. Besides localizing all visible vertebral bodies in the image stack, SpineNet can evaluate several parameters for each vertebral level: the Pfirrmann grade of disc degeneration, presence of disc narrowing, presence and severity of central canal stenosis, endplate defect and marrow changes, foraminal stenosis, spondylolisthesis, and disc herniation. Two distinct recent studies performed by independent research groups described an external validation of SpineNet, demonstrating its robustness and reliability (64, 65).
Outcome prediction
The advancement of predictive modeling in the clinical field has been possible thanks to the adoption of electronic health records (EHRs) and the development of national registries (66). These data sources contain a large amount of clinical data that could be used to develop models to support clinical decisions and get to the implementation of the concept of personalized medicine, where treatment is tailored toward the patient’s characteristics (5, 67). The spine field is not an exception. A lot of work has been done to predict mechanical complications, infections, risk of reoperation, and health-related quality of life (HRQOL) outcomes (68, 69, 70, 71). Some studies estimated that 2–23% of patients undergoing spinal surgery will have a complication (72, 73). For this reason, predictive modeling can help identify patients at risk of a mechanical complication in advance to give health-care providers the chance to take action. Many studies predicting mechanical complications made use of the Global Alignment and Proportion (GAP) score (74) as the main predictor for mechanical complications. The GAP score can be used as a threshold (74) or it can be embedded in a multivariable logistic model together with other demographic and clinical data. Indeed, a study tried to incorporate body mass index (BMI) and bone mineral density (BMD) into the GAP score, achieving an area under the curve (AUC) of 0.885, indicating higher accuracy than using the GAP score alone (75). All the previous studies implemented simple linear models, but more complex ML models could have been used. A study aimed to create a machine learning model to predict mechanical complications in adult spinal deformity (ASD) surgery (76). The authors tested a set of ML models including logistic regression, random forest, gradient boosting, and deep neural network, and found the best accuracy to be equal to 73.2% for the gradient boosting model. Another work by Shah et al. focused on predicting the risk of major complications and readmission using different boosting algorithms (77). XGBoost gave the best performances (AUC = 0.68), and some interpretability techniques were implemented to find the most important variables to predict major complications to have better insights into the model’s decision.
The risk of infection prediction has also been widely investigated employing linear logistic models. The models developed showed good discriminative ability in detecting infection with AUCs around 0.72 (78, 79). Regarding HRQOL, the primary goals of the research were to forecast follow-up scores or to classify a clinically significant change in the scores starting from baseline data. These scores are collected through questionnaires, with the Oswestry Disability Index (ODI) (80) and the Core Outcome Measure Index (COMI) (81) being the most commonly used in the spine field. They are administered before the treatment (baseline) and at different follow-ups to assess the patient’s conditions in terms of different life domains such as pain and disability. The work by Halicka et al. aimed to create and externally validate models predicting outcomes of spinal surgery in terms of COMI significant change from the baseline to follow-up and the COMI itself at follow-up (82). The AUCs in classifying the significant change in score were 0.63 and 0.62 for the logistic model and the random forest respectively. The R 2 in score predictions was around 0.25 for both models. Another study applied a multivariable linear regression model to predict COMI and its subitems achieving R 2 on the test set ranging from 0.09 for the leg pain sub-item to 0.16 for the COMI score (83). Staartjes et al. focused on the classification of the minimum clinically important difference at 12 months postoperatively that was defined as a reduction from baseline of at least 15 points for ODI, 2.2 points for COMI, or two points for pain severity (back and leg) (84). The AUCs for functional impairment (ODI/COMI), back pain, and leg pain were 0.67 (95% CI: 0.59–0.74), 0.72 (95% CI: 0.64–0.79), and 0.64 (95% CI: 0.54–0.73), respectively, indicating a moderate ability to identify patients likely to benefit from surgery. All the presented models could help clinicians have better insights into which are the main characteristics of patients that recover better and define personalized treatments.
Omics and multimodal data
The latest developments in genetics could facilitate the acquisition of more comprehensive data, offering a deeper genetic understanding of a patient’s health status. The term omics includes a multitude of areas (85) such as genomics, transcriptomics, proteomics, interactomics, metabolomics, phenomics, and pharmacogenomics. The combination of omics data with other types of data presented in this study, such as images and EHR, is known as multimodal data (Fig. 5). This integration could facilitate the realization of precision medicine, a tailored approach to health care (5, 6). The advancements in ML and in particular DL have been shown capable of representing and learning predictable relationships in many diverse forms of data (86). Indeed, there was an attempt to develop a general multimodal artificial intelligence framework for health-care applications using four data modalities (tabular, time-series, text, and images) (15).
In the spine domain, omics and multimodal approaches have been overlooked. However, it has been shown that the variables currently used (EHR, images, questionnaires) allow for around 70% accuracy in prediction suggesting that more variables coming from the biological field need to be investigated (87, 88). There have been some attempts to implement multimodal models but without including omics and using a few modalities only. One study simply compared two image modalities, namely MR and CT, to predict osteoporosis showing that the combination outperformed the unimodal setting (89). A multimodal model combining radiomics features from CT scans and X-ray images with clinical risk factors achieved an AUC of 0.91 on a test set to predict osteoporosis (90). Nevertheless, the number of modalities used was not enough to have a comprehensive picture of the patient’s health status. This suggests a need for further research to fill this gap and have a fully integrated multimodal model with many data modalities.
Natural language processing and large language models
The integration of natural language processing (NLP) into spine research and care represents a significant advance in the utilization of computational linguistics to enhance clinical decision-making and research capabilities. NLP techniques enable the extraction, analysis, and synthesis of valuable information from unstructured text data, such as clinical notes, radiology reports, and scientific literature. For instance, NLP can automate the extraction of diagnostic findings from radiology reports, thus streamlining the identification of conditions such as lumbar disc herniation or spinal stenosis. Galbusera et al. reported the use of NLP tools, namely Bidirectional Encoder Representations from Transformers (BERT) models, to facilitate the annotation of images by automatically extracting knowledge about degenerative findings from radiological reports (91). Such annotations were then exploited to train deep CNNs aimed at detecting such findings directly in the images, thus avoiding a tedious and time-consuming human annotation of the dataset. Biswas et al. described the development of NLP models to detect and identify intra-operative information such as incidental durotomy, wound drains, and skin clips, in free-text notes describing lumbar spine surgeries (92), demonstrating the potential of NLP in monitoring and reporting surgery-related information without any impact on the efficiency of the workflow.
The application of large language models (LLMs), with their advanced understanding and generation of natural language, further amplifies the potential of NLP by providing more nuanced interpretations of text data, facilitating the identification of patterns, trends, and correlations that might not be immediately apparent to human readers. ChatGPT, the first LLM that gained widespread recognition, has been so far tested for some applications, such as summarizing a radiological report in patient-friendly language (93), providing patient-oriented answers to frequently asked questions about spine disorders and treatments (94, 95), as well as automating billing from free-text spine surgery operative notes, revealing good performance although with limitations and occasional failures. Additionally, LLMs can assist in summarizing research findings from extensive literature, making it easier for practitioners to stay updated with the latest advancements in spine care. Considering the pace at which these technologies are evolving and the promising outlook, their integration into spine research and care is expected to optimize the efficiency of health-care delivery while improving patient information and finally treatment outcomes.
Recommendations for use and future perspectives
The use of ML tools in spine care and research presents both unprecedented opportunities and significant challenges. For practical application and integration into clinical practice, ML models must undergo rigorous external validation and obtain necessary regulatory approvals by the local competent authorities to ensure safety, efficacy, and compliance with health-care standards (96, 97). In the USA, the Food and Drug Administration (FDA) mandates pre-market submissions, including clinical validation and a comprehensive plan for software modifications to account for the evolving nature of ML models. Similarly, in the European Union, ML-based medical devices must obtain the CE Mark under the Medical Device Regulation (MDR), demonstrating adherence to safety, health, and environmental protection standards. This process involves rigorous assessment of the technology, including clinical evaluation and the implementation of a quality management system. Both the FDA and the EU's MDR frameworks emphasize the importance of transparency, data protection, and post-market surveillance to ensure the integrity of patient care. Ethical considerations must also be at the forefront of ML deployment, including patient consent, data privacy, and the mitigation of biases in algorithm development to ensure equitable care across diverse populations (98).
Another aspect to be considered is the need for large, high-quality datasets, which can possibly be fostered by data sharing in repositories freely accessible by the research community (99). While this practice can accelerate innovation by providing researchers with rich datasets to train more sophisticated and accurate models, it raises important considerations for ethics and safety. Sharing data openly promotes a collaborative environment that can lead to breakthroughs in diagnosing and treating spinal conditions. However, it necessitates stringent data anonymization to protect patient privacy, robust data quality standards to prevent the propagation of biases, and careful adherence to ethical guidelines to ensure equitable care outcomes. By balancing the drive for innovation with the imperatives of ethics, safety, and privacy, the spine research community can leverage the full potential of ML to improve patient care while upholding high standards of professional responsibility.
Despite these challenges, the potential for ML to revolutionize spine care is immense, from enhancing diagnostic precision to personalizing treatment plans and predicting outcomes. Future efforts should focus on fostering multidisciplinary collaborations between clinicians, computer scientists, and ethicists to drive innovation while ensuring ethical, transparent, and patient-centered approaches to care. By navigating these regulatory and ethical landscapes thoughtfully, the spine community can leverage ML to significantly improve research and care, making it more accurate, efficient, and accessible for all patients.
ICMJE Conflict of Interest Statement
The authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the instructional lecture.
Funding Statement
This instructional lecture did not receive any specific grant from any funding agency in the public, commercial, or not-for-profit sector.
References
- 1↑
Jordan MI, & Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015 349 255–260. (https://doi.org/10.1126/science.aaa8415)
- 2↑
Gupta D, & Rani R. A study of big data evolution and research challenges. Journal of Information Science 2019 45 322–340. (https://doi.org/10.1177/0165551518789880)
- 3↑
Dalal KR. Analysing the implementation of machine learning in healthcare. International Conference on Electronics and Sustainable Communication Systems (ICESC), IEEE, pp. 133–137 2020. (https://doi.org/10.1109/ICESC48915.2020.9156061)
- 4↑
Badawy M, Ramadan N, & Hefny HA. Healthcare predictive analytics using machine learning and deep learning techniques: a survey. Journal of Electrical Systems and Information Technology 2023 10 1–45. (https://doi.org/10.1186/s43067-023-00108-y)
- 5↑
Mathur S, & Sutton J. Personalized medicine could transform healthcare. Biomedical Reports 2017 7 3–5. (https://doi.org/10.3892/br.2017.922)
- 6↑
Seyhan AA, & Carini C. Are innovation and new technologies in precision medicine paving a new era in patients centric care? Journal of Translational Medicine 2019 17 114. (https://doi.org/10.1186/s12967-019-1864-9)
- 7↑
Free C, Phillips G, Watson L, Galli L, Felix L, Edwards P, Patel V, & Haines A. The effectiveness of mobile-health technologies to improve health care service delivery processes: a systematic review and meta-analysis. PLoS Medicine 2013 10 e1001363. (https://doi.org/10.1371/journal.pmed.1001363)
- 8↑
Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, Aldairem A, Alrashed M, Bin Saleh K, Badreldin HA, et al.Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Medical Education 2023 23 689. (https://doi.org/10.1186/s12909-023-04698-z)
- 9↑
Galbusera F, Casaroli G, & Bassani T. Artificial intelligence and machine learning in spine research. JOR Spine 2019 2 e1044. (https://doi.org/10.1002/jsp2.1044)
- 10↑
Chang M, Canseco JA, Nicholson KJ, Patel N, & Vaccaro AR. The role of machine learning in spine surgery: the future is now. Frontiers in Surgery 2020 7 54. (https://doi.org/10.3389/fsurg.2020.00054)
- 11↑
Ren G, Yu K, Xie Z, Wang P, Zhang W, Huang Y, Wang Y, & Wu X. Current applications of machine learning in spine: from clinical view. Global Spine Journal 2022 12 1827–1840. (https://doi.org/10.1177/21925682211035363)
- 12↑
Subramanian I, Verma S, Kumar S, Jere A, & Anamika K. Multi-omics data integration, interpretation, and its application. Bioinformatics and Biology Insights 2020 14 1177932219899051. (https://doi.org/10.1177/1177932219899051)
- 13↑
Ali AM, & Mohammed MA. A comprehensive review of artificial intelligence approaches in omics data processing: evaluating progress and challenges. International Journal of Mathematics, Statistics, and Computer Science 2024 2 114–167. (https://doi.org/10.59543/ijmscs.v2i.8703)
- 14↑
Cai Q, Wang H, Li Z, & Liu X. A survey on multimodal data-driven smart healthcare systems: approaches and applications. IEEE Access 2019 7 133583–133599. (https://doi.org/10.1109/ACCESS.2019.2941419)
- 15↑
Soenksen LR, Ma Y, Zeng C, Boussioux L, Villalobos Carballo K, Na L, Wiberg HM, Li ML, Fuentes I, & Bertsimas D. Integrated multimodal artificial intelligence framework for healthcare applications. npj Digital Medicine 2022 5 149. (https://doi.org/10.1038/s41746-022-00689-4)
- 16↑
Mahesh B. Machine learning algorithms-a review. International Journal of Science and Research 2020 9 381–386.
- 17↑
O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, & Walsh J. Deep learning vs. traditional Computer vision. Advances in Computer 2020 http://paperpile.com/b/KD9fhV/FDGx128–144. (https://doi.org/10.1007/978-3-030-17795-9_10)
- 18↑
Alassadi A, & Ivanauskas T. Classification Performance Between Machine Learning and Traditional Programming in Java 2019. Available at: diva-portal.org
- 19↑
Alloghani M, Al-Jumeily D, Mustafina J, Hussain A, & Aljaaf AJ. A systematic review on supervised and unsupervised machine learning algorithms for data science. In Supervised and Unsupervised Learning for Data Science, pp. 3–21. Berry MW, Mohamed A, Yap BW, Eds. Cham: Springer International Publishing 2020. (https://doi.org/10.1007/978-3-030-22475-2_1)
- 20↑
Nasteski V. An overview of the supervised machine learning methods. Horizons.B 2017 4 51–62. (https://doi.org/10.20544/HORIZONS.B.04.1.17.P05)
- 21↑
Gentleman R, & Carey VJ. Unsupervised machine learning. In Bioconductor Case Studies, pp. 137–157. Hahne F, Huber W, Gentleman R, & Falcon S, Eds. New York, NY: Springer New York 2008. (https://doi.org/10.1007/978-0-387-77240-0_10)
- 22↑
Steyerberg EW, & Harrell FE. Prediction models need appropriate internal, internal-external, and external validation. Journal of Clinical Epidemiology 2016 69 245–247. (https://doi.org/10.1016/j.jclinepi.2015.04.005)
- 23↑
Tan J, Yang J, Wu S, Chen G, & Zhao J. A critical look at the current train/test split in machine learning. arXiv [csLG] 2021. (https://doi.org/10.48550/arXiv.2106.0452)
- 24↑
Ramspek CL, Jager KJ, Dekker FW, Zoccali C, & van Diepen M. External validation of prognostic models: what, why, how, when and where? Clinical Kidney Journal 2021 14 49–58. (https://doi.org/10.1093/ckj/sfaa188)
- 25↑
External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2019 365 l4379. (https://doi.org/10.1136/bmj.l4379)
- 26↑
Maćkiewicz A, & Ratajczak W. Principal components analysis (PCA). Computers and Geosciences 1993 19 303–342. (https://doi.org/10.1016/0098-3004(9390090-R)
- 27↑
McInnes L, Healy J, & UMAP MJ. Uniform manifold approximation and projection for dimension reduction. arXiv [statML] 2018. (https://doi.org/10.48550/arXiv.1802.0342)
- 28↑
Cortes C, & Vapnik V. Support-vector networks. Machine Learning 1995 20 273–297. (https://doi.org/10.1007/BF00994018)
- 29↑
Cutler A, Cutler DR, & Stevens JR. Tree-based methods. In High-Dimensional Data Analysis in Cancer Research, pp. 1–19. Li X, & Xu R, Eds.: New York, NY: Springer New York 2009. (https://doi.org/10.1007/978-0-387-69765-9_5)
- 30↑
Barcel’o P, Monet M, P’erez JA, Subercaseaux B. Model interpretability through the lens of computational complexity. Advances in Neural Information Processing Systems 2020 33 15487–15498.
- 31↑
Sepiolo D, & Ligęza A. Towards explainability of tree-based ensemble models. A critical overview. In New Advances in Dependability of Networks and Systems, pp. 287–296. Springer International Publishing 2022. (https://doi.org/10.1007/978-3-031-06746-4_28)
- 32↑
Carvalho J, Santos JPV, Torres RT, Santarém F, & Fonseca C. Tree-based methods: concepts, uses and limitations under the framework of resource selection models. Journal of Environmental Informatics 2018 32 112–124. (https://doi.org/10.3808/JEI.201600352)
- 33↑
Suthaharan S. Decision tree learning. In Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, pp. 237–269. Suthaharan S, Ed. Boston, MA: Springer US 2016. (https://doi.org/10.1007/978-1-4899-7641-3_10)
- 34↑
Breiman L. Random forests. Machine Learning 2001 45 5–32. (https://doi.org/10.1023/A:1010933404324)
- 35↑
Biau G, & Scornet E. A random forest guided tour. Test 2016 25 197–227. (https://doi.org/10.1007/s11749-016-0481-7)
- 36↑
Altman N, & Krzywinski M. Ensemble methods: bagging and random forests. Nature Methods 2017 14 933–934. (https://doi.org/10.1038/nmeth.4438)
- 37↑
Sutton CD. Classification and regression trees, bagging, and boosting. In Data Mining and Data Visualization, pp. 303–329. Rao CR, Wegman EJ, & Solka JL, Eds. Elsevier 2005. (https://doi.org/10.1016/S0169-7161(0424011-1)
- 38↑
Chen T, & Guestrin C. XGBoost: Reliable Large-Scale Tree Boosting System. Of the 22nd … 2015 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 785–794. (https://doi.org/10.1145/2939672.2939785)
- 39↑
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, & Liu TY. Lightgbm: a highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) (2017), pp. 3149–3157.
- 40↑
Ostroumova L, Gusev G, Vorobev A, Dorogush AV, & Gulin A. CatBoost: unbiased boosting with categorical features. Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS) (2018), pp. 6639–6649.
- 41↑
da Silva IN, Hernane Spatti D, Andrade Flauzino R, Liboni LHB, & dos Reis Alves SF. Artificial neural network architectures and training processes. In Artificial Neural Networks: A Practical Course, pp. 21–28. da Silva IN, Hernane Spatti D, Andrade Flauzino R, Liboni LHB, & dos Reis Alves SF, Eds. Cham: Springer International Publishing 2017. (https://doi.org/10.1007/978-3-319-43162-8_2)
- 42↑
McCulloch WS, & Pitts W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 1943 5 115–133. (https://doi.org/10.1007/BF02478259)
- 43↑
Hebb DO. Organization of behavior. new york: Wiley. Journal of Clinical Psychology 1949 6 335–307.
- 44↑
Rumelhart D, Hinton GE, & Williams RJ. Learning internal representations by error propagation. Nature 1986 673–695. (https://doi.org/10.1016/B978-1-4832-1446-7.50035-2)
- 45↑
Hubel DH, & Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology 1962 160 106–154. (https://doi.org/10.1113/jphysiol.1962.sp006837)
- 46↑
Yasaka K, Tanishima T, Ohtake Y, Tajima T, Akai H, Ohtomo K, Abe O, & Kiryu S. Deep learning reconstruction for the evaluation of neuroforaminal stenosis using 1.5T cervical spine MRI: comparison with 3T MRI without deep learning reconstruction. Neuroradiology 2022 64 2077–2083. (https://doi.org/10.1007/s00234-022-03024-6)
- 47↑
Yoo H, Yoo RE, Choi SH, Hwang I, Lee JY, Seo JY, Koh SY, Choi KS, Kang KM, & Yun TJ. Deep learning-based reconstruction for acceleration of lumbar spine MRI: a prospective comparison with standard MRI. European Radiology 2023 33 8656–8668. (https://doi.org/10.1007/s00330-023-09918-0)
- 48↑
Galbusera F, Niemeyer F, Wilke HJ, Bassani T, Casaroli G, Anania C, Costa F, Brayda-Bruno M, & Sconfienza LM. Fully automated radiological analysis of spinal disorders and deformities: a deep learning approach. European Spine Journal 2019 28 951–960. (https://doi.org/10.1007/s00586-019-05944-z)
- 49↑
Galbusera F, Bassani T, Panico M, Sconfienza LM, & Cina A. A fresh look at spinal alignment and deformities: automated analysis of a large database of 9832 biplanar radiographs. Frontiers in Bioengineering and Biotechnology 2022 10 863054. (https://doi.org/10.3389/fbioe.2022.863054)
- 50↑
Orosz LD, Bhatt FR, Jazini E, Dreischarf M, Grover P, Grigorian J, Roy R, Schuler TC, Good CR, & Haines CM. Novel artificial intelligence algorithm: an accurate and independent measure of spinopelvic parameters. Journal of Neurosurgery. Spine 2022 37 893–901. (https://doi.org/10.3171/2022.5.SPINE22109)
- 51↑
Löchel J, Putzier M, Dreischarf M, Grover P, Urinbayev K, Abbas F, Labbus K, & Zahn R. Deep learning algorithm for fully automated measurement of sagittal balance in adult spinal deformity. European Spine Journal 2024. (https://doi.org/10.1007/s00586-023-08109-1)
- 52↑
Tu Y, Wang N, Tong F, & Chen H. Automatic measurement algorithm of scoliosis Cobb angle based on deep learning. Journal of Physics: Conference Series 2019 1187 042100. (https://doi.org/10.1088/1742-6596/1187/4/042100)
- 53↑
Alukaev D, Kiselev S, Mustafaev T, Ainur A, Ibragimov B, & Vrtovec T. A deep learning framework for vertebral morphometry and Cobb angle measurement with external validation. European Spine Journal 2022 31 2115–2124. (https://doi.org/10.1007/s00586-022-07245-4)
- 54↑
Liebl H, Schinz D, Sekuboyina A, Malagutti L, Löffler MT, Bayat A, El Husseini M, Tetteh G, Grau K, Niederreiter E, et al.A computed tomography vertebral segmentation dataset with anatomical variations and multi-vendor scanner data. Scientific Data 2021 8 284. (https://doi.org/10.1038/s41597-021-01060-0)
- 55↑
Deng Y, Wang C, Hui Y, Li Q, Li J, Luo S, Sun M, Quan Q, Yang S, Hao Y, et al.CTSpine1K: a large-scale dataset for Spinal Vertebrae Segmentation in Computed Tomography. arXiv [eessIV] 2021. (https://doi.org/10.48550/arXiv.2105.1471)
- 56↑
Andrew J, DivyaVarshini M, Barjo P, & Tigga I. Spine magnetic resonance image segmentation using deep learning techniques. 6th International Conference on Advanced Computing and Communication Systems (ICACCS), IEEE, pp. 945–950 2020. (https://doi.org/10.1109/ICACCS48705.2020.9074218)
- 57↑
Huang J, Shen H, Wu J, Hu X, Zhu Z, Lv X, Liu Y, & Wang Y. Spine Explorer: a deep learning based fully automated program for efficient and reliable quantifications of the vertebrae and discs on sagittal lumbar spine MR images. Spine Journal 2020 20 590–599. (https://doi.org/10.1016/j.spinee.2019.11.010)
- 58↑
Shen H, Huang J, Zheng Q, Zhu Z, Lv X, Liu Y, & Wang Y. A deep-learning–based, fully automated program to segment and quantify major spinal components on axial lumbar spine magnetic resonance images. Physical Therapy 2021 101. (https://doi.org/10.1093/ptj/pzab041)
- 59↑
Niemeyer F, Zanker A, Jonas R, Tao Y, Galbusera F, & Wilke H-J. An externally validated deep learning model for the accurate segmentation of the lumbar paravertebral muscles. bioRxiv 2021. (https://doi.org/10.1101/2021.10.25.21265466)
- 60↑
Zhang Y, Shi Z, Wang H, Yan C, Wang L, Mu Y, Liu Y, Wu S, & Liu T. LumNet: a deep neural network for lumbar paraspinal muscles segmentation. Advances in Artificial Intelligence 2019 http://paperpile.com/b/KD9fhV/XlUw574–585. (https://doi.org/10.1007/978-3-030-35288-2_46)
- 61↑
Li H, Luo H, Liu Y & Paraspinal Muscle Segmentation Based on Deep Neural Network. Paraspinal Muscle Segmentation Based on Deep Neural Network. Sensors 2019 19. (https://doi.org/10.3390/s19122650)
- 62↑
He K, Head J, Mouchtouris N, Hines K, Shea P, Schmidt R, Hoelscher C, Stricsek G, Harrop J, & Sharan A. The implications of paraspinal muscle atrophy in low back pain, thoracolumbar pathology, and clinical outcomes after spine surgery: a review of the literature. Global Spine Journal 2020 10 657–666. (https://doi.org/10.1177/2192568219879087)
- 63↑
Jamaludin A, Lootus M, Kadir T, Zisserman A, Urban J, Battié MC, Fairbank J, McCall I & Genodisc Consortium. ISSLS PRIZE IN BIOENGINEERING SCIENCE 2017: automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. European Spine Journal 2017 26 1374–1383. (https://doi.org/10.1007/s00586-017-4956-3)
- 64↑
Grob A, Loibl M, Jamaludin A, Winklhofer S, Fairbank JCT, Fekete T, Porchet F, & Mannion AF. External validation of the deep learning system “SpineNet” for grading radiological features of degeneration on MRIs of the lumbar spine. European Spine Journal 2022 31 2137–2148. (https://doi.org/10.1007/s00586-022-07311-x)
- 65↑
McSweeney TP, Tiulpin A, Saarakkala S, Niinimäki J, Windsor R, Jamaludin A, Kadir T, Karppinen J, & Määttä J. External validation of SpineNet, an open-source deep learning model for grading lumbar disk degeneration MRI features, using the Northern Finland birth cohort 1966. Spine 2023 48 484–491. (https://doi.org/10.1097/BRS.0000000000004572)
- 66↑
Lubelski D, Hersh A, Azad TD, Ehresman J, Pennington Z, Lehner K, & Sciubba DM. Prediction models in degenerative spine surgery: a systematic review. Global Spine Journal 2021 11 79S–88S. (https://doi.org/10.1177/2192568220959037)
- 67↑
Khan O, Badhiwala JH, Grasso G, & Fehlings MG. Use of machine learning and artificial intelligence to drive personalized medicine approaches for spine care. World Neurosurgery 2020 140 512–518. (https://doi.org/10.1016/j.wneu.2020.04.022)
- 68↑
Wilson JR, Grossman RG, Frankowski RF, Kiss A, Davis AM, Kulkarni AV, Harrop JS, Aarabi B, Vaccaro A, Tator CH, et al.A clinical prediction model for long-term functional outcome after traumatic spinal cord injury based on acute clinical and imaging factors. Journal of Neurotrauma 2012 29 2263–2271. (https://doi.org/10.1089/neu.2012.2417)
- 69↑
Lee MJ, Cizik AM, Hamilton D, & Chapman JR. Predicting medical complications after spine surgery: a validated model using a prospective surgical registry. Spine Journal 2014 14 291–299. (https://doi.org/10.1016/j.spinee.2013.10.043)
- 70↑
Tetreault LA, Côté P, Kopjar B, Arnold P, Fehlings MG & AOSpine North America and International Clinical Trial Research Network. A clinical prediction model to assess surgical outcome in patients with cervical spondylotic myelopathy: internal and external validations using the prospective multicenter AOSpine North American and international datasets of 743 patients. Spine Journal 2015 15 388–397. (https://doi.org/10.1016/j.spinee.2014.12.145)
- 71↑
Kawabata A, Yoshii T, Sakai K, Hirai T, Yuasa M, Inose H, Utagawa K, Hashimoto J, Matsukura Y, Tomori M, et al.Identification of predictive factors for mechanical complications after adult spinal deformity surgery: a multi-institutional retrospective study. Spine 2020 45 1185–1192. (https://doi.org/10.1097/BRS.0000000000003500)
- 72↑
Nasser R, Yadla S, Maltenfort MG, Harrop JS, Greg Anderson DG, Vaccaro AR, Sharan AD, & Ratliff JK. Complications in spine surgery. Journal of Neurosurgery 2010 13 144–157. (https://doi.org/10.3171/2010.3.SPINE09369)
- 73↑
Yeramaneni S, Robinson C, & Hostin R. Impact of spine surgery complications on costs associated with management of adult spinal deformity. Current Reviews in Musculoskeletal Medicine 2016 9 327–332. (https://doi.org/10.1007/s12178-016-9352-9)
- 74↑
Yilgor C, Sogunmez N, Boissiere L, Yavuz Y, Obeid I, Kleinstück F, Pérez-Grueso FJS, Acaroglu E, Haddad S, Mannion AF, et al.Global alignment and proportion (GAP) score: development and validation of a new method of analyzing spinopelvic alignment to predict mechanical complications after adult spinal deformity surgery. Journal of Bone and Joint Surgery. American Volume 2017 99 1661–1672. (https://doi.org/10.2106/JBJS.16.01594)
- 75↑
Noh SH, Ha Y, Obeid I, Park JY, Kuh SU, Chin DK, Kim KS, Cho YE, Lee HS, & Kim KH. Modified global alignment and proportion scoring with body mass index and bone mineral density (GAPB) for improving predictions of mechanical complications after adult spinal deformity surgery. Spine Journal 2020 20 776–784. (https://doi.org/10.1016/j.spinee.2019.11.006)
- 76↑
Noh SH, Lee HS, Park GE, Ha Y, Park JY, Kuh SU, Chin DK, Kim KS, Cho YE, Kim SH, et al.Predicting mechanical complications after adult spinal deformity operation using a machine learning based on modified global alignment and proportion scoring with body mass index and bone mineral density. Neurospine 2023 20 265–274. (https://doi.org/10.14245/ns.2244854.427)
- 77↑
Shah AA, Devana SK, Lee C, Bugarin A, Lord EL, Shamie AN, Park DY, van der Schaar M, & SooHoo NF. Prediction of major complications and readmission after lumbar spinal fusion: a machine learning–driven approach. World Neurosurgery 2021 152 e227–e234. (https://doi.org/10.1016/j.wneu.2021.05.080)
- 78↑
Lee MJ, Cizik AM, Hamilton D, & Chapman JR. Predicting surgical site infection after spine surgery: a validated model using a prospective surgical registry. Spine Journal 2014 14 2112–2117. (https://doi.org/10.1016/j.spinee.2013.12.026)
- 79↑
Janssen DMC, van Kuijk SMJ, d’Aumerie B, & Willems P. A prediction model of surgical site infection after instrumented thoracolumbar spine surgery in adults. European Spine Journal 2019 28 775–782. (https://doi.org/10.1007/s00586-018-05877-z)
- 80↑
Fairbank JC, Couper J, Davies JB, & O’Brien JP. The Oswestry low back pain disability questionnaire. Physiotherapy 1980 66 271–273.
- 81↑
Mannion AF, Elfering A, Staerkle R, Junge A, Grob D, Semmer NK, Jacobshagen N, Dvorak J, & Boos N. Outcome assessment in low back pain: how low can you go? European Spine Journal 2005 14 1014–1026. (https://doi.org/10.1007/s00586-005-0911-9)
- 82↑
Halicka M, Wilby M, Duarte R, & Brown C. Predicting patient-reported outcomes following lumbar spine surgery: development and external validation of multivariable prediction models. BMC Musculoskeletal Disorders 2023 24 333. (https://doi.org/10.1186/s12891-023-06446-2)
- 83↑
Müller D, Haschtmann D, Fekete TF, Kleinstück F, Reitmeir R, Loibl M, O’Riordan D, Porchet F, Jeszenszky D, & Mannion AF. Development of a machine-learning based model for predicting multidimensional outcome after surgery for degenerative disorders of the spine. European Spine Journal 2022 31 2125–2136. (https://doi.org/10.1007/s00586-022-07306-8)
- 84↑
Staartjes VE, Stumpo V, Ricciardi L, Maldaner N, Eversdijk HAJ, Vieli M, Ciobanu-Caraus O, Raco A, Miscusi M, Perna A, et al.FUSE-ML: development and external validation of a clinical prediction model for mid-term outcomes after lumbar spinal fusion for degenerative disease. European Spine Journal 2022 31 2629–2638. (https://doi.org/10.1007/s00586-022-07135-9)
- 85↑
Pirih N, & Kunej T. Toward a taxonomy for multi-omics science? Terminology development for whole genome study approaches by omics technology and hierarchy. Omics 2017 21 1–16. (https://doi.org/10.1089/omi.2016.0144)
- 86↑
Grapov D, Fahrmann J, Wanichthanarak K, & Khoomrung S. Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine. Omics 2018 22 630–636. (https://doi.org/10.1089/omi.2018.0097)
- 87↑
Haddad S, Pizones J, Raganato R, Safaee MM, Scheer JK, Pellisé F, & Ames CP. Future data points to implement in adult spinal deformity assessment for artificial intelligence modeling prediction: the importance of the biological dimension. International Journal of Spine Surgery 2023 17 S34–S44. (https://doi.org/10.14444/8502)
- 88↑
Vo NV, Piva SR, Patterson CG, McKernan GP, Zhou L, Bell KM, Anderst W, Greco CM, Schneider MJ, Delitto A, et al.Toward the identification of distinct phenotypes: research protocol for the low back pain biological, biomechanical, and behavioral (LB3P) cohort study and the BACPAC Mechanistic Research Center at the University of Pittsburgh. Pain Medicine 2023 24(Supplement 1) S36–S47. (https://doi.org/10.1093/pm/pnad009)
- 89↑
Küçükçiloğlu Y, Şekeroğlu B, Adalı T, & Şentürk N. Prediction of osteoporosis using MRI and CT scans with unimodal and multimodal deep-learning models. Diagnostic and Interventional Radiology 2024 30 9–20. (https://doi.org/10.4274/dir.2023.232116)
- 90↑
Cheng L, Cai F, Xu M, Liu P, Liao J, & Zong S. A diagnostic approach integrated multimodal radiomics with machine learning models based on lumbar spine CT and X-ray for osteoporosis. Journal of Bone and Mineral Metabolism 2023 41 877–889. (https://doi.org/10.1007/s00774-023-01469-0)
- 91↑
Galbusera F, Cina A, Bassani T, Panico M, & Sconfienza LM. Automatic diagnosis of spinal disorders on radiographic images: leveraging existing unstructured datasets with natural language processing. Global Spine Journal 2023 13 1257–1266. (https://doi.org/10.1177/21925682211026910)
- 92↑
Biswas S, McMenemy L, Sarkar V, MacArthur J, Snowdon E, Tetlow C, & George KJ. Natural language processing for the automated detection of intra-operative elements in lumbar spine surgery. Frontiers in Surgery 2023 10 1271775. (https://doi.org/10.3389/fsurg.2023.1271775)
- 93↑
Kuckelman IJ, Wetley K, Yi PH, & Ross AB. Translating musculoskeletal radiology reports into patient-friendly summaries using ChatGPT-4. Skeletal Radiology 2024. (https://doi.org/10.1007/s00256-024-04599-2)
- 94↑
Shrestha N, Shen Z, Zaidat B, Duey AH, Tang JE, Ahmed W, Hoang T, Restrepo Mejia M, Rajjoub R, Markowitz JS, et al.Performance of ChatGPT on NASS clinical guidelines for the diagnosis and treatment of low back pain: a comparison study. Spine 2024. (https://doi.org/10.1097/BRS.0000000000004915)
- 95↑
Rajjoub R, Arroyave JS, Zaidat B, Ahmed W, Mejia MR, Tang J, Kim JS, & Cho SK. ChatGPT and its role in the decision-making for the diagnosis and treatment of lumbar spinal stenosis: a comparative analysis and narrative review. Global Spine Journal 2023 21925682231195783. (https://doi.org/10.1177/21925682231195783)
- 96↑
Petersen E, Potdevin Y, Mohammadi E, Zidowitz S, Breyer S, Nowotka D, Henn S, Pechmann L, Leucker M, Rostalski P, et al.Responsible and regulatory conform machine learning for medicine: a survey of challenges and solutions. IEEE Access 2022 10 58375–58418. (https://doi.org/10.1109/ACCESS.2022.3178382)
- 97↑
Muehlematter UJ, Daniore P, & Vokinger KN. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. The Lancet Digital Health 2021 3 e195–e203. (https://doi.org/10.1016/S2589-7500(2030292-2)
- 98↑
Keskinbora KH. Medical ethics considerations on artificial intelligence. Journal of Clinical Neuroscience 2019 64 277–282. (https://doi.org/10.1016/j.jocn.2019.03.001)
- 99↑
Li J, Zhu G, Hua C, Feng M, Bennamoun B, Li P, Lu X, Song J, Shen P, Xu X, et al.A systematic collection of medical image datasets for deep learning. ACM Computing Surveys 2024 56 1–51. (https://doi.org/10.1145/3615862)
- 100↑
Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, Wang F, Cheng F, & Luo Y. Multimodal machine learning in precision health: a scoping review. npj Digital Medicine 2022 5 171. (https://doi.org/10.1038/s41746-022-00712-8)