Published on in Vol 3 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/75688, first published .
Peer Review of “Machine Learning Ensemble Identifies Distinct Age-Related Response to Spaceflight in Mammary Tissue (Preprint)”

Peer Review of “Machine Learning Ensemble Identifies Distinct Age-Related Response to Spaceflight in Mammary Tissue (Preprint)”

Peer Review of “Machine Learning Ensemble Identifies Distinct Age-Related Response to Spaceflight in Mammary Tissue (Preprint)”


This is the peer-review report for the preprint “Machine Learning Ensemble Identifies Distinct Age-Related Response to Spaceflight in Mammary Tissue.”

This review is the result of a virtual collaborative live review discussion organized and hosted by PREreview and JMIR Publications on March 21, 2025. The discussion was joined by 21 people: 3 facilitators from the PREreview team, 1 member of the JMIR Publications team, 1 author, and 16 live review participants, including 3 who agreed to be named: Matthew W Darlison, Luciana Gallo, and Meghal Gandhi. The authors of this review have dedicated additional asynchronous time over the course of 2 weeks to help compose this final report using the notes from the live review. We thank all participants who contributed to the discussion and made it possible for us to provide feedback on this preprint.


In this study [1], a small existing dataset of mammary gland cell gene transcription in female mice was subjected to various combinations of artificial intelligence binary classification techniques to distinguish young versus old female mice and those exposed or not exposed to a prolonged stay in space. The authors applied machine learning (ML) methods to analyze data on mammary gland gene expression in mice newly returned from a prolonged stay in Earth orbit as compared to controls remaining on the ground in order to identify which genes were affected by the spaceflight experience and how the age of a mouse influenced this response. The underlying theory is that a cell’s “strategy” for adapting to certain stressors may change as a mouse ages, a qualitative change rather than an overall quantitative deterioration of resiliency.

Discovery of key genes involved pinpointing which ones stood out as enabling classification of mice as either young or old and, separately, as having flown in space or not. Because of the small size of the dataset (which had been collected for other research), a conventional random forest approach lacked sufficient power to identify critical genes. Instead, the authors describe trying various ensembles of ML tools until eventually selecting several candidate genes. By associating those genes with metabolic pathways, they then suggest a plausible description of cells of younger mice activating cell structure/cell adhesion–related mechanisms, while older mice activate pathways involved in cortisol synthesis and cardiac muscle contraction.

The application of innovative, computerized techniques (eg, ML and algorithms to better understand gene expression) offers fresh insight into spaceflight in animal models. More specifically, the research sheds new light on molecular pathways implicated in spaceflight-related health risks. This is particularly important in understanding the pathogenesis of a large number of diseases such as cancer that is often characterized by the development of abnormal tissues. However, the study has a few shortfalls as outlined above. Perhaps, a section of the paper should be devoted to limitations of the research. A brief ethical explanation could provide more clarity with the approach of the research. It should be made clear early that the experiment/analysis was done “in silico.” Additionally, the experimentation on mice may overlook biological properties in humans; therefore, arguments should only be extended and scoped on mice.


  • The title should be more specific with respect to the source of mammary tissue: identify “mouse mammary gland tissue” in the title or, perhaps, simply “murine mammary tissue.”
  • While the methodology is interesting and the findings certainly warrant further study, this should be clearly identified as formative research. There was no preregistration of hypotheses and methods, and the findings (list of key genes and of pathways differing according to age) are just suggestive and not at all robust or convincing. Accordingly, some detail about the experiences of the mice and physiological values are beside the point, so we suggest it is moved to a “Supplements” section along with more specifics about ML parameters, etc, that could help researchers attempting similar approaches.
  • With respect to the OSD-511 dataset, the details of Rodent Research Reference Mission 1 need revision, as it was mentioned that there are 40 female BALB/cAnNTac mice, while the total number of animals used was 43: 21 younger mice and 22 older mice. Moreover, the 8 younger mice that were kept in standard cages were exposed to different conditions from the 7 older mice that were housed in flight hardware.
  • In addition, it was mentioned that each group of space-flown mice had corresponding control groups (ground control), but it is not clear which basal controls (10 mice euthanized 1 day post launch) are used to compare which group. This is important to explain the single group called “non-flight” that is mentioned later in the paragraph, and indicate if these latter details from the original experiment are not available to the authors.
  • In the Discussion section, or as a separate Limitations section, consider explicitly pointing out that data of experimental mice that were collected just once after 40 days in space and 2 days post return recovery provides only cross-sectional data and does not capture changes in the mice that could be evident while in space or longer after return from space. Also, the description for Figure 1 mentions Figure 1E and F, which are not available in the figure.
  • The small sample size should be acknowledged, which means the outcome models may not be able to generalize well on unseen data in downstream tasks.

  • The title could be enhanced to make it clear that this was an experiment based on a model organism (mouse) and not human.
  • The reviewers acknowledge the availability of details that enable the reproducibility of the study, such as publicly accessible data sources and detailed description of data handling and analysis procedures. However, the reviewers wondered whether the source code used could be availed for enhancing the reproducibility.
  • The total number of mice stated that were used in the study does not correspond with the total number used, based on the breakdown of individual group numbers. Authors need to cross-check the numbers to ensure that they tally with the numbers used.
  • Clarify the composition of the control cohort, refer to those mice in a consistent way, and discuss differences that were found to exist between the subsets of controls.
  • On page 4, under the Data Transformation section, it is stated that “four filtering methods were performed,” but Figure 2B only represents three filters. Kindly clarify if the fourth filtering method was used but not included in the figure or whether there was a mistake in either the figure or the text for the sake of consistency.
  • On page 6, the last paragraph, a linear regression model was used to predict the weight of mice at euthanasia, but the significance of this prediction was not discussed. The significance should be discussed for a better understanding of its applicability. Add a brief discussion of the significance of the model, which may include a statistical test validation such as P values and/or CIs.
  • On page 15, under the Conclusion section, it is also mentioned that “The dysregulation of ECM remodeling, cytoskeletal function, and stress response pathways was observed in radiation-exposed mice,” but radiation exposure was not the intervention applied. Revise this statement to accurately reflect the intervention applied in this study (spaceflight) and ensure the conclusion is per the experimental conditions.
  • In the Discussion section, some results are repeated instead of being analyzed in depth. Focus more on interpreting the results, compare them with similar studies, and discuss their significance.
  • Only accuracy is reported for model performance metrics. Add other metrics, including area under the receiver operating characteristic curve, sensitivity, specificity, and F1-score, to enhance the assessment of the model’s predictive ability.
  • Under the algorithms discussion, remove possessive apostrophe from the “1950’s.”
  • It may help to add a statement to make it explicit whether ethics approval was necessary for the study. In addition, it would add value in discussing ethical implications of collecting the dataset used in the manuscript with reference to any discussion in previous publications or from the authors who collected the original data.

  • Most figures have poor resolution, which makes them difficult to understand or interpret. It would be helpful to regenerate the figures with better resolution.
  • It would be helpful to add details to the captions to include what’s represented in each panel and any elements of statistics.
  • Creating a table to present the various groups and their characteristics, including ground control, would help improve readability.
  • Figure 1 lacks an adequate explanation of each panel, which will clarify what they represent.
  • Table 1 is not clear, making it difficult to read. The top and left parts of Figure 7 are cropped, and its possible important information is omitted.
  • The legend refers to plots by layout (left/right), duplicating the role of (a)-(d) labels. Also, plot titles are not the most prominent text and are not referenced in the text.
  • In Figure 4, the term “accuracy” is used without clarification.
  • Abbreviations used in Figures 2 and 3 are not explained.
  • The Figure 3 legend does not clearly describe the difference between the left and right diagrams.
  • The manuscript refers to Table 1 subsections “e” and “f,” which are not present. Some figures are also unclear and not explanatory enough.
  • Figure 5: Fonts are too small to read, and part of the legend is cropped.
  • In Figure 1, the caption states that the left plots represent ground mice and the right plots represent space mice, which is not reflected in the figure.
  • On page 4, the principal components analysis statement interpreting Figure 1A and D is misleading. The statement suggests that both Figure 1A and D show the principal components analysis for spaceflight, whereas Figure 1A only represents ground mice.
  • The text for Figure 1 describes Figure 1E and F, but these panels are not present.

  • Consider revising the title and abstract to identify that the study was conducted with data collected in a model organism or murine model.
  • The second page, second sentence of the first paragraph: “Female astronauts in particular have an increased risk of breast cancer due to exposure to galactic cosmic radiation (7).” Please revise the reference, as Kumar et al [2] did not investigate or conclude the mentioned data.
  • On the second page, in the last sentence of the first paragraph, “Female astronauts...this increased vulnerability.” Please provide a reference for the mentioned data.
  • The second page, second paragraph: “Machine learning (ML) has been leveraged but to a much lesser extent (15).” Please revise the reference Larrañaga et al [3], as ML’s role in bioinformatics has been widely expanded since 2006.
  • Page 6, second paragraph: It was mentioned that “The support vector machine was created by Hava Siegelmann and Vladimir Vapnik,” and there is a reference to Cortes and Vapnik [4], while this work [5] was published in 2001.
  • Page 11, pathway enrichment analysis: Please identify the abbreviation “KEGG” as “Kyoto Encyclopedia of Genes and Genomes.”
  • Page 11, pathway enrichment analysis: Please identify the abbreviation “FDR” as “False Discovery Rate.”

  • In the Data Transformation section, groups were introduced for the first time in the manuscript “FLT vs GC and YNG vs OLD”; these categories are defined later, but it would be good to spell out the names the first time they are mentioned. That’s true for any other acronym used.
  • The article did not introduce a Limitation section. It is helpful to the reader to emphasize the limitations of the methods.

Acknowledgments

PREreview and JMIR Publications thank the authors of the preprint for posting their work openly for feedback. We also thank all participants of the live review for their time and for engaging in the lively discussion that generated this review.

Conflicts of Interest

None declared.

  1. Casaletto JA, Zhao T, Yeung J, et al. Machine learning ensemble identifies distinct age-related response to spaceflight in mammary tissue. Bioinformatics. Preprint posted online on Feb 23, 2025. [CrossRef]
  2. Kumar K, Angdisen J, Ma J, Datta K, Fornace AJ, Suman S. Simulated galactic cosmic radiation exposure-induced mammary tumorigenesis in ApcMin/+ mice coincides with activation of ERα-ERRα-SPP1 signaling axis. Cancers (Basel). Nov 26, 2024;16(23):3954. [CrossRef] [Medline]
  3. Larrañaga P, Calvo B, Santana R, et al. Machine learning in bioinformatics. Brief Bioinform. Mar 2006;7(1):86-112. [CrossRef] [Medline]
  4. Cortes C, Vapnik V. Support-vector networks. Mach Learn. Sep 1995;20(3):273-297. [CrossRef]
  5. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Support vector clustering. J Mach Learn. Dec 2001;2:125-137. URL: https://www.jmlr.org/papers/volume2/horn01a/horn01a.pdf


ML: machine learning


Edited by Amy Schwartz; This is a non–peer-reviewed article. submitted 08.04.25; accepted 08.04.25; published 23.04.25.

Copyright

© Sylvester Sakilay, Mitchell Collier, Arya Rahgozar, Toba Olatoye, Simon Muhindi Savai, Myron Pulier, Randa Salah Gomaa Mahmoud, Clara Amaka Nkpoikanke Akpan, Sayan Mitra, Julie Moonga. Originally published in JMIRx Bio (https://bio.jmirx.org), 23.4.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIRx Bio, is properly cited. The complete bibliographic information, a link to the original publication on https://bio.jmirx.org/, as well as this copyright and license information must be included.