Identifying cancer-associated leukocyte profiles using a high-resolution flow cytometry screening pipeline

Methods Animals C57BL/6 (B6) and BALB/c (BC) female mice aged between 6-10 weeks (from the Australian Phenomics Facility, ANU) were used in the study. Animals were housed in a specific-pathogen free environment and used under strict adherence to protocols approved by the institutional Animal Experimentation Ethics Committee (AEEC), ANU, under protocol A2020/39. At experimental end points, animals were euthanised by cervical dislocation according to AEEC approved procedures. Cell lines The mammary carcinomas cell lines, 4T124 (ATCC), 4T1.225 (kindly provided by Dr Robin Anderson, Olivia Newton-John Cancer Research Institute), 4T1Br426 (kindly provided by Dr Normand Pouliot, Olivia Newton-John Cancer Research Institute), and AT-3-OVA27 (kindly provided by Dr. Di Yu), the colorectal carcinomas cell lines, CT2628 (ATCC) and MC3829 (kindly provided by Dr. Di Yu), and the melanoma cell line B16-F1030 (kindly provide by Dr Chris Parish) were used in this study. Cell lines were confirmed clear of specific pathogens by Cerberus Sciences (ISO 9001 Licence No. AU843-QC). Cell lines were cultured and subcultured as described previously in supplemented (sRPMI) RPMI-1640 (11875093, ThermoFisher Scientific)18. Tumour establishment Tumour cells (1 x 105) were injected subcutaneously in the right hind flank (primary tumour) and then 3 days later in the left-hind flank (secondary tumour) of syngeneic mice (cell lines 4T1, 4T1.2, 4T1Br4, and CT26 injected in BC mice, and cell lines AT-3-OVA, MC38 and B16-F10 injected in B6 mice) randomised across housing cages as described previously18. Tumours were left to grow for 17-21 days. At endpoint, mice were humanely sacrificed, and their tumours and spleens excised and weighed. Blood and spleen collection and processing Blood and spleens from mice were collected at experimental end point. Blood was collected and processed as described previously31. Spleens were harvested and processed to single cells as described previously with the exception that the red blood cell lysis step was not performed32. Spleen cell barcoding Spleen cells from three (of the 3-5) replicate mice bearing the largest mass of 14-17-day old tumours from either 4T1, 4T1.2, 4T1Br4, AT-3-OVA, CT26 MC38 or B16-F10 cell lines, or healthy controls (host B6 or BC mice) were pooled into nine separate tubes in a total of 10 mL of phosphate buffered saline (PBS). The nine spleen cell groups were made to the equivalent of 2 spleen masses based on spleen weights (~equivalent to the mass of 2 normal spleens) by removing appropriate cell suspension volumes from each tube. Cells volumes were then made to 10 mL with PBS and cells sedimented by centrifugation (300g for 10 min), supernatant aspirated and cells resuspended in 2.9 mL sRPMI. Each spleen cell group was then barcoded separately with a unique concentration of carboxyfluorescein diacetate succinimidyl ester (CFSE) and/or cell trace violet (CTV) (S1 Table) and all groups were then pooled in to one sample as previously described33. Cells were then suspended in 10 mL of sRPMI and passed through a 70 m filter mesh and counted. A total of 400 x 106 leukocytes was then suspended in 10 mL sRPMI, passed through a 70 m filter mesh, sedimented by centrifugation (300g for 10 min) and supernatant aspirated, ready for immediate backbone antibody labelling. S1 Table: Barcoding vital dye cell-labelling concentrations Group Sequence CFSE concentration CTV concentration Nil BALB/c 1 74 nM 0 nM CT26 2 74 nM 1500 nM 4T1 3 74 nM 20300 nM B16-F10 4 11 nM 0 nM MC38 5 11 nM 1500 nM AT3-OVA 6 11 nM 20300 nM 4T1.Br4 7 0 nM 0 nM 4T1.2 8 0 nM 1500 nM Nil C57BL/6 9 0 nM 20300 nM Backbone antibody labelling Barcoded pooled cells were resuspended in 0.6 mL of Labelling Buffer (PBS containing 5 mM EDTA, 1% BSA (w/v)) with 5 mg/mL TruStain FcXTM (anti-mouse CD16/32) antibody (101320, Biolegend) for 15 min at 4oC. Samples were then incubated with a backbone panel of antibodies (S2 Table) by adding 0.6 mL of Labelling Buffer with 10% (v/v) Brilliant stain buffer (BD) containing a 2 X stock of each antibody (S2 Table), for 30 min at 4oC. The pooled barcoded and backbone antibody-labelled cells where then resuspended to 13.4 x 106 cells per mL (ie 1x106 cell/ 75 L) in Labelling Buffer and passed through a 70 m filter mesh ready for aliquoting into the wells of the LEGENDScreen plates. S2 Table: Antibodies Panel Antigen Clone Fluorochrome Cat. # Isotype (r = rat; ah = armenian hamster) Source 2 X Stock dilution factor Backbone CD45 30-F11 PerCP-Cy5.5 103132 r-IgG2b, k Biolegend 1/50 CD90.2 53-2.1 PE-Cy7 105326 r-IgG2b, k Biolegend 1/200 CD4 RM4-5 AF-700 100536 r-IgG2a,k Biolegend 1/400 CD8a 53-6.7 BV650 100742 r-IgG2a,k Biolegend 1/50 PD-1 29F.1A12 APC 135210 r-IgG2a,k Biolegend 1/400 CD25 PC61 APC-F750 102054 r-IgG2a, l Biolegend 1/400 B220 RA3-6B2 AF-700 103232 r-IgG2a,k Biolegend 1/50 CD11c N418 APC 117310 ah-IgG Biolegend 1/100 CD11b M1/70 APCFire750 101262 r-IgG2b, k Biolegend 1/400 Ly-6C HK1.4 BV711 128037 r-IgG2c,k Biolegend 1/50 Ly-6G 1A8 BV650 127606 r-IgG2a,k Biolegend 1/100 F4/80 BM8 PE-Cy7 123114 r-IgG2a,k Biolegend 1/100 I-A/I-E (MHC-II) M5/114.15.2 BV605 107639 r-IgG2b, k Biolegend 1/50 PD-L1 10F.9G2 PE-Dazzle594 124324 r-IgG2b, k Biolegend 1/50 Siglec-F E50-2440 BV786 740956 r-IgG2a,k BD 1/50 CD49b DX5 BUV395 740250 ah-IgG1, k BD 1/400 TCRb H57-597 BV605 109241 ah-IgG Biolegend 1/50 Backbone + screen markers CD62L MEL-14 BV570 104433 r-IgG2a,k Biolegend 1/50 CD44 IM7 BUV737 612799 r-IgG2b, k BD 1/50 CD24 M1/69 PE 101808 r-IgG2b,k Biolegend 1/200 CD45RB C363-16A FITC 103305 r-IgG2a,k Biolegend 1/100 IgD 11-26c.2a BV421 405725 r-IgG2a,k Biolegend 1/100 CD66a Mab-CC1 BV650 134529 m-IgG1, k Biolegend 1/100 LEGENDScreen assay A LEGENDScreen Mouse PE Kit (BioLegend) was used for spleen leukocyte screening for cancer-specific cell-surface markers. Plates from the kit were prepared according to the manufacturer’s instructions with lyophilised antibodies in each well of the assay plates being resuspended in 25 L of deionized H2O. The pooled barcoded and backbone antibody-labelled cells were added at 75 L (ie 1x106 cell) to each well containing the reconstituted antibodies and incubated in the dark for 30 min at 4OC. Cells were then washed in Legend Screen Wash provided in the kit and cell pelleted and resuspend in 0.04 mL Labelling Buffer containing 0.001 mg/ml of the viability dye Hoechst 33285 and the equivalent of 500 Flow-Count Fluorospheres (7547053, Beckman Coulter) per 0.04 mL and stored at 4oC overnight before flow cytometry. Immunophenotyping of blood leukocytes by flow cytometry Blood samples (0.005 mL) labelled with antibodies that included the backbone panel and the screen-identified antibodies (S2 Table) and prepared for flow cytometry analysis using methods described previously31. Flow cytometry Flow cytometry was performed on a BD X-20 (BD Bioscience) flow cytometer with FACSDiva software. Application Settings were applied to standardise fluorescence intensity readings between experiments, and fluorescence intensities monitored using SpheroTM 8-peak Rainbow Beads (110620, BD Bioscience). Voltages were initially setup using unlabelled RBC-lysed blood leukocytes. BD CompBeads (552843, BD Bioscience) were used as compensation controls as previously described31. Blood cell samples were acquired until a total of 2000 Flow-Count Fluorosphere beads were collected based on side scatter (log) and forward scatter (linear) plot gating. LEGENDScreen samples were acquired at 10,000 event/second using the sample fine adjust and on a low sample flow rate to collect a total of ~1-3 x 105 live CD45+ cells. Every 36th sample was followed by a 3 min run on a high sample flow rate with 10% sodium hypochlorite then a 2 min run on a high sample flow rate with ddH2O and the stability of fluorescence of each channel assessed by acquiring SpheroTM 8-peak Rainbow Beads. Raw Flow Cytometry Standard (FCS) files of the data are available upon request at the ANU DATA COMMONS repository (https://dx.doi.org/10.25911/6153a8ab5747c). Flow cytometry analysis Flow cytometry analysis was performed using FlowJo v10 software (BD Bioscience) and the R package CytoExploreR version 2.0.034 (https://dillonhammill.github.io/CytoExploreR/). A combination of manual gating and unsupervised Pairwise Controlled Manifold Approximation Projection (PaCMAP) analysis was use to delineate cell populations and assess for manual gate cell population segregation, and cell groups then named based on marker expression represented by median fluorescent intensities (MedFI) of each marker plotted using heat map dot plots made using the R packages ggplot2 (https://ggplot2.tidyverse.org) (see Results section). Data normalisation and processing Blood leukocyte data To reduce the influence of inter-experimental technical variability on the independent blood analysis experiments, their data was normalised at several levels. Firstly, cell numbers in each flow cytometry acquisition set were normalised to counting beads spiked into the sample, with each sample normalised to 5000 Flow-Count Fluorospheres (5/5 of the spiked load), to give the number of cells in ~0.005 mL of blood (“counting bead normalised” values). Secondly, these normalised counts were normalised to the mean counts of the respective blood leukocytes from non-tumour bearing control animals within each experiment, the “nil normalised values”. To get “normalised cell counts” per 0.005 mL of blood (as an estimate of the overall cells across the groups), the “nil normalised values” were multiplied to the overall mean of the “bead normalised cell count” from all non-tumour-bearing animals for each cell population across all experiments. LEGENDScreen data Leukocyte marker expression changes in cancer samples was compared to healthy levels as follows: Background (matched healthy controls) PE MedFI of LEGENDScreen markers on each cell population was subtracted from the corresponding marker MedFI of the same cell population in each tumour type. This MedFI difference was then divided by the maximum PE MedFI change of each marker for each population and any values less than -1 was assigned as -1. This gave a cancer-specific marker change scaled from -1 to 1 (with 0 being normal). These values where visualised using a heat map dot plot using the R package ComplexHeatmap35. Supervised machine learning Supervised machine learning was performed using Orange 3 software. Random Forest and CATboost modelling used 100 trees for predictions or 500 for ranking feature importance, with a maximum tree depth of 4 (for Random Forest) or 6 (for CATBoost) and for Random Forest a maximum number of features considered at each node was 5 and subsets smaller than 5 not split. In addition, for CATBoost learning, the learning rate was 0.3, the regularisation was lambda 3 and subsampling was 1. For classification of groups using monocytes, CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results of predicted and actual classes displayed as a confusion matrix. Feature ranking was done using both Random Forest and CATBoost (built into the models in Orange 3 software). For the learning curve as a function of decreased features (populations), CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results assessed using area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall). Final CATboost predictions on an optimised subset of features used 66% of randomly sampled data for training and predictions were on the remaining data. Orange 3 workflows are provided in S1 File. Statistical analysis and data presentation For means comparisons between Nil, CT26 and 4T1 cohorts, data was transformed using the formula Y=Log(Y+1) to help normalise distributions and equalise variance, and then assessed by 2-way ANOVA using GraphPad Prism software. Analysis was corrected for multiple comparisons using the two-stage step-up method of Benjamini, Krieger and Yekuyieli36 and a false discovery rate of 0.05 and p values reported to test the null hypothesis that the means are equal or distributions were from the same population. PaCMAP used the pacmap python package through CytoExploroR. Multidimensional scaling used the cmdscale function in the R package, stats (v 3.6.2) using Euclidean distances and displayed using the cyto_plot function in the CytoExploreR R package. Heatmap dot plot were made through several R packages including ggplot2, ComplexHeatmap and HeatmapR. Log ratio (M) log average (A) (MA) plots were constructed using the ggpubr, ppplot2 and pprepel R packages. Pythagorean trees and confusion matrices were made in Orange 3 software. Circular bar plots were made using ggplot2 in R. Prism was also used for plotting data.
Type
collection
Title
Identifying cancer-associated leukocyte profiles using a high-resolution flow cytometry screening pipeline
Collection Type
Repository
Access Privileges
The John Curtin School of Medical Research
DOI - Digital Object Identifier
10.25911/zrp3-nd51
Metadata Language
English
Data Language
English
Significance Statement
A screening approach to identify definitive leukocyte biomarkers to predict cancer outcomes.
Brief Description
Here we developed a multiparameter cell-surface marker screening pipeline, aimed at resolving systemic leukocyte population profiles that associate with cancer at high resolution using flow cytometry. The rational is that identifying numerous cancer-specific-associated leukocyte population fluctuations in the blood could result in more definitive machine lerning models for cancer identification and classification.
Full Description
Methods Animals C57BL/6 (B6) and BALB/c (BC) female mice aged between 6-10 weeks (from the Australian Phenomics Facility, ANU) were used in the study. Animals were housed in a specific-pathogen free environment and used under strict adherence to protocols approved by the institutional Animal Experimentation Ethics Committee (AEEC), ANU, under protocol A2020/39. At experimental end points, animals were euthanised by cervical dislocation according to AEEC approved procedures. Cell lines The mammary carcinomas cell lines, 4T124 (ATCC), 4T1.225 (kindly provided by Dr Robin Anderson, Olivia Newton-John Cancer Research Institute), 4T1Br426 (kindly provided by Dr Normand Pouliot, Olivia Newton-John Cancer Research Institute), and AT-3-OVA27 (kindly provided by Dr. Di Yu), the colorectal carcinomas cell lines, CT2628 (ATCC) and MC3829 (kindly provided by Dr. Di Yu), and the melanoma cell line B16-F1030 (kindly provide by Dr Chris Parish) were used in this study. Cell lines were confirmed clear of specific pathogens by Cerberus Sciences (ISO 9001 Licence No. AU843-QC). Cell lines were cultured and subcultured as described previously in supplemented (sRPMI) RPMI-1640 (11875093, ThermoFisher Scientific)18. Tumour establishment Tumour cells (1 x 105) were injected subcutaneously in the right hind flank (primary tumour) and then 3 days later in the left-hind flank (secondary tumour) of syngeneic mice (cell lines 4T1, 4T1.2, 4T1Br4, and CT26 injected in BC mice, and cell lines AT-3-OVA, MC38 and B16-F10 injected in B6 mice) randomised across housing cages as described previously18. Tumours were left to grow for 17-21 days. At endpoint, mice were humanely sacrificed, and their tumours and spleens excised and weighed. Blood and spleen collection and processing Blood and spleens from mice were collected at experimental end point. Blood was collected and processed as described previously31. Spleens were harvested and processed to single cells as described previously with the exception that the red blood cell lysis step was not performed32. Spleen cell barcoding Spleen cells from three (of the 3-5) replicate mice bearing the largest mass of 14-17-day old tumours from either 4T1, 4T1.2, 4T1Br4, AT-3-OVA, CT26 MC38 or B16-F10 cell lines, or healthy controls (host B6 or BC mice) were pooled into nine separate tubes in a total of 10 mL of phosphate buffered saline (PBS). The nine spleen cell groups were made to the equivalent of 2 spleen masses based on spleen weights (~equivalent to the mass of 2 normal spleens) by removing appropriate cell suspension volumes from each tube. Cells volumes were then made to 10 mL with PBS and cells sedimented by centrifugation (300g for 10 min), supernatant aspirated and cells resuspended in 2.9 mL sRPMI. Each spleen cell group was then barcoded separately with a unique concentration of carboxyfluorescein diacetate succinimidyl ester (CFSE) and/or cell trace violet (CTV) (S1 Table) and all groups were then pooled in to one sample as previously described33. Cells were then suspended in 10 mL of sRPMI and passed through a 70 m filter mesh and counted. A total of 400 x 106 leukocytes was then suspended in 10 mL sRPMI, passed through a 70 m filter mesh, sedimented by centrifugation (300g for 10 min) and supernatant aspirated, ready for immediate backbone antibody labelling. S1 Table: Barcoding vital dye cell-labelling concentrations Group Sequence CFSE concentration CTV concentration Nil BALB/c 1 74 nM 0 nM CT26 2 74 nM 1500 nM 4T1 3 74 nM 20300 nM B16-F10 4 11 nM 0 nM MC38 5 11 nM 1500 nM AT3-OVA 6 11 nM 20300 nM 4T1.Br4 7 0 nM 0 nM 4T1.2 8 0 nM 1500 nM Nil C57BL/6 9 0 nM 20300 nM Backbone antibody labelling Barcoded pooled cells were resuspended in 0.6 mL of Labelling Buffer (PBS containing 5 mM EDTA, 1% BSA (w/v)) with 5 mg/mL TruStain FcXTM (anti-mouse CD16/32) antibody (101320, Biolegend) for 15 min at 4oC. Samples were then incubated with a backbone panel of antibodies (S2 Table) by adding 0.6 mL of Labelling Buffer with 10% (v/v) Brilliant stain buffer (BD) containing a 2 X stock of each antibody (S2 Table), for 30 min at 4oC. The pooled barcoded and backbone antibody-labelled cells where then resuspended to 13.4 x 106 cells per mL (ie 1x106 cell/ 75 L) in Labelling Buffer and passed through a 70 m filter mesh ready for aliquoting into the wells of the LEGENDScreen plates. S2 Table: Antibodies Panel Antigen Clone Fluorochrome Cat. # Isotype (r = rat; ah = armenian hamster) Source 2 X Stock dilution factor Backbone CD45 30-F11 PerCP-Cy5.5 103132 r-IgG2b, k Biolegend 1/50 CD90.2 53-2.1 PE-Cy7 105326 r-IgG2b, k Biolegend 1/200 CD4 RM4-5 AF-700 100536 r-IgG2a,k Biolegend 1/400 CD8a 53-6.7 BV650 100742 r-IgG2a,k Biolegend 1/50 PD-1 29F.1A12 APC 135210 r-IgG2a,k Biolegend 1/400 CD25 PC61 APC-F750 102054 r-IgG2a, l Biolegend 1/400 B220 RA3-6B2 AF-700 103232 r-IgG2a,k Biolegend 1/50 CD11c N418 APC 117310 ah-IgG Biolegend 1/100 CD11b M1/70 APCFire750 101262 r-IgG2b, k Biolegend 1/400 Ly-6C HK1.4 BV711 128037 r-IgG2c,k Biolegend 1/50 Ly-6G 1A8 BV650 127606 r-IgG2a,k Biolegend 1/100 F4/80 BM8 PE-Cy7 123114 r-IgG2a,k Biolegend 1/100 I-A/I-E (MHC-II) M5/114.15.2 BV605 107639 r-IgG2b, k Biolegend 1/50 PD-L1 10F.9G2 PE-Dazzle594 124324 r-IgG2b, k Biolegend 1/50 Siglec-F E50-2440 BV786 740956 r-IgG2a,k BD 1/50 CD49b DX5 BUV395 740250 ah-IgG1, k BD 1/400 TCRb H57-597 BV605 109241 ah-IgG Biolegend 1/50 Backbone + screen markers CD62L MEL-14 BV570 104433 r-IgG2a,k Biolegend 1/50 CD44 IM7 BUV737 612799 r-IgG2b, k BD 1/50 CD24 M1/69 PE 101808 r-IgG2b,k Biolegend 1/200 CD45RB C363-16A FITC 103305 r-IgG2a,k Biolegend 1/100 IgD 11-26c.2a BV421 405725 r-IgG2a,k Biolegend 1/100 CD66a Mab-CC1 BV650 134529 m-IgG1, k Biolegend 1/100 LEGENDScreen assay A LEGENDScreen Mouse PE Kit (BioLegend) was used for spleen leukocyte screening for cancer-specific cell-surface markers. Plates from the kit were prepared according to the manufacturer’s instructions with lyophilised antibodies in each well of the assay plates being resuspended in 25 L of deionized H2O. The pooled barcoded and backbone antibody-labelled cells were added at 75 L (ie 1x106 cell) to each well containing the reconstituted antibodies and incubated in the dark for 30 min at 4OC. Cells were then washed in Legend Screen Wash provided in the kit and cell pelleted and resuspend in 0.04 mL Labelling Buffer containing 0.001 mg/ml of the viability dye Hoechst 33285 and the equivalent of 500 Flow-Count Fluorospheres (7547053, Beckman Coulter) per 0.04 mL and stored at 4oC overnight before flow cytometry. Immunophenotyping of blood leukocytes by flow cytometry Blood samples (0.005 mL) labelled with antibodies that included the backbone panel and the screen-identified antibodies (S2 Table) and prepared for flow cytometry analysis using methods described previously31. Flow cytometry Flow cytometry was performed on a BD X-20 (BD Bioscience) flow cytometer with FACSDiva software. Application Settings were applied to standardise fluorescence intensity readings between experiments, and fluorescence intensities monitored using SpheroTM 8-peak Rainbow Beads (110620, BD Bioscience). Voltages were initially setup using unlabelled RBC-lysed blood leukocytes. BD CompBeads (552843, BD Bioscience) were used as compensation controls as previously described31. Blood cell samples were acquired until a total of 2000 Flow-Count Fluorosphere beads were collected based on side scatter (log) and forward scatter (linear) plot gating. LEGENDScreen samples were acquired at 10,000 event/second using the sample fine adjust and on a low sample flow rate to collect a total of ~1-3 x 105 live CD45+ cells. Every 36th sample was followed by a 3 min run on a high sample flow rate with 10% sodium hypochlorite then a 2 min run on a high sample flow rate with ddH2O and the stability of fluorescence of each channel assessed by acquiring SpheroTM 8-peak Rainbow Beads. Raw Flow Cytometry Standard (FCS) files of the data are available upon request at the ANU DATA COMMONS repository (https://dx.doi.org/10.25911/6153a8ab5747c). Flow cytometry analysis Flow cytometry analysis was performed using FlowJo v10 software (BD Bioscience) and the R package CytoExploreR version 2.0.034 (https://dillonhammill.github.io/CytoExploreR/). A combination of manual gating and unsupervised Pairwise Controlled Manifold Approximation Projection (PaCMAP) analysis was use to delineate cell populations and assess for manual gate cell population segregation, and cell groups then named based on marker expression represented by median fluorescent intensities (MedFI) of each marker plotted using heat map dot plots made using the R packages ggplot2 (https://ggplot2.tidyverse.org) (see Results section). Data normalisation and processing Blood leukocyte data To reduce the influence of inter-experimental technical variability on the independent blood analysis experiments, their data was normalised at several levels. Firstly, cell numbers in each flow cytometry acquisition set were normalised to counting beads spiked into the sample, with each sample normalised to 5000 Flow-Count Fluorospheres (5/5 of the spiked load), to give the number of cells in ~0.005 mL of blood (“counting bead normalised” values). Secondly, these normalised counts were normalised to the mean counts of the respective blood leukocytes from non-tumour bearing control animals within each experiment, the “nil normalised values”. To get “normalised cell counts” per 0.005 mL of blood (as an estimate of the overall cells across the groups), the “nil normalised values” were multiplied to the overall mean of the “bead normalised cell count” from all non-tumour-bearing animals for each cell population across all experiments. LEGENDScreen data Leukocyte marker expression changes in cancer samples was compared to healthy levels as follows: Background (matched healthy controls) PE MedFI of LEGENDScreen markers on each cell population was subtracted from the corresponding marker MedFI of the same cell population in each tumour type. This MedFI difference was then divided by the maximum PE MedFI change of each marker for each population and any values less than -1 was assigned as -1. This gave a cancer-specific marker change scaled from -1 to 1 (with 0 being normal). These values where visualised using a heat map dot plot using the R package ComplexHeatmap35. Supervised machine learning Supervised machine learning was performed using Orange 3 software. Random Forest and CATboost modelling used 100 trees for predictions or 500 for ranking feature importance, with a maximum tree depth of 4 (for Random Forest) or 6 (for CATBoost) and for Random Forest a maximum number of features considered at each node was 5 and subsets smaller than 5 not split. In addition, for CATBoost learning, the learning rate was 0.3, the regularisation was lambda 3 and subsampling was 1. For classification of groups using monocytes, CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results of predicted and actual classes displayed as a confusion matrix. Feature ranking was done using both Random Forest and CATBoost (built into the models in Orange 3 software). For the learning curve as a function of decreased features (populations), CATBoost was used and trained on 66% of randomly sampled data and tested on the remaining data, this repeated 100 times and results assessed using area under curve of the receiver operating characteristics (AUC; to assess separability of the classes), classification accuracy (CA; proportion of correct classification), precision (ratio of correct positive prediction to all predicted positive), recall (ratio of correct positive prediction to actual positive), and F1 score (weighted average of precision and recall). Final CATboost predictions on an optimised subset of features used 66% of randomly sampled data for training and predictions were on the remaining data. Orange 3 workflows are provided in S1 File. Statistical analysis and data presentation For means comparisons between Nil, CT26 and 4T1 cohorts, data was transformed using the formula Y=Log(Y+1) to help normalise distributions and equalise variance, and then assessed by 2-way ANOVA using GraphPad Prism software. Analysis was corrected for multiple comparisons using the two-stage step-up method of Benjamini, Krieger and Yekuyieli36 and a false discovery rate of 0.05 and p values reported to test the null hypothesis that the means are equal or distributions were from the same population. PaCMAP used the pacmap python package through CytoExploroR. Multidimensional scaling used the cmdscale function in the R package, stats (v 3.6.2) using Euclidean distances and displayed using the cyto_plot function in the CytoExploreR R package. Heatmap dot plot were made through several R packages including ggplot2, ComplexHeatmap and HeatmapR. Log ratio (M) log average (A) (MA) plots were constructed using the ggpubr, ppplot2 and pprepel R packages. Pythagorean trees and confusion matrices were made in Orange 3 software. Circular bar plots were made using ggplot2 in R. Prism was also used for plotting data.
Contact Email
ben.quah@anu.edu.au
Contact Address
John Curtin School of Medical Research 131 Garran Rd, Acton, ACT 2601
Principal Investigator
Benjamin Quah
Supervisors
Benjamin Quah
Collaborators
David A Simon Davis; Melissa Ritchie; Dillon Hammill; Jessica Garrett; Robert Slater; Naomi Otoo; Anna Orlov; Katharine Gosling; Jason Price; Farhan M Syed; Ines I Atmosukarto
Fields of Research
3299 - Other biomedical and clinical sciences
Socio-Economic Objective
280103 - Expanding knowledge in the biomedical and clinical sciences; 280112 - Expanding knowledge in the health sciences
Keywords
Cancer immunilogy
Type of Research Activity
Experimental development
Date of data creation
2022
Year of data publication
2023
Creator(s) for Citation
Quah
Benjamin
Publisher for Citation
The Australian National University Data Commons
Access Rights Type
Conditional
Rights held in and over the data
CC-BY-SA
Licence Type
CC-BY-SA - Attribution-SharedAlice (Version 4.0)
Data Location
URL
Retention Period
Indefinitely
Data Size
GB
Data Management Plan
No
Status: Published
Published to:
  • Australian National University
  • Australian National Data Service
Related items