November 4, 2019
Prepared by Caleb Jones and Ross Davis in collaboration with Greg Hellbourg and Ian Morrison for the NenuFAR SETI Project.
The following is a network analysis of 96 exoplanets observed during September 23rd-24th, 2019 by the Search for Extraterrestrial Intelligence (SETI) project that used the French radio telescope NenuFAR.
The network analysis is illustrated through a series of visualizations and tables based on standard statistical techniques used in network analysis. The network analysis can help to narrow the search for technosignatures (signs of ET technology such as radio, optical, or near-infrared signals) by highlighting where a potential technosignature may likely be found in the 96-exoplanet network, according to prominent characteristics of the network as depicted by the visualizations and tables.
The analysis can be used in tandem with other SETI research models to search for exoplanet technosignatures, such as a 3-planet communications model used to identify exoplanet pair-Earth alignments (e.g., heightened focus on alignments that would be within or adjacent to prominent network features).
Network analysis can model information propagation through interconnected systems whether those systems are biological, computer, population, social, or even scientific journals (e.g., Erdős number). Standard statistical techniques associated with network analysis include, yet are not limited to: degree ranking, HITS analysis, and betweenness centrality. In context of SETI, this kind of network analysis can help to narrow the search for technosignatures by examining the composition and distribution of exoplanets as networks.
Additionally, the network analysis can potentially help to enhance non-SETI astronomical research, where networks and the key aspects thereof would be relevant (e.g., quantitatively enhancing interstellar topologies).
For this initial analysis, the following goals have been identified:
To achieve these goals the following methodology is used:
nenufar-seti-field-10-17-19.csv
which is a customized data file derived from the NASA Exoplanet Archive online). Then experiment with different "max distance" edge values to create a network with a single componentTo create the network with its nodes and edges, host star systems are loaded from planet data then added to a network. Planets are not modeled as nodes in this network as the distances between planets and their host stars are insignificant compared to interstellar distances. Host stars are effective proxies for their exoplanets when modeling networks at interstellar scale. The number of exoplanets in each system is modeled as an attribute for the host star.
To map positions of host stars onto a Euclidean space (X,Y,Z coordinates) the Astropy library for Python was used to convert from Right Ascension (ra) and Declination (dec) coordinates. The ra/dec coordinates are kept on the host star nodes themselves as attributes, but the distance calculation and visualizations (2D and 3D) use the X/Y/Z Euclidean coordinates.
Straight-line, Euclidean pathways are then calculated between host stars. Rather than include a pathway edge from every host star to every other host star in the network, pathway edges are filtered based on distances which are less than or equal to a "max distance". A max distance is imposed as a way to highlight natural topologies in terms of proximity between host stars. Also, imposing a max edge distance models information relay pathways similar to what emerges in computer networks, biological systems, road systems, or social networks where not every node is connected to every other node but instead utilizes hubs, routers, relays, brokers, and clusters of connections to interact with the wider network. A hypothetical information network between systems may have similar constraints and emergent traits which many other kinds of networks exhibit. An inverse weight is then associated to the edges, giving edges with a smaller distance greater weight which is incorporated into various analysis algorithms.
There is some nuance to determining a minimum max distance to use in the edge modeling. That nuance and the justification for the minimum max distance chosen for this analysis is detailed below in the "Results" section.
Once the network is loaded, the following network-wide analysis are done using Gephi. Gephi is an open-source network visualization and analysis platform. For this analysis, the Gephi UI was used (see automation below for ideas on how to automate perhaps using the Gephi Toolkit SDK):
After the data was loaded, it became quickly apparent that there are outlier star systems which make a strict "single component" approach untenable. This is illustrated in the histogram in Figure 1, wherein the WTS-1
star system with 1 exoplanet is 3200 parsecs away from our solar system with the next furthest away system being CoRoT-26
which is 1670 parsecs away from our solar system.
Figure 1. Distance from Sun Histogram
Ideally, a minimum max distance is selected which results in the network having a single connected component, wherein any two nodes are connected via a path. For this dataset, in order to have a network with a single connected component, a minimum edge distance of ~1997 parsecs is required. It is likely this is an artifact of incomplete data as many more exoplanet discoveries and confirmations have yet to be made.
Ignoring WTS-1
and other outliers, a minimum edge distance can be as low as 300 parses while retaining 90% of the host stars in the network. Figure 2 charts the percentage of nodes in the network (y-axis 0.0-1.0) that is included in the largest connected component in the network when different max distance parameters are used (x-axis in parsecs). A "knee" can be seen at around 300 parsecs where diminishing returns are found for higher max edge distances and the primary component drastically shrinks in size much lower than 300 parsecs.
Figure 2. Max Distance Component Analysis
The host stars excluded when using 300 parsecs as an edge max-distance are:
For this analysis, a max edge distance of 300 parsecs will be used.
Figure 3 is a 2D visualization of the resulting 300 parsec network. Host stars are colored as described above (black indicates no temperature data for that star system) and sized based on the number of exoplanets in that system.
Figure 3. 2D network with host stars sized by exoplanet number
Here, a primary community exists towards the top with a secondary community below.
Within this single connected component, a modularity analysis can be done to identify communities of host stars which are more strongly connected to each other compared to the rest of the graph. This analysis takes into account the edge weight calculated above. This results in 4 different communities (using a default resolution parameter of 1.0 for the algorithm). These communities are visualized in Figure 4:
Figure 4. 2D network with host stars and edges colored by community
Degree is a measure of the number of edges connected to a given node. The top 10 host stars with the highest degree are ranked in Figure 5:
Figure 5. Top 10 host stars ranked by degree
Host Star | Degree |
---|---|
K2-294 | 42 |
K2-293 | 41 |
K2-217 | 41 |
K2-207 | 41 |
WASP-44 | 40 |
WASP-28 | 40 |
K2-213 | 40 |
WASP-151 | 40 |
WASP-147 | 39 |
WASP-158 | 39 |
An 11th host star also with a degree of 39 is K2-205. A visualization of this ranking is in Figure 6:
Figure 6. 2D network colored by community detection with host stars sized by degree
Degree can also be analyzed within the context of communities. Figure 7 shows this ranking:
Figure 7. Top 3 host stars ranked by degree in each community (using color corresponding to community visualization)
Degree can also be calculated taking into account the weights of edges. Figure 8 ranks the top 10 host stars with the highest weighted degree:
Figure 8. Top 10 host stars ranked by highest weighted degree
Figure 9 visualizes the network with nodes sized by weighted degree:
Figure 9. 2D network colored by community detection with host stars sized by weighted degree
A HITS analysis gives each node two scores: "Authority" and "Hub". "Authority" measures how valuable information stored at that node is and "Hub" measures the value of its edge links. These two scores are defined in terms of one another and thus the algorithm runs repeatedly until a desired convergence is reached. This analysis (using epsilon parameter of 1.0E-4 for the algorithm's convergence limit) results in "authority" scores which closely correlate with the degree of host stars but with some exceptions. Figure 10 shows the top 10 host stars based on their authority score:
Figure 10. Host stars ranked by their HITS authority score - degree included for comparison
Host Star | Authority | Degree |
---|---|---|
K2-294 | 0.190 | 42 |
K2-293 | 0.187 | 41 |
WASP-44 | 0.185 | 40 |
WASP-151 | 0.185 | 40 |
WASP-28 | 0.184 | 40 |
WASP-147 | 0.182 | 39 |
K2-207 | 0.181 | 41 |
K2-217 | 0.178 | 41 |
K2-213 | 0.178 | 40 |
WASP-158 | 0.177 | 39 |
Figure 11 visualizes this ranking:
Figure 11. 2D network colored by community detection with host stars sized by HITS authority score
A Network Diameter analysis gives each node a betweenness centrality score which is a measure of how often that node appears in pathways in the network. This is another metric which can be used to quantify a node's importance or influence in a network (often highlighting broker nodes). Figure 12 shows a ranking of these scores. Figure 13 visualizes these score in the network. Note the prominence of HATS-10
in figure 13 as it brokers connections between two loosely coupled communities in the network. While this prominence might be an artifact of the data (e.g., Are there actually fewer exoplanets in that region of space or have we simply detected fewer?) this methodology is sound in its scoring:
Figure 12. Top 10 host stars ranked by centrality score
Figure 13. 2D network colored by community detection with host stars sized by centrality score
Finally, a page rank analysis generates a ranking score based on how likely random traversals are to encounter a node. Like the HITS algorithm, this algorithm is also recursive. An epsilon parameter of 0.001 (for recursion termination) and probability parameter of 0.85 was used for the algorithm. The algorithm is also configured to take into account edge weight. Figure 14 shows this ranking with Figure 15 visualizing the rankings in the network:
Figure 14. Top 10 host stars ranked by PageRank score
Figure 15. 2D network colored by community detection with host stars sized by PageRank score
Placing the aforementioned types of rankings side by side and comparing how often a host star appears in each can detect whether any particular systems have an importance in the network across these multiple measures (accounting for bias of individual algorithms). This is done in Figure 16. Note that the following table only includes host stars which were in the 2 or more of the top-10 rankings across degree, HITS, or PageRank algorithms. The centrality algorithm's bias produces a non-overlapping top ten for this dataset and so is excluded from this table. Those host stars may deserve special consideration on their own. -
indicates that a host star was not in top-10 for that measure:
Figure 16. Ranking of host stars
Host Star | Degree (rank) | Weighted Degree (rank) | HITS (rank) | PageRank (rank) |
---|---|---|---|---|
K2-294 | 1 | 2 | 1 | 2 |
K2-293 | 2 | 1 | 2 | 1 |
K2-217 | 3 | 8 | 8 | 10 |
K2-207 | 4 | 4 | 7 | 5 |
WASP-44 | 5 | 7 | 3 | - |
WASP-28 | 6 | 3 | 5 | 3 |
K2-213 | 7 | 6 | 9 | 7 |
WASP-151 | 8 | - | 4 | - |
WASP-147 | 9 | - | 6 | - |
WASP-158 | 10 | - | 10 | - |
This network can be visualized in 3D using NAViGaTOR. NAViGaTOR is a network analysis and visualization tool created by the Krembil Research Institute. The latest version (NAViGaTOR 3) does not support 3D visualization of networks. However, the previous version (NAViGaTOR 2.3) does provide that functionality. NAViGaTOR 2.3 (for Windows, Mac, or Linux) can be downloaded here. For this analysis, NAViGaTOR 2.3 only worked on Windows.
When opening a graph file in NAViGaTOR 2.3 (this analysis used GML format), NAViGaTOR 2.3 will, by default, apply a force-directed layout. However, this behavior can be disabled which causes NAViGaTOR 2.3 to use standard graphics
properties including any X/Y/Z coordinates. When loading the network (see "Network Modeling" above) the Euclidean X/Y/Z coordinates are stored in this graphics
property. Figure 16 shows the results of this 3D visualization.
Figure 16. 3D, rotating visualization of the network using NAViGaTOR 2.3
The following are data and artifacts produced in this project:
The following are structured data used or created in this analysis (with the corresponding file format or software used listed in parenthesis):
The following are the 2D network visualizations in scalable vector graphics (SVG) format. This enables arbitrary zooming to see the labels of host stars in the network which are often too small to read in raster image formats like PNG:
Much of this data may be biased due to data limitations considering that scientific research involving exoplanets and SETI/technosignatures/astrobiology is still in the formative stages relative to other types of scientific research (i.e., on-going exoplanet discovery and lack of surrounding topology due to missing data for some parameters). While a higher emphasis on systems with exoplanets may give higher weight to systems when modeling hypothetical interstellar networks, systems without planets could serve as relay points. A similar analysis could be done including stars, regardless of exoplanet status, then measure how planetary systems situate relative to key influencing systems. An analysis on a larger body of data which fills out the surrounding space may be able to minimize this bias for this set of stars. One example of such an analysis on a different dataset has been done by Caleb Jones previously. That analysis is published here.
As noted above, default tuning values were used for network-wide analysis algorithms. Additional effort could be made to automatically try several different tuning configurations and comparing results similar to hyperparameter optimization used in tuning neural networks. This process would be different, however, since this process (as is) is not a learning algorithm and could instead create comparison charts looking for plateaus or inflection points in the parameter's effects on the score distributions.
Additional efforts to automate this analysis could be made. In particular, automating the min-of-max edge distance discovery, analysis rankings, and 2D visualizations could almost entirely automate the generation of a report like this. Standardizing the import format or making the loader more dynamic could also accelerate the time necessary to model the network. Besides writing code from scratch to create a 3D file, no practical method for automating the 3D visualization is known and that step may remain a manual process when replicating this anlaysis.
With additional automation, this kind of analysis could be packaged as a general-purpose tool to analyze interstellar structures and/or modeling hypothetical information propagation networks across star systems in the context of SETI research (and potentially non-SETI astronomical research).