API Reference
CLI
Command-line entry point for ALMA Search.
This module turns user input files into ALMA archive queries, collects the raw matches, applies deduplication and cleaner-row selection, and finally writes the public CSV output.
- alma_search.cli.parse_args(argv=None)[source]
Parse command-line arguments for the
alma_searchexecutable.- Parameters:
argv (sequence[str] | None, optional) – Optional argument list. When
None,argparsereads from the live command line.- Returns:
Parsed options and positional arguments used by
main().- Return type:
argparse.Namespace
- alma_search.cli.main(argv=None)[source]
Execute the end-to-end ALMA search workflow.
The workflow is:
Parse command-line options.
Load and normalize target coordinates.
Query the ALMA TAP service once per target.
Convert raw archive rows into output rows.
Deduplicate, optionally apply the cleaner filter, and write CSV.
- Parameters:
argv (sequence[str] | None, optional) – Optional command-line arguments to parse.
- Returns:
Exit status code.
0means success,1means a local validation or file-processing error, and2means every remote ALMA query failed so no trustworthy output could be produced.- Return type:
int
Input and Output Helpers
Input parsing and output-table helpers.
This module is responsible for normalizing user-supplied target catalogs, combining and cleaning intermediate result rows, computing the final observed-species flag, and writing the canonical CSV schema.
- alma_search.io.get_output_columns(species)[source]
Return the exported column order for a specific observed-species label.
- Parameters:
species (Any) – User-supplied species name, for example
"CO"or"HCN".- Returns:
Public CSV columns in the exact order used for export, with the final internal flag column renamed to the user-facing label.
- Return type:
list[str]
- alma_search.io.compute_observed_species_flag(target_lines, distance_arcsec, fov_arcsec, observed_species, observed_distance_threshold_arcsec, observed_fov_threshold_arcsec)[source]
Compute the internal observed-species score for one result row.
The internal score is later converted to a simple
YesorNoin the exported CSV. A positive value means the selected species is considered observed for that source. The score values are:1.0when the inferred line is present and the ALMA pointing is within the distance threshold.0.5when the inferred line is present, the pointing is farther away, but the field of view is large enough to still count as coverage.0.0otherwise.
- Parameters:
target_lines (Any) – Comma-separated inferred line names for the row.
distance_arcsec (Any) – Angular separation between the input target and ALMA pointing center.
fov_arcsec (Any) – Approximate ALMA field of view in arcseconds.
observed_species (Any) – Species to test, such as
"CO"or"HCN".observed_distance_threshold_arcsec (float) – Distance threshold for a definite match.
observed_fov_threshold_arcsec (float) – FOV threshold for the looser coverage case.
- Returns:
Internal score used during post-processing.
- Return type:
float
- alma_search.io.observed_species_flag_to_label(value)[source]
Convert an internal observed-species score into a CSV label.
- Parameters:
value (Any) – Numeric score produced by
compute_observed_species_flag().- Returns:
"Yes"when the score is greater than zero, otherwise"No".- Return type:
str
- alma_search.io.build_no_match_row(input_name, input_ra_deg, input_dec_deg)[source]
Create a placeholder row for a target with no returned ALMA matches.
- Parameters:
input_name (str) – Original target name from the input catalog.
input_ra_deg (float) – Input right ascension in decimal degrees.
input_dec_deg (float) – Input declination in decimal degrees.
- Returns:
Output row with coordinate fields filled and science metadata set to missing values.
- Return type:
dict[str, Any]
- alma_search.io.load_targets_from_table(df)[source]
Normalize a tabular input catalog into
Name,ra_deg,dec_deg.Supported schemas are:
Name, ra_deg, dec_degfor decimal-degree coordinates.Name, ra, decfor sexagesimal or decimal text coordinates.
- Parameters:
df (pandas.DataFrame) – Raw table read from a CSV-like input file.
- Returns:
Normalized target table with decimal-degree coordinates.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If the required columns are missing or any coordinate row cannot be parsed.
- alma_search.io.load_targets_from_text(path)[source]
Load targets from a plain-text coordinate list.
Each non-comment line must have the form
Name,RA DECwhere the coordinate tokens can be decimal degrees or sexagesimal text.- Parameters:
path (str) – Path to the plain-text input file.
- Returns:
Normalized target table with columns
Name,ra_deg, anddec_deg.- Return type:
pandas.DataFrame
- Raises:
ValueError – If any line cannot be parsed into a valid name and coordinate pair.
- alma_search.io.load_targets(path, logger=None)[source]
Load a target catalog from CSV-like or plain-text input.
The function first attempts structured CSV parsing. If that fails, it falls back to the plain-text parser used for line-based coordinate lists.
- Parameters:
path (str) – Input file path.
logger (Any | None, optional) – Optional logger used to report fallback decisions.
- Returns:
Normalized target catalog in decimal degrees.
- Return type:
pandas.DataFrame
- alma_search.io.combine_arrays(values)[source]
Combine classified array labels into the canonical order.
- Parameters:
values (sequence[str]) – Array classifications such as
"12m","7m", or comma-separated combinations from multiple rows.- Returns:
Unique array labels joined in
12m,7m,TPorder.- Return type:
str
- alma_search.io.combine_bands(values)[source]
Combine ALMA band metadata into a sorted comma-separated string.
- Parameters:
values (sequence[Any]) – Raw
band_listfield values, potentially containing repeated values or multiple delimiters.- Returns:
Unique bands sorted numerically when possible.
- Return type:
str
- alma_search.io.combine_lines(values)[source]
Combine inferred line labels from multiple rows.
- Parameters:
values (sequence[str]) – Comma-separated line lists, often from multiple observations being merged into one output row.
- Returns:
Unique line names in first-seen order. Returns
"Unknown"only when no explicit line name is available but at least one source row was marked as unknown.- Return type:
str
- alma_search.io.blank_string_to_na(value)[source]
Convert blank strings to
pandas.NA.- Parameters:
value (Any) – Scalar value to normalize.
- Returns:
pandas.NAfor empty strings, otherwise the original value.- Return type:
Any
- alma_search.io.finalize_results(df, observed_species='CO', observed_distance_threshold_arcsec=30.0, observed_fov_threshold_arcsec=100.0)[source]
Apply final cleanup and derive the observed-species flag column.
- Parameters:
df (pandas.DataFrame) – Intermediate result table using the internal schema.
observed_species (Any, optional) – Species used for the final observed-in-ALMA decision.
observed_distance_threshold_arcsec (float, optional) – Distance threshold for a definite observed flag.
observed_fov_threshold_arcsec (float, optional) – FOV threshold for the looser observed flag.
- Returns:
Cleaned result table with normalized missing values and a groupwise observed-species score propagated across rows with the same source
Name.- Return type:
pandas.DataFrame
- alma_search.io.select_cleaner_rows(df, observed_species='CO', max_observed_rows_per_name=5)[source]
Reduce the final table to a smaller human-review subset.
Rules
Keep one unmatched row when a source has no ALMA match.
Keep up to
max_observed_rows_per_nameclosest rows when the selected observed species exists.Otherwise keep the single closest row for that source.
- param df:
Finalized result table.
- type df:
pandas.DataFrame
- param observed_species:
Species used to decide whether a source has relevant rows.
- type observed_species:
Any, optional
- param max_observed_rows_per_name:
Maximum number of closest species-matching rows to keep per source.
- type max_observed_rows_per_name:
int, optional
- returns:
Filtered table sorted by source name and distance.
- rtype:
pandas.DataFrame
- Parameters:
df (DataFrame)
observed_species (Any)
max_observed_rows_per_name (int)
- Return type:
DataFrame
- alma_search.io.deduplicate_results(df, dedup_level, observed_species='CO', observed_distance_threshold_arcsec=30.0, observed_fov_threshold_arcsec=100.0)[source]
Deduplicate raw result rows before final export.
- Parameters:
df (pandas.DataFrame) – Raw per-observation result rows in the internal schema.
dedup_level (str) – Deduplication mode. Supported values are
"none","project", and"project_target".observed_species (Any, optional) – Species used for the final observed-species flag.
observed_distance_threshold_arcsec (float, optional) – Distance threshold for a definite observed flag.
observed_fov_threshold_arcsec (float, optional) – FOV threshold for the looser observed flag.
- Returns:
Deduplicated and finalized output rows.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If
dedup_levelis not one of the supported values.
- alma_search.io.write_csv(df, path, observed_species='CO')[source]
Write the public CSV output file.
- Parameters:
df (pandas.DataFrame) – Final result table in the internal schema.
path (str) – Destination CSV file path.
observed_species (Any, optional) – Species name used to label the final
Observed ... in ALMA?column.
- Return type:
None
Archive Query and Search Logic
ALMA archive querying and result-row construction.
This module contains the logic that talks to the ALMA TAP service, interprets frequency metadata, infers likely spectral lines from frequency coverage, and translates archive rows into the package’s internal output schema.
- alma_search.search.create_tap_service(tap_url='https://almascience.eso.org/tap')[source]
Create a TAP client for the ALMA science archive.
- Parameters:
tap_url (str, optional) – TAP endpoint URL. The default points to the public ALMA archive.
- Returns:
pyvo.dal.TAPServiceinstance.- Return type:
Any
- Raises:
ImportError – If
pyvois not installed in the current Python environment.
- alma_search.search.build_adql_query(ra_deg, dec_deg, radius_deg)[source]
Build the ADQL cone-search query used against ALMA ObsCore.
- Parameters:
ra_deg (float) – Cone center right ascension in decimal degrees.
dec_deg (float) – Cone center declination in decimal degrees.
radius_deg (float) – Search radius in decimal degrees.
- Returns:
ADQL query string selecting the ObsCore fields needed by this package.
- Return type:
str
- alma_search.search.query_alma_cone(service, ra_deg, dec_deg, radius_arcmin)[source]
Query the ALMA archive around one target position.
- Parameters:
service (Any) – TAP service client, typically created by
create_tap_service().ra_deg (float) – Cone center right ascension in decimal degrees.
dec_deg (float) – Cone center declination in decimal degrees.
radius_arcmin (float) – Cone-search radius in arcminutes.
- Returns:
Query results as a pandas table. Returns an empty frame when the query completes successfully but finds no rows.
- Return type:
pandas.DataFrame
- alma_search.search.parse_frequency_support(frequency_support)[source]
Parse the ALMA
frequency_supportmetadata field into GHz intervals.- The field often contains text fragments like:
[87.30..89.17GHz, …] 1.23456E+11..1.24567E+11Hz 230.1 .. 232.0 GHz U 234.0 .. 236.0 GHz
The parser is intentionally permissive and extracts every interval that looks like
number .. number unit.- Parameters:
frequency_support (Any) – Raw ObsCore
frequency_supportvalue.- Returns:
List of
(low_ghz, high_ghz)intervals. The list is empty when no recognizable interval is present.- Return type:
list[tuple[float, float]]
- alma_search.search.coarse_frequency_interval_from_em(em_min, em_max)[source]
Convert wavelength bounds into a coarse frequency interval.
- Parameters:
em_min (Any) – Minimum wavelength in meters from ObsCore.
em_max (Any) – Maximum wavelength in meters from ObsCore.
- Returns:
Single coarse
(low_ghz, high_ghz)interval, or an empty list when the wavelength bounds are unavailable or invalid.- Return type:
list[tuple[float, float]]
- alma_search.search.infer_lines(frequency_support, em_min, em_max, line_velocity_tolerance_kms, line_catalog_ghz=None)[source]
Infer likely spectral lines covered by an ALMA observation.
- Parameters:
frequency_support (Any) – Raw spectral-coverage metadata string from ObsCore.
em_min (Any) – Minimum wavelength in meters, used as a fallback when
frequency_supportis absent or unparsable.em_max (Any) – Maximum wavelength in meters, used alongside
em_min.line_velocity_tolerance_kms (float) – Velocity tolerance used to widen the rest-frequency matching window.
line_catalog_ghz (dict[str, float] | None, optional) – Optional replacement catalog mapping line labels to rest frequencies in GHz.
- Returns:
Comma-separated matched line names, or
"Unknown"when no reliable coverage interval could be inferred.- Return type:
str
- alma_search.search.classify_array(instrument_name)[source]
Classify an ALMA observation using only the instrument name field.
- Parameters:
instrument_name (Any) – Raw ALMA instrument metadata.
- Returns:
Comma-separated array labels such as
"12m"or"7m,TP".- Return type:
str
- alma_search.search.classify_array_from_metadata(instrument_name, antenna_arrays)[source]
Classify ALMA array usage from instrument and antenna metadata.
- Parameters:
instrument_name (Any) – Raw ObsCore instrument metadata.
antenna_arrays (Any) – Raw antenna-array metadata, often containing antenna IDs that reveal the array type.
- Returns:
Canonical array classification assembled from the available metadata.
- Return type:
str
- alma_search.search.rows_from_query_results(input_name, input_ra_deg, input_dec_deg, query_df, line_velocity_tolerance_kms)[source]
Transform raw ALMA query rows into the package’s internal row schema.
- Parameters:
input_name (str) – Source name from the user’s input catalog.
input_ra_deg (float) – Input source right ascension in decimal degrees.
input_dec_deg (float) – Input source declination in decimal degrees.
query_df (pandas.DataFrame) – Raw ALMA query results for this source.
line_velocity_tolerance_kms (float) – Velocity tolerance passed through to
infer_lines().
- Returns:
One internal output row per valid ALMA archive row.
- Return type:
list[dict[str, Any]]
Line Matching
Line-catalog and observed-species helpers.
This module stores the small built-in spectral-line catalog used for coarse coverage inference and provides string-matching helpers for the final observed-species flag.
- alma_search.lines.normalize_observed_species_label(species)[source]
Normalize a user-supplied observed-species label.
- Parameters:
species (Any) – Raw species name from CLI or API input.
- Returns:
Cleaned species label, or the package default when the input is blank.
- Return type:
str
- alma_search.lines.observed_species_column_name(species)[source]
Return the exported column header for the observed-species flag.
- Parameters:
species (Any) – Species label selected by the user.
- Returns:
Human-readable column title such as
"Observed CO in ALMA?".- Return type:
str
- alma_search.lines.extract_line_species_token(line_name)[source]
Extract the species part of a line label.
- Parameters:
line_name (str) – Full line label such as
"HCN(1-0)".- Returns:
Prefix before the first
(, used for species comparison.- Return type:
str
- alma_search.lines.normalize_species_token(token)[source]
Normalize a species token for comparison.
- Parameters:
token (str) – Raw species token.
- Returns:
Upper-case token with whitespace and square brackets removed.
- Return type:
str
- alma_search.lines.line_matches_observed_species(line_name, observed_species)[source]
Return whether an inferred line should count for the chosen species.
- Parameters:
line_name (str) – Inferred line label from the built-in catalog.
observed_species (Any) – User-selected species label.
- Returns:
Truewhen the line belongs to the requested species family. The special queryCOmatches common isotopologues such as12CO,13CO,C18O, andC17O.- Return type:
bool
- alma_search.lines.has_observed_species_line(target_lines, observed_species)[source]
Test whether a row’s inferred line list includes the chosen species.
- Parameters:
target_lines (Any) – Comma-separated inferred line list.
observed_species (Any) – User-selected species label.
- Returns:
Truewhen at least one inferred line matches the requested species.- Return type:
bool
Utilities
Shared utility helpers for ALMA archive search workflows.
These helpers are intentionally small and reusable. They normalize missing values, parse coordinate strings, and combine repeated metadata values into stable CSV-friendly text.
- alma_search.utils.configure_logging(verbose)[source]
Configure the package-wide logging format and level.
- Parameters:
verbose (bool) – When
True, enable debug logging. Otherwise use info-level logging.- Return type:
None
- alma_search.utils.safe_get(record, key, default='')[source]
Read a dictionary-like value while normalizing null-like entries.
- Parameters:
record (dict[str, Any]) – Mapping to read from.
key (str) – Key to retrieve.
default (Any, optional) – Fallback value used when the key is missing or null-like.
- Returns:
Stored value or the supplied default.
- Return type:
Any
- alma_search.utils.is_blank(value)[source]
Return whether a value should be treated as missing text/data.
- Parameters:
value (Any) – Value to test.
- Returns:
TrueforNone, pandas missing values, and empty strings.- Return type:
bool
- alma_search.utils.normalize_whitespace(value)[source]
Collapse repeated whitespace in a scalar value.
- Parameters:
value (Any) – Value to normalize.
- Returns:
String with internal whitespace collapsed to single spaces, or an empty string when the value is blank.
- Return type:
str
- alma_search.utils.unique_preserve_order(items)[source]
Return unique items while preserving first-seen order.
- Parameters:
items (iterable[str]) – Candidate string values.
- Returns:
Non-blank unique values in their original encounter order.
- Return type:
list[str]
- alma_search.utils.stable_sort_numeric_strings(values)[source]
Sort string values numerically when possible, otherwise lexically.
- Parameters:
values (iterable[str]) – String values to sort.
- Returns:
Unique values sorted with numeric strings before non-numeric ones.
- Return type:
list[str]
- alma_search.utils.parse_ra_dec_to_degrees(ra_value, dec_value)[source]
Parse RA and Dec values into decimal degrees.
- Parameters:
ra_value (Any) – Right ascension value in decimal degrees or sexagesimal text.
dec_value (Any) – Declination value in decimal degrees or sexagesimal text.
- Returns:
Parsed
(ra_deg, dec_deg)pair.- Return type:
tuple[float, float]
- Raises:
ValueError – If either coordinate is blank or cannot be parsed.
- alma_search.utils.format_ra_dec_strings(ra_deg, dec_deg)[source]
Format decimal-degree coordinates as sexagesimal strings.
- Parameters:
ra_deg (float) – Right ascension in decimal degrees.
dec_deg (float) – Declination in decimal degrees.
- Returns:
(ra_text, dec_text)formatted with colon separators.- Return type:
tuple[str, str]
- alma_search.utils.to_optional_float(value, scale=1.0, digits=3)[source]
Convert a scalar to a rounded float when possible.
- Parameters:
value (Any) – Input value to convert.
scale (float, optional) – Multiplicative scale factor applied before rounding.
digits (int, optional) – Number of decimal places to keep.
- Returns:
Rounded float result, or
pandas.NAwhen conversion fails.- Return type:
float | pandas.NA
- alma_search.utils.format_float_text(value, digits=3)[source]
Format a scalar value as compact text for merged CSV fields.
- Parameters:
value (Any) – Input scalar value.
digits (int, optional) – Number of decimal places used when formatting numeric values.
- Returns:
Blank string for missing input, a cleaned text value for non-numeric input, or a trimmed numeric string.
- Return type:
str
- alma_search.utils.combine_scalar_values(values, digits=3)[source]
Combine repeated scalar values into a unique CSV-friendly string.
- Parameters:
values (sequence[Any]) – Scalar values collected across rows.
digits (int, optional) – Number of decimal places for numeric formatting.
- Returns:
Comma-separated unique values, or
pandas.NAwhen nothing usable is available.- Return type:
str | pandas.NA