F |
frank |
I chat "how and what are the best practices, reference frameworks and workflows to human genomics data curation process ?"
The response : "My apologies, I couldn't determine the nature of your request. Can you please provide more details? Click the chat input field, then use keyboard up arrow to restore your previous prompts.
"
I tried with different AI assistance
Activity Newest / Oldest
frank
I tried with different agents and it doesn't recognize the selected text in the document. First, I select the document and then I ask the agents to search for appropriate references. Sometimes it works, sometimes it doesn't. I believe it lacks consistency, as if different language models are acting in each chat session.
frank
The message is : Semantic Scholar did not return any relevant articles, please try again in a few seconds!
p.krawczyk
I would be happy to give you a better explanation in a quick zoom session - just please email me at info@figlinq.com!
p.krawczyk
Sorry, i misunderstood your prompt, I thought you were busy processing data. For a general chat or question please select the "chat assistant". Here is the answer that i got to your question from the chat assistant:
Quote:
What a great question! Human genomics data curation is a crucial process that requires careful attention to ensure the accuracy, completeness, and consistency of the data. Here are some best practices, reference frameworks, and workflows that can help:
**Best Practices:**
1. **Data Standardization**: Use standardized formats, such as VCF (Variant Call Format) or GVF (Genome Variation Format), to ensure consistency across different datasets.
2. **Data Validation**: Validate data against reference genomes, such as GRCh37 or GRCh38, to ensure accuracy and detect errors.
3. **Data Annotation**: Annotate data with relevant metadata, such as sample IDs, experimental conditions, and sequencing platforms.
4. **Data Quality Control**: Perform quality control checks to detect and remove low-quality or contaminated data.
5. **Data Versioning**: Track changes to the data through versioning to ensure reproducibility and transparency.
**Reference Frameworks:**
1. **FAIR Principles**: Follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles to ensure that data is easily discoverable, accessible, and reusable.
2. **GA4GH (Global Alliance for Genomics and Health) Standards**: Use GA4GH standards, such as the Genomic Data Sharing (GDS) framework, to ensure interoperability and standardization across different datasets.
3. **Human Genome Variation Society (HGVS) Guidelines**: Follow HGVS guidelines for variant annotation and nomenclature to ensure consistency across different datasets.
**Workflows:**
1. **Data Ingestion**: Develop a workflow for ingesting data from various sources, such as sequencing platforms or public databases.
2. **Data Processing**: Develop a workflow for processing data, including quality control, annotation, and standardization.
3. **Data Storage**: Develop a workflow for storing data in a secure and scalable manner, such as using cloud-based storage solutions.
4. **Data Access**: Develop a workflow for accessing data, including authentication and authorization mechanisms.
Some popular tools and platforms for human genomics data curation include:
1. **Genomics Data Management Platforms**: Platforms like Genome Nexus, Genomics Data Manager, and genomicsDB provide integrated solutions for data curation, analysis, and visualization.
2. **Workflow Management Systems**: Systems like Nextflow, Snakemake, and Apache Airflow provide workflow management and automation capabilities for data curation and analysis.
3. **Data Integration Tools**: Tools like Apache NiFi, Apache Beam, and Data Integration Framework provide data integration and processing capabilities for various data sources.
If you'd like to explore these concepts further or discuss specific use cases, please let me know!