Skip to content

Latest commit

 

History

History
179 lines (122 loc) · 7.84 KB

ideas.md

File metadata and controls

179 lines (122 loc) · 7.84 KB

Data Science

Data products provide actionable information without exposing decision makers to the underlying data or analytics (e.g., buy/sell strategies for fnancial instruments, a set of actions to improve product yield, or steps to improve product marketing).

Data Science is the art of turning data into actions. Tis is accomplished through the creation of data products, which provide actionable information without exposing decision makers to the underlying data or analytics (e.g., buy/sell strategies for fnancial instruments, a set of actions to improve product yield, or steps to improve product marketing). Performing Data Science requires the extraction of timely, actionable information from diverse data sources to drive data products. Examples of data products include answers to questions such as: “Which of my products should I advertise more heavily to increase proft? How can I improve my compliance program, while reducing costs? What manufacturing process change will allow me to build a better product?” Te key to answering these questions is: understand the data you have and what the data inductively tells you.

A data product provides actionable information without exposing decision makers to the underlying data or analytics.

Examples include: Movie Recommendations, Weather Forecasts, Stock Market Predictions, Production Process Improvements

Analytics Projects


One-off analysis


The goal of your analysis is to tell an actionable story. the output is a report, notebook or a presentation. Following project illustrate this:

  • Predicting Spatial Risk of Opioid Overdoses in Providence, RI
  • Tidying and mapping Toronto open data
  • Where to live in the US
  • Text analysis of Trump's tweets confirms he writes only the (angrier) Android half

Also see: https://www.opencasestudies.org and academic papers

Best practices: http://jtleek.com/ads2020/week-1.html


Advanced Analytics in Spark (classic)

  • Recommending Music and the Audioscrobbler Data Set
  • Predicting Forest Cover with Decision Trees
  • Anomaly Detection in Network Traffic with K-means Clustering
  • Understanding Wikipedia with Latent Semantic Analysis
  • Analyzing Co-occurrence Networks with GraphX
  • Geospatial and Temporal Data Analysis on the New York City Taxi Trip Data
  • Analyzing Genomics Data and the BDG Project
  • Analyzing Neuroimaging Data with PySpark and Thunder
  • Estimating Financial Risk through Monte Carlo Simulation

Beautiful Data : (classic)

  • Personal Data
  • Government and Urban Planning data
  • Analysing Housing data
  • DNA - Genomics Data Analysis
  • Analysing Political Data
  • Astronomy data - Image Processing on Mars
  • Photography meta-data analysis
  • Social Media data

https://learning.oreilly.com/library/view/beautiful-data/9780596801656/


Fast Forward Labs (based on recent research)

  1. Structural Time Series
  2. Meta-Learning
  3. Automated Question Answering
  4. Causality for Machine Learning
  5. Interpretability: 2020 Edition
  6. Deep Learning for Anomaly Detection
  7. Transfer Learning for NLP
  8. Multi-task learning
  9. Semantic recommendations
  10. Interpretability
  11. Probabilistic programming
  12. Summarization

AI and Healthcare


Text Analytics

  1. NLP: https://paperswithcode.com/area/natural-language-processing
  2. Semantic Code Search by Github: https://towardsdatascience.com/semantic-code-search-3cd6d244a39c, https://github.com/github/codesearchnet

ML in Software Engineering

Big list: https://ml4code.github.io/tags.html

  1. The Case for Learned Index Structures
  2. https://research.fb.com/wp-content/uploads/2020/09/Machine-Learning-in-Compilers-Past-Present-and-Future.pdf
  3. Automatic Recommendation of Pythonic Idiom Usage - https://youtu.be/vOCQReSvBxA
  4. Using machine learning for code recommendation : https://ai.facebook.com/blog/aroma-ml-for-code-recommendation/
  5. Deep learning to translate between programming languages https://ai.facebook.com/blog/deep-learning-to-translate-between-programming-languages/
  6. More broadly, AI has the potential to help with other programming tasks. For example, Facebook AI has previously shared Neural Code Search, a method for using natural language in queries about code, and Getafix, a tool that learns to automatically suggest fixes for coding bugs.
  7. Introducing TF-Coder, a tool that writes tricky TensorFlow expressions for you!
  8. Learn to rank: https://medium.com/@nikhilbd/intuitive-explanation-of-learning-to-rank-and-ranknet-lambdarank-and-lambdamart-fe1e17fac418 https://arxiv.org/pdf/1812.00073.pdf

“We’re entering a new world in which data may be more important than software.” — Tim O’Reilly

“You can best learn data mining and data science by doing, so start analyzing data as soon as you can! However, don’t forget to learn the theory, since you need a good statistical and machine learning foundation to understand what you are doing and to find real nuggets of value in the noise of big data.”

“The best way to learn data science is to do data science.” — Chanin Nantasenamat

Data Literacy:

“The era of Data Technology is here and it will surpass the Information Technology era. The DT era is about transparency, sharing of information and enabling others. Alibaba is excited about the possibilities of the DT era and how it can bring value to society.” — Jack Ma

“Many believe that Big Data is over-hyped, but seeing the fantastic use cases popping up around the globe I would say Big Data is under-hyped! In the coming years, Big Data will revolutionize every industry unlike we have seen before!”

“I like to think of data as the new soil, Get in and get your hands dirty.” — David McCandless

“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” — Josh Wills

“Despite an awful lot of marketing hype, big data are here to stay and big data analytics (i.e. data science and statistics) will remain aids to human thinking and not replacements for it!” — Diego Kuonen

“For me, data science is a mix of three things: quantitative analysis (for the rigor necessary to understand your data), programming (so that you can process your data and act on your insights), and storytelling (to help others understand what the data means).” — Edwin Chen

“A data scientist is someone who can obtain, scrub, explore, model, and interpret data, blending hacking, statistics, and machine learning. Data scientists not


Fourier Analysis Graph theory Computational theory Coding theory Learning theory Sorting Streaming Algorithms Randomized Algorithms Lists and Sequences Geometry Hash Tables Graphs and Network Science Combinatorial Optimization