This course provided by GOOGLE CLOUD is intended to be an introduction to machine learning for business professionals. It teaches me how to translate business problems into machine learning use cases and vet them for feasibility and impact. The following are the notes I took during this course.
1. What is Machine Learning ?
ML is a way to use standard algorithms to derive predictive insights from data and make repeated decisions.
Phases of ML
- Collecting data.
- Labeling data.
- Training using chosen metrics and objectives.
- Evaluate a model.
- Deploy a model.
2. Good data characteristic
- Has coverage.
- Is clean.
- Is complete.
3. ML vs AI
ML is a type of Artificial intelligence.
Logic vs Machine Learning.
Neural networks & Deep learning.
Use AI responsibly —- responsible AI = successful AI.
4. Why ML now ?
- Increasing availability of data.
- Increasing maturity and sophistication of ML algorithms.
- Increasing power and availability of computing hardware and software.
5. Labeling data
Label is the true answer for a given input.
Regression vs Classification.
Every example needs to have features and a label.
Ways to label your data:
- Use a proxy label.
- Build a labeling system.
- Use a labeling service.
6. Modeling Training
Continuous training keeps models fresh.
https://github.com/tensorflow.
Formulating the ML problem
- Choosing input features.
- Get labels.
- Choose an objective.
7. Modeling Evaluation
Test data (20%)
Confusion matrices measure performance relative to expectations for classification.
8. ML Best Practices
- ML involves experimentation.
- start simple.
- Don’t use your test data during experimentation.
- Do pilot projects with end-users.
9. Human Bias in ML
Decisions made as you do ML have real world impact for you and your customers.
Unconscious biases exist in data.
Fairness in ML.
10. Discovering ML Use Cases
Simplifying rule-based systems.
Streaming business processes.
Understanding unstructured data.
Personalizing experiences: Adds significant value to users.
Recommender systems.
ML in Series.
11. Data Strategy
ML is about repeated decisions:
- Design a system so that you will have more data next year.
- Break down data silos.
- Transition from data lakes to data warehouses.
- Learn about your data.
- Integrate pilots into your tools.
- Run ML models on real-time data to extract the most value.
- Collect more data.
12. Data Governance
Data access must be balanced against security.
Three goals for ML and Privacy:
- Identify sensitive data.
- Protect sensitive data by removing, masking or coarsening.
- Create public governance documentation.
Types of sensitive data:
- Specific columns in structured datasets.
- Patterned text, e.g., credit card numbers.
- Unstructured data, like audio, video and images.
- Combination of fields.
Common principles for establishing a policy framework:
- Establish a secure location for documentation.
- Exclude sensitive information from documentation.
- Document all sources and processes.
- Establish a process to review and enforce policies.
Build your ML team:
- Data engineers.
- ML engineers.
- Data analysts.
13. Create a Culture of Innovation
Starts with a dedicated mindset.
Focus on the user.
10X thinking.
Launch and iterate.
Change is inevitable.