Fundamentals of Data Science 2024 Regular Solved Question Paper

F-DS 2023/2024 (Regular) (NEP) Solved Question Paper Fundamentals of Data Science

Time: 2 Hrs | Max. Marks: 60

Section - A

I. Answer any TEN questions, 2 marks each (2x10=20)

1. What is Data Science?

Data Science is a field that uses scientific methods, processes, algorithms, and systems to extract insights and knowledge from structured and unstructured data.

Data Science is the process of collecting, analyzing, and interpreting large amounts of data to help make better decisions.

Components:

Data Collection – Gathering data from various sources (web, databases, sensors, etc.)
Data Analysis – Finding patterns, trends, or summaries

2. Define Datafication

Datafication is the process of converting various aspects of life into data that can be analyzed and used for decision-making.
datafication is a fundamental concept in Data Science, enabling the extraction of value from digitalized information.

3. Define mean and mode.

Mean: The average of all numbers in a dataset, calculated by summing all values and dividing by the number of values.
Mode: The value that appears most frequently in a dataset.

4. Name the languages used for data science.

Python
R Programming
SQL
Julia
Java
Scala
JavaScript

5. What is data munging?

Data munging (or data wrangling) is the process of cleaning, structuring, and enriching raw data into a desired format for better decision making.

6. What is the formula used to calculate BMI (Body Mass Index)?

BMI = weight (kg) / (height (m))²

7. What do you mean by Population and Sample?

Population: The entire group you’re interested in studying.
Sample: A subset of the population used to represent the whole.

8. What is exploratory data analysis?

Exploratory Data Analysis (EDA) is an approach to analyzing datasets to summarize their main characteristics, often using visual methods.

9. What is machine learning?

Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on developing systems and algorithms that can learn from data and make decisions or predictions without being explicitly programmed for every task.

Applications to use:

Business
Healthcare
Smart Cities
Social Media

10. What is big data?

Big data refers to extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

11. Define Decision tree.

A decision tree is a flowchart-like structure where each internal node represents a test on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or decision.

12. What is interactive visualization?

Interactive visualization is a form of data visualization that allows users to directly manipulate and explore graphical representations of data.

Section - B

II. Answer any FOUR questions, 5 marks each (4x5=20)

13. Explain any live data science job profiles.

Data Analyst: Analyzes data to identify trends.
Data Scientist: Develops models using ML algorithms.
Data Engineer: Manages and optimizes data pipelines.
ML Engineer: Designs and implements ML models.
Business Intelligence Analyst: Provides data-driven insights to help businesses.

14. Explain the steps of data cleaning.

Data cleaning (or data cleansing) is the process of detecting and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality before analysis.

Handle missing values: Fill in missing values using. Mean, Median, Mode (for numbers)
Remove duplicates: Check for and delete repeated rows or records.
Fix data types: Standardize inconsistent data formats. (Example: “M” and “Male” → both converted to “Male”)
Remove outliers: Detect and remove unusual or extreme values that don’t fit the pattern.
Normalize or standardize values: Convert data to a common scale for fair comparison.

15. What are the advanced ranking techniques? Explain.

PageRank: Algorithm used by Google to rank web pages based on link structure.
Learning to Rank: Machine learning approach to ranking problems.
RankNet: Neural network-based ranking algorithm.
LambdaMART: Combines boosted decision trees with ranking loss function.
Pairwise Ranking: Compares pairs of items to determine ranking.

PageRank: Ranks web pages based on links.
TF-IDF: Ranks keywords based on frequency.
Gradient Boosting Models: Used in Kaggle competitions.
Borda Count and Reciprocal Rank Fusion (RRF) for aggregating rankings.

16. Explain the different chart types.

Bar charts: Compare quantities across categories.
Line charts: Show trends over time.
Pie charts: Display parts of a whole.
Scatter plots: Show relationship between two variables.
Histograms: Display distribution of continuous data.
Heatmaps: Show magnitude of phenomena as color in two dimensions.
Box plots: Display distribution through quartiles.

17. Explain decision tree classifiers.

Decision tree classifiers are supervised learning algorithms that create a model predicting the value of a target variable by learning simple decision rules inferred from data features.

They work by recursively partitioning the data into subsets based on feature values.
The tree consists of decision nodes (tests on attributes) and leaf nodes (class labels).
Advantages: Easy to understand/interpret, requires little data preparation.
Disadvantages: Can create over-complex trees that don’t generalize well.

Section - C

III. Answer any TWO questions, 10 marks each (10x2=20)

18. a) Explain the data science process with a neat diagram.

The Data Science Process refers to a structured workflow that data scientists follow to extract insights and build predictive models from raw data.

1. Problem Definition

Understand the business or research problem.
Define clear goals and objectives.

2. Data Collection

Gather data from various sources such as:
- Databases
- APIs
- Files (CSV, Excel)
- Web scraping

3. Data Cleaning (Preprocessing)

Handle missing values.
Remove duplicates and outliers.
Convert data types.
Normalize or scale data.

4. Data Exploration (EDA – Exploratory Data Analysis)

Visualize data using charts and graphs.
Find patterns, correlations, and trends.
Use summary statistics.

5. Data Modeling

Choose appropriate machine learning models.
Train models using training data.
Evaluate performance (accuracy, precision, etc.).

6. Model Evaluation & Validation

Test the model on new (test) data.
Perform cross-validation.
Tune hyperparameters.

7. Deployment

Integrate the model into a real-world application.
Use cloud services or APIs to make predictions.

8. Monitoring & Maintenance

Monitor model performance.
Update model with new data regularly.

18. b) Explain the methods of data collection.

Surveys/Questionnaires: Structured data collection from respondents.
Web Scraping: Automated extraction of data from websites.
APIs: Programmatic access to data from services.
Databases: Extracting data from SQL or NoSQL databases.
Sensors/IoT: Automated data collection from devices.
Public Datasets: Using openly available datasets.
Experiments: Controlled data generation for research.

19. a) Write a note on statistical distributions.

A statistical distribution describes how values of a random variable are spread or distributed. It tells us the probability or frequency of each possible value the variable can take.

Types of Statistical Distributions:

Normal Distribution (Gaussian Distribution)
Binomial Distribution
Poisson Distribution
Uniform Distribution
Exponential Distribution

Properties:

Mean (μ) – Average value
Median – Middle value
Variance (σ²) – Spread of data
Skewness – Measure of asymmetry
Kurtosis – “Peakedness” of the distribution

19. b) Explain the different visualization tools.

Tableau: Powerful business intelligence tool with drag-and-drop interface.
Power BI: Microsoft’s business analytics service.
Matplotlib/Seaborn: Python libraries for static, animated, and interactive visualizations.
Plotly/Dash: Tools for interactive web-based visualizations.
D3.js: JavaScript library for producing dynamic, interactive data visualizations.
GGplot2: R’s data visualization package based on grammar of graphics.
Excel: Basic visualization capabilities for simple datasets.

20. a) Explain the concept of machine learning.

Machine learning is the study of algorithms that improve automatically through experience.

Machine learning is a method of data analysis that automates analytical model building. It uses algorithms that learn from data to make predictions or decisions

Three main types:

1. Supervised Learning: Model learns from labeled training data.
2. Unsupervised Learning: Model finds patterns in unlabeled data.
3. Reinforcement Learning: Model learns through trial and error with rewards.

Applications:

Voice assistants (Siri, Alexa)
Recommendation systems (Netflix, Amazon)
Fraud detection in banking
Medical diagnosis
Chatbots and virtual assistants

Key concepts: training data, features, model, prediction, evaluation metrics.

20. b) Explain the role of data scientist in data science process.

Problem Formulation: Work with stakeholders to define business problems.
Data Acquisition: Identify and collect relevant data.
Data Preparation: Clean and transform raw data.
Exploratory Analysis: Discover patterns and insights.
Model Development: Build and train predictive models.
Model Evaluation: Assess model performance and refine.
Deployment: Implement models in production.
Communication: Explain findings to non-technical stakeholders.
Ethical Considerations: Ensure responsible use of data and algorithms.

AI 2024 Regular Solved Question Paper | MAD 2024 Regular Solved QP | PHP 2024 Regualr Solved QP