Top 50 SQL for Mastery
Recyclable and Low Fat Products 1 2 3 def find_products(products: pd.DataFrame) -> pd.DataFrame: # return products[(products["low_fats"] == "Y") & (products["recyclable"] =="Y")].loc[:,["product_id"]] return products[(products["low_fats"] == "Y") & (products["recyclable"] =="Y")][["product_id"]] 1 select product_id from Products where low_fats = "Y" and recyclable = "Y" Note: can use df.iloc[:, [“column_name”]] or df[[“column_name]] df[“column_name] returns series. Find Customer Refree 1 2 def find_customer_referee(customer: pd.DataFrame) -> pd.DataFrame: return customer[(customer["referee_id"] != 2) | (customer["referee_id"].isnull())][["name"]] or 1 2 3 def find_customer_referee(customer: pd.DataFrame) -> pd.DataFrame: customer.fillna(0, inplace=True) return customer.loc[customer["referee_id"] !=2 , ["name"]] Note: can insert the logic in the loc 1 select name from Customer where referee_id != 2 or referee_id is Null Big countries 1 2 def big_countries(world: pd.DataFrame) -> pd.DataFrame: return world[(world["area"] >= 3000000) | (world["population"] > 25000000)][["name", "population", "area"]] 1 select name, population, area from world where population >= 25000000 or area >= 3000000 Article View 1 2 3 def article_views(views: pd.DataFrame) -> pd.DataFrame: return views[views["viewer_id"] == views["author_id"]][["author_id"]].drop_duplicates() .sort_values("author_id").rename(columns={"author_id": "id"}) 1 select distinct(author_id) as id from Views where author_id = viewer_id order by id invalid tweets 1 select tweet_id from Tweets where length(content) >15 1 2 def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:22 return tweets.query(f"content.str.len() >15")[["tweet_id"]] or ...
Machine learning Box
Imagine a box where you put all of your machine learning stuff, Here it is. [WIP] will update the structure Bias vs Varience Metrics Precision Recall Accuracy F1-score Cross-Validation How do you choose which cross validation technique will be used for your project. THink about how your model will be sued and interact with the data in a deployed setting. if the dataset is huge, use Hold-out, which is basically 80-20 method ...
Setting up Nvidia SDK Manager and Torch Library in Jetson Board
The blog serves as a backup for setting up the Nvidia Jetson Orion AGX for Development/Production. Please note: review all the steps before proceeding. I was updating from JetPack 5.x to 6.x. My Jetson Orion AGX had already been flashed. Installation The blog covers installing the Nvidia sdk manager. Make sure to install Ubuntu 22.04, not 24.04, for installing JetPack 6 (as of August 19, 2024). Similarly, upgrade the host machine, install all the necessary Nvidia drivers and CUDA on the host machine before installing the Nvidia SDK Manager. More information on installing Nvidia Drivers on ubuntu ...
Installing Cuda 12.x in Ubuntu 24.04
MOTIVE So, Ubuntu 24.04 LTS was released on 25 April 2024. Its been more 3 months from the release and I thought it would be safe enough to install it in my main working computer. Wrong!! I wanted a documentation for setting up the CUDA 12.2 in Ubuntu 24.04 but ended for reinstalling everything. Thus, here is the blog for future me to setup everything. Thanks to all the great help I found in the Internet [ referenced as links ]. ...
Case study: Automated Image Clustering for E-commerce company
Background In this case study, Artificial Intelligence based system is used to take buisness decisions. The study is based on a e-commerce company which ingest lots of data like images, quarterly reports, invoices, customer feedbacks, customer reviews with images, etc. It would be great if this large amount of data is automatically categorized or grouped together. Previously, a machine learning team in the company had created a model to classify the input data. But the nature of the e-commerce data, new product launches, new customer feedbacks, etc. are not predictable. Thus, the model needs to be retrained frequently. The company is looking for a solution that can automatically perform and help the analyst to get the insight from the unstructured data. ...