Notes:
- Due to a sudden demand from the mines, I was unable to devote significant time and energy to the planned tasks. As explained in the channel: this should ease off in another week or at the most two – I am working specifically towards making that happen. Meanwhile – I will work on higher priority tasks.
- I’ve actually completed more than I thought I would manage, though I was stuck for a while with v.py before suddenly seeing some light in some areas.
- I’ve improved in rejecting shiny distractions and pacing myself. I’ve found the latter is also a reason I missed earlier review deadlines, because I floundered around ‘working’ till I had no energy left to review, and did not want to review till I had finished the task. Such an approach is fueled by feelings rather than thought, and does not produce the intended result. I also have to improve in my estimation of the time it takes to review.
- Based on the discussion today, I’ve decided to make an effort in savoring progress and not become overly dejected when I cannot attain a goal. i.e to strive for a more balanced way of thinking and to give progress the time it needs.
Review:
[ ]v.py : 10 lines, minimum.- I have made (minor) progress with the vdiff algorithm. It appeared that I need to understand atleast the terminology and fundamental concepts of encryption better to understand what V does, and why. For example, I am now familiar sending encrypted messages and the basic idea of a GPG key pair, but I had not explored the terminologies of symmetric, asymmetric and the process of hashing. I have just started reading the book Serious Cryptography which I found after today’s discussion on #ossasepia. There appears to be a significant gain in comfort through reading the structured text even though I’ve barely finished a single chapter.
- Refer Updated prelim notes.
- Some questions still remain as shown in the post – but I would like to make one more attempt at cracking it, and think I should be able to, though I will not wait long before asking for help.
[ ]Completepartially finishedpending Python basics revision.- Dropped this task due to the redirect in focus.
[X]GLMnet results issue – spillover from Week 5.- This was resolved. Luckily, it was almost resolved even before the redirection of focus. Additionally, as hoped – I was able to make more inroads into grokking Linear Regression and in particular Penalty based extensions like glmnet. Re-reading things I read in the past was invaluable in making these new connections. I know enough now to apply the glmnet algorithm as well, but I strongly think – I need a little more depth, and definitely a revision.
- I started writing a summary of the problem, but realised that I also wanted to interweave a summary of the algorithm itself, and did not have enough clarity to do so. This was paused to work on v.py.
[X]Hash out project plan for exploring Open Data, and using it as practice. Datasets of ‘interest’- Computer price index
- Decided to start with the wages dataset as it appeared more interesting at this point, based on a brief exploration of the datasets.
[X]Wages- Generic plan is laid out. A brief exploration of the dataset is complete. It appears to be a good dataset to work with, i.e there are challenges in the data wrangling and exploration. Refer Blog post.
- Computer price index
[ ]Learning about software and the software industry- None of these subtasks were completed.
[X]Improve summary of ‘Your feelings are to get you’, based on Diana’s comment. (1 paragraph).- This was completed, but I am yet to respond to Diana’s subsequent comment about the improved summary appearing like a first pass. Though I did draft a response – I was not satisfied with it, and wanted to give it more attention.
[ ]Learn + Summarise :- systemd | init ( Refer log)- This task is at the lowest priority as advised in earlier comments.
[X]Formulate response to Diana’s comment on ‘what is data science’. Worth noting that this was discussed before. My ‘view’ has improved, since then, but I’m not sure I would answer very differently.- This was discussed in the comments on the previous post. My definition has in fact changed slightly since the same, which is expected as I progress in my learning. However, I will keep note of Diana’s advice and focus on specific projects, rather than an ambiguous and rather unhelpful goal of ‘conquering data science’.
1. You still get some time but note that you seem to have no trouble in asking for help in random/outside-of-here places, while at the same time you keep postponing asking in #ossasepia where the author of the code you are studying even repeatedly stated his availability to answer. Are you trying to preserve some sort of self-made public image of yourself here or what’s going on?
3.1 “Re-reading things I read in the past was invaluable in making these new connections.” – yes, that’s precisely how it works, always.
3.2 “but realised that I also wanted to interweave a summary of the algorithm itself” – why interweave though? And more to the point, why delay/not do X because of an urge to add to it some Y that is not yet available? Do what X can be done *now* and then you’ll also have it as a reference when you do the Y.
Anytime you follow some urge/want/unexpected turn, you *should* at the very least examine and explicitly state the reason *why* you decided it’s better to follow it than not to follow it!
8. The questions I asked there are questions you should have asked yourself before diving in – it’s not as much that I need their answers but that *you* need to learn to question more and better than you seem to do so far.
Comment by Diana Coman — August 26, 2019 @ 1:52 pm
1. I’ve only asked for help in #gnupg and that one posted conversation was all of it, after which I haven’t asked for help anywhere on this topic. In fact, I did mention in the channel that I realised right after asking I should have posted in #ossasepia, but incidentally got a response and the convo continued. This has nothing to do with any self perceived image, but has a lot to do with the desire or self imposed mandate to put in more work and push myself before asking a question or admitting my brain is frozen. It would not matter on any other channel, because they are not mentoring me and I don’t particularly care about asking before putting in some more work. This is also connected with valuing your time. I don’t claim the approach is efficient – or that I always recognize the thin line where I should stop treating each thing as a personal crusade of discovery and ask questions.
3.2. Okay I agree I should have completed my summary, and I will rectify this. As to why interweave: While the goal was to solve that specific problem posted in the course forum – I had to learn about how the algo worked to do so. The problem in specific was actually reaching across 2 separate packages, one which is like a wrapper around several such ML algo’s and was actually the root cause of the issue. My approach to solve it was backward wherein I studied the actual glmnet algo and package (not the wrapper), with the intention of using this opportunity to master it, because it will be used down the line, like in (4). Perhaps this should have been treated as a separate project once the specific problem was solved.
8. Well – I certainly did consider the question “what is data science”, and the reasons supporting a dive, which as you may have gathered is a significant pivot for me and thus not something I could do without more than a superficial evaluation, said evaluation which in fact is always on. While stating that I want to ‘break into data science’ is ambiguous – the fundamental approach in my head has always been to take up an ‘interesting’ data set – analyse it, apply ML — rinse and repeat in different areas untill I find a specific area that I want to focus on, while in parallel looking for suitable opportunities. Now – these opportunities / companies etc have to be identified as belonging to a particular sector demanding a specific set of skills. Most job profile descriptions do have a lot of overlap, and I did not find one ‘sector’ like Technology or Finance significantly more preferable than the other purely in terms of my perception of the typical work involved. Though that is also being refined with exposure – there is a basic set of skills / knowledge base that is still common across all of them, like say a solid understanding of stats, or knowledge of atleast some commonly used algo’s, or familiarity with a type of analysis (like A/B testing etc) etc. I’ve been successful in acquiring some parts, and not so much in the other.
Comment by Shreyas Ragavan — August 26, 2019 @ 4:36 pm
1. Ah, I see. The trouble with your updated notes is that you don’t mark in any way when/which parts got updated and so it seemed as if there was yet another round of asking, what can I tell you. It’s fine to keep it all in one post if you want it that way, but mark prominently “Updated on Day Month Year” at the top of new sections as you add them or something.
8. Hm, that sounds like you could do with a more detailed discussion on this really. There are indeed some basic skills and there is even some overlap with crypto anyway. Get yourself up to date with V and then FFA and you’ll move on to Eulora and loads of interesting *things* to find from data. It’s never data in itself that is interesting really and going blindly at it sounds more like fishing for results – you’ll end up finding something just by chance pretty much i.e. not something meaningful, despite what it might seem.
Re questions I meant more the bunch of questions I asked on your blog, not specifically this one.
Comment by Diana Coman — August 26, 2019 @ 7:11 pm