Monday, October 2, 2017

Which algorithms/data structures should I "recognize" and know by name?

This quote from 

https://softwareengineering.stackexchange.com


Which algorithms/data structures should I "recognize" and know by name? 



An objective response:


While my initial response to this question was based on my empirical experience as a soon-to-graduate CS student and my projected opinion of the type of people I wanted to work with in the CS field. There is actually an objective (with respect to the subjective opinions of the ACM SIGCSE and IEEE computing societies) answer. Every 10 years the ACM and the IEEEbodies cooperate on a joint publication that details suggestions for undergraduate computer science curriculum based on professional knowledge of the state of the computing industry. More information can be found at cs2013.org. The committee publishes a final report listing their curriculum recommendation.


That said, I still think my list is pretty good.


Original answer below.


What Should I Know?


Minimum


I think an adept programmer should have at least undergraduate level knowledge in Computer Science. Sure, you can be effective at many jobs with only a small subset of Computer Science because of the rock solid community CS sits upon, and the narrowed focus of most professional positions. Also, many people will further specialize after undergraduate study. However, I do not think either are an excuse to not be privy of foundational CS knowledge.


To answer the title question, here is what an undergraduate CS student (the foundation for an adept programmer) should know upon graduation:


Data Structures


Machine Data Representation

Ones, Two's Complement, and Related Arithmetic


Words, Pointers, Floating Point


Bit Access, Shifting, and Manipulation


Linked Lists


Hash Tables (maps or dictionaries)


Arrays


Trees


Stacks


Queues


Graphs


Databases


Algorithms


Sorting:

Bubble Sort (to know why it's bad)


Insertion Sort


Merge Sort


Quick Sort


Radix style sorts, Counting Sort and Bucket Sort


Heap Sort


Bogo and Quantum Sort (=


Searching:

Linear Search


Binary Search


Depth First Search


Breadth First Search


String Manipulation


Iteration


Tree Traversal


List Traversal


Hashing Functions


Concrete implementation of a Hash Table, Tree, List, Stack, Queue, Array, and Set or Collection


Scheduling Algorithms


File System Traversal and Manipulation (on the inode or equivalent level).


Design Patterns


Modularization


Factory


Builder


Singleton


Adapter


Decorator


Flyweight


Observer


Iterator


State [Machine]


Model View Controller


Threading and Parallel Programming Patterns


Paradigms


Imperative


Object Oriented


Functional


Declarative


Static and Dynamic Programming


Data Markup


Complexity Theory


Complexity Spaces


Computability


Regular, Context Free, and Universal Turing Machine complete Languages


Regular Expressions


Counting and Basic Combinatorics


Beyond


To get into what you're asking about later in your question, if you are familiar with the above, you should be easily able to identify the appropriate pattern, algorithm, and data structure for a given scenario. However, you should recognize that there is often no best solution. Sometimes you may be required to pick the lesser of two evils or even simply choose between two equally viable solutions. Because of this, you need the general knowledge to be able to defend your choice against your peers.


Here are some tips for algorithms and data structures:


Binary Search can only (and should) be used on sorted data.


Radix style sorts are awesome, but only when you have finite classes of things being sorted.


Trees are good for almost anything as are Hash Tables. The functionality of a Hash Table can be extrapolated and used to solve many problems at the cost of efficiency.


Arrays can be used to back most higher level data structures. Sometimes a "data structure" is no more than some clever math for accessing locations in an array.


The choice of language can be the difference between pulling your hair out over, or sailing through, a problem.


The ASCII table and a 128 element array form an implicit hash table (=


Regular expressions can solve a lot of problems, but they can't be used to parse HTML.


Sometimes the data structure is just as important as the algorithm.


Some of the above might seem like no brainers, and some may seem vague. If you want me to go into more detail, I can. But, my hope is when encountered with a more concrete question such as, "Design a function that counts the number of occurrences of every character in a String", you look to the tip about the ASCII table and 128 element arrays forming neat implicit hash tables for the answer.


Based off these ideas, I will propose an answer the locker problem outlined in your question.


Answer to the problem posed in your question.


This may not be the best answer to your question, but I think it's an interesting one that doesn't require anything too complex. And it will certainly beat the time complexity of using a queue, or stack which require linear time to determine whether a locker is free or not.


You have 0-999 lockers. Now, because you have a fixed number of lockers, you can easily conceive a hashing function with no collisions on the range 0-999. This function is simply h(x) = x mod 1000. Now, [conceptually] construct a hash table with integer keys and the contents of a 1000 element char array as your values. If a customer wants to reserve locker 78 for use, simply put 78 into the hash function (returning 78), and then add that number to the base pointer of the array -- storing a true value at the location pointed to by the offset value. Similarly, if you need to check whether 78 is in use, simply read the value stored at that location and check against true.


This solution operates in constant time for lookups and storage as opposed to a log(n) time storage and lookup in the case of a priority queue backed by a binary tree. The description is intentionally verbose so you can see the higher concepts being boiled down into an efficient algorithm.


Now, you might ask, what if I need to know all of the available lockers, wouldn't a priority queue be better? If there are k available lockers in the priority queue, iterating over all of them will take k steps. Further, depending on your priority queue implementation, you might have to rebuild your priority queue as you look at it all.. which would take k*log(k) : (k < 1000) steps. In the array solution, you only have to iterate of a 1000 element array and check which ones are open. You can also add an available or used list to the implementation to check in k time only.